I have a table with data like this
+-------------+--------------+------------+----------------+ | CustomerSID | StartDateSID | EndDateSID | MarketingOptIn | +-------------+--------------+------------+----------------+ | 12345 | 20101019 | 20131016 | Y | | 12345 | 20131017 | 20140413 | Y | | 12345 | 20140414 | 20140817 | N | | 12345 | 20140818 | 20141228 | N | | 12345 | 20141229 | 20150125 | Y | | 12345 | 20150126 | 0 | Y | +-------------+--------------+------------+----------------+
I need to create a view on top of this table to have data formatted in the below format for the Flag, basically the duration for which the Flag was Y or N. (EndDateSID – 0 is currently active, so today’s date)
+-------------+--------------+------------+----------------+ | CustomerSID | StartDateSID | EndDateSID | MarketingOptIn | +-------------+--------------+------------+----------------+ | 12345 | 20101019 | 20140413 | Y | | 12345 | 20140414 | 20141228 | N | | 12345 | 20141229 | 20150825 | Y | +-------------+--------------+------------+----------------+
Most customers only have a change in their Flag once, hence below query works:
SELECT CH1.CustomerSID ,MIN(CH1.StartDateSID) StartDate ,MAX(ISNULL(NULLIF(CH1.EndDateSID,0),CONVERT(INT, CONVERT(VARCHAR, GETDATE(), 112)))) EndDate ,CH1.MarketingOptIn FROM DWH.DimCustomerHistory CH1 GROUP BY CH1.CustomerSID, CH1.MarketingOptIn ORDER BY CH1.CustomerSID, CH1.MarketingOptIn
How can I achieve the intended output for customers like the one above, having changes in the flag more than once?
Advertisement
Answer
This is a gaps and islands problem. You need to use ROW_NUMBER()
to identify your gaps, so the start stage would be:
SELECT CustomerSID, StartDateSID, EndDateSID, MarketingOptIn, TotalRowNum = ROW_NUMBER() OVER(PARTITION BY CustomerSID ORDER BY StartDateSID), RowNumInGroup = ROW_NUMBER() OVER(PARTITION BY CustomerSID, MarketingOptIn ORDER BY StartDateSID), GroupID = ROW_NUMBER() OVER(PARTITION BY CustomerSID ORDER BY StartDateSID) - ROW_NUMBER() OVER(PARTITION BY CustomerSID, MarketingOptIn ORDER BY StartDateSID) FROM dbo.YourTable;
Output:
CustomerSID StartDateSID EndDateSID MarketingOptIn TotalRowNum RowNumInGroup GroupID --------------------------------------------------------------------------------------------------- 12345 20101019 20131016 Y 1 1 0 12345 20131017 20140413 Y 2 2 0 12345 20140414 20140817 N 3 1 2 12345 20140818 20141228 N 4 2 2 12345 20141229 20150125 Y 5 3 2 12345 20150126 0 Y 6 4 2
The key here is that by taking the row number of each row, and also the row number of each row withing the group, you can get a unique identifier (GroupID + MarketingOptIn) which identifies each of your islands. Then it is just a case of grouping by this identifier when doing your aggregates:
FULL WORKING EXAMPLE
DECLARE @T TABLE ( CustomerSID INT, StartDateSID INT, EndDateSID INT, MarketingOptIn CHAR(1) ) INSERT @T VALUES (12345, 20101019, 20131016, 'Y'), (12345, 20131017, 20140413, 'Y'), (12345, 20140414, 20140817, 'N'), (12345, 20140818, 20141228, 'N'), (12345, 20141229, 20150125, 'Y'), (12345, 20150126, 0, 'Y'); WITH CTE AS ( SELECT CustomerSID, StartDateSID, EndDateSID, MarketingOptIn, GroupID = ROW_NUMBER() OVER(PARTITION BY CustomerSID ORDER BY StartDateSID) - ROW_NUMBER() OVER(PARTITION BY CustomerSID, MarketingOptIn ORDER BY StartDateSID) FROM @T ) SELECT CustomerSID, StartDateSID = MIN(StartDateSID), EndDateSID = CASE WHEN MIN(EndDateSID) = 0 THEN CONVERT(INT, CONVERT(VARCHAR(8), GETDATE(), 112)) ELSE MAX(EndDateSID) END, MarketingOptIn FROM CTE GROUP BY CustomerSID, MarketingOptIn, GroupID ORDER BY CustomerSID, StartDateSID;