My end goal is to create islands of continuous enrollment days for each CLIENTID
for a single sub-population: ‘Adult Expansion’ for calendar years 2019 and 2020. A CLIENTID
can be associated with multiple sub-populations in a calendar year, but can never be associated with more than one sub-population at once (there is no overlap in enrollment). My data go back to 2016, but I am only interested in 2019 and 2020. The data are structured that each row is a single enrollment period, with start and end dates of enrollment, associated with a sub-population.
I’ve included below some dummy data and a desired output to better illustrate my goal:
CREATE TABLE #t ( CLIENTID NVARCHAR(9), DEMONSTRATION_POPULATION NVARCHAR(30), ELIGBEGIN DATE, ELIGEND DATE, AGE INT ) INSERT INTO #t VALUES ('123456789', 'Adult Expansion', '2019-12-16', '2019-12-31', 52) , ('123456789', 'Adult Expansion', '2020-01-01', '2020-01-15', 52) , ('123456789', 'Adult Expansion', '2020-03-01', '2020-03-31', 52) , ('123456789', 'Adult Expansion', '2020-04-01', '2020-04-30', 52) , ('123456789', 'Adult Expansion', '2020-05-01', '2020-05-31', 52) , ('123456789', 'Adult Expansion', '2020-06-01', '2020-06-30', 52) , ('123456789', 'Adult Expansion', '2020-07-01', '2020-07-31', 52) , ('123456789', 'Adult Expansion', '2020-08-01', '2020-08-31', 52) , ('123456789', 'Adult Expansion', '2020-09-01', '2020-09-30', 52) , ('123456789', 'Adult Expansion', '2020-10-01', '2020-10-31', 52) , ('123456789', 'Adult Expansion', '2020-11-01', '2020-11-30', 52) , ('123456789', 'Adult Expansion', '2020-12-01', '2020-12-31', 52) ------------------------NEW CLIENTID----------------------------- ,('012345678', 'Demonstration Population #3', '2019-10-01', '2019-10-31', 52) ,('012345678', 'Demonstration Population #3', '2019-11-01', '2019-11-30', 52) ,('012345678', 'Demonstration Population #3', '2019-12-01', '2019-12-31', 52) ,('012345678', 'Demonstration Population #3', '2020-01-01', '2020-01-31', 52) ,('012345678', 'Adult Expansion', '2020-02-01', '2020-02-28', 52) ,('012345678', 'Demonstration Population #3', '2020-02-29', '2020-02-29', 52) ,('012345678', 'Adult Expansion', '2020-03-01', '2020-03-31', 52) ,('012345678', 'Adult Expansion', '2020-04-01', '2020-04-30', 52) ,('012345678', 'Adult Expansion', '2020-05-01', '2020-05-31', 52) ,('012345678', 'Adult Expansion', '2020-06-01', '2020-06-30', 52) ,('012345678', 'Adult Expansion', '2020-07-01', '2020-07-31', 52) ,('012345678', 'Adult Expansion', '2020-08-01', '2020-08-31', 52) ,('012345678', 'Adult Expansion', '2020-09-01', '2020-09-30', 52) ,('012345678', 'Adult Expansion', '2020-10-01', '2020-10-31', 52) ,('012345678', 'Adult Expansion', '2020-11-01', '2020-11-30', 52) ,('012345678', 'Adult Expansion', '2020-12-01', '2020-12-31', 52) ---------------------------NEW CLIENTID--------------------------- ,('020234587', 'Adult Expansion', '2019-06-01', '2019-06-30', 36) ,('020234587', 'Adult Expansion', '2019-08-01', '2019-08-31', 36) ,('020234587', 'Adult Expansion', '2019-09-01', '2019-09-30', 36) ,('020234587', 'Adult Expansion', '2019-10-01', '2019-10-31', 36) ,('020234587', 'Adult Expansion', '2019-11-01', '2019-11-30', 36) ,('020234587', 'Non-1115-Waiver', '2019-12-01', '2019-12-31', 36) ,('020234587', 'Non-1115-Waiver', '2020-01-01', '2020-01-31', 36) ,('020234587', 'Non-1115-Waiver', '2020-02-01', '2020-02-29', 36)
DESIRED OUTPUT:
CLIENTID | AGE | ELIGBEGIN | ELIGEND | Sequence_ID |
---|---|---|---|---|
123456789 | 52 | 2019-12-19 | 2019-12-31 | 1 |
123456789 | 52 | 2020-01-01 | 2020-01-15 | 2 |
123456789 | 52 | 2020-03-01 | 2020-12-31 | 3 |
012345678 | 52 | 2020-02-01 | 2020-28-20 | 1 |
012345678 | 52 | 2020-03-01 | 2020-03-31 | 2 |
020234587 | 36 | 2019-06-01 | 2019-06-30 | 1 |
020234587 | 36 | 2019-08-01 | 2019-11-30 | 2 |
Then here is the current code I am using to attempt and solve this problem.
SELECT * INTO #y1 FROM #t WHERE YEAR(ELIGBEGIN) = '2019' SELECT s1.CLIENTID , s1.AGE , CAST(s1.ELIGBEGIN AS DATETIME) AS ELIGBEGIN , MIN(CAST(t1.ELIGEND AS DATETIME)) AS ELIGEND , ROW_NUMBER() OVER(PARTITION BY s1.CLIENTID ORDER BY CAST(s1.ELIGBEGIN AS DATETIME)) AS Sequence_ID INTO #CY19 FROM --[dbo].[Eligs] --#t s1 #y1 s1 INNER JOIN #y1--#t t1 ON t1.CLIENTID = s1.CLIENTID AND CAST(s1.ELIGBEGIN AS DATETIME) <= CAST(t1.ELIGEND AS DATETIME) AND t1.DEMONSTRATION_POPULATION = 'Adult Expansion' AND NOT EXISTS ( SELECT * FROM #y1 t2--#t t2 WHERE t2.CLIENTID = t1.CLIENTID AND (CAST(t1.ELIGEND AS DATETIME) + 1) >= CAST(t2.ELIGBEGIN AS DATETIME) AND CAST(t1.ELIGEND AS DATETIME) < CAST(t2.ELIGEND AS DATETIME) ) WHERE --s1.DEMONSTRATION_POPULATION = 'Adult Expansion' AND NOT EXISTS ( SELECT * FROM #y1 s2--#t s2 WHERE s2.CLIENTID = s1.CLIENTID AND CAST(s1.ELIGBEGIN AS DATETIME) > CAST(s2.ELIGBEGIN AS DATETIME) AND (CAST(s1.ELIGBEGIN AS DATETIME) - 1) <= CAST(s2.ELIGEND AS DATETIME) ) --AND s1.DEMONSTRATION_POPLUATION = 'Adult Expansion' --AND s1.UMIC = '1' GROUP BY s1.CLIENTID, CAST(s1.ELIGBEGIN AS DATETIME), s1.AGE, s1.ELIGBEGIN, s1.DEMONSTRATION_POPULATION ORDER BY s1.CLIENTID, CAST(s1.ELIGBEGIN AS DATETIME) --do the same for the 2020 calendar year: SELECT * INTO #y2 FROM #t WHERE YEAR(ELIGBEGIN) = '2020' SELECT s1.CLIENTID , s1.AGE , CAST(s1.ELIGBEGIN AS DATETIME) AS ELIGBEGIN , MIN(CAST(t1.ELIGEND AS DATETIME)) AS ELIGEND , ROW_NUMBER() OVER(PARTITION BY s1.CLIENTID ORDER BY CAST(s1.ELIGBEGIN AS DATETIME)) AS Sequence_ID INTO #CY20 FROM --[dbo].[Eligs] --#t s1 #y2 s1 INNER JOIN #y2--#t t1 ON t1.CLIENTID = s1.CLIENTID AND CAST(s1.ELIGBEGIN AS DATETIME) <= CAST(t1.ELIGEND AS DATETIME) AND s1.DEMONSTRATION_POPULATION = 'Adult Expansion' AND NOT EXISTS ( SELECT * FROM #y2 t2--#t t2 WHERE t2.CLIENTID = t1.CLIENTID AND (CAST(t1.ELIGEND AS DATETIME) + 1) >= CAST(t2.ELIGBEGIN AS DATETIME) AND CAST(t1.ELIGEND AS DATETIME) < CAST(t2.ELIGEND AS DATETIME) ) WHERE --s1.DEMONSTRATION_POPLUATION = 'Adult Expansion' --AND NOT EXISTS ( SELECT * FROM #y2 s2--#t s2 WHERE s2.CLIENTID = s1.CLIENTID AND CAST(s1.ELIGBEGIN AS DATETIME) > CAST(s2.ELIGBEGIN AS DATETIME) AND (CAST(s1.ELIGBEGIN AS DATETIME) - 1) <= CAST(s2.ELIGEND AS DATETIME) ) --AND s1.DEMONSTRATION_POPLUATION = 'Adult Expansion' --AND s1.UMIC = '1' GROUP BY s1.CLIENTID, CAST(s1.ELIGBEGIN AS DATETIME), s1.AGE --ORDER BY s1.CLIENTID, CAST(s1.ELIGBEGIN AS DATETIME) ; SELECT CLIENTID , AGE , ELIGBEGIN , ELIGEND , Sequence_ID = ROW_NUMBER() OVER (PARTITION BY CLIENTID ORDER BY ELIGBEGIN) INTO #testcase FROM ( SELECT * FROM #CY19 UNION SELECT * FROM #CY20) a SELECT * FROM #testcase ORDER BY CLIENTID
However, the actual output of the final SELECT * FROM #testcase
is the following:
ACTUAL RESULT:
CLIENTID | AGE | ELIGBEGIN | ELIGEND | Sequence_ID |
---|---|---|---|---|
123456789 | 52 | 2019-12-19 | 2019-12-31 | 1 |
123456789 | 52 | 2020-01-01 | 2020-01-15 | 2 |
123456789 | 52 | 2020-03-01 | 2020-12-31 | 3 |
020234587 | 36 | 2019-06-01 | 2019-06-30 | 1 |
020234587 | 36 | 2019-08-01 | 2019-11-30 | 2 |
As you can see there are three major issues I’m facing with the actual output not matching the desired output.
- I have to run the same query twice for two different years because I couldn’t figure out how to otherwise segment enrollment for a member with continuous enrollment from 2019-12-31 to 2020-01-31 and beyond. If I run the code table
#t
instead of table#y1
, the first CLIENTID’s output would be 12/16/19 through 1/15/20, which I do not want. - I entirely lose one CLIENTID because they were in the ‘Adult Expansion’ sub-population, then in the ‘Demonstration Population #3’, and back into the ‘Adult Expansion’
- For some reason, the code will properly ignore preceding rows where a client’s
DEMONSTRATION_POPULATION != 'Adult Expansion'
but if a CLIENTID changes from ‘Adult Expansion’ to a different sub-population, then the actual output still includes the enrollment dates from this different sub-population which is directly against my desired output. I only want the enrollment segments when an individual is enrolled in ‘Adult Expansion’.
If anyone can assist in solving for the three above problems, I would be super grateful!
Thanks, in advance, for any tips!
Advertisement
Answer
Here’s a solution using a gaps-and-islands approach:
;WITH prevNextCTE AS ( SELECT CLIENTID, DEMONSTRATION_POPULATION, ELIGBEGIN, ELIGEND, AGE, -- compare the year of the previous record to the current to create split at year end CASE WHEN YEAR(LAG(ELIGEND,1) OVER (PARTITION BY CLIENTID ORDER BY ELIGBEGIN)) = YEAR(ELIGBEGIN) THEN LAG(ELIGEND,1) OVER (PARTITION BY CLIENTID ORDER BY ELIGBEGIN) END AS prevELIGEND, -- compare the year of the next record to the current to create split at year end CASE WHEN YEAR(LEAD(ELIGBEGIN,1) OVER (PARTITION BY CLIENTID ORDER BY ELIGBEGIN)) = YEAR(ELIGEND) THEN LEAD(ELIGBEGIN,1) OVER (PARTITION BY CLIENTID ORDER BY ELIGBEGIN) END AS nextELIGBEGIN FROM #t WHERE DEMONSTRATION_POPULATION = 'Adult Expansion' ) ,islandStartCTE AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY CLIENTID ORDER BY ELIGBEGIN) AS sequence_no FROM prevNextCTE WHERE prevELIGEND IS NULL OR DATEADD(DAY,1,prevELIGEND) < ELIGBEGIN ) ,islandEndCTE AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY CLIENTID ORDER BY ELIGBEGIN) AS sequence_no FROM prevNextCTE WHERE nextELIGBEGIN IS NULL OR DATEADD(DAY,1,ELIGEND) < nextELIGBEGIN ) SELECT iss.CLIENTID, iss.AGE, iss.ELIGBEGIN, ise.ELIGEND, iss.sequence_no FROM islandStartCTE AS iss -- left join here is not necessary but makes it easier to spot errors -- when a start position has no matching end (which should not occur) LEFT JOIN islandEndCTE AS ise ON ise.CLIENTID = iss.CLIENTID AND ise.sequence_no = iss.sequence_no