First time post, but I’ve stumbled across Stackoverflow for years.
A patient might get a particular lab test many many times in a span of a few months. However, to reduce duplication, I only want to capture these results once every 45 days. So if a patient had a lab test today, I can ignore the next 45 days’ worth of lab tests. But if a patient gets a repeat lab test, say, on day 51, I want to reset the “clock” and ignore the next 45 days’ worth of lab tests.
Here’s generally what I’m trying to do:
PatientID Date DayNum KeepThis? DaysSinceLastKeptTest 1 3/2/2020 0 Yes 0 1 3/5/2020 3 3 1 4/6/2020 35 35 1 4/10/2020 39 39 1 5/15/2020 74 Yes 0 (more than 45 days since 3/2 test) 1 5/25/2020 84 10 1 5/30/2020 89 15 1 6/6/2020 96 22 1 7/1/2020 121 Yes 0 (more than 45 days since 5/15 test) 1 7/4/2020 124 3 1 7/15/2020 135 14
(in my final product, I don’t necessarily need the DayNum, KeepThis, or DaysSinceLastKeptTest columns. I just need the 3 “Yes” rows.)
I’m trying to do this in SQL, but can’t figure it out. I’ve thought about temporary tables or subqueries, and I’m just learning about OVER/PARTITION BY as a possible way to tackle this.
I appreciate any help or ideas you may have. Thanks!
Advertisement
Answer
With a recursive CTE you can do it:
with cte_rn as ( select *,ROW_NUMBER() over (partition by patientId order by testdate ) as seq from patients ), cte_recursive as ( select *,cte_rn.testdate as usedTestDate,1 as toUse from cte_rn where seq=1 union all select rn.patientid,rn.testdate,rn.seq,case when datediff(day ,r.usedTestDate,rn.testdate)>45 then rn.testdate else r.usedTestDate end as usedTestDate,case when datediff(day ,r.usedTestDate,rn.testdate)>45 then 1 else 0 end as toUse from cte_recursive r join cte_rn rn on r.seq = rn.seq-1 ) select * from cte_recursive