I have a table with ID, Date, Value. For each row, I want to return the most recent non-null value where Date <= Date minus 1 year.
I’m trying variations of window functions with last_value or lag, and partitioning on ID, order by Date, with the IGNORE NULLS clause. However, I’m not sure where to include the condition where Date <= Date minus 1 year
Sample Data:
ID | Date | Value |
---|---|---|
1 | 01-Jan-19 | NULL |
1 | 02-Jan-19 | NULL |
1 | 03-Jan-19 | NULL |
1 | 04-Jan-19 | NULL |
1 | 05-Jan-19 | NULL |
1 | 01-Jan-20 | NULL |
1 | 02-Jan-20 | 20 |
1 | 03-Jan-20 | NULL |
1 | 04-Jan-20 | NULL |
1 | 05-Jan-20 | 30 |
1 | 01-Jan-21 | 20 |
1 | 02-Jan-21 | NULL |
1 | 03-Jan-21 | NULL |
1 | 04-Jan-21 | NULL |
1 | 05-Jan-21 | 30 |
2 | 01-Jan-19 | NULL |
2 | 02-Jan-19 | NULL |
2 | 03-Jan-19 | NULL |
2 | 04-Jan-19 | 30 |
2 | 05-Jan-19 | 30 |
2 | 01-Jan-20 | 25 |
2 | 02-Jan-20 | 20 |
2 | 03-Jan-20 | NULL |
2 | 04-Jan-20 | NULL |
2 | 05-Jan-20 | 30 |
2 | 01-Jan-21 | 20 |
2 | 02-Jan-21 | 15 |
2 | 03-Jan-21 | 15 |
2 | 04-Jan-21 | 15 |
2 | 05-Jan-21 | 30 |
Desired Results:
ID | Date | Value | Most_Recent_Prior_Year_Date | Most_Recent_Prior_Year_Value |
---|---|---|---|---|
1 | 01-Jan-19 | NULL | NULL | NULL |
1 | 02-Jan-19 | NULL | NULL | NULL |
1 | 03-Jan-19 | NULL | NULL | NULL |
1 | 04-Jan-19 | NULL | NULL | NULL |
1 | 05-Jan-19 | NULL | NULL | NULL |
1 | 01-Jan-20 | NULL | NULL | NULL |
1 | 02-Jan-20 | 20 | NULL | NULL |
1 | 03-Jan-20 | NULL | NULL | NULL |
1 | 04-Jan-20 | NULL | NULL | NULL |
1 | 05-Jan-20 | 30 | NULL | NULL |
1 | 01-Jan-21 | 20 | NULL | NULL |
1 | 02-Jan-21 | NULL | 02-Jan-20 | 20 |
1 | 03-Jan-21 | NULL | 02-Jan-20 | 20 |
1 | 04-Jan-21 | NULL | 02-Jan-20 | 20 |
1 | 05-Jan-21 | 30 | 05-Jan-20 | 30 |
2 | 01-Jan-19 | NULL | NULL | NULL |
2 | 02-Jan-19 | NULL | NULL | NULL |
2 | 03-Jan-19 | NULL | NULL | NULL |
2 | 04-Jan-19 | 30 | NULL | NULL |
2 | 05-Jan-19 | 30 | NULL | NULL |
2 | 01-Jan-20 | 25 | NULL | NULL |
2 | 02-Jan-20 | 20 | NULL | NULL |
2 | 03-Jan-20 | NULL | NULL | NULL |
2 | 04-Jan-20 | NULL | 04-Jan-19 | 30 |
2 | 05-Jan-20 | 30 | 05-Jan-19 | 30 |
2 | 01-Jan-21 | 20 | 01-Jan-20 | 25 |
2 | 02-Jan-21 | 15 | 02-Jan-20 | 20 |
2 | 03-Jan-21 | 15 | 02-Jan-20 | 20 |
2 | 04-Jan-21 | 15 | 02-Jan-20 | 20 |
2 | 05-Jan-21 | 30 | 05-Jan-20 | 30 |
Advertisement
Answer
IF you strickly have only one row per day per id then you can use:
A single LEFT JOIN and WHERE/QUALIFY will work here:
SELECT a.ID, a.Date, a.Value b.Data AS Most_Recent_Prior_Year_Date b.Value AS Most_Recent_Prior_Year_Value FROM table AS a LEFT JOIN table AS b on a.id = b.id AND DATEADD('year',-1,a.date) >= b.date QUALIFY ROW_NUMBER() OVER( PARTITION BY a.id, a.date ORDER BY b.date DESC NULLS LAST) = 1
Otherwise you will want to first separate out a single value per day aka MAX, MIN, FIRST_VALE and then run the above over that output.