Skip to content
Advertisement

Return most recent non-null value prior to date minus 1 year — Snowflake

I have a table with ID, Date, Value. For each row, I want to return the most recent non-null value where Date <= Date minus 1 year.

I’m trying variations of window functions with last_value or lag, and partitioning on ID, order by Date, with the IGNORE NULLS clause. However, I’m not sure where to include the condition where Date <= Date minus 1 year

Sample Data:

ID Date Value
1 01-Jan-19 NULL
1 02-Jan-19 NULL
1 03-Jan-19 NULL
1 04-Jan-19 NULL
1 05-Jan-19 NULL
1 01-Jan-20 NULL
1 02-Jan-20 20
1 03-Jan-20 NULL
1 04-Jan-20 NULL
1 05-Jan-20 30
1 01-Jan-21 20
1 02-Jan-21 NULL
1 03-Jan-21 NULL
1 04-Jan-21 NULL
1 05-Jan-21 30
2 01-Jan-19 NULL
2 02-Jan-19 NULL
2 03-Jan-19 NULL
2 04-Jan-19 30
2 05-Jan-19 30
2 01-Jan-20 25
2 02-Jan-20 20
2 03-Jan-20 NULL
2 04-Jan-20 NULL
2 05-Jan-20 30
2 01-Jan-21 20
2 02-Jan-21 15
2 03-Jan-21 15
2 04-Jan-21 15
2 05-Jan-21 30

Desired Results:

ID Date Value Most_Recent_Prior_Year_Date Most_Recent_Prior_Year_Value
1 01-Jan-19 NULL NULL NULL
1 02-Jan-19 NULL NULL NULL
1 03-Jan-19 NULL NULL NULL
1 04-Jan-19 NULL NULL NULL
1 05-Jan-19 NULL NULL NULL
1 01-Jan-20 NULL NULL NULL
1 02-Jan-20 20 NULL NULL
1 03-Jan-20 NULL NULL NULL
1 04-Jan-20 NULL NULL NULL
1 05-Jan-20 30 NULL NULL
1 01-Jan-21 20 NULL NULL
1 02-Jan-21 NULL 02-Jan-20 20
1 03-Jan-21 NULL 02-Jan-20 20
1 04-Jan-21 NULL 02-Jan-20 20
1 05-Jan-21 30 05-Jan-20 30
2 01-Jan-19 NULL NULL NULL
2 02-Jan-19 NULL NULL NULL
2 03-Jan-19 NULL NULL NULL
2 04-Jan-19 30 NULL NULL
2 05-Jan-19 30 NULL NULL
2 01-Jan-20 25 NULL NULL
2 02-Jan-20 20 NULL NULL
2 03-Jan-20 NULL NULL NULL
2 04-Jan-20 NULL 04-Jan-19 30
2 05-Jan-20 30 05-Jan-19 30
2 01-Jan-21 20 01-Jan-20 25
2 02-Jan-21 15 02-Jan-20 20
2 03-Jan-21 15 02-Jan-20 20
2 04-Jan-21 15 02-Jan-20 20
2 05-Jan-21 30 05-Jan-20 30

Advertisement

Answer

IF you strickly have only one row per day per id then you can use:

A single LEFT JOIN and WHERE/QUALIFY will work here:

SELECT 
    a.ID,
    a.Date,
    a.Value
    b.Data AS Most_Recent_Prior_Year_Date
    b.Value AS Most_Recent_Prior_Year_Value
FROM table AS a
LEFT JOIN table AS b 
    on a.id = b.id AND DATEADD('year',-1,a.date) >= b.date
QUALIFY ROW_NUMBER() OVER( PARTITION BY a.id, a.date ORDER BY b.date DESC NULLS LAST) = 1

Otherwise you will want to first separate out a single value per day aka MAX, MIN, FIRST_VALE and then run the above over that output.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement