Skip to content
Advertisement

Snowflake/SQL: create a time-series table such that every ID is visible, and if ID is null, it uses the previous value? (similar to shift)

Suppose I have the following table:

Day ID Value
2022-11-05 0 A
2022-11-06 1 B
2022-11-07 0 C

Now given a time window of 1 day, I want to create a time-series table that:

  • The Day column granular unit is 1 day
  • Each Day row displays every ID in the table (like cross-join)
  • Moreover, if for that day, the ID is not recorded, then it uses the Value from the previous day. If it does not exist before this day, we can ignore it.

Let’s say I want to view this time series from 2022-11-05 to 2022-11-08, this is the desired output:

Day ID Value
2022-11-05 0 A
2022-11-06 0 A
2022-11-06 1 B
2022-11-07 0 C
2022-11-07 1 B
2022-11-08 0 C
2022-11-08 1 B

Explanation: ID=0 is not recorded on 11-06 so it uses the value from the previous day. ID=1 does not record new value on 11-07 so it uses the value from 11-06.

Note that the number of columns can be large, so if possible, I am looking for a solution that handles it too.

Advertisement

Answer

Way One:

  • first we start with some data
  • then we find the_days in the period we are interested in
  • then we find the data_start for each id
  • then we join those values together, and use LAG with the IGNORE NULLS OVER clause to find the “prior values” if the current values in not present via NVL
with data(Day, ID, Value) as (
    select * from values
        ('2022-11-05'::date, 0, 'A'),
        ('2022-11-06'::date, 1, 'B'),
        ('2022-11-07'::date, 0, 'C')
), the_days as (
    select 
        row_number() over (order by null)-1 as rn
        ,dateadd('day', rn, from_day) as day
    from (
        select 
            min(day) as from_day
            ,'2022-11-08' as to_day
            ,datediff('days', from_day, to_day) as days
        from data
    ), table(generator(ROWCOUNT => 200))
    qualify rn <= days
), data_starts as (
    select 
        id, 
        min(day) as start_day
    from data
    group by 1
)
select 
    td.day,
    ds.id,
    nvl(d.value, lag(d.value) ignore nulls over (partition by ds.id order by td.day)) as value
from data_starts as ds
join the_days as td 
    on td.day >= ds.start_day
left join data as d
    on ds.id = d.id and d.day = td.day
order by 1,2;

gives:

DAY ID VALUE
2022-11-05 0 A
2022-11-06 0 A
2022-11-06 1 B
2022-11-07 0 C
2022-11-07 1 B
2022-11-08 0 C
2022-11-08 1 B

Way Two:

with data(Day, ID, Value) as (
    select * from values
        ('2022-11-05'::date, 0, 'A'),
        ('2022-11-06'::date, 1, 'B'),
        ('2022-11-07'::date, 0, 'C')
), the_days as (
    select 
        dateadd('day', row_number() over (order by null)-1, '2022-11-05') as day
    from table(generator(ROWCOUNT => 4))
)
select 
    td.day,
    i.id,
    nvl(d.value, lag(d.value) ignore nulls over (partition by i.id order by td.day)) as _value
from the_days as td
cross join (select distinct id from data) as i
left join data as d
    on i.id = d.id and d.day = td.day
qualify _value is not null
order by 1,2;

this requires a unique name for the _values output so it can be referenced in the qualify without needing to duplicate the code.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement