Suppose I have the following table:
Day | ID | Value |
---|---|---|
2022-11-05 | 0 | A |
2022-11-06 | 1 | B |
2022-11-07 | 0 | C |
Now given a time window of 1 day, I want to create a time-series table that:
- The
Day
column granular unit is 1 day - Each
Day
row displays every ID in the table (like cross-join) - Moreover, if for that day, the ID is not recorded, then it uses the
Value
from the previous day. If it does not exist before this day, we can ignore it.
Let’s say I want to view this time series from 2022-11-05
to 2022-11-08
, this is the desired output:
Day | ID | Value |
---|---|---|
2022-11-05 | 0 | A |
2022-11-06 | 0 | A |
2022-11-06 | 1 | B |
2022-11-07 | 0 | C |
2022-11-07 | 1 | B |
2022-11-08 | 0 | C |
2022-11-08 | 1 | B |
Explanation: ID=0
is not recorded on 11-06 so it uses the value from the previous day. ID=1
does not record new value on 11-07 so it uses the value from 11-06.
Note that the number of columns can be large, so if possible, I am looking for a solution that handles it too.
Advertisement
Answer
Way One:
- first we start with some
data
- then we find
the_days
in the period we are interested in - then we find the
data_start
for each id - then we join those values together, and use
LAG
with theIGNORE NULLS OVER
clause to find the “prior values” if the current values in not present viaNVL
with data(Day, ID, Value) as ( select * from values ('2022-11-05'::date, 0, 'A'), ('2022-11-06'::date, 1, 'B'), ('2022-11-07'::date, 0, 'C') ), the_days as ( select row_number() over (order by null)-1 as rn ,dateadd('day', rn, from_day) as day from ( select min(day) as from_day ,'2022-11-08' as to_day ,datediff('days', from_day, to_day) as days from data ), table(generator(ROWCOUNT => 200)) qualify rn <= days ), data_starts as ( select id, min(day) as start_day from data group by 1 ) select td.day, ds.id, nvl(d.value, lag(d.value) ignore nulls over (partition by ds.id order by td.day)) as value from data_starts as ds join the_days as td on td.day >= ds.start_day left join data as d on ds.id = d.id and d.day = td.day order by 1,2;
gives:
DAY | ID | VALUE |
---|---|---|
2022-11-05 | 0 | A |
2022-11-06 | 0 | A |
2022-11-06 | 1 | B |
2022-11-07 | 0 | C |
2022-11-07 | 1 | B |
2022-11-08 | 0 | C |
2022-11-08 | 1 | B |
Way Two:
with data(Day, ID, Value) as ( select * from values ('2022-11-05'::date, 0, 'A'), ('2022-11-06'::date, 1, 'B'), ('2022-11-07'::date, 0, 'C') ), the_days as ( select dateadd('day', row_number() over (order by null)-1, '2022-11-05') as day from table(generator(ROWCOUNT => 4)) ) select td.day, i.id, nvl(d.value, lag(d.value) ignore nulls over (partition by i.id order by td.day)) as _value from the_days as td cross join (select distinct id from data) as i left join data as d on i.id = d.id and d.day = td.day qualify _value is not null order by 1,2;
this requires a unique name for the _values
output so it can be referenced in the qualify without needing to duplicate the code.