Suppose I have the following table:
| Day | ID | Value |
|---|---|---|
| 2022-11-05 | 0 | A |
| 2022-11-06 | 1 | B |
| 2022-11-07 | 0 | C |
Now given a time window of 1 day, I want to create a time-series table that:
- The
Daycolumn granular unit is 1 day - Each
Dayrow displays every ID in the table (like cross-join) - Moreover, if for that day, the ID is not recorded, then it uses the
Valuefrom the previous day. If it does not exist before this day, we can ignore it.
Let’s say I want to view this time series from 2022-11-05 to 2022-11-08, this is the desired output:
| Day | ID | Value |
|---|---|---|
| 2022-11-05 | 0 | A |
| 2022-11-06 | 0 | A |
| 2022-11-06 | 1 | B |
| 2022-11-07 | 0 | C |
| 2022-11-07 | 1 | B |
| 2022-11-08 | 0 | C |
| 2022-11-08 | 1 | B |
Explanation: ID=0 is not recorded on 11-06 so it uses the value from the previous day. ID=1 does not record new value on 11-07 so it uses the value from 11-06.
Note that the number of columns can be large, so if possible, I am looking for a solution that handles it too.
Advertisement
Answer
Way One:
- first we start with some
data - then we find
the_daysin the period we are interested in - then we find the
data_startfor each id - then we join those values together, and use
LAGwith theIGNORE NULLS OVERclause to find the “prior values” if the current values in not present viaNVL
with data(Day, ID, Value) as (
select * from values
('2022-11-05'::date, 0, 'A'),
('2022-11-06'::date, 1, 'B'),
('2022-11-07'::date, 0, 'C')
), the_days as (
select
row_number() over (order by null)-1 as rn
,dateadd('day', rn, from_day) as day
from (
select
min(day) as from_day
,'2022-11-08' as to_day
,datediff('days', from_day, to_day) as days
from data
), table(generator(ROWCOUNT => 200))
qualify rn <= days
), data_starts as (
select
id,
min(day) as start_day
from data
group by 1
)
select
td.day,
ds.id,
nvl(d.value, lag(d.value) ignore nulls over (partition by ds.id order by td.day)) as value
from data_starts as ds
join the_days as td
on td.day >= ds.start_day
left join data as d
on ds.id = d.id and d.day = td.day
order by 1,2;
gives:
| DAY | ID | VALUE |
|---|---|---|
| 2022-11-05 | 0 | A |
| 2022-11-06 | 0 | A |
| 2022-11-06 | 1 | B |
| 2022-11-07 | 0 | C |
| 2022-11-07 | 1 | B |
| 2022-11-08 | 0 | C |
| 2022-11-08 | 1 | B |
Way Two:
with data(Day, ID, Value) as (
select * from values
('2022-11-05'::date, 0, 'A'),
('2022-11-06'::date, 1, 'B'),
('2022-11-07'::date, 0, 'C')
), the_days as (
select
dateadd('day', row_number() over (order by null)-1, '2022-11-05') as day
from table(generator(ROWCOUNT => 4))
)
select
td.day,
i.id,
nvl(d.value, lag(d.value) ignore nulls over (partition by i.id order by td.day)) as _value
from the_days as td
cross join (select distinct id from data) as i
left join data as d
on i.id = d.id and d.day = td.day
qualify _value is not null
order by 1,2;
this requires a unique name for the _values output so it can be referenced in the qualify without needing to duplicate the code.