Skip to content
Advertisement

Snowflake/SQL: create a time-series table such that every ID is visible, and if ID is null, it uses the previous value? (similar to shift)

Suppose I have the following table:

Day ID Value
2022-11-05 0 A
2022-11-06 1 B
2022-11-07 0 C

Now given a time window of 1 day, I want to create a time-series table that:

  • The Day column granular unit is 1 day
  • Each Day row displays every ID in the table (like cross-join)
  • Moreover, if for that day, the ID is not recorded, then it uses the Value from the previous day. If it does not exist before this day, we can ignore it.

Let’s say I want to view this time series from 2022-11-05 to 2022-11-08, this is the desired output:

Day ID Value
2022-11-05 0 A
2022-11-06 0 A
2022-11-06 1 B
2022-11-07 0 C
2022-11-07 1 B
2022-11-08 0 C
2022-11-08 1 B

Explanation: ID=0 is not recorded on 11-06 so it uses the value from the previous day. ID=1 does not record new value on 11-07 so it uses the value from 11-06.

Note that the number of columns can be large, so if possible, I am looking for a solution that handles it too.

Advertisement

Answer

Way One:

  • first we start with some data
  • then we find the_days in the period we are interested in
  • then we find the data_start for each id
  • then we join those values together, and use LAG with the IGNORE NULLS OVER clause to find the “prior values” if the current values in not present via NVL

gives:

DAY ID VALUE
2022-11-05 0 A
2022-11-06 0 A
2022-11-06 1 B
2022-11-07 0 C
2022-11-07 1 B
2022-11-08 0 C
2022-11-08 1 B

Way Two:

this requires a unique name for the _values output so it can be referenced in the qualify without needing to duplicate the code.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement