Data Table:
Dates | Customer | Subscription |
---|---|---|
20/02/2020 | A | free |
21/02/2020 | A | free |
22/02/2020 | A | free |
23/02/2020 | B | free |
23/03/2020 | A | full |
01/03/2020 | B | full |
01/03/2020 | A | full |
02/03/2020 | A | full |
Need to fill gaps in dates by the value in the previous date
Output:
Dates | Customer | Last Subscription |
---|---|---|
20/02/2020 | A | free |
21/02/2020 | A | free |
22/02/2020 | A | free |
23/03/2020 | A | full |
23/03/2020 | B | free |
24/02/2020 | A | full |
24/02/2020 | B | free |
25/02/2020 | A | full |
25/02/2020 | B | free |
26/02/2020 | A | full |
26/02/2020 | B | free |
27/02/2020 | A | full |
27/02/2020 | B | free |
28/02/2020 | A | full |
28/02/2020 | B | free |
01/03/2020 | A | full |
01/03/2020 | B | full |
02/03/2020 | A | full |
02/03/2020 | B | full |
I found a similar solution Duplicate groups of records to fill multiple date gaps in Google BigQuery, but it is not suitable because in my example each Customer
has a different start date.
Advertisement
Answer
Consider below
with temp as ( select customer, dates from ( select customer, min(dates) min_date, max(dates) max_date from `project.dataset.table` group by customer ), unnest(generate_date_array(min_date, max_date)) dates ) select customer, dates, first_value(subscription ignore nulls) over win as subscription from temp a left join `project.dataset.table` b using(customer, dates) window win as (partition by customer order by dates desc rows between current row and unbounded following) # order by dates, customer
If to apply to sample data in y our question – output is