Data Table:
| Dates | Customer | Subscription |
|---|---|---|
| 20/02/2020 | A | free |
| 21/02/2020 | A | free |
| 22/02/2020 | A | free |
| 23/02/2020 | B | free |
| 23/03/2020 | A | full |
| 01/03/2020 | B | full |
| 01/03/2020 | A | full |
| 02/03/2020 | A | full |
Need to fill gaps in dates by the value in the previous date
Output:
| Dates | Customer | Last Subscription |
|---|---|---|
| 20/02/2020 | A | free |
| 21/02/2020 | A | free |
| 22/02/2020 | A | free |
| 23/03/2020 | A | full |
| 23/03/2020 | B | free |
| 24/02/2020 | A | full |
| 24/02/2020 | B | free |
| 25/02/2020 | A | full |
| 25/02/2020 | B | free |
| 26/02/2020 | A | full |
| 26/02/2020 | B | free |
| 27/02/2020 | A | full |
| 27/02/2020 | B | free |
| 28/02/2020 | A | full |
| 28/02/2020 | B | free |
| 01/03/2020 | A | full |
| 01/03/2020 | B | full |
| 02/03/2020 | A | full |
| 02/03/2020 | B | full |
I found a similar solution Duplicate groups of records to fill multiple date gaps in Google BigQuery, but it is not suitable because in my example each Customer has a different start date.
Advertisement
Answer
Consider below
with temp as (
select customer, dates from (
select customer, min(dates) min_date, max(dates) max_date
from `project.dataset.table`
group by customer
), unnest(generate_date_array(min_date, max_date)) dates
)
select customer, dates,
first_value(subscription ignore nulls) over win as subscription
from temp a
left join `project.dataset.table` b
using(customer, dates)
window win as (partition by customer order by dates desc rows between current row and unbounded following)
# order by dates, customer
If to apply to sample data in y our question – output is
