I have two tables. The first is a date_incremental table just holing dates from 1900-01-01 to current in yyyy-mm-dd format. The second table has user behavior. Here are the tables in visual form:
date_incremental
date 1900-01-01 .... 2022-10-05
The user table
date | user | purchase | country 2020-01-01 | 1 | 10 | US 2020-01-04 | 1 | 8 | US 2020-01-08 | 1 | 2 | US 2020-02-03 | 2 | 45 | GER 2020-02-05 | 2 | 81 | GER 2020-02-06 | 2 | 22 | GER
Now, I want to get a table output where no date gaps exist between the users purchases, i.e. display the history subsequently. If there a days where a user did not purchase anything, the purchase column should be 0 whereas the country column should be the last non-null value. Here is the output I am looking for:
date | user | purchase | country 2020-01-01 | 1 | 10 | US 2020-01-02 | 1 | 0 | US NEW INSERTED ROW 2020-01-03 | 1 | 0 | US NEW INSERTED ROW 2020-01-04 | 1 | 8 | US 2020-01-05 | 1 | 0 | US NEW INSERTED ROW 2020-01-06 | 1 | 0 | US NEW INSERTED ROW 2020-01-07 | 1 | 0 | US NEW INSERTED ROW 2020-01-08 | 1 | 2 | US 2020-02-03 | 2 | 45 | GER 2020-02-04 | 2 | 0 | GER NEW INSERTED ROW 2020-02-05 | 2 | 81 | GER 2020-02-06 | 2 | 22 | GER
I tried the following query but didnt get this output:
Select a.* from user_tbl a right join date_incremental b on a.date=b.date
- My output just returns the exact same user_table output for some reason. How can I fix this?
- How do I specify that I want to use the last non-null values when filling the gaps for all non-numeric columns. i.e. Country and fill 0 for numeric values?
Any advise is super appreciated.
Advertisement
Answer
So you have to fetch the “window of dates” per user, and bind those for each user, so you can display those results, I do this with yet another cte.
Then you can do the LEFT JOIN to bind the data, and LAG over this
with date_incremental (date) as ( select * from values ('2019-12-31'::date), ('2020-01-01'::date), ('2020-01-02'::date), ('2020-01-03'::date), ('2020-01-04'::date), ('2020-01-05'::date), ('2020-01-06'::date), ('2020-01-07'::date), ('2020-01-08'::date), ('2020-01-09'::date), ('2020-02-02'::date), ('2020-02-03'::date), ('2020-02-04'::date), ('2020-02-05'::date), ('2020-02-06'::date), ('2020-02-07'::date) ), user(date,user,purchase,country) as ( select * from values ('2020-01-01'::date, 1, 10, 'US'), ('2020-01-04'::date, 1, 8, 'US'), ('2020-01-08'::date, 1, 2, 'US'), ('2020-02-03'::date, 2, 45, 'GER'), ('2020-02-05'::date, 2, 81, 'GER'), ('2020-02-06'::date, 2, 22, 'GER') ), enriched_user as ( select user ,min(date) as min_date ,max(date) as max_date from user group by 1 ) select d.date ,e.user ,zeroifnull(a.purchase) as purchase ,nvl(a.country, lag(a.country) ignore nulls over (partition by e.user order by d.date)) as country from enriched_user as e join date_incremental as d on d.date between e.min_date and e.max_date left join user as a on a.user = e.user and a.date = d.date order by 2, 1;
gives:
DATE | USER | PURCHASE | COUNTRY |
---|---|---|---|
2020-01-01 | 1 | 10 | US |
2020-01-02 | 1 | 0 | US |
2020-01-03 | 1 | 0 | US |
2020-01-04 | 1 | 8 | US |
2020-01-05 | 1 | 0 | US |
2020-01-06 | 1 | 0 | US |
2020-01-07 | 1 | 0 | US |
2020-01-08 | 1 | 2 | US |
2020-02-03 | 2 | 45 | GER |
2020-02-04 | 2 | 0 | GER |
2020-02-05 | 2 | 81 | GER |
2020-02-06 | 2 | 22 | GER |