Skip to content
Advertisement

Right join to an incremental date table not working in SQL

I have two tables. The first is a date_incremental table just holing dates from 1900-01-01 to current in yyyy-mm-dd format. The second table has user behavior. Here are the tables in visual form:

date_incremental

date 
1900-01-01
....
2022-10-05

The user table

date       | user | purchase | country
2020-01-01 | 1    | 10       | US
2020-01-04 | 1    | 8        | US
2020-01-08 | 1    | 2        | US
2020-02-03 | 2    | 45       | GER
2020-02-05 | 2    | 81       | GER
2020-02-06 | 2    | 22       | GER

Now, I want to get a table output where no date gaps exist between the users purchases, i.e. display the history subsequently. If there a days where a user did not purchase anything, the purchase column should be 0 whereas the country column should be the last non-null value. Here is the output I am looking for:

date       | user | purchase | country
2020-01-01 | 1    | 10       | US
2020-01-02 | 1    | 0        | US NEW INSERTED ROW
2020-01-03 | 1    | 0        | US NEW INSERTED ROW
2020-01-04 | 1    | 8        | US
2020-01-05 | 1    | 0        | US NEW INSERTED ROW
2020-01-06 | 1    | 0        | US NEW INSERTED ROW
2020-01-07 | 1    | 0        | US NEW INSERTED ROW
2020-01-08 | 1    | 2        | US

2020-02-03 | 2    | 45       | GER
2020-02-04 | 2    | 0        | GER NEW INSERTED ROW
2020-02-05 | 2    | 81       | GER
2020-02-06 | 2    | 22       | GER

I tried the following query but didnt get this output:

Select a.* 
from user_tbl a 
right join date_incremental b 
on a.date=b.date
  1. My output just returns the exact same user_table output for some reason. How can I fix this?
  2. How do I specify that I want to use the last non-null values when filling the gaps for all non-numeric columns. i.e. Country and fill 0 for numeric values?

Any advise is super appreciated.

Advertisement

Answer

So you have to fetch the “window of dates” per user, and bind those for each user, so you can display those results, I do this with yet another cte.

Then you can do the LEFT JOIN to bind the data, and LAG over this

with date_incremental (date) as (
    select * from values
    ('2019-12-31'::date), 
    ('2020-01-01'::date), 
    ('2020-01-02'::date), 
    ('2020-01-03'::date), 
    ('2020-01-04'::date), 
    ('2020-01-05'::date), 
    ('2020-01-06'::date), 
    ('2020-01-07'::date), 
    ('2020-01-08'::date), 
    ('2020-01-09'::date), 
    ('2020-02-02'::date), 
    ('2020-02-03'::date), 
    ('2020-02-04'::date), 
    ('2020-02-05'::date), 
    ('2020-02-06'::date),
    ('2020-02-07'::date)
), user(date,user,purchase,country) as (
    select * from values
    ('2020-01-01'::date, 1, 10, 'US'),
    ('2020-01-04'::date, 1, 8, 'US'),
    ('2020-01-08'::date, 1, 2, 'US'),
    ('2020-02-03'::date, 2, 45, 'GER'),
    ('2020-02-05'::date, 2, 81, 'GER'),
    ('2020-02-06'::date, 2, 22, 'GER')
), enriched_user as (
    select user
        ,min(date) as min_date
        ,max(date) as max_date
    from user
    group by 1
)
select
    d.date
    ,e.user
    ,zeroifnull(a.purchase) as purchase
    ,nvl(a.country, lag(a.country) ignore nulls over (partition by e.user order by d.date)) as country
from enriched_user as e
join date_incremental as d
    on d.date between e.min_date and e.max_date
left join user as a
    on a.user = e.user and a.date = d.date
order by 2, 1;

gives:

DATE USER PURCHASE COUNTRY
2020-01-01 1 10 US
2020-01-02 1 0 US
2020-01-03 1 0 US
2020-01-04 1 8 US
2020-01-05 1 0 US
2020-01-06 1 0 US
2020-01-07 1 0 US
2020-01-08 1 2 US
2020-02-03 2 45 GER
2020-02-04 2 0 GER
2020-02-05 2 81 GER
2020-02-06 2 22 GER
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement