Skip to content
Advertisement

Redshift – Missing latest date when join two tables

I have two tables (Calling as A and B table);

Table – A data only includes the last 1 month data. Table – B data stores all data that you have.

|user | table_A_date | amount_table_A|
|-----| ------------ | ------------- |
| A   |2019-11-30    |1111.0         |
| A   |2019-12-02    |1111.0         |
| A   |2019-12-05    |1111.0         |
| A   |2019-12-09    |1111.0         |


|user | table_B_date | amount_table_B|
|-----| ------------ | ------------- |
| A   |2019-11-25    |1111.0         |
| A   |2019-12-02    |1111.0         |
| A   |2019-12-05    |1111.0         |
| A   |2019-12-10    |1111.0         |

I need to find the difference between these two tables dates, but when I left joined the two tables I have null dates:

|user     | table_A_date |  table_B_date | amount_table_A|
| ------- | -------      | -------       | -----   |
| A       |2019-11-30    |   Null        |1111.0   |
| A       |2019-12-02    |2019-12-02     |1111.0   |
| A       |2019-12-05    |2019-12-05     |1111.0   |
| A       |2019-12-09    |    Null       |1111.0   |

I am going to use last_value over () function but I am still missing the first null value. How can I store each user previous last value (for user A 2019-11-25)

Advertisement

Answer

You can use a full join along with lag()/last_value() and then filter:

select ab.*
from (select coalesce(a.user, b.user) as user,
             a.date as a_date, a.amount as a_amount,
             coalesce(b.date,
                      lag(b.date ignore nulls) over (partition by user order by b.date)
                         ) as b_date,
             coalesce(b.amount,
                      lag(b.amount ignore nulls) over (partition by user order by b.date)
                     ) as b_amount
      from a full join
           b
           on a.user = b.user and a.date = b.date
     ) ab
where a_date is not null;
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement