Skip to content
Advertisement

Repeat previous values by date in hive

I have 2 tables – Dates and Data with data as follows: Table: Dates

Date
1/02/2020
2/02/2020
3/02/2020
4/02/2020
5/02/2020
6/02/2020
7/02/2020
8/02/2020
9/02/2020
10/02/2020

Table: Data

Date    Id  Amount
1/02/2020   1   10
3/02/2020   1   30
10/02/2020  1   50
2/02/2020   2   5
6/02/2020   2   10
9/02/2020   2   20

Expected results:

Date    Id  Amount
1/02/2020   1   10
2/02/2020   1   10
3/02/2020   1   30
4/02/2020   1   30
5/02/2020   1   30
6/02/2020   1   30
7/02/2020   1   30
8/02/2020   1   30
9/02/2020   1   30
10/02/2020  1   50
2/02/2020   2   5
3/02/2020   2   5
4/02/2020   2   5
5/02/2020   2   5
6/02/2020   2   10
7/02/2020   2   10
8/02/2020   2   10
9/02/2020   2   20

I have tried using PARTITION OVER but getting duplicate vales and not all dates from Dates table, therefore not getting desired results. Really appreciate if you can help with the code in Hive SQL.

Advertisement

Answer

Use a cross join to generate the rows. Then a left join combine the two tables. And finally use last_value() to bring in missing values:

select d.date, i.id,
       last_value(amount, TRUE) over (partition by id order by d.date) as amount
from date d cross join
     (select distinct id from data) i left join
     data t
     on d.date = t.date and i.id = t.id;

The last_value() with the second argument ignores NULL values, so it will “go back” to get the most recent non-NULL value.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement