I have 2 tables – Dates and Data with data as follows: Table: Dates
x
Date
1/02/2020
2/02/2020
3/02/2020
4/02/2020
5/02/2020
6/02/2020
7/02/2020
8/02/2020
9/02/2020
10/02/2020
Table: Data
Date Id Amount
1/02/2020 1 10
3/02/2020 1 30
10/02/2020 1 50
2/02/2020 2 5
6/02/2020 2 10
9/02/2020 2 20
Expected results:
Date Id Amount
1/02/2020 1 10
2/02/2020 1 10
3/02/2020 1 30
4/02/2020 1 30
5/02/2020 1 30
6/02/2020 1 30
7/02/2020 1 30
8/02/2020 1 30
9/02/2020 1 30
10/02/2020 1 50
2/02/2020 2 5
3/02/2020 2 5
4/02/2020 2 5
5/02/2020 2 5
6/02/2020 2 10
7/02/2020 2 10
8/02/2020 2 10
9/02/2020 2 20
I have tried using PARTITION OVER but getting duplicate vales and not all dates from Dates table, therefore not getting desired results. Really appreciate if you can help with the code in Hive SQL.
Advertisement
Answer
Use a cross join
to generate the rows. Then a left join
combine the two tables. And finally use last_value()
to bring in missing values:
select d.date, i.id,
last_value(amount, TRUE) over (partition by id order by d.date) as amount
from date d cross join
(select distinct id from data) i left join
data t
on d.date = t.date and i.id = t.id;
The last_value()
with the second argument ignores NULL
values, so it will “go back” to get the most recent non-NULL
value.