I have 2 tables – Dates and Data with data as follows: Table: Dates
Date 1/02/2020 2/02/2020 3/02/2020 4/02/2020 5/02/2020 6/02/2020 7/02/2020 8/02/2020 9/02/2020 10/02/2020
Table: Data
Date Id Amount 1/02/2020 1 10 3/02/2020 1 30 10/02/2020 1 50 2/02/2020 2 5 6/02/2020 2 10 9/02/2020 2 20
Expected results:
Date Id Amount 1/02/2020 1 10 2/02/2020 1 10 3/02/2020 1 30 4/02/2020 1 30 5/02/2020 1 30 6/02/2020 1 30 7/02/2020 1 30 8/02/2020 1 30 9/02/2020 1 30 10/02/2020 1 50 2/02/2020 2 5 3/02/2020 2 5 4/02/2020 2 5 5/02/2020 2 5 6/02/2020 2 10 7/02/2020 2 10 8/02/2020 2 10 9/02/2020 2 20
I have tried using PARTITION OVER but getting duplicate vales and not all dates from Dates table, therefore not getting desired results. Really appreciate if you can help with the code in Hive SQL.
Advertisement
Answer
Use a cross join to generate the rows. Then a left join combine the two tables. And finally use last_value() to bring in missing values:
select d.date, i.id,
last_value(amount, TRUE) over (partition by id order by d.date) as amount
from date d cross join
(select distinct id from data) i left join
data t
on d.date = t.date and i.id = t.id;
The last_value() with the second argument ignores NULL values, so it will “go back” to get the most recent non-NULL value.