I have 2 tables – Dates and Data with data as follows: Table: Dates
Date 1/02/2020 2/02/2020 3/02/2020 4/02/2020 5/02/2020 6/02/2020 7/02/2020 8/02/2020 9/02/2020 10/02/2020
Table: Data
Date Id Amount 1/02/2020 1 10 3/02/2020 1 30 10/02/2020 1 50 2/02/2020 2 5 6/02/2020 2 10 9/02/2020 2 20
Expected results:
Date Id Amount 1/02/2020 1 10 2/02/2020 1 10 3/02/2020 1 30 4/02/2020 1 30 5/02/2020 1 30 6/02/2020 1 30 7/02/2020 1 30 8/02/2020 1 30 9/02/2020 1 30 10/02/2020 1 50 2/02/2020 2 5 3/02/2020 2 5 4/02/2020 2 5 5/02/2020 2 5 6/02/2020 2 10 7/02/2020 2 10 8/02/2020 2 10 9/02/2020 2 20
I have tried using PARTITION OVER but getting duplicate vales and not all dates from Dates table, therefore not getting desired results. Really appreciate if you can help with the code in Hive SQL.
Advertisement
Answer
Use a cross join
to generate the rows. Then a left join
combine the two tables. And finally use last_value()
to bring in missing values:
select d.date, i.id, last_value(amount, TRUE) over (partition by id order by d.date) as amount from date d cross join (select distinct id from data) i left join data t on d.date = t.date and i.id = t.id;
The last_value()
with the second argument ignores NULL
values, so it will “go back” to get the most recent non-NULL
value.