I have two tables. One with metadata, one with billing data. I need to join those effiently in order to assign metadata to costs.
Table 1 (metadata) looks like this:
year month day id label1 label2 2021 06 04 892221805 foo aaa 2021 06 30 892221805 bar aaa 2021 06 04 594083437 baz aaa 2021 06 04 552604244 baz bbb
Table 2 (billing data) looks like this:
year month id cost 2021 06 892221805 1.00 $ 2021 06 892221805 1.00 $ 2021 06 594083437 1.00 $ 2021 06 552604244 1.00 $
For each combination of year, month, id in Table 2, there is an corresponding ID in Table 1.
For each year, month, id in T2, i need label1, label2 from the row in T1 which matches year, month, id, and has the highest date (in that month), so that the result may look like this:
year month id cost label1 label2 2021 06 892221805 1.00 $ bar aaa 2021 06 892221805 1.00 $ bar aaa 2021 06 594083437 1.00 $ baz aaa 2021 06 552604244 1.00 $ baz bbb
I.e. the first row of T1 is not used, as the second row has labels with a newer date in that month.
I am using Atheana on Amazon Webservices, which should be Presto compatible, I think.
How do I select this correctly? Preferably, in a way that can be used as a view.
Advertisement
Answer
You can use row_number()
to get to the last row in a month:
select t2.*, t1.label1, t1.label2 from table2 t2 left join (select t1.* row_number() over (partition by year, month, id order by day desc) as seqnum from table1 t1 ) t1 on t1.id = t2.id and t1.year = t2.year and t1.month = t2.month and seqnum = 1;