Skip to content
Advertisement

How to join two tables while only selecting the highest day of each month from one table

I have two tables. One with metadata, one with billing data. I need to join those effiently in order to assign metadata to costs.

Table 1 (metadata) looks like this:

year    month   day id          label1     label2
2021    06      04  892221805   foo        aaa
2021    06      30  892221805   bar        aaa     
2021    06      04  594083437   baz        aaa
2021    06      04  552604244   baz        bbb

Table 2 (billing data) looks like this:

year    month   id          cost
2021    06      892221805   1.00 $
2021    06      892221805   1.00 $    
2021    06      594083437   1.00 $
2021    06      552604244   1.00 $

For each combination of year, month, id in Table 2, there is an corresponding ID in Table 1.

For each year, month, id in T2, i need label1, label2 from the row in T1 which matches year, month, id, and has the highest date (in that month), so that the result may look like this:

year    month   id          cost   label1 label2
2021    06      892221805   1.00 $ bar    aaa
2021    06      892221805   1.00 $ bar    aaa
2021    06      594083437   1.00 $ baz    aaa
2021    06      552604244   1.00 $ baz    bbb

I.e. the first row of T1 is not used, as the second row has labels with a newer date in that month.

I am using Atheana on Amazon Webservices, which should be Presto compatible, I think.

How do I select this correctly? Preferably, in a way that can be used as a view.

Advertisement

Answer

You can use row_number() to get to the last row in a month:

select t2.*, t1.label1, t1.label2
from table2 t2 left join
     (select t1.*
             row_number() over (partition by year, month, id order by day desc) as seqnum
      from table1 t1
     ) t1
     on t1.id = t2.id and t1.year = t2.year and
        t1.month = t2.month and seqnum = 1;
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement