Skip to content
Advertisement

PostgreSQL: Select latest entries only when values differ in a column

Imagine the data:

id   audit_id   val     updated_at
1    11         43      October 09, 2021, 07:55 AM
1    12         34      October 11, 2021, 11:03 PM
1    13         88      January 23, 2022, 01:03 AM
1    14         34      January 23, 2022, 09:41 AM
2    21         200     June 28, 2021, 08:07 PM
2    22         200     December 23, 2021, 03:20 PM
2    23         205     January 12, 2022, 10:15 AM
2    24         211     May 13, 2022, 04:02 AM

Per id, I would like to return the two latest entries of different dates (not just timestamp, the date part should be different):

id   audit_id   val     updated_at
1    12         34      October 11, 2021, 11:03 PM
1    14         34      January 23, 2022, 09:41 AM
2    23         205     January 12, 2022, 10:15 AM
2    24         211     May 13, 2022, 04:02 AM

I assume I need to use partition and lag on the audit_id but I don’t know how to start structuring it.

Advertisement

Answer

I would attack this in two parts. The first would make sure only the latest from a single date is kept. The second numbers the rows starting with the latest.

with by_day as (
  select *, 
         updated_at::date != 
           lag(updated_at::date) over (partition by id 
                                           order by updated_at desc) keep
    from imagined_data
), numbered as (
  select *, row_number() over (partition by id
                                   order by updated_at desc) as rn
    from by_day
   where coalesce(keep, true)
)
select id, audit_id, val, updated_at
  from numbered
 where rn <= 2;

db<>fiddle here

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement