Skip to content
Advertisement

Select only newer records among the result of a query that extracts duplicate rows in a db

I have a db where I can have multiple rows with same field “aid” (it’s a sort of historical changelog of same items). With the following query, I’m extracting rows with same aid orderd by descending date:

select aid,data_ril,specie,id from scu.censimento t1
where (select count(*) from scu.censimento t2
where t1.aid = t2.aid) > 1
order by aid, data_ril desc

That should be a child table related to a father layer (made by a postgres materialized view that shows only newest aid records) and the following is the result I get:

    +------+------------+--------+------+
    | aid  | data_ril   | specie | id   |
    +------+------------+--------+------+
    | 349  | 2020-06-18 | 35     | 349  |
    +------+------------+--------+------+
    | 349  | 2020-06-17 | 35     | 2004 |
    +------+------------+--------+------+
    | 700  | 2020-08-07 | 58     | 700  |
    +------+------------+--------+------+
    | 700  | 2020-07-06 | 58     | 2006 |
    +------+------------+--------+------+
    | 700  | 2020-05-02 | 15     | 1956 |
    +------+------------+--------+------+
    | 1316 | 2020-09-02 | 1      | 1316 |
    +------+------------+--------+------+
    | 1316 | 2020-08-27 | 1      | 2005 |
    +------+------------+--------+------+
    | 1317 | 2020-09-02 | 2      | 1317 |
    +------+------------+--------+------+
    | 1317 | 2020-08-27 | 2      | 1996 |
    +------+------------+--------+------+
    | 1481 | 2020-12-03 | 21     | 2112 |
    +------+------------+--------+------+
    | 1481 | 2020-09-08 | 49     | 1481 |
    +------+------------+--------+------+
    | 1492 | 2020-09-28 | 6      | 1492 |
    +------+------------+--------+------+
    | 1492 | 2020-09-08 | 6      | 1999 |
    +------+------------+--------+------+
    | 1688 | 2020-11-03 | 72     | 1688 |
    +------+------------+--------+------+
    | 1688 | 2020-10-08 | 72     | 2000 |
    +------+------------+--------+------+

I’d like to know the SQL syntax to modify the above query in order to show all duplicate rows except the ones with latest dates, so that I can have a table like the following:

+------+------------+--------+------+
| aid  | data_ril   | specie | id   |
+------+------------+--------+------+
| 349  | 2020-06-17 | 35     | 2004 |
+------+------------+--------+------+
| 700  | 2020-07-06 | 58     | 2006 |
+------+------------+--------+------+
| 700  | 2020-05-02 | 15     | 1956 |
+------+------------+--------+------+
| 1316 | 2020-08-27 | 1      | 2005 |
+------+------------+--------+------+
| 1317 | 2020-08-27 | 2      | 1996 |
+------+------------+--------+------+
| 1481 | 2020-09-08 | 49     | 1481 |
+------+------------+--------+------+
| 1492 | 2020-09-08 | 6      | 1999 |
+------+------------+--------+------+
| 1688 | 2020-10-08 | 72     | 2000 |
+------+------------+--------+------+

Thanks in advance.

Advertisement

Answer

You could do this with window functions. The idea is to rank records having the same aid by descending data_ril, and then filter out the top record per group.

select aid, data_ril, specie, id
from (
    select t.*, 
        row_number() over(partition by aid order by data_ril desc) rn
    from mytable t 
) t
where rn > 1
order by aid, data_ril
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement