I have the following table called areas in a PostgreSQL server:
| id | name | code | shape | created | cars ======================================================= | 1 | name 1 | code 1 | shape 1 | created1 | cars 1 | 2 | name 1 | code 1 | shape 2 | created2 | NULL | 3 | name 2 | code 2 | shape 3 | created3 | cars 2 | 4 | name 2 | code 2 | shape 4 | created4 | NULL . . .
So basically it’s nearly duplicate entries that have the same name and code, but different id, shape and created timestamp. What I want is to merge those duplicate rows, but overwrite the “shape” and “created” fields with the values of the newest entry. So eventually the above table would look like the following :
| id | name | code | shape | created | cars ====================================================== | 1 | name 1 | code 1 | shape 2 | created2 | cars 1 | 3 | name 2 | code 2 | shape 4 | created4 | cars 2 . . .
Is there a way to achieve this? There is the limitation that the duplicate, newer, entries already exist in the table, so it’s not possible to just update the previous entries by inserting the new ones. I want to store afterwards the results in the same table.
Advertisement
Answer
Do it in two steps:
- update the lower ids using the later ids
- delete the later ids that are no longer needed.
UPDATE ztable dst SET shape = src.shape , created = src.created FROM ztable src WHERE src.name = dst.name AND src.code = dst.code AND src.id > dst.id ; DELETE FROM ztable del WHERE EXISTS ( SELECT * FROM ztable x WHERE x.name = del.name AND x.code = del.code AND x.id < del.id );
If there are more than two duplicates you’ll have a problem: which of the upper ones should you use for the update?