SQL – Merge two rows with similar values

I have the following table called areas in a PostgreSQL server:

| id | name    | code    | shape   | created  | cars
=======================================================
| 1  | name 1  | code 1  | shape 1 | created1 | cars 1
| 2  | name 1  | code 1  | shape 2 | created2 | NULL
| 3  | name 2  | code 2  | shape 3 | created3 | cars 2
| 4  | name 2  | code 2  | shape 4 | created4 | NULL
.
.
.

​x
 
| id | name    | code    | shape   | created  | cars=======================================================| 1  | name 1  | code 1  | shape 1 | created1 | cars 1| 2  | name 1  | code 1  | shape 2 | created2 | NULL| 3  | name 2  | code 2  | shape 3 | created3 | cars 2| 4  | name 2  | code 2  | shape 4 | created4 | NULL...​

So basically it’s nearly duplicate entries that have the same name and code, but different id, shape and created timestamp. What I want is to merge those duplicate rows, but overwrite the “shape” and “created” fields with the values of the newest entry. So eventually the above table would look like the following :

| id | name    | code    | shape   | created  | cars
======================================================
| 1  | name 1  | code 1  | shape 2 | created2 | cars 1
| 3  | name 2  | code 2  | shape 4 | created4 | cars 2
.
.
.

 
| id | name    | code    | shape   | created  | cars======================================================| 1  | name 1  | code 1  | shape 2 | created2 | cars 1| 3  | name 2  | code 2  | shape 4 | created4 | cars 2...​

Is there a way to achieve this? There is the limitation that the duplicate, newer, entries already exist in the table, so it’s not possible to just update the previous entries by inserting the new ones. I want to store afterwards the results in the same table.

Answer

Do it in two steps:

update the lower ids using the later ids
delete the later ids that are no longer needed.

UPDATE ztable dst
SET shape = src.shape
        , created = src.created
FROM ztable src
WHERE src.name = dst.name
AND src.code = dst.code
AND src.id > dst.id
        ;

DELETE FROM ztable del
WHERE EXISTS ( SELECT *
        FROM ztable x
        WHERE x.name = del.name
        AND x.code = del.code
        AND x.id < del.id
        );

 
UPDATE ztable dstSET shape = src.shape        , created = src.createdFROM ztable srcWHERE src.name = dst.nameAND src.code = dst.codeAND src.id > dst.id        ;​DELETE FROM ztable delWHERE EXISTS ( SELECT *        FROM ztable x        WHERE x.name = del.name        AND x.code = del.code        AND x.id < del.id        );​

If there are more than two duplicates you’ll have a problem: which of the upper ones should you use for the update?

Advertisement

Answer