Skip to content
Advertisement

SQL – Merge two rows with similar values

I have the following table called areas in a PostgreSQL server:

| id | name    | code    | shape   | created  | cars
=======================================================
| 1  | name 1  | code 1  | shape 1 | created1 | cars 1
| 2  | name 1  | code 1  | shape 2 | created2 | NULL
| 3  | name 2  | code 2  | shape 3 | created3 | cars 2
| 4  | name 2  | code 2  | shape 4 | created4 | NULL
.
.
.

So basically it’s nearly duplicate entries that have the same name and code, but different id, shape and created timestamp. What I want is to merge those duplicate rows, but overwrite the “shape” and “created” fields with the values of the newest entry. So eventually the above table would look like the following :

| id | name    | code    | shape   | created  | cars
======================================================
| 1  | name 1  | code 1  | shape 2 | created2 | cars 1
| 3  | name 2  | code 2  | shape 4 | created4 | cars 2
.
.
.

Is there a way to achieve this? There is the limitation that the duplicate, newer, entries already exist in the table, so it’s not possible to just update the previous entries by inserting the new ones. I want to store afterwards the results in the same table.

Advertisement

Answer

Do it in two steps:

  • update the lower ids using the later ids
  • delete the later ids that are no longer needed.

UPDATE ztable dst
SET shape = src.shape
        , created = src.created
FROM ztable src
WHERE src.name = dst.name
AND src.code = dst.code
AND src.id > dst.id
        ;

DELETE FROM ztable del
WHERE EXISTS ( SELECT *
        FROM ztable x
        WHERE x.name = del.name
        AND x.code = del.code
        AND x.id < del.id
        );

If there are more than two duplicates you’ll have a problem: which of the upper ones should you use for the update?

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement