Skip to content
Advertisement

Deleting duplicate rows with primary keys that are connected to other tables

A process was causing duplicate rows in a table where there were not supposed to be any. There are several great answers to deleting duplicate rows online. But, what if those duplicates with ID primary keys all have data in other tables tied to them?

Is there a way to delete all duplicates in the first table and migrate all data tied to those keys to the single PK ID that wasn’t deleted?

For example:

TABLE 1
+-------+----------+----------+------------+
| ID(PK)| Model    | ItemType | Color      |
+-------+----------+----------+------------+
| 1     | 4        | B        | Red        |
| 2     | 4        | B        | Red        |
| 3     | 5        | A        | Blue       |
+-------+----------+----------+------------+

TABLE 2
+-------+----------+---------+
| ID(PK)| OtherID  | Type    |
+-------+----------+---------+
| 1     | 1        | Type1   |
| 2     | 1        | Type2   |
| 3     | 2        | Type3   |
| 4     | 2        | Type4   |
| 5     | 2        | Type5   |
+-------+----------+---------+

So I would theoretically want to delete the entry with ID: 2 from TABLE 1, and then have the OtherID fields in TABLE 2 switch to 1. This would actually be needed for X number of tables. This particular situation has 4 tables connected to its ID PK.

Advertisement

Answer

You cannot do this automatically. But you can do this with some queries. First, you set all the foreign keys to the correct id, which is presumably the smallest one:

with ids (
      select t1.*, min(id) over (partition by Model, ItemType, Color) as min_id
      from table1 t1
     )
update t2
    set t2.otherid = ids.min_id
    from table2 t2 join
         ids
         on t2.otherid = ids.id
    where ids.id <> ids.min_id; 

Then delete the ids that are either duplicated or not referenced in table2 (depending on which you actually want):

with ids (
      select t1.*, min(id) over (partition by Model, ItemType, Color) as min_id
      from table1 t1
     )
delete from ids
    where id <> min_id;

Note: If the database has concurrent users, you might want to put it in single user mode for this operation or lock the tables so they are not modified during these two operations.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement