A process was causing duplicate rows in a table where there were not supposed to be any. There are several great answers to deleting duplicate rows online. But, what if those duplicates with ID primary keys all have data in other tables tied to them?
Is there a way to delete all duplicates in the first table and migrate all data tied to those keys to the single PK ID that wasn’t deleted?
For example:
TABLE 1 +-------+----------+----------+------------+ | ID(PK)| Model | ItemType | Color | +-------+----------+----------+------------+ | 1 | 4 | B | Red | | 2 | 4 | B | Red | | 3 | 5 | A | Blue | +-------+----------+----------+------------+ TABLE 2 +-------+----------+---------+ | ID(PK)| OtherID | Type | +-------+----------+---------+ | 1 | 1 | Type1 | | 2 | 1 | Type2 | | 3 | 2 | Type3 | | 4 | 2 | Type4 | | 5 | 2 | Type5 | +-------+----------+---------+
So I would theoretically want to delete the entry with ID: 2 from TABLE 1, and then have the OtherID fields in TABLE 2 switch to 1. This would actually be needed for X number of tables. This particular situation has 4 tables connected to its ID PK.
Advertisement
Answer
You cannot do this automatically. But you can do this with some queries. First, you set all the foreign keys to the correct id, which is presumably the smallest one:
with ids ( select t1.*, min(id) over (partition by Model, ItemType, Color) as min_id from table1 t1 ) update t2 set t2.otherid = ids.min_id from table2 t2 join ids on t2.otherid = ids.id where ids.id <> ids.min_id;
Then delete the ids that are either duplicated or not referenced in table2
(depending on which you actually want):
with ids ( select t1.*, min(id) over (partition by Model, ItemType, Color) as min_id from table1 t1 ) delete from ids where id <> min_id;
Note: If the database has concurrent users, you might want to put it in single user mode for this operation or lock the tables so they are not modified during these two operations.