Skip to content
Advertisement

Delete duplicated rows and Update references

How do I Delete duplicated rows in one Table and update References in another table to the remaining row? The duplication only occurs in the name. The Id Columns are Identity columns.

Example:

Assume we have two tables Doubles and Data.

Doubles table (
   Id int,
   Name varchar(50)
)

Data Table (
    Id int,
    DoublesId int
)

Now I Have Two entries in the Doubls table:

Id Name
1  Foo
2  Foo

And two entries in the Data Table:

ID DoublesId
1  1
2  2

At the end there should be only one entry in the Doubles Table:

Id Name
1  Foo

And two entries in the Data Table:

Id DoublesId
1  1
2  1 

In the doubles Table there can be any number of duplicated rows per name (up to 30) and also regular ‘single’ rows.

Advertisement

Answer

I’ve not run this, but hopefully it should be correct, and close enough to the final soln to get you there. Let me know any mistakes if you like and I’ll update the answer.

--updates the data table to the min ids for each name
update Data
set id = final_id
from
  Data
join
  Doubles 
on Doubles.id = Data.id
join
(
  select 
    name
    min(id) as final_id
  from Doubles
  group by name
) min_ids
on min_ids.name = Doubles.name

--deletes redundant ids from the Doubles table
delete 
from Doubles
where id not in
(
  select 
    min(id) as final_id
  from Doubles
  group by name
)
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement