Skip to content
Advertisement

Is it possible to delete duplicate rows from a table without knowing it’s columns?

Is there a way to delete all the duplicate rows from an arbitrary table in a database when you don’t know the table’s structure (it’s columns) using only plain SQL (without using PL/SQL)?

-- The code should look something like this
DELETE FROM tableName -- tableName can be changed
WHERE ...

So, as an example, for the following table:

.-------.
| A | B |
|---|---|
| 1 | 1 | -- (1)
| 2 | 1 | -- (2)
| 2 | 1 | -- (3) duplicate of (2)
| 3 | 2 | -- (4)
| 3 | 2 | -- (5) duplicate of (4)
| 3 | 2 | -- (6) duplicate of (4)
'-------'

the result should be as follows:

.-------.
| A | B |
|---|---|
| 1 | 1 | -- (1)
| 2 | 1 | -- (2)
| 3 | 2 | -- (4)
'-------'

Advertisement

Answer

SELECT distinct * from TableA INTO TableB; 
truncate TableA;
insert into TableA select * from tableB;
drop tableB;

Additional answers from the comments:

Establish a CTE (construct this dynamically using parameterized SQL and information schema)

;WITH cteExample (val1, val2, RN) as 
( select *, ROW_NUMBER() over (PARTITION by val1, val2 order by val1, val2) as RN from tableA) 
-- select * from cteExample -- to verify results
delete from cteExample where RN > 1 -- to delete duplicate records

Utilize distinct if you know the tables will have a primary key (as they should)

;WITH cteExample as (select distinct * from TableA) delete a from cteExample b right join TableA a on b.primaryKey = a.PrimaryKey where b.PrimaryKey is null;

Lastly, skip the CTE entirely if you want:

delete a from TableA a left join (select distinct * from TableA) b on a.PK = b.PK 
where b.PK is null

All of these statements would be best served constructed dynamically using a stored procedure or the equivalent of exec sp_executeSQL (if avail) in order to grab the column names without knowledge of the columns ahead of time.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement