Is there a way to delete all the duplicate rows from an arbitrary table in a database when you don’t know the table’s structure (it’s columns) using only plain SQL (without using PL/SQL)?
x
-- The code should look something like this
DELETE FROM tableName -- tableName can be changed
WHERE
So, as an example, for the following table:
.-------.
| A | B |
|---|---|
| 1 | 1 | -- (1)
| 2 | 1 | -- (2)
| 2 | 1 | -- (3) duplicate of (2)
| 3 | 2 | -- (4)
| 3 | 2 | -- (5) duplicate of (4)
| 3 | 2 | -- (6) duplicate of (4)
'-------'
the result should be as follows:
.-------.
| A | B |
|---|---|
| 1 | 1 | -- (1)
| 2 | 1 | -- (2)
| 3 | 2 | -- (4)
'-------'
Advertisement
Answer
SELECT distinct * from TableA INTO TableB;
truncate TableA;
insert into TableA select * from tableB;
drop tableB;
Additional answers from the comments:
Establish a CTE (construct this dynamically using parameterized SQL and information schema)
;WITH cteExample (val1, val2, RN) as
( select *, ROW_NUMBER() over (PARTITION by val1, val2 order by val1, val2) as RN from tableA)
-- select * from cteExample -- to verify results
delete from cteExample where RN > 1 -- to delete duplicate records
Utilize distinct if you know the tables will have a primary key (as they should)
;WITH cteExample as (select distinct * from TableA) delete a from cteExample b right join TableA a on b.primaryKey = a.PrimaryKey where b.PrimaryKey is null;
Lastly, skip the CTE entirely if you want:
delete a from TableA a left join (select distinct * from TableA) b on a.PK = b.PK
where b.PK is null
All of these statements would be best served constructed dynamically using a stored procedure or the equivalent of exec sp_executeSQL (if avail) in order to grab the column names without knowledge of the columns ahead of time.