How to remove duplicate entries from a large Oracle table (200M rows, 20 columns)?
The below query from 2014 is slow. It took 2 minutes to delete 1 duplicate entry for one specific combination of columns (i.e. where col1 = 1 and .. col20 = 'Z'
).
DELETE sch.table1 WHERE rowid NOT IN (SELECT MIN(rowid) FROM sch.table1 GROUP BY col1, col2, col3, col4,.. ., col20)
Any way to speed it up, e.g. with indexing?
Advertisement
Answer
Rather than using an anti-join (and finding the non-matching ROWID
and then deleting all the others), you can use the ROW_NUMBER
analytic function to directly find the ROWID
s to delete:
DELETE FROM sch.table1 t WHERE EXISTS( SELECT 1 FROM ( SELECT ROW_NUMBER() OVER ( PARTITION BY col1, col2, col3, col4, ..., col20 ORDER BY rowid ) AS rn FROM sch.table1 ) x WHERE x.ROWID = t.ROWID AND x.rn > 1 );
or:
DELETE FROM sch.table1 t WHERE ROWID IN ( SELECT ROWID FROM ( SELECT ROW_NUMBER() OVER ( PARTITION BY col1, col2, col3, col4, ..., col20 ORDER BY rowid ) AS rn FROM sch.table1 ) WHERE rn > 1 );