Skip to content
Advertisement

Speed up removal of duplicates in Oracle with indexing

How to remove duplicate entries from a large Oracle table (200M rows, 20 columns)?

The below query from 2014 is slow. It took 2 minutes to delete 1 duplicate entry for one specific combination of columns (i.e. where col1 = 1 and .. col20 = 'Z').

DELETE sch.table1
 WHERE rowid NOT IN
       (SELECT MIN(rowid)
          FROM sch.table1
         GROUP BY col1, col2, col3, col4,.. ., col20)

Any way to speed it up, e.g. with indexing?

Advertisement

Answer

Rather than using an anti-join (and finding the non-matching ROWID and then deleting all the others), you can use the ROW_NUMBER analytic function to directly find the ROWIDs to delete:

DELETE FROM sch.table1 t
WHERE EXISTS(
  SELECT 1
  FROM   (
    SELECT ROW_NUMBER() OVER (
             PARTITION BY col1, col2, col3, col4, ..., col20
             ORDER BY rowid
           ) AS rn
    FROM   sch.table1
  ) x
  WHERE x.ROWID = t.ROWID
  AND   x.rn > 1
);

or:

DELETE FROM sch.table1 t
WHERE ROWID IN (
  SELECT ROWID
  FROM   (
    SELECT ROW_NUMBER() OVER (
             PARTITION BY col1, col2, col3, col4, ..., col20
             ORDER BY rowid
           ) AS rn
    FROM   sch.table1
  )
  WHERE rn > 1
);

fiddle

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement