Skip to content
Advertisement

How to delete rows that have duplicate column combination

I need to delete some rows that have the same column combination (except ID). The row that should be kept is the one that has the maximum reference date

ID      Column1 Column2   RefDate
GUID1   34578   BRKNRI    2018-05-03
GUID2   34578   BRKNRI    2018-05-02
GUID3   12381   BRSAPR    2018-05-03
GUID4   12381   BRSAPR    2018-05-02
GUID5   12381   BRSAPR    2018-05-01

So, after the query the table should be like this

ID      Column1 Column2   RefDate
GUID1   34578   BRKNRI    2018-05-03
GUID3   12381   BRSAPR    2018-05-03

I know that the query below will return the table I want, but I don’t know how to delete the “duplicate” entries and leave only the one with the maximum date.

SELECT Column1,
       Column2,
       max(RefDate) as MaxDate
FROM Table
GROUP BY Column1, Column2)

Advertisement

Answer

You can try to use delete JOIN

DELETE t1
FROM [Table] t1
INNER JOIN (
  SELECT Column1,
       Column2,
       max(RefDate) as MaxDate
  FROM [Table]
  GROUP BY Column1, Column2
) t2
  ON t1.Column1 = t2.Column1
 AND t1.Column2 = t2.Column2
 AND t1.RefDate <> t2.MaxDate

or use EXISTS subquery.

DELETE t1
FROM [Table] t1
WHERE EXISTS (
  SELECT 1
  FROM [Table] t2
  WHERE t1.Column1 = t2.Column1
  AND t1.Column2 = t2.Column2
  HAVING max(t2.RefDate) <> t1.RefDate
)

sqlfiddle

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement