I have this query:
SELECT T1.ID_NUMBER, T1.INCEPTION_DATE, T1.OCCURRENCE, T1.TRANSACTION_DATE, T1.FILE_LOAD_DATE, T1.BATCH_NUM FROM mastertable T1 INNER JOIN (SELECT ID_NUMBER, INCEPTION_DATE, OCCURRENCE, COUNT(*) AS DUPL_COUNT FROM mastertable WHERE SOURCE_SYSTEM ='LEGACY' GROUP BY ID_NUMBER, INCEPTION_DATE, OCCURRENCE HAVING COUNT(*) > 1) t2 ON T2.ID_NUMBER = T1.ID_NUMBER AND T2.INCEPTION_DATE = T1.INCEPTION_DATE AND T2.OCCURRENCE= T1.OCCURRENCE ORDER BY 1, 2, 3, 4, 5
Which is returning the following results
ID_NUMBER | INCEPTION_DATE | OCCURRENCE | TRANSACTION_DATE | FILE_LOAD_DATE | BATCH_NUM |
---|---|---|---|---|---|
112897732 | 2008-09-15 | 4 | 2008-07-03 | 2008-07-07 17:57:19 | 06341 |
112897732 | 2008-09-15 | 4 | 2008-07-13 | 2008-07-18 03:35:55 | 06753 |
828194721 | 2008-11-11 | 1 | 2008-09-06 | 2008-09-17 02:50:44 | 97334 |
828194721 | 2008-11-11 | 1 | 2008-09-23 | 2008-09-24 02:55:27 | 98331 |
456457422 | 2008-09-28 | 1 | 2008-12-03 | 2008-07-13 08:08:39 | 00734 |
456457422 | 2008-09-28 | 1 | 2008-12-03 | 2008-07-18 13:35:55 | 00991 |
999272910 | 2008-05-07 | 3 | 2008-05-03 | 2008-10-13 08:08:38 | 11432 |
999272910 | 2008-05-07 | 3 | 2008-05-28 | 2008-10-18 03:35:55 | 13342 |
875328642 | 2008-03-01 | 3 | 2008-04-28 | 2008-01-23 08:08:38 | 74542 |
875328642 | 2008-03-01 | 3 | 2008-04-30 | 2008-01-25 12:55:11 | 77536 |
011028734 | 2008-07-12 | 2 | 2008-12-03 | 2008-08-07 11:57:03 | 23422 |
011028734 | 2008-07-12 | 2 | 2008-12-03 | 2008-08-11 17:23:29 | 25748 |
018264981 | 2008-07-09 | 0 | 2008-12-03 | 2008-12-07 02:18:12 | 00432 |
018264981 | 2008-07-09 | 0 | 2008-12-03 | 2008-12-11 17:44:19 | 00773 |
The earlier FILE_LOAD_DATE
of each ID_NUMBER
or lesser BATCH_NUM
is the record I want to keep.
Is there a way to write a query that will delete the other records, perhaps using a CTE with ROW_NUMBER()
?
I am hoping for something that is DRY in case this issue happens again. Thanks!
(Also if it isn’t too much trouble please explain how the solution works)
Advertisement
Answer
You could use a deletable CTE here:
WITH cte AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY ID_NUMBER, INCEPTION_DATE, OCCURRENCE ORDER BY FILE_LOAD_DATE, BATCH_NUM) rn FROM mastertable WHERE SOURCE_SYSTEM = 'LEGACY' ) DELETE FROM cte WHERE rn > 1;
The logic is to assign a row number to each group of records having the same values for ID_NUMBER
, INCEPTION_DATE
, and OCCURRENCE
. The first row number value of 1 will be assigned to the record having the earliest FILE_LOAD_DATE
. In cases of two or more records tied for the earliest FILE_LOAD_DATE
, the tie breaker will be determined by the earliest BATCH_NUM
.
The delete statement removes all records except for this earliest record.