I have this query:
SELECT
T1.ID_NUMBER,
T1.INCEPTION_DATE,
T1.OCCURRENCE,
T1.TRANSACTION_DATE,
T1.FILE_LOAD_DATE,
T1.BATCH_NUM
FROM
mastertable T1
INNER JOIN
(SELECT
ID_NUMBER, INCEPTION_DATE, OCCURRENCE,
COUNT(*) AS DUPL_COUNT
FROM
mastertable
WHERE
SOURCE_SYSTEM ='LEGACY'
GROUP BY
ID_NUMBER, INCEPTION_DATE, OCCURRENCE
HAVING
COUNT(*) > 1) t2 ON T2.ID_NUMBER = T1.ID_NUMBER
AND T2.INCEPTION_DATE = T1.INCEPTION_DATE
AND T2.OCCURRENCE= T1.OCCURRENCE
ORDER BY
1, 2, 3, 4, 5
Which is returning the following results
ID_NUMBER | INCEPTION_DATE | OCCURRENCE | TRANSACTION_DATE | FILE_LOAD_DATE | BATCH_NUM |
---|---|---|---|---|---|
112897732 | 2008-09-15 | 4 | 2008-07-03 | 2008-07-07 17:57:19 | 06341 |
112897732 | 2008-09-15 | 4 | 2008-07-13 | 2008-07-18 03:35:55 | 06753 |
828194721 | 2008-11-11 | 1 | 2008-09-06 | 2008-09-17 02:50:44 | 97334 |
828194721 | 2008-11-11 | 1 | 2008-09-23 | 2008-09-24 02:55:27 | 98331 |
456457422 | 2008-09-28 | 1 | 2008-12-03 | 2008-07-13 08:08:39 | 00734 |
456457422 | 2008-09-28 | 1 | 2008-12-03 | 2008-07-18 13:35:55 | 00991 |
999272910 | 2008-05-07 | 3 | 2008-05-03 | 2008-10-13 08:08:38 | 11432 |
999272910 | 2008-05-07 | 3 | 2008-05-28 | 2008-10-18 03:35:55 | 13342 |
875328642 | 2008-03-01 | 3 | 2008-04-28 | 2008-01-23 08:08:38 | 74542 |
875328642 | 2008-03-01 | 3 | 2008-04-30 | 2008-01-25 12:55:11 | 77536 |
011028734 | 2008-07-12 | 2 | 2008-12-03 | 2008-08-07 11:57:03 | 23422 |
011028734 | 2008-07-12 | 2 | 2008-12-03 | 2008-08-11 17:23:29 | 25748 |
018264981 | 2008-07-09 | 0 | 2008-12-03 | 2008-12-07 02:18:12 | 00432 |
018264981 | 2008-07-09 | 0 | 2008-12-03 | 2008-12-11 17:44:19 | 00773 |
The earlier FILE_LOAD_DATE
of each ID_NUMBER
or lesser BATCH_NUM
is the record I want to keep.
Is there a way to write a query that will delete the other records, perhaps using a CTE with ROW_NUMBER()
?
I am hoping for something that is DRY in case this issue happens again. Thanks!
(Also if it isn’t too much trouble please explain how the solution works)
Advertisement
Answer
You could use a deletable CTE here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID_NUMBER, INCEPTION_DATE, OCCURRENCE
ORDER BY FILE_LOAD_DATE, BATCH_NUM) rn
FROM mastertable
WHERE SOURCE_SYSTEM = 'LEGACY'
)
DELETE
FROM cte
WHERE rn > 1;
The logic is to assign a row number to each group of records having the same values for ID_NUMBER
, INCEPTION_DATE
, and OCCURRENCE
. The first row number value of 1 will be assigned to the record having the earliest FILE_LOAD_DATE
. In cases of two or more records tied for the earliest FILE_LOAD_DATE
, the tie breaker will be determined by the earliest BATCH_NUM
.
The delete statement removes all records except for this earliest record.