Skip to content
Advertisement

How to remove rest of the rows with the same ID starting from the first duplicate?

I have the following structure for the table DataTable: every column is of the datatype int, RowID is an identity column and the primary key. LinkID is a foreign key and links to rows of an other table.

RowID   LinkID   Order  Data    DataSpecifier
1       120      1      1       1
2       120      2      1       3
3       120      3      1       10
4       120      4      1       13
5       120      5      1       10
6       120      6      1       13
7       371      1      6       2
8       371      2      3       5
9       371      3      8       1
10      371      4      10      1
11      371      5      7       2
12      371      6      3       3
13      371      7      7       2
14      371      8      17      4
.................................
.................................

I’m trying to do a query which alters every LinkID batch in the following way:

  • Take every row with same LinkID (e.g. the first batch is the first 6 rows here)
  • Order them by the Order column
  • Look at Data and DataSpecifier columns as one compare unit (They can be thought as one column, called dataunit):
    • Keep as many rows from Order 1 onwards, until a duplicate dataunit comes by
    • Delete every row from that first duplicate onwards for that LinkID

So for the LinkID 120:

  • Sort the batch (already sorted here, but should still do it)
  • Start looking from the top (So Order=1 here), go as long as you don’t see a duplicate.
  • Stop at the first duplicate Order = 5 (dataunit 1 10 was already seen).
  • Delete everything which has the LinkID=120 AND Order>=5

After similar process for LinkID 371 (and every other LinkID in the table), the processed table will look like this:

RowID   LinkID   Order  Data    DataSpecifier
1       120      1      1       1
2       120      2      1       3
3       120      3      1       10
4       120      4      1       13
7       371      1      6       2
8       371      2      3       5
9       371      3      8       1
10      371      4      10      1
11      371      5      7       2
12      371      6      3       3
.................................
.................................

I’ve done quite a lot of SQL queries, but never something this complicated. I know I need to use a query which is something like this:

DELETE FROM DataTable  
WHERE RowID IN (SELECT RowID
                FROM DataTable
                WHERE -- ?
                GROUP BY LinkID
                HAVING COUNT(*) > 1 -- ?
                ORDER BY [Order]);

But I just can’t seem to wrap my head around this and get the query right. I would preferably do this in pure SQL, with one executable (and reusable) query.

Advertisement

Answer

You can use the ROW_NUMBER() window function to identify any rows that come after the original. After that you can delete and rows with a matching LinkID and a greater than or equal to any encountered Order with a row number greater than one.

(I originally used a second CTE to get the MIN order, but I realized that it wasn’t necessary as long as the join to order was greater than equal to any order where there was a second instance of the DataUnitId. By removing the MIN the query plan became quite simple and efficient.)

WITH DataUnitInstances AS (
  SELECT *
    , ROW_NUMBER() OVER
      (PARTITION BY LinkID, [Data], [DataSpecifier] ORDER BY [Order]) DataUnitInstanceId
  FROM DataTable
)
DELETE FROM DataTable
FROM DataTable dt
INNER JOIN DataUnitInstances dup ON dup.LinkID = dt.LinkID 
  AND dup.[Order] <= dt.[Order]
  AND dup.DataUnitInstanceId > 1

Here is the output from your sample data which matches your desired result:

+-------+--------+-------+------+---------------+
| RowID | LinkID | Order | Data | DataSpecifier |
+-------+--------+-------+------+---------------+
| 1     | 120    | 1     | 1    | 1             |
| 2     | 120    | 2     | 1    | 3             |
| 3     | 120    | 3     | 1    | 10            |
| 4     | 120    | 4     | 1    | 13            |
| 7     | 371    | 1     | 6    | 2             |
| 8     | 371    | 2     | 3    | 5             |
| 9     | 371    | 3     | 8    | 1             |
| 10    | 371    | 4     | 10   | 1             |
| 11    | 371    | 5     | 7    | 2             |
| 12    | 371    | 6     | 3    | 3             |
+-------+--------+-------+------+---------------+
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement