Skip to content
Advertisement

How to remove rest of the rows with the same ID starting from the first duplicate?

I have the following structure for the table DataTable: every column is of the datatype int, RowID is an identity column and the primary key. LinkID is a foreign key and links to rows of an other table.

I’m trying to do a query which alters every LinkID batch in the following way:

  • Take every row with same LinkID (e.g. the first batch is the first 6 rows here)
  • Order them by the Order column
  • Look at Data and DataSpecifier columns as one compare unit (They can be thought as one column, called dataunit):
    • Keep as many rows from Order 1 onwards, until a duplicate dataunit comes by
    • Delete every row from that first duplicate onwards for that LinkID

So for the LinkID 120:

  • Sort the batch (already sorted here, but should still do it)
  • Start looking from the top (So Order=1 here), go as long as you don’t see a duplicate.
  • Stop at the first duplicate Order = 5 (dataunit 1 10 was already seen).
  • Delete everything which has the LinkID=120 AND Order>=5

After similar process for LinkID 371 (and every other LinkID in the table), the processed table will look like this:

I’ve done quite a lot of SQL queries, but never something this complicated. I know I need to use a query which is something like this:

But I just can’t seem to wrap my head around this and get the query right. I would preferably do this in pure SQL, with one executable (and reusable) query.

Advertisement

Answer

You can use the ROW_NUMBER() window function to identify any rows that come after the original. After that you can delete and rows with a matching LinkID and a greater than or equal to any encountered Order with a row number greater than one.

(I originally used a second CTE to get the MIN order, but I realized that it wasn’t necessary as long as the join to order was greater than equal to any order where there was a second instance of the DataUnitId. By removing the MIN the query plan became quite simple and efficient.)

Here is the output from your sample data which matches your desired result:

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement