I have the following structure for the table DataTable
: every column is of the datatype int, RowID
is an identity column and the primary key. LinkID
is a foreign key and links to rows of an other table.
RowID LinkID Order Data DataSpecifier 1 120 1 1 1 2 120 2 1 3 3 120 3 1 10 4 120 4 1 13 5 120 5 1 10 6 120 6 1 13 7 371 1 6 2 8 371 2 3 5 9 371 3 8 1 10 371 4 10 1 11 371 5 7 2 12 371 6 3 3 13 371 7 7 2 14 371 8 17 4 ................................. .................................
I’m trying to do a query which alters every LinkID
batch in the following way:
- Take every row with same
LinkID
(e.g. the first batch is the first 6 rows here) - Order them by the
Order
column - Look at
Data
andDataSpecifier
columns as one compare unit (They can be thought as one column, calleddataunit
): - Keep as many rows from
Order=1
onwards, until adataunit
comes by which appears more than one time in the batch - Keep that final row, but delete rest of the rows with same
LinkID
and greaterOrder
value
So for the LinkID
120
:
- Sort the batch by the
Order
column (already sorted here, but should still do it) - Start looking from the top (So
Order=1
here), go as long as you don’t see a value which appears more than 1 time in the batch - Stop at the first duplicate
Order=3
(dataunit
1 10
is also onOrder
5
). - Delete everything which has the
LinkID=120 AND Order>=4
After similar process for LinkID
371
(and every other LinkID
in the table), the processed table will look like this:
RowID LinkID Order Data DataSpecifier 1 120 1 1 1 2 120 2 1 3 3 120 3 1 10 7 371 1 6 2 8 371 2 3 5 9 371 3 8 1 10 371 4 10 1 11 371 5 7 2 ................................. .................................
I’ve never done an SQL query which this complicated. I know the query has to be something like this:
DELETE FROM DataTable WHERE RowID IN (SELECT RowID FROM DataTable WHERE -- ? GROUP BY LinkID HAVING COUNT(*) > 1 -- ? ORDER BY [Order]);
but I just can’t seem to wrap my head around this and get the query right. I would preferably do this in pure SQL, with one executable (and reusable) query.
I asked a very similar question here: How to remove rest of the rows with the same ID starting from the first duplicate?
But since I realized that my original filtering logic in the question was not actually what I needed and that question had already been answered correctly, I had to make this new question.
Advertisement
Answer
Here, my previous solution updated. A few GROUP BY
should be enough. Code is simple enough to be understood by reading.
Set up:
IF OBJECT_ID('tempdb..#YourData') IS NOT NULL DROP TABLE #YourData CREATE TABLE #YourData ( RowID INT, LinkID INT, [Order] INT, Data INT, DataSpecifier INT) INSERT INTO #YourData ( RowID, LinkID, [Order], Data, DataSpecifier) VALUES ('1', ' 120', '1', '1', ' 1'), ('2', ' 120', '2', '1', ' 3'), ('3', ' 120', '3', '1', ' 10'), ('4', ' 120', '4', '1', ' 13'), ('5', ' 120', '5', '1', ' 10'), ('6', ' 120', '6', '1', ' 13'), ('7', ' 371', '1', '6', ' 2'), ('8', ' 371', '2', '3', ' 5'), ('9', ' 371', '3', '8', ' 1'), ('10', '371', '4', '10', '1'), ('11', '371', '5', '7', ' 2'), ('12', '371', '6', '3', ' 3'), ('13', '371', '7', '7', ' 2'), ('14', '371', '8', '17', '4')
Solution:
;WITH DuplicatesByLinkID AS ( SELECT Y.LinkID, Y.Data, Y.DataSpecifier, [Order] = MIN([Order]) FROM #YourData AS Y GROUP BY Y.LinkID, Y.Data, Y.DataSpecifier HAVING COUNT(*) > 1 ), FirstDuplicateByLinkID AS ( SELECT D.LinkID, MinOrder = MIN(D.[Order]) FROM DuplicatesByLinkID AS D GROUP BY D.LinkID ) DELETE Y FROM #YourData AS Y INNER JOIN FirstDuplicateByLinkID AS M ON Y.LinkID = M.LinkID AND Y.[Order] > M.MinOrder SELECT * FROM #YourData
Result:
RowID LinkID Order Data DataSpecifier 1 120 1 1 1 2 120 2 1 3 3 120 3 1 10 7 371 1 6 2 8 371 2 3 5 9 371 3 8 1 10 371 4 10 1 11 371 5 7 2