Please help me with this one, I’m stuck and cant figure out how to write my Query. I’m working with SQL Server 2014.
Table A (approx 65k ROWS) CEID = primary key
CEID State Checksum
1 2 666
2 2 666
3 2 666
4 2 333
5 2 333
6 9 333
7 9 111
8 9 111
9 9 741
10 2 656
Desired output
CEID State Checksum
3 2 666
6 9 333
8 9 111
9 9 741
10 2 656
I want to keep the row with highest CEID if “state” is equal for all duplicate checksums. If state differs but Checksum is equal i want to keep the row with highest CEID for State=9. Unique rows like CEID 9 and 10 should be included in result regardless of State.
This join returns all duplicates:
SELECT a1.*, a2.*
FROM tableA a1
INNER JOIN tableA a2 ON a1.ChecksumI = a2.ChecksumI
AND a1.CEID <> a2.CEID
I’ve also identified MAX(CEID) for each duplicate checksum with this query
SELECT a.Checksum, a.State, MAX(a.CEID) CEID_MAX ,COUNT(*) cnt FROM tableA a GROUP BY a.Checksum, a.State HAVING COUNT(*) > 1 ORDER BY a.Checksum, a.State
With the first query, I can’t figure out how to SELECT the row with the highest CEID per Checksum.
The problem I encounter with last one is that GROUP BY isn’t allowed in subqueries when I try to join on it.
Advertisement
Answer
You can use row_number() with partition by checksum and order by State desc and CEID desc. Note both of your condition may be fulfill by ORDER BY State desc, CEID desc
And take the first row_number
;with
cte as
(
select *, rn = row_number() over (Partition by Checksum order by State desc, CEID desc)
from TableA
)
select *
from cte
where rn = 1
order by CEID;