Filter and keep most recent duplicate

Please help me with this one, I’m stuck and cant figure out how to write my Query. I’m working with SQL Server 2014.

Table A (approx 65k ROWS) CEID = primary key

 CEID     State    Checksum
    1        2          666
    2        2          666
    3        2          666
    4        2          333
    5        2          333
    6        9          333
    7        9          111
    8        9          111
    9        9          741
   10        2          656

Desired output

 CEID     State    Checksum
    3        2          666
    6        9          333
    8        9          111
    9        9          741
   10        2          656

I want to keep the row with highest CEID if “state” is equal for all duplicate checksums. If state differs but Checksum is equal i want to keep the row with highest CEID for State=9. Unique rows like CEID 9 and 10 should be included in result regardless of State.

This join returns all duplicates:

SELECT a1.*, a2.*
FROM  tableA a1  
INNER JOIN tableA a2 ON a1.ChecksumI = a2.ChecksumI
                     AND a1.CEID <> a2.CEID

I’ve also identified MAX(CEID) for each duplicate checksum with this query

SELECT a.Checksum, a.State, MAX(a.CEID) CEID_MAX ,COUNT(*) cnt
FROM tableA a
GROUP BY a.Checksum, a.State
HAVING COUNT(*) > 1
ORDER BY a.Checksum, a.State

With the first query, I can’t figure out how to SELECT the row with the highest CEID per Checksum.

The problem I encounter with last one is that GROUP BY isn’t allowed in subqueries when I try to join on it.

Answer

You can use row_number() with partition by checksum and order by State desc and CEID desc. Note both of your condition may be fulfill by ORDER BY State desc, CEID desc

And take the first row_number

;with 
cte as
(
    select  *, rn = row_number() over (Partition by Checksum order by State desc, CEID desc)
    from    TableA
)
select  *
from    cte
where   rn = 1
order by CEID;

Advertisement

Answer