Skip to content
Advertisement

Removing duplicates from Join with grouped result

I use an sql server with EF Core. Due to lack of GROUP BY support by EF I run raw sql query.

So, I have a history table

+------------+---------+------------------+
| HardwareId | StateId |    Timestamp     |
+------------+---------+------------------+
| Dev1       |       1 | 2019-10-11 11:00 |
| Dev2       |       2 | 2019-10-11 11:01 |
| Dev1       |       3 | 2019-10-11 11:09 |
| Dev2       |       1 | 2019-10-11 11:10 |
| Dev2       |       2 | 2019-10-11 11:10 |
| Dev1       |       3 | 2019-10-11 11:11 |
+------------+---------+------------------+

Here some Device records can have the same timestamps.

Now I want to have the last state of each device:

+------------+---------+------------------+
| HardwareId | StateId |    Timestamp     |
+-----==-----+---------+------------------+
| Dev2       |       1 | 2019-10-11 11:10 |
| Dev1       |       3 | 2019-10-11 11:11 |
+------------+---------+------------------+

I run

SELECT H.TimeStamp, H.HardwareId, H.ErrorCode, SD.Description FROM History H
INNER JOIN 
(SELECT HardwareId, MAX(TimeStamp) LastDateTime from History 
GROUP BY HardwareId) AS LastStates 
ON H.TimeStamp = LastStates.LastDateTime
INNER JOIN StateDescription SD ON H.ErrorCode = SD.Id ORDER BY H.HardwareId

but it gives me duplicates (due to timestamp duplicates for a given device, kind of multiple last states of a device. I would prefer to select any of these states)

+------------+---------+------------------+
| HardwareId | StateId |    Timestamp     |
+------------+---------+------------------+
| Dev2       |       1 | 2019-10-11 11:10 |
| Dev2       |       2 | 2019-10-11 11:10 |
| Dev1       |       3 | 2019-10-11 11:11 |
+------------+---------+------------------+

adding DISTINCT fixes it

SELECT DISTINCT H.TimeStamp, H.HardwareId, H.ErrorCode, SD.Description FROM History H
INNER JOIN 
(SELECT HardwareId, MAX(TimeStamp) LastDateTime from History 
GROUP BY HardwareId) AS LastStates 
ON H.TimeStamp = LastStates.LastDateTime
INNER JOIN StateDescription SD ON H.ErrorCode = SD.Id ORDER BY H.HardwareId

but if I want to add a record Id (H.Id) to the final result I obviously get duplicates again.

  SELECT DISTINCT H.TimeStamp, H.Id, H.HardwareId, H.ErrorCode, SD.Description FROM History H
    INNER JOIN 
    (SELECT HardwareId, MAX(TimeStamp) LastDateTime from History 
    GROUP BY HardwareId) AS LastStates 
    ON H.TimeStamp = LastStates.LastDateTime
    INNER JOIN StateDescription SD ON H.ErrorCode = SD.Id ORDER BY H.HardwareId

How could I get some kind of TOP(1) record for each devices?

Advertisement

Answer

I would just use row_number():

select h.*
from (select h.*,
             row_number() over (partition by HardwareId order by timestamp desc) as seqnum
      from history h
     ) h
where seqnum = 1;

This will select exactly one row per device per day. If there are duplicates on the day, it will return an arbitrary value. If you want all of them, then use rank() instead of row_number().

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement