Skip to content
Advertisement

MS SQL identify duplicate rows based on log time

I have a fairly simple database table that logs every time a tray passes over an RFID reader. What sometimes happens is the data is being sent twice, so I have been asked if I can find out how often this happens. Rather than spending the next few days going through every record in the log table, I presume there must be a way of extracting this data using a SQL query?

For example, my data looks like this:

ID TYPE POINT RFID LOGGED
1 101 4101 1234 2021-01-20 06:31:25:154
2 101 4101 4567 2021-01-20 06:32:24:165
3 101 4102 1234 2021-01-20 06:35:55:154
4 101 4102 1234 2021-01-20 06:35:55:516

What I want to do is identify where I have duplicate RFID records where the POINT is the same and the time difference between the records is less than 5 seconds.

In the example above, IDs 3 & 4 would be returned as duplicate because they are both from POINT 4102, have the same RFID and the LOGGED difference is less than 5 seconds.

Although ID 1 has the same RFID (1234) as ID 3 & 4, this would not be a duplicate because the POINT is different.

I tried to write a query to return the duplicates, but I get no results, even though I have found some by looking through the table:

DECLARE @TheDate date
SET @TheDate = '2021-01-20'

SELECT 
    [Type]
    ,[Point]
    ,[RFID]
    ,[Logged]
FROM 
    [Log]
WHERE 
    @TheDate = CONVERT(date,[Logged])
    AND [Point] IN ('4101', '4102')
    AND [Type] = '101'
GROUP BY 
    [Type], [Point], [RFID], [Logged]
HAVING 
    COUNT([RFID]) > 1

The other problem I have is that I have no idea how to do the datetime comparison to see if the difference in time is less than 5 seconds.

Advertisement

Answer

You can use lead() and lag(). For all such rows:

select l.*
from (select l.*,
             lag(logged) over (partition by rfid order by logged) as prev_logged,
             lead(logged) over (partition by rfid order by logged) as next_logged
      from logs l
     ) l
where prev_logged > dateadd(second, -5, logged) or
      next_logged < dateadd(second, 5, logged);

Your sample code looks like SQL Server so this uses SQL Server syntax.

If you just want the first record in a sequence of duplicates, you can use similar logic:

select l.*
from (select l.*,
             lag(logged) over (partition by rfid order by logged) as prev_logged,
             lead(logged) over (partition by rfid order by logged) as next_logged
      from logs l
     ) l
where (prev_logged < dateadd(second, -5, logged) or prev_logged is null) and
      next_logged < dateadd(second, 5, logged);

Note that both of these only use the rfid. I’m not sure if the rfid should really be used in conjunction with other columns, such as point, but your question explicitly mentioned only duplicate rfid.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement