I have a fairly simple database table that logs every time a tray passes over an RFID reader. What sometimes happens is the data is being sent twice, so I have been asked if I can find out how often this happens. Rather than spending the next few days going through every record in the log table, I presume there must be a way of extracting this data using a SQL query?
For example, my data looks like this:
| ID | TYPE | POINT | RFID | LOGGED |
|---|---|---|---|---|
| 1 | 101 | 4101 | 1234 | 2021-01-20 06:31:25:154 |
| 2 | 101 | 4101 | 4567 | 2021-01-20 06:32:24:165 |
| 3 | 101 | 4102 | 1234 | 2021-01-20 06:35:55:154 |
| 4 | 101 | 4102 | 1234 | 2021-01-20 06:35:55:516 |
What I want to do is identify where I have duplicate RFID records where the POINT is the same and the time difference between the records is less than 5 seconds.
In the example above, IDs 3 & 4 would be returned as duplicate because they are both from POINT 4102, have the same RFID and the LOGGED difference is less than 5 seconds.
Although ID 1 has the same RFID (1234) as ID 3 & 4, this would not be a duplicate because the POINT is different.
I tried to write a query to return the duplicates, but I get no results, even though I have found some by looking through the table:
DECLARE @TheDate date
SET @TheDate = '2021-01-20'
SELECT
[Type]
,[Point]
,[RFID]
,[Logged]
FROM
[Log]
WHERE
@TheDate = CONVERT(date,[Logged])
AND [Point] IN ('4101', '4102')
AND [Type] = '101'
GROUP BY
[Type], [Point], [RFID], [Logged]
HAVING
COUNT([RFID]) > 1
The other problem I have is that I have no idea how to do the datetime comparison to see if the difference in time is less than 5 seconds.
Advertisement
Answer
You can use lead() and lag(). For all such rows:
select l.*
from (select l.*,
lag(logged) over (partition by rfid order by logged) as prev_logged,
lead(logged) over (partition by rfid order by logged) as next_logged
from logs l
) l
where prev_logged > dateadd(second, -5, logged) or
next_logged < dateadd(second, 5, logged);
Your sample code looks like SQL Server so this uses SQL Server syntax.
If you just want the first record in a sequence of duplicates, you can use similar logic:
select l.*
from (select l.*,
lag(logged) over (partition by rfid order by logged) as prev_logged,
lead(logged) over (partition by rfid order by logged) as next_logged
from logs l
) l
where (prev_logged < dateadd(second, -5, logged) or prev_logged is null) and
next_logged < dateadd(second, 5, logged);
Note that both of these only use the rfid. I’m not sure if the rfid should really be used in conjunction with other columns, such as point, but your question explicitly mentioned only duplicate rfid.