I have a fairly simple database table that logs every time a tray passes over an RFID reader. What sometimes happens is the data is being sent twice, so I have been asked if I can find out how often this happens. Rather than spending the next few days going through every record in the log table, I presume there must be a way of extracting this data using a SQL query?
For example, my data looks like this:
ID | TYPE | POINT | RFID | LOGGED |
---|---|---|---|---|
1 | 101 | 4101 | 1234 | 2021-01-20 06:31:25:154 |
2 | 101 | 4101 | 4567 | 2021-01-20 06:32:24:165 |
3 | 101 | 4102 | 1234 | 2021-01-20 06:35:55:154 |
4 | 101 | 4102 | 1234 | 2021-01-20 06:35:55:516 |
What I want to do is identify where I have duplicate RFID records where the POINT is the same and the time difference between the records is less than 5 seconds.
In the example above, IDs 3 & 4 would be returned as duplicate because they are both from POINT 4102, have the same RFID and the LOGGED difference is less than 5 seconds.
Although ID 1 has the same RFID (1234) as ID 3 & 4, this would not be a duplicate because the POINT is different.
I tried to write a query to return the duplicates, but I get no results, even though I have found some by looking through the table:
DECLARE @TheDate date SET @TheDate = '2021-01-20' SELECT [Type] ,[Point] ,[RFID] ,[Logged] FROM [Log] WHERE @TheDate = CONVERT(date,[Logged]) AND [Point] IN ('4101', '4102') AND [Type] = '101' GROUP BY [Type], [Point], [RFID], [Logged] HAVING COUNT([RFID]) > 1
The other problem I have is that I have no idea how to do the datetime comparison to see if the difference in time is less than 5 seconds.
Advertisement
Answer
You can use lead()
and lag()
. For all such rows:
select l.* from (select l.*, lag(logged) over (partition by rfid order by logged) as prev_logged, lead(logged) over (partition by rfid order by logged) as next_logged from logs l ) l where prev_logged > dateadd(second, -5, logged) or next_logged < dateadd(second, 5, logged);
Your sample code looks like SQL Server so this uses SQL Server syntax.
If you just want the first record in a sequence of duplicates, you can use similar logic:
select l.* from (select l.*, lag(logged) over (partition by rfid order by logged) as prev_logged, lead(logged) over (partition by rfid order by logged) as next_logged from logs l ) l where (prev_logged < dateadd(second, -5, logged) or prev_logged is null) and next_logged < dateadd(second, 5, logged);
Note that both of these only use the rfid
. I’m not sure if the rfid
should really be used in conjunction with other columns, such as point
, but your question explicitly mentioned only duplicate rfid
.