Skip to content
Advertisement

Conditional removing duplicate records

I’m storing some realtime data in SQLite. Now, I want to remove duplicate records to reduce data and enlarge its timeframe to 20 seconds by SQL commands.

Sample data:

id     t        col1    col2  
-----------------------------
23  9:19:18     15      16   
24  9:19:20     10      11
25  9:19:20     10      11   
26  9:19:35     10      11   
27  9:19:45     10      11   
28  9:19:53     10      11   
29  9:19:58     14      13

Logic: In above sample, records 25-28 have same value in col1 and col2 field, so they are duplicate. But because keeping one (for example, record 25) and removing others will cause timeframe (= time difference between subsequent data) to be more than 20s, i don’t want to remove all of records 26-28. So, in above sample, row=25 will be kept because, it’s not duplicate of its previous row. Row=26 will be kept, because although its duplicate of its previous row, removing this row causes to have timeframe to more than 20s (19:45 – 19:20). Row=27 will be removed, meeting these 2 conditions and row=28 will be kept.

I can load data to C# datatable and apply this logic in code in a loop over records, but it is slow comparing to run SQL in database. I’m not sure this can be implemented in SQL. Any help would be greatly appreciated.

Edit: I’ve added another row before row =25 to show rows with the same time. Fiddle is here: Link

Advertisement

Answer

OK so here’s an alternate answer that handles the duplicate record scenario you’ve described, uses LAG and LEAD and also ends up considerably simpler as it turns out!

delete from t1 where id in
(
with cte as (
select id, 
       lag(t, 1) over(partition by col1, col2 order by t) as prev_t,
       lead(t, 1) over(partition by col1, col2 order by t) as next_t
from t1
)
select id
from cte
where strftime('%H:%M:%S',next_t,'-20 seconds') < strftime('%H:%M:%S',prev_t)
)

Online demo here

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement