Skip to content
Advertisement

SQL – Combine two rows if difference is below threshhold

I have a table like this in SQL Server:

id  start_time  end_time
1   10:00:00    10:34:00
2   10:38:00    10:52:00
3   10:53:00    11:23:00
4   11:24:00    11:56:00
5   14:20:00    14:40:00
6   14:41:00    14:59:00
7   15:30:00    15:40:00

What I would like to have is a query that outputs consolidated records based on the time difference between two consecutive records (end_time of row n and start_time row n+1) . All records where the time difference is less than 2 minutes should be combined into one time entry and the ID of the first record should be kept. This should also combine more than two records if multiple consecutive records have a time difference less than 2 minutes.

This would be the expected output:

id  start_time  end_time
1   10:00:00    10:34:00
2   10:38:00    11:56:00
5   14:20:00    14:59:00
7   15:30:00    15:40:00

Thanks in advance for any tips how to build the query.

Edit: I started with following code to calculate the lead_time and the time difference but do not know how to group and consolidate.

WITH rows AS
        (
        SELECT  *, ROW_NUMBER() OVER (ORDER BY Id) AS rn
        FROM #temp
        )
SELECT  mc.id, mc.start_time, mc.end_time, mp.start_time lead_time, DATEDIFF(MINUTE, mc.[end_time], mp.[start_time]) as DiffToNewSession
FROM    rows mc
LEFT JOIN    rows mp
ON      mc.rn = mp.rn - 1

Advertisement

Answer

The window function in t-sql can realize a lot of data statistics, such as

create table #temp(id int identity(1,1), start_time time, end_time time)
insert into #temp(start_time, end_time)
values  ('10:00:00', '10:34:00')
      , ('10:38:00', '10:52:00')
      , ('10:53:00', '11:23:00')
      , ('11:24:00', '11:56:00')
      , ('14:20:00', '14:40:00')
      , ('14:41:00', '14:59:00')
      , ('15:30:00', '15:40:00')

;with c0 as(
select *, LAG(end_time,1,'00:00:00') over (order by id) as lag_time
from #temp
), c1 as(
select *, case when DATEDIFF(MI, lag_time, start_time) <= 2 then 1 else -0 end as gflag
from c0
), c2 as(
select *, SUM(case when gflag=0 then 1 else 0 end) over(order by id) as gid
from c1
)
select MIN(id) as id, MIN(start_time) as start_time, MAX(end_time) as end_time
from c2
group by gid

In order to better describe the process of data construction, I simply use c0, c1, c2… to represent levels, you can merge some levels and optimize. If you can’t use id as a sorting condition, then you need to change the sorting part in the above statement.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement