Skip to content
Advertisement

How to group data within a range of contigious timestamps

I have a table made up of rows of data collected through an indeterministic polling process. Each row has a start and end timestamp denoting the time period in which the data was collected. In some cases the data was collected contiguously, in which case the timestamp of one row will have the same value as the start timestamp for the next row. In other cases there is a break in time between one row and the next.

For example, in the table below, rows number 1,2,3 and 4 are all part of one time series of data. Similarly for rows 5, 6, 7 and 8 and again for rows 9 and 10. In between are time periods for which I do not have data.

Row  Start_Timestamp      End_Timestamp           Data_Item
---  ---------------      --------------          ---------
1    2019-08-12_22:07:53  2019-08-12_22:09:57     100      
2    2019-08-12_22:09:57  2019-08-12_22:12:01     203      
3    2019-08-12_22:12:01  2019-08-12_22:13:03     487      
4    2019-08-12_22:13:03  2019-08-12_22:16:19     113      
5    2019-08-12_22:24:34  2019-08-12_22:26:37     632      
6    2019-08-12_22:26:37  2019-08-12_22:27:40     532      
7    2019-08-12_22:27:40  2019-08-12_22:28:42     543      
8    2019-08-12_22:28:42  2019-08-12_22:31:57     142      
9    2019-08-13_19:56:06  2019-08-13_19:57:08     351      
10   2019-08-13_19:57:08  2019-08-13_19:58:10     982      

I would like to groups these contiguous time series ideally as follows:

Row  Series  Start_Timestamp      End_Timestamp           Data_Item   
---  ------  ---------------      --------------          -----------  
1    1       2019-08-12_22:07:53  2019-08-12_22:09:57     100
2    1       2019-08-12_22:09:57  2019-08-12_22:12:01     203
3    1       2019-08-12_22:12:01  2019-08-12_22:13:03     487
4    1       2019-08-12_22:13:03  2019-08-12_22:16:19     113
5    2       2019-08-12_22:24:34  2019-08-12_22:26:37     632
6    2       2019-08-12_22:26:37  2019-08-12_22:27:40     532
7    2       2019-08-12_22:27:40  2019-08-12_22:28:42     543
8    2       2019-08-12_22:28:42  2019-08-12_22:31:57     142
9    3       2019-08-13_19:56:06  2019-08-13_19:57:08     351
10   3       2019-08-13_19:57:08  2019-08-13_19:58:10     982

I am new to SQL and have been struggling with this problem. I appreciate any insights or advice on how I might achieve this.

Advertisement

Answer

This is a simplified gaps-and-island problem. Assuming that your RDBMS support window functions, you can approach this with a window sum. When the Start_Timestamp of record is different than the End_Timestamp of the previous record, a new group starts:

select
    t.Row,
    sum(case when Start_Timestamp = lag_End_Timestamp then 0 else 1 end) 
        over(order by End_Timestamp) series,
    t.Start_Timestamp,
    t.End_Timestamp,
    t.Data_Item
from (
    select
        t.*,
        lag(End_Timestamp) over (order by End_Timestamp) lag_End_Timestamp
    from mytable t
) t

Demo on DB Fiddle:

Row | series | Start_Timestamp     | End_Timestamp       | Data_Item
--: | -----: | :------------------ | :------------------ | --------:
  1 |      1 | 2019-08-12 22:07:53 | 2019-08-12 22:09:57 |       100
  2 |      1 | 2019-08-12 22:09:57 | 2019-08-12 22:12:01 |       203
  3 |      1 | 2019-08-12 22:12:01 | 2019-08-12 22:13:03 |       487
  4 |      1 | 2019-08-12 22:13:03 | 2019-08-12 22:16:19 |       113
  5 |      2 | 2019-08-12 22:24:34 | 2019-08-12 22:26:37 |       632
  6 |      2 | 2019-08-12 22:26:37 | 2019-08-12 22:27:40 |       532
  7 |      2 | 2019-08-12 22:27:40 | 2019-08-12 22:28:42 |       543
  8 |      2 | 2019-08-12 22:28:42 | 2019-08-12 22:31:57 |       142
  9 |      3 | 2019-08-13 19:56:06 | 2019-08-13 19:57:08 |       351
 10 |      3 | 2019-08-13 19:57:08 | 2019-08-13 19:58:10 |       982
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement