Skip to content
Advertisement

Cumulative elapsed minutes on hourly basis in PostgreSQL

I have a datetime column. I need to derive a column of total minutes elapsed from the first to the last value of every hour grouped by hour, but, in cases of overlapping event, the time should be distributed between two hours. There is also a condition where if the elapsed time exceeds 30 minutes in between two consecutive records, then it has to be ignored.

Below, I’ve explained in three phases, Original, Intermediate (calculating the running total) and Final.

enter image description here

And, I’m planning to take hourly incremental data on the same, so, how can we properly merge it with the old data is another question.

Sample data:

 Moves_TS
1/4/2020 10:00
1/4/2020 10:25
1/4/2020 10:42
1/4/2020 10:56
1/4/2020 10:59
1/4/2020 11:02
1/4/2020 11:24
1/4/2020 11:43
1/4/2020 11:55
1/4/2020 12:26
1/4/2020 12:29

Intermediate layer:

Moves_TS    Hour    Running Total
1/4/2020 10:00  10  0
1/4/2020 10:25  10  25
1/4/2020 10:42  10  42
1/4/2020 10:56  10  56
1/4/2020 10:59  10  60
1/4/2020 11:02  11  2
1/4/2020 11:24  11  24
1/4/2020 11:43  11  43
1/4/2020 11:55  11  55
1/4/2020 12:26  12  0
1/4/2020 12:29  12  3

Final Output:

Hour    Work done/Hour
10  60
11  55
12  3

Advertisement

Answer

This is a gaps-and-islands problem with some twists. First, I would summarize by the “islands” defined by the gaps of 30 minutes:

select min(moves_ts) as start_ts, max(moves_ts) as end_ts
from (select o.*,
             count(prev_moves_ts) filter (where moves_ts > prev_moves_ts + interval '30 minute') over (order by moves_ts) as grp
      from (select o.*, lag(moves_ts) over (order by moves_ts) as prev_moves_ts
            from original o
           ) o
     ) o
group by grp;

Then you can use this with generate_series() to expand the data and calculate the overlaps with each hour:

with islands as (
      select min(moves_ts) as start_ts, max(moves_ts) as end_ts
      from (select o.*,
                   count(prev_moves_ts) filter (where moves_ts > prev_moves_ts + interval '30 minute') over (order by moves_ts) as grp
            from (select o.*, lag(moves_ts) over (order by moves_ts) as prev_moves_ts
                  from original o
                 ) o
           ) o
      group by grp
     )
select hh.hh,
       sum( least(hh.hh + interval '1 hour', i.end_ts) -
            greatest(hh.hh, i.start_ts)
          ) as duration           
from (select generate_series(date_trunc('hour', min(moves_ts)),
                             date_trunc('hour', max(moves_ts)),
                             interval '1 hour'
                            ) hh
      from original o
     ) hh left join
     islands i
     on i.start_ts < hh.hh + interval '1 hour' and
        i.end_ts >= hh.hh
group by hh.hh
order by hh.hh;

Here is a db<>fiddle.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement