Skip to content
Advertisement

Finding Gaps in Timestamps for Multiple Users in PostgreSQL

I am working with a dataset containing Check-In and Check-Out times for multiple office rooms over the last 5 years. One of the projects I was asked to work on was calculating the amount of time each room is busy and vacant over various time ranges (daily, weekly, monthly, etc.) assuming normal operational hours (8am to 5pm). A sample of the dataset for two days looks like this:

room_id         start_dt                end_dt
Room: Room 3    2019-05-04 09:00:00     2019-05-04 11:30:00
Room: Room 3    2019-05-04 11:30:00     2019-05-04 12:15:00
Room: Room 3    2019-05-04 12:30:00     2019-05-04 13:00:00
Room: Room 3    2019-05-05 09:00:00     2019-05-05 13:00:00
Room: Room 4    2019-05-04 08:00:00     2019-05-04 09:00:00
Room: Room 4    2019-05-04 09:00:00     2019-05-04 11:00:00
Room: Room 4    2019-05-04 14:00:00     2019-05-04 16:00:00
Room: Room 4    2019-05-05 08:30:00     2019-05-05 09:30:00

I borrowed and modified some code written in a previous StackOverflow post by @Branko Dimitrijevic (full post: SQL Query to show gaps between multiple date ranges) to try and handle multiple different rooms. Below is my modified code with two instances of room_id in the SELECT clause for debugging purposes:

SELECT t1.room_id, t2.room_id, end_dt, start_dt, start_dt - end_dt as gap_dur
FROM
    (
        SELECT DISTINCT room_id, start_dt, ROW_NUMBER() OVER (ORDER BY start_dt) RN
        FROM my_table T1
        WHERE
            NOT EXISTS (
                SELECT *
                FROM my_table T2
                WHERE (T1.start_dt > T2.start_dt and t1.resource = t2.resource)
                    AND (T1.start_dt < T2.end_dt and t1.resource = t2.resource)
            )
        ) T1
    JOIN (
        SELECT DISTINCT resource, end_dt, ROW_NUMBER() OVER (ORDER BY end_dt) RN
        FROM my_table T1
        WHERE
            NOT EXISTS (
                SELECT *
                FROM my_table T2
                WHERE (T1.end_dt > T2.start_dt and t1.resource = t2.resource)
                    AND (T1.end_dt < T2.end_dt and t1.resource = t2.resource)
            )
    ) T2
    ON T1.RN - 1 = T2.RN
WHERE
    end_dt < start_dt

And this is the output I am receiving:

room_id         room_id         end_dt                  start_dt                gap_dur
Room: Exam 4    Room: Exam 4    2019-05-04 16:00:00     2019-05-05 08:30:00     16:30:00
Room: Exam 4    Room: Exam 3    2019-05-04 13:00:00     2019-05-04 14:00:00     01:00:00
Room: Exam 3    Room: Exam 3    2019-05-04 12:15:00     2019-05-04 12:30:00     00:15:00

However, this is becoming confused between different rooms and I don’t know how to implement workday constraints, such as finding time gaps between 8am and the first scheduled event. Below is an optimal output, or at least a data format that would be usable to calculate the statistics I would need with some simple GROUP BY scripts:

room_id         end_dt                  start_dt                gap_dur
Room: Exam 3    2019-05-04 08:00:00     2019-05-04 09:00:00     01:00:00
Room: Exam 3    2019-05-04 12:15:00     2019-05-04 12:30:00     00:15:00
Room: Exam 3    2019-05-04 13:00:00     2019-05-04 17:00:00     04:00:00
Room: Exam 3    2019-05-05 08:00:00     2019-05-05 09:00:00     01:00:00
Room: Exam 3    2019-05-05 13:00:00     2019-05-05 17:00:00     04:00:00
Room: Exam 4    2019-05-04 11:00:00     2019-05-04 14:00:00     03:00:00
Room: Exam 4    2019-05-04 16:00:00     2019-05-04 17:00:00     01:00:00
Room: Exam 4    2019-05-05 08:00:00     2019-05-05 08:30:00     00:30:00
Room: Exam 4    2019-05-05 09:30:00     2019-05-05 17:00:00     09:30:00

Any help on this would be greatly appreciated and happy to provide additional information if it helps!

Advertisement

Answer

One of the projects I was asked to work on was calculating the amount of time each room is busy and vacant over various time ranges (daily, weekly, monthly, etc.) assuming normal operational hours (8am to 5pm).

Based on your sample data, two assumptions seem reasonable:

  • The “busy” periods don’t overlap.
  • “Busy” periods are all within one day.

If these are not true, I would suggest that you ask a NEW question with appropriate explanation and sample data.

The calculation is then pretty simple for a given day:

select date_trunc('day', start_dt),
       sum( least(extract(epoch from end_dt), v.epoch2) - 
            greatest(extract(epoch from start_dt), epoch1)
          ) as busy_seconds,
       (epoch2 - epoch1 -
        sum( least(extract(epoch from end_dt), v.epoch2) - 
             greatest(extract(epoch from start_dt), epoch1)
           )
       ) as free_seconds
from rooms r cross join
     (values (extract(epoch from date_trunc('day', start_dt) + interval '8 hour'),
              extract(epoch from date_trunc('day', start_dt) + interval '17 hour')
             )
     ) v(epoch1, epoch2)                  
group by date_trunc('day', start_dt)
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement