I am working with a dataset containing Check-In and Check-Out times for multiple office rooms over the last 5 years. One of the projects I was asked to work on was calculating the amount of time each room is busy and vacant over various time ranges (daily, weekly, monthly, etc.) assuming normal operational hours (8am to 5pm). A sample of the dataset for two days looks like this:
room_id start_dt end_dt Room: Room 3 2019-05-04 09:00:00 2019-05-04 11:30:00 Room: Room 3 2019-05-04 11:30:00 2019-05-04 12:15:00 Room: Room 3 2019-05-04 12:30:00 2019-05-04 13:00:00 Room: Room 3 2019-05-05 09:00:00 2019-05-05 13:00:00 Room: Room 4 2019-05-04 08:00:00 2019-05-04 09:00:00 Room: Room 4 2019-05-04 09:00:00 2019-05-04 11:00:00 Room: Room 4 2019-05-04 14:00:00 2019-05-04 16:00:00 Room: Room 4 2019-05-05 08:30:00 2019-05-05 09:30:00
I borrowed and modified some code written in a previous StackOverflow post by @Branko Dimitrijevic (full post: SQL Query to show gaps between multiple date ranges) to try and handle multiple different rooms. Below is my modified code with two instances of room_id in the SELECT clause for debugging purposes:
SELECT t1.room_id, t2.room_id, end_dt, start_dt, start_dt - end_dt as gap_dur FROM ( SELECT DISTINCT room_id, start_dt, ROW_NUMBER() OVER (ORDER BY start_dt) RN FROM my_table T1 WHERE NOT EXISTS ( SELECT * FROM my_table T2 WHERE (T1.start_dt > T2.start_dt and t1.resource = t2.resource) AND (T1.start_dt < T2.end_dt and t1.resource = t2.resource) ) ) T1 JOIN ( SELECT DISTINCT resource, end_dt, ROW_NUMBER() OVER (ORDER BY end_dt) RN FROM my_table T1 WHERE NOT EXISTS ( SELECT * FROM my_table T2 WHERE (T1.end_dt > T2.start_dt and t1.resource = t2.resource) AND (T1.end_dt < T2.end_dt and t1.resource = t2.resource) ) ) T2 ON T1.RN - 1 = T2.RN WHERE end_dt < start_dt
And this is the output I am receiving:
room_id room_id end_dt start_dt gap_dur Room: Exam 4 Room: Exam 4 2019-05-04 16:00:00 2019-05-05 08:30:00 16:30:00 Room: Exam 4 Room: Exam 3 2019-05-04 13:00:00 2019-05-04 14:00:00 01:00:00 Room: Exam 3 Room: Exam 3 2019-05-04 12:15:00 2019-05-04 12:30:00 00:15:00
However, this is becoming confused between different rooms and I don’t know how to implement workday constraints, such as finding time gaps between 8am and the first scheduled event. Below is an optimal output, or at least a data format that would be usable to calculate the statistics I would need with some simple GROUP BY scripts:
room_id end_dt start_dt gap_dur Room: Exam 3 2019-05-04 08:00:00 2019-05-04 09:00:00 01:00:00 Room: Exam 3 2019-05-04 12:15:00 2019-05-04 12:30:00 00:15:00 Room: Exam 3 2019-05-04 13:00:00 2019-05-04 17:00:00 04:00:00 Room: Exam 3 2019-05-05 08:00:00 2019-05-05 09:00:00 01:00:00 Room: Exam 3 2019-05-05 13:00:00 2019-05-05 17:00:00 04:00:00 Room: Exam 4 2019-05-04 11:00:00 2019-05-04 14:00:00 03:00:00 Room: Exam 4 2019-05-04 16:00:00 2019-05-04 17:00:00 01:00:00 Room: Exam 4 2019-05-05 08:00:00 2019-05-05 08:30:00 00:30:00 Room: Exam 4 2019-05-05 09:30:00 2019-05-05 17:00:00 09:30:00
Any help on this would be greatly appreciated and happy to provide additional information if it helps!
Advertisement
Answer
One of the projects I was asked to work on was calculating the amount of time each room is busy and vacant over various time ranges (daily, weekly, monthly, etc.) assuming normal operational hours (8am to 5pm).
Based on your sample data, two assumptions seem reasonable:
- The “busy” periods don’t overlap.
- “Busy” periods are all within one day.
If these are not true, I would suggest that you ask a NEW question with appropriate explanation and sample data.
The calculation is then pretty simple for a given day:
select date_trunc('day', start_dt), sum( least(extract(epoch from end_dt), v.epoch2) - greatest(extract(epoch from start_dt), epoch1) ) as busy_seconds, (epoch2 - epoch1 - sum( least(extract(epoch from end_dt), v.epoch2) - greatest(extract(epoch from start_dt), epoch1) ) ) as free_seconds from rooms r cross join (values (extract(epoch from date_trunc('day', start_dt) + interval '8 hour'), extract(epoch from date_trunc('day', start_dt) + interval '17 hour') ) ) v(epoch1, epoch2) group by date_trunc('day', start_dt)