Skip to content
Advertisement

Find rows with adjourning date ranges and accumulate their durations

My PostgreSQL database stores school vacation, public holidays and weekend dates for parents to plan their vacation. Many times school vacations are adjourned by weekends or public holidays. I want to display the total number of non-school days for a school vacation. That should include any adjourned weekend or public holiday.

Example Data

locations

SELECT id, name, is_federal_state 
FROM locations 
WHERE is_federal_state = true;
| id | name              | is_federal_state |
|----|-------------------|------------------|
| 2  | Baden-Württemberg | true             |
| 3  | Bayern            | true             |

holiday_or_vacation_types

SELECT id, name FROM holiday_or_vacation_types;
| id | name                  |
|----|-----------------------|
| 1  | Herbst                |
| 8  | Wochenende            |

“Herbst” is German for “autumn” and “Wochenende” is German for “weekend”.

periods

SELECT id, starts_on, ends_on, holiday_or_vacation_type_id 
FROM periods 
WHERE location_id = 2 
ORDER BY starts_on;
| id  | starts_on    | ends_on      | holiday_or_vacation_type_id |
|-----|--------------|--------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 8                           |
| 532 | "2019-10-28" | "2019-10-30" | 1                           |
| 533 | "2019-10-31" | "2019-10-31" | 1                           |
| 671 | "2019-11-02" | "2019-11-03" | 8                           |
| 672 | "2019-11-09" | "2019-11-10" | 8                           |
| 673 | "2019-11-16" | "2019-11-17" | 8                           |

Task

I want to select all periods where location_id equals 2. And I want to calculate the duration of each period in days. That can be done with this SQL query:

SELECT id, starts_on, ends_on, 
       (ends_on - starts_on + 1) AS duration, 
       holiday_or_vacation_type_id 
FROM periods
| id  | starts_on    | ends_on      | duration | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2        | 8                           |
| 532 | "2019-10-28" | "2019-10-30" | 3        | 1                           |
| 533 | "2019-10-31" | "2019-10-31" | 1        | 1                           |
| 671 | "2019-11-02" | "2019-11-03" | 2        | 8                           |
| 672 | "2019-11-09" | "2019-11-10" | 2        | 8                           |
| 673 | "2019-11-16" | "2019-11-17" | 2        | 8                           |

Any human looking at the calendar would see that the ids 670 (weekend), 532 (fall vacation) and 533 (fall vacation) are adjourned. So they add up to a 6 day vacation period. So far I do this with a program which computes this. But that takes quite a lot of resources (the actual table contains some 500,000 items).

Problem 1

Which SQL query would result in the following output (is adds a real_duration column)? Is that even possible with SQL?

| id  | starts_on    | ends_on      | duration | real_duration | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|---------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2        | 6             | 8                           |
| 532 | "2019-10-28" | "2019-10-30" | 3        | 6             | 1                           |
| 533 | "2019-10-31" | "2019-10-31" | 1        | 6             | 1                           |
| 671 | "2019-11-02" | "2019-11-03" | 2        | 2             | 8                           |
| 672 | "2019-11-09" | "2019-11-10" | 2        | 2             | 8                           |
| 673 | "2019-11-16" | "2019-11-17" | 2        | 2             | 8                           |

Problem 2

It is possible to list the adjourning periods in a part_of_range field? This would be the result. Can that be done with SQL?

| id  | starts_on    | ends_on      | duration | part_of_range | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|---------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2        | 670,532,533   | 8                           |
| 532 | "2019-10-28" | "2019-10-30" | 3        | 670,532,533   | 1                           |
| 533 | "2019-10-31" | "2019-10-31" | 1        | 670,532,533   | 1                           |
| 671 | "2019-11-02" | "2019-11-03" | 2        |               | 8                           |
| 672 | "2019-11-09" | "2019-11-10" | 2        |               | 8                           |
| 673 | "2019-11-16" | "2019-11-17" | 2        |               | 8                           |

Advertisement

Answer

This is a gaps and islands problem. In this case you can use lag() to see where an island starts and then a cumulative sum.

The final operation is some aggregation (using window functions):

SELECT p.*, 
      (Max(ends_on) OVER (PARTITION BY location_id, grp) - Min(starts_on) OVER (PARTITION BY location_id, grp) ) + 1 AS duration,
      Array_agg(p.id) OVER (PARTITION BY location_id) 
FROM (SELECT p.*, 
             Count(*) FILTER (WHERE prev_eo < starts_on - INTERVAL '1 day') OVER (PARTITION BY location_id ORDER BY starts_on) AS grp
      FROM (SELECT id, starts_on, ends_on, location_id, holiday_or_vacation_type_id, 
                   lag(ends_on) OVER (PARTITION BY location_id ORDER BY (starts_on)) AS prev_eo
            FROM periods 
           ) p
     ) p;
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement