I am trying to solve “gaps and islands” and group consecutive checks together. My data looks like this
site_id date_id location_id reservation_id revenue
5 20210101 125 792727 100
5 20210101 126 792728 90
5 20210101 228 792757 200
5 20210102 217 792977 50
5 20210102 218 792978 120
5 20210102 219 792979 100
I want to group by consecutive location_id and consecutive reservation_id (both should be consecutive respectively) within same date and site_id, and sum revenue. so for the example above the output should be:
site_id date_id location_id reservation_id revenue
5 20210101 125 792727 190
5 20210101 228 792757 200
5 20210102 217 792977 270
Location_id and reservation_id are of no importance except for this particular task, so a simple MAX() or MIN() for these two columns will work.
Advertisement
Answer
Try sessionization:
Two nested queries. First, a counter that is at 0 when a condition is false, at 1 when it’s true; in our case, the previous reservation id not being exactly one less than the current one.
The second query queries the first query, and makes a running sum of the counter obtained before. This gives a session id.
Then, group by site id, date id and the obtained session id.
WITH
indata(site_id,date_id,location_id,reservation_id,revenue) AS (
SELECT 5,DATE '2021-01-01',125,792727,100
UNION ALL SELECT 5,DATE '2021-01-01',126,792728,90
UNION ALL SELECT 5,DATE '2021-01-01',228,792757,200
UNION ALL SELECT 5,DATE '2021-01-02',217,792977,50
UNION ALL SELECT 5,DATE '2021-01-02',218,792978,120
UNION ALL SELECT 5,DATE '2021-01-02',219,792979,100
)
,
with_counter AS (
SELECT
site_id
, date_id
, location_id
, reservation_id
, revenue
, CASE
WHEN reservation_id - LAG(reservation_id) OVER(
PARTITION BY site_id ORDER BY date_id,reservation_id
) > 1
THEN 1
ELSE 0
END AS counter
FROM indata
)
,
with_session AS (
SELECT
site_id
, date_id
, location_id
, reservation_id
, revenue
, SUM(counter) OVER(
PARTITION BY site_id ORDER BY date_id,reservation_id
) AS session_id
FROM with_counter
-- test output
-- out site_id | date_id | location_id | reservation_id | revenue | session_id
-- out ---------+------------+-------------+----------------+---------+------------
-- out 5 | 2021-01-01 | 125 | 792727 | 100 | 0
-- out 5 | 2021-01-01 | 126 | 792728 | 90 | 0
-- out 5 | 2021-01-01 | 228 | 792757 | 200 | 1
-- out 5 | 2021-01-02 | 217 | 792977 | 50 | 2
-- out 5 | 2021-01-02 | 218 | 792978 | 120 | 2
-- out 5 | 2021-01-02 | 219 | 792979 | 100 | 2
)
SELECT
site_id
, date_id
, MIN(location_id ) AS location_id
, MIN(reservation_id) AS reservation_id
, SUM(revenue ) AS revenue
FROM with_session
GROUP BY
site_id
, date_id
, session_id
ORDER BY
site_id
, date_id
;
-- out site_id | date_id | location_id | reservation_id | revenue
-- out ---------+------------+-------------+----------------+---------
-- out 5 | 2021-01-01 | 125 | 792727 | 190
-- out 5 | 2021-01-01 | 228 | 792757 | 200
-- out 5 | 2021-01-02 | 217 | 792977 | 270