I am trying to solve “gaps and islands” and group consecutive checks together. My data looks like this
site_id date_id location_id reservation_id revenue 5 20210101 125 792727 100 5 20210101 126 792728 90 5 20210101 228 792757 200 5 20210102 217 792977 50 5 20210102 218 792978 120 5 20210102 219 792979 100
I want to group by consecutive location_id and consecutive reservation_id (both should be consecutive respectively) within same date and site_id, and sum revenue. so for the example above the output should be:
site_id date_id location_id reservation_id revenue 5 20210101 125 792727 190 5 20210101 228 792757 200 5 20210102 217 792977 270
Location_id and reservation_id are of no importance except for this particular task, so a simple MAX() or MIN() for these two columns will work.
Advertisement
Answer
Try sessionization:
Two nested queries. First, a counter that is at 0 when a condition is false, at 1 when it’s true; in our case, the previous reservation id not being exactly one less than the current one.
The second query queries the first query, and makes a running sum of the counter obtained before. This gives a session id.
Then, group by site id, date id and the obtained session id.
WITH indata(site_id,date_id,location_id,reservation_id,revenue) AS ( SELECT 5,DATE '2021-01-01',125,792727,100 UNION ALL SELECT 5,DATE '2021-01-01',126,792728,90 UNION ALL SELECT 5,DATE '2021-01-01',228,792757,200 UNION ALL SELECT 5,DATE '2021-01-02',217,792977,50 UNION ALL SELECT 5,DATE '2021-01-02',218,792978,120 UNION ALL SELECT 5,DATE '2021-01-02',219,792979,100 ) , with_counter AS ( SELECT site_id , date_id , location_id , reservation_id , revenue , CASE WHEN reservation_id - LAG(reservation_id) OVER( PARTITION BY site_id ORDER BY date_id,reservation_id ) > 1 THEN 1 ELSE 0 END AS counter FROM indata ) , with_session AS ( SELECT site_id , date_id , location_id , reservation_id , revenue , SUM(counter) OVER( PARTITION BY site_id ORDER BY date_id,reservation_id ) AS session_id FROM with_counter -- test output ... -- out site_id | date_id | location_id | reservation_id | revenue | session_id -- out ---------+------------+-------------+----------------+---------+------------ -- out 5 | 2021-01-01 | 125 | 792727 | 100 | 0 -- out 5 | 2021-01-01 | 126 | 792728 | 90 | 0 -- out 5 | 2021-01-01 | 228 | 792757 | 200 | 1 -- out 5 | 2021-01-02 | 217 | 792977 | 50 | 2 -- out 5 | 2021-01-02 | 218 | 792978 | 120 | 2 -- out 5 | 2021-01-02 | 219 | 792979 | 100 | 2 ) SELECT site_id , date_id , MIN(location_id ) AS location_id , MIN(reservation_id) AS reservation_id , SUM(revenue ) AS revenue FROM with_session GROUP BY site_id , date_id , session_id ORDER BY site_id , date_id ; -- out site_id | date_id | location_id | reservation_id | revenue -- out ---------+------------+-------------+----------------+--------- -- out 5 | 2021-01-01 | 125 | 792727 | 190 -- out 5 | 2021-01-01 | 228 | 792757 | 200 -- out 5 | 2021-01-02 | 217 | 792977 | 270