Group By based on consequtive flag in Redshift (Gaps and Islands problem)

Question

I am trying to solve "gaps and islands" and group consecutive checks together. My data looks like this I want to group by consecutive location_id and consecutive reservation_id (both should be consecutive respectively) within same date and site_id, and sum revenue. so for the example above the output should be: Location_id and reservation_id are of no importance except for this

Accepted Answer

Try sessionization:Two nested queries. First, a counter that is at 0 when a condition is false, at 1 when it&#8217;s true; in our case, the previous reservation id not being exactly one less than the current one.The second query queries the first query, and makes a running sum of the counter obtained before. This gives a session id.Then, group by site id, date id and the obtained session id.WITHindata(site_id,date_id,location_id,reservation_id,revenue) AS (          SELECT 5,DATE '2021-01-01',125,792727,100UNION ALL SELECT 5,DATE '2021-01-01',126,792728,90UNION ALL SELECT 5,DATE '2021-01-01',228,792757,200UNION ALL SELECT 5,DATE '2021-01-02',217,792977,50UNION ALL SELECT 5,DATE '2021-01-02',218,792978,120UNION ALL SELECT 5,DATE '2021-01-02',219,792979,100),with_counter AS (  SELECT    site_id  , date_id  , location_id  , reservation_id  , revenue  , CASE      WHEN reservation_id - LAG(reservation_id) OVER(         PARTITION BY site_id ORDER BY date_id,reservation_id      ) > 1      THEN 1      ELSE 0    END AS counter  FROM indata),with_session AS (  SELECT    site_id  , date_id  , location_id  , reservation_id  , revenue  , SUM(counter) OVER(      PARTITION BY site_id ORDER BY date_id,reservation_id    ) AS session_id  FROM with_counter  -- test output ...  -- out  site_id |  date_id   | location_id | reservation_id | revenue | session_id   -- out ---------+------------+-------------+----------------+---------+------------  -- out        5 | 2021-01-01 |         125 |         792727 |     100 |          0  -- out        5 | 2021-01-01 |         126 |         792728 |      90 |          0  -- out        5 | 2021-01-01 |         228 |         792757 |     200 |          1  -- out        5 | 2021-01-02 |         217 |         792977 |      50 |          2  -- out        5 | 2021-01-02 |         218 |         792978 |     120 |          2  -- out        5 | 2021-01-02 |         219 |         792979 |     100 |          2)SELECT  site_id, date_id, MIN(location_id   ) AS location_id, MIN(reservation_id) AS reservation_id, SUM(revenue       ) AS revenueFROM with_sessionGROUP BY  site_id, date_id, session_idORDER BY  site_id, date_id;-- out  site_id |  date_id   | location_id | reservation_id | revenue -- out ---------+------------+-------------+----------------+----------- out        5 | 2021-01-01 |         125 |         792727 |     190-- out        5 | 2021-01-01 |         228 |         792757 |     200-- out        5 | 2021-01-02 |         217 |         792977 |     270

Advertisement

Answer