Skip to content
Advertisement

Group By based on consequtive flag in Redshift (Gaps and Islands problem)

I am trying to solve “gaps and islands” and group consecutive checks together. My data looks like this

I want to group by consecutive location_id and consecutive reservation_id (both should be consecutive respectively) within same date and site_id, and sum revenue. so for the example above the output should be:

Location_id and reservation_id are of no importance except for this particular task, so a simple MAX() or MIN() for these two columns will work.

Advertisement

Answer

Try sessionization:

Two nested queries. First, a counter that is at 0 when a condition is false, at 1 when it’s true; in our case, the previous reservation id not being exactly one less than the current one.

The second query queries the first query, and makes a running sum of the counter obtained before. This gives a session id.

Then, group by site id, date id and the obtained session id.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement