Skip to content
Advertisement

Gap and Island problem – query not working for all periods

I have to create a query to find the gaps and islands between dates. This seems to be a standard gaps and island problem. To show my issue I will use sample of data. The queries are executed in Snowflake.

This code gives me a sample of table.

Then I have the code to find gaps and islands:

The results are:

enter image description here

As you see the problem is for period 8/28/2017 – 9/19/2017. This period should not be a separate island, because it should be included in the period: 8/23/2017 – 9/23/2017.

Do you have any idea how I can modify my query to get the correct results (so instead 6 I should have 5 islands as 8/28/2017 – 9/19/2017 should not be island). This just example of data, so I am looking for unversal solution, but so far I have not figure out the correct approach.

Advertisement

Answer

You can express the gaps-and-islands logic like this:

Here is a db<>fiddle.

The idea is to look for the maximum enddate on all the “earlier” rows. This value is used to check if there is an overlap.

So, the innermost subquery calculates the previous enddate. The middle subquery does a cumulative sum of the beginnings of groups to assign a group identifier.

The outer query just aggregates by the group identifier.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement