Skip to content
Advertisement

What is the best way in SQL to combine sequential events based on matching end time to start time?

That database I work in records events based on a part ID and the times in which it is active. The issue I came across is these events are truncated to fit within a single day. If the active time for a part carries over to the next day, the event will be split by the number of days it ties in to. In this case, the active start timestamp on the next day matches the active end timestamp of the previous. I am trying to come up with a way to combine these split events into a single record with the “true” start and end times for when each part is active.


Here is a sample of what the dataset looks like:

I am trying to reduce it down to the following:

There are ~70 different part numbers and each having anywhere up to 200 different active events over the observation period. Active events can last up to several days. As I am fairly inexperienced in SQL any help would be greatly appreciated.

Advertisement

Answer

This is a gaps and island problem, where you want to group together adjacent rows.

Here is one solution that uses window functions:

The most inner query retrieves the end date of the previous record that has the same part_id. The intermediate query does a window sum that increases by 1 every time the previous end date is not equal to the current start date: this defines the groups of adjacent rows. Finally, the outer query aggregates by group, and computes the start and end of the range.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement