Skip to content
Advertisement

Remove duplicate data in ‘date-ranged’ rows

I have a table like the following.

The date range is used to know for what time period the rest of the Attributes were valid, the problem i have is that there are several consecutive time ranges where the Attributes ramain the same, what I would like is to obtain the same data but without the duplicate rows.

From the previous example, my expected end result would be like this:

What I did was merge the 2nd and 3rd row into one (All attribute except the date were the same), but I kept the StartDate of the 2nd and the endDate of the 3rd row.

I first thought of grouping by the values obtaining the MAX and MIN like this

But as soon as I run it I realized that when the attributes change several times and go back to their original value I’d end up with overlapping intervals. I’ve been stuck for a while now trying to figure out how to fix this problem.

Here’s an example of what I meant in my previous statement.

When the initial data looks like the following:

Grouping the results would end up like the following

And what I’d like to obtain is this

Any help would be welcomed 🙂

EDIT: I’ll be uploading some sample data soon to make my problem a bit easier to understand.

EDIT2: Here’s a script with some of my data. From that sample what I’d like to obtain are the following rows.

Advertisement

Answer

EDITED, following comments. Try:

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement