Skip to content
Advertisement

Merge lines over timespan in SCD2 table

I’m having the following table sourced from a SCD2 table. From this source table, I have selected only a few columns, which reults in several lines looking exactly similar. I want to remove the unnecessary lines, those that contain the same data, and have the ValidFrom column showing the first value and ValidTo column showing the last value within “the timespan group”.

Source data:

| Item     | Color      | ValidFrom     | ValidTo    |
| -------- | ---------- | ------------- | ---------- |
| Ball     | Red        | 2020-01-01    | 2020-03-24 |
| Ball     | Blue       | 2020-03-25    | 2020-04-12 |
| Ball     | Blue       | 2020-04-13    | 2020-05-07 |
| Ball     | Blue       | 2020-05-08    | 2020-11-14 |
| Ball     | Red        | 2020-11-15    | 9999-12-31 |
| Doll     | Yellow     | 2020-01-01    | 2020-03-24 |
| Doll     | Green      | 2020-03-25    | 2020-04-12 |
| Doll     | Green      | 2020-04-13    | 2020-05-07 |
| Doll     | Green      | 2020-05-08    | 2020-11-14 |
| Doll     | Pink       | 2020-11-15    | 9999-12-31 | 

What I want to accomplish is this:

| Item     | Color      | ValidFrom     | ValidTo    |
| -------- | ---------- | ------------- | ---------- |
| Ball     | Red        | 2020-01-01    | 2020-03-24 |
| Ball     | Blue       | 2020-03-25    | 2020-11-14 |
| Ball     | Red        | 2020-11-15    | 9999-12-31 |
| Doll     | Yellow     | 2020-01-01    | 2020-03-24 |
| Doll     | Green      | 2020-03-25    | 2020-11-14 |
| Doll     | Pink       | 2020-11-15    | 9999-12-31 | 

Note that the Item Ball at first has the color Red, then Blue and then goes back to Red. That makes things a bit more complicated, from what I have learned.

Thanks for your help.

Advertisement

Answer

Your data is very regular. You seem to just want to combine adjacent, tiled, records that have no overlaps or gaps. However the following handles gaps and more general overlaps:

select item, color, min(validfrom), max(validto)
from (select t.*,
             sum(case when prev_validto >= dateadd(day, -1, validfrom)
                      then 0 else 1
                 end) over (partition by item order by validfrom) as grp
      from (select t.*,
                   lag(validto) over (partition by item, color order by validfrom) as prev_validto
            from t
            ) t
     ) t
group by item, color, grp;

You are looking for islands of rows in the original data where the “islands” have the same item, color, and adjacent dates. This determines where islands start by looking at the previous row for the same item and color. If there is no such row or the row ends before the current row begins, then the current row is the beginning of an island.

The grp is then the cumulative sum of “island beginnings”, and that can be used for aggregating and getting the final results.

Your specific data is quite constrained — perfectly tiled with one row ending the day before the next begins. You can do something very similar using left join:

select item, color, min(validfrom), max(validto)
from (select t.*,
             sum(case when tprev.color is null then 1 else 0
                 end) over (partition by t.item order by t.validfrom) as grp
      from t left join
           t tprev
           on tprev.item = t.item and
              tprev.color = t.color and
              tprev.validto = dateadd(day, -1, t.validfrom)
     ) t
group by item, color, grp
order by item, min(validfrom);

Here is a db<>fiddle illustrating both methods

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement