I’m having the following table sourced from a SCD2 table. From this source table, I have selected only a few columns, which reults in several lines looking exactly similar. I want to remove the unnecessary lines, those that contain the same data, and have the ValidFrom column showing the first value and ValidTo column showing the last value within “the timespan group”.
Source data:
| Item | Color | ValidFrom | ValidTo | | -------- | ---------- | ------------- | ---------- | | Ball | Red | 2020-01-01 | 2020-03-24 | | Ball | Blue | 2020-03-25 | 2020-04-12 | | Ball | Blue | 2020-04-13 | 2020-05-07 | | Ball | Blue | 2020-05-08 | 2020-11-14 | | Ball | Red | 2020-11-15 | 9999-12-31 | | Doll | Yellow | 2020-01-01 | 2020-03-24 | | Doll | Green | 2020-03-25 | 2020-04-12 | | Doll | Green | 2020-04-13 | 2020-05-07 | | Doll | Green | 2020-05-08 | 2020-11-14 | | Doll | Pink | 2020-11-15 | 9999-12-31 |
What I want to accomplish is this:
| Item | Color | ValidFrom | ValidTo | | -------- | ---------- | ------------- | ---------- | | Ball | Red | 2020-01-01 | 2020-03-24 | | Ball | Blue | 2020-03-25 | 2020-11-14 | | Ball | Red | 2020-11-15 | 9999-12-31 | | Doll | Yellow | 2020-01-01 | 2020-03-24 | | Doll | Green | 2020-03-25 | 2020-11-14 | | Doll | Pink | 2020-11-15 | 9999-12-31 |
Note that the Item Ball at first has the color Red, then Blue and then goes back to Red. That makes things a bit more complicated, from what I have learned.
Thanks for your help.
Advertisement
Answer
Your data is very regular. You seem to just want to combine adjacent, tiled, records that have no overlaps or gaps. However the following handles gaps and more general overlaps:
select item, color, min(validfrom), max(validto) from (select t.*, sum(case when prev_validto >= dateadd(day, -1, validfrom) then 0 else 1 end) over (partition by item order by validfrom) as grp from (select t.*, lag(validto) over (partition by item, color order by validfrom) as prev_validto from t ) t ) t group by item, color, grp;
You are looking for islands of rows in the original data where the “islands” have the same item, color, and adjacent dates. This determines where islands start by looking at the previous row for the same item and color. If there is no such row or the row ends before the current row begins, then the current row is the beginning of an island.
The grp
is then the cumulative sum of “island beginnings”, and that can be used for aggregating and getting the final results.
Your specific data is quite constrained — perfectly tiled with one row ending the day before the next begins. You can do something very similar using left join
:
select item, color, min(validfrom), max(validto) from (select t.*, sum(case when tprev.color is null then 1 else 0 end) over (partition by t.item order by t.validfrom) as grp from t left join t tprev on tprev.item = t.item and tprev.color = t.color and tprev.validto = dateadd(day, -1, t.validfrom) ) t group by item, color, grp order by item, min(validfrom);
Here is a db<>fiddle illustrating both methods