I am building a table for a data warehouse that needs to have a row for each change that occurs. The issue is that there are sometimes changes that occur in the subgroups and I can’t figure out how to show those changes.
For example, I have the following table:
RowNumber | Code | CorrectedProductYear | ProductYear | Product | CategoryYear | Category | PartYear | Parts | KeepRow |
---|---|---|---|---|---|---|---|---|---|
1 | ABC | 2012 | 2012 | Sport Car | 1995 | Auto | 1980 | Wheels | N |
2 | ABC | 2012 | 2012 | Sport Car | 1996 | Cars | 1980 | Wheels | N |
3 | ABC | 2012 | 2012 | Sport Car | 1998 | Transport | 1980 | Wheels | Y |
4 | ABC | 2014 | 2014 | Sedan | 1995 | Auto | 1980 | Wheels | N |
5 | ABC | 2014 | 2014 | Sedan | 1996 | Cars | 1980 | Wheels | N |
6 | ABC | 2014 | 2014 | Sedan | 1998 | Transport | 1980 | Wheels | Y |
7 | ABC | 2018 | 2018 | Luxury Car | 1995 | Auto | 1980 | Wheels | N |
8 | ABC | 2018 | 2018 | Luxury Car | 1996 | Cars | 1980 | Wheels | N |
9 | ABC | 2018 | 2018 | Luxury Car | 1998 | Transport | 1980 | Wheels | N |
10 | ABC | 2016 | 2018 | Sedan | 2016 | Vehicle | 1980 | Wheels | Y |
11 | ABC | 2018 | 2018 | Luxury Car | 2017 | Motorcar | 1980 | Wheels | Y |
12 | ABC | 2018 | 2018 | Luxury Car | 2017 | Motorcar | 1980 | Wheels | N |
13 | ABC | 2019 | 2018 | Luxury Car | 2017 | Motorcar | 2019 | Fancy Wheels | Y |
14 | ABC | 2020 | 2020 | Super Luxury Car | 2017 | Motorcar | 2019 | Fancy Wheels | N |
15 | ABC | 2020 | 2020 | Super Luxury Car | 2017 | Motorcar | 2019 | Fancy Wheels | Y |
I marked the KeepRow with a ‘Y’ to show what the find result should look like. It should be this:
RowNumber | Code | CorrectedProductYear | Product | CategoryYear | Category | PartYear | Parts | KeepRow |
---|---|---|---|---|---|---|---|---|
3 | ABC | 2012 | Sport Car | 1998 | Transport | 1980 | Wheels | Y |
6 | ABC | 2014 | Sedan | 1998 | Transport | 1980 | Wheels | Y |
10 | ABC | 2016 | Sedan | 2016 | Vehicle | 1980 | Wheels | Y |
11 | ABC | 2018 | Luxury Car | 2017 | Motorcar | 1980 | Wheels | Y |
13 | ABC | 2019 | Luxury Car | 2017 | Motorcar | 2019 | Fancy Wheels | Y |
15 | ABC | 2020 | Super Luxury Car | 2017 | Motorcar | 2019 | Fancy Wheels | Y |
In this table the ProductYear is the “main” year. When the subgroupings (Category and Parts) have a change in their description/year, that needs to be captured in the ProductYear. This occurs in:
- Row 10 – where the Category changes to Vehicle in 2016, so it is between the ProductYear of 2014 Sedan and 2018 Luxury Car. Additionally, the Product needs to revert back to Sedan because Luxury Car had not yet been produced.
- Row 13 – Parts changes to Fancy Wheels in 2019 which is between the ProductYear of 2018 and 2020.
I know that I can do an Max Effective Date query to only pull the most recent CategoryYear and PartYear, but then I miss the changes that might occur between the ProductYear.
I’ve tried different variations of LAG and LEAD and well as LAST_VALUE and some of the other Window functions, but I’m at stuck and can’t figure this one out.
I also think that this might not be able to be completed with just SQL. I do have the ability to do some SSIS if that is a better way to proceed.
Any help would be enormously appreciated!
Thank you.
Advertisement
Answer
I think this is something like what you are looking for. I had to split it into two queries as you are filtering once for CategoryYear and then again for PartYear. This might help you get on the right path:
SELECT RowNumber,Code,CorrectedProductYear,ProductYear,Product,CategoryYear,Category,PartYear,Parts FROM ( SELECT *, ROW_NUMBER() OVER(PARTITION BY Product order by ProductYear) RN, LAG(ProductYear,1) OVER(order by ProductYear) LastProdYear, LEAD(ProductYear,1) OVER(order by ProductYear) NextProdYear, LAG(Category,1) OVER(Order by Product,ProductYear) LastCategory FROM Products ) T WHERE CategoryYear > LastProdYear and CategoryYear < NextProdYear and KeepRow = 'Y' and Category <> LastCategory and RN > 1 UNION ALL SELECT RowNumber,Code,CorrectedProductYear,ProductYear,Product,CategoryYear,Category,PartYear,Parts FROM ( SELECT *, ROW_NUMBER() OVER(PARTITION BY Product order by ProductYear) RN, LAG(ProductYear,1) OVER(order by ProductYear) LastProdYear, LEAD(ProductYear,1) OVER(order by ProductYear) NextProdYear, LAG(Parts,1) OVER(Order by Product,ProductYear) LastParts FROM Products ) T WHERE PartYear > LastProdYear and PartYear < NextProdYear and KeepRow = 'Y' and Parts <> LastParts and RN > 1