Skip to content
Advertisement

Using SQL, when there are multiple columns with dates. how can I determine and return a row where the date is between the other rows?

I am building a table for a data warehouse that needs to have a row for each change that occurs. The issue is that there are sometimes changes that occur in the subgroups and I can’t figure out how to show those changes.

For example, I have the following table:

RowNumber Code CorrectedProductYear ProductYear Product CategoryYear Category PartYear Parts KeepRow
1 ABC 2012 2012 Sport Car 1995 Auto 1980 Wheels N
2 ABC 2012 2012 Sport Car 1996 Cars 1980 Wheels N
3 ABC 2012 2012 Sport Car 1998 Transport 1980 Wheels Y
4 ABC 2014 2014 Sedan 1995 Auto 1980 Wheels N
5 ABC 2014 2014 Sedan 1996 Cars 1980 Wheels N
6 ABC 2014 2014 Sedan 1998 Transport 1980 Wheels Y
7 ABC 2018 2018 Luxury Car 1995 Auto 1980 Wheels N
8 ABC 2018 2018 Luxury Car 1996 Cars 1980 Wheels N
9 ABC 2018 2018 Luxury Car 1998 Transport 1980 Wheels N
10 ABC 2016 2018 Sedan 2016 Vehicle 1980 Wheels Y
11 ABC 2018 2018 Luxury Car 2017 Motorcar 1980 Wheels Y
12 ABC 2018 2018 Luxury Car 2017 Motorcar 1980 Wheels N
13 ABC 2019 2018 Luxury Car 2017 Motorcar 2019 Fancy Wheels Y
14 ABC 2020 2020 Super Luxury Car 2017 Motorcar 2019 Fancy Wheels N
15 ABC 2020 2020 Super Luxury Car 2017 Motorcar 2019 Fancy Wheels Y

I marked the KeepRow with a ‘Y’ to show what the find result should look like. It should be this:

RowNumber Code CorrectedProductYear Product CategoryYear Category PartYear Parts KeepRow
3 ABC 2012 Sport Car 1998 Transport 1980 Wheels Y
6 ABC 2014 Sedan 1998 Transport 1980 Wheels Y
10 ABC 2016 Sedan 2016 Vehicle 1980 Wheels Y
11 ABC 2018 Luxury Car 2017 Motorcar 1980 Wheels Y
13 ABC 2019 Luxury Car 2017 Motorcar 2019 Fancy Wheels Y
15 ABC 2020 Super Luxury Car 2017 Motorcar 2019 Fancy Wheels Y

In this table the ProductYear is the “main” year. When the subgroupings (Category and Parts) have a change in their description/year, that needs to be captured in the ProductYear. This occurs in:

  • Row 10 – where the Category changes to Vehicle in 2016, so it is between the ProductYear of 2014 Sedan and 2018 Luxury Car. Additionally, the Product needs to revert back to Sedan because Luxury Car had not yet been produced.
  • Row 13 – Parts changes to Fancy Wheels in 2019 which is between the ProductYear of 2018 and 2020.

I know that I can do an Max Effective Date query to only pull the most recent CategoryYear and PartYear, but then I miss the changes that might occur between the ProductYear.

I’ve tried different variations of LAG and LEAD and well as LAST_VALUE and some of the other Window functions, but I’m at stuck and can’t figure this one out.

I also think that this might not be able to be completed with just SQL. I do have the ability to do some SSIS if that is a better way to proceed.

Any help would be enormously appreciated!

Thank you.

Advertisement

Answer

I think this is something like what you are looking for. I had to split it into two queries as you are filtering once for CategoryYear and then again for PartYear. This might help you get on the right path:

SELECT RowNumber,Code,CorrectedProductYear,ProductYear,Product,CategoryYear,Category,PartYear,Parts 
FROM
(
    SELECT *, ROW_NUMBER() OVER(PARTITION BY Product order by ProductYear) RN,
        LAG(ProductYear,1) OVER(order by ProductYear) LastProdYear,
        LEAD(ProductYear,1)  OVER(order by ProductYear) NextProdYear,
        LAG(Category,1) OVER(Order by Product,ProductYear) LastCategory
    FROM Products 
) T

WHERE  CategoryYear > LastProdYear and CategoryYear < NextProdYear
and KeepRow = 'Y' and Category <> LastCategory and RN > 1

UNION ALL

SELECT RowNumber,Code,CorrectedProductYear,ProductYear,Product,CategoryYear,Category,PartYear,Parts 
FROM
(
    SELECT *, ROW_NUMBER() OVER(PARTITION BY Product order by ProductYear) RN,
        LAG(ProductYear,1) OVER(order by ProductYear) LastProdYear,
        LEAD(ProductYear,1)  OVER(order by ProductYear) NextProdYear,
        LAG(Parts,1) OVER(Order by Product,ProductYear) LastParts
    FROM Products 
) T
WHERE  PartYear > LastProdYear and PartYear < NextProdYear
and KeepRow = 'Y' and Parts <> LastParts and RN > 1
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement