I have the following table (#CategoryWeight) where weight and factor values are stored for each category with date ranges. I need to aggregate / simplify this data, when possible, so that consecutive data ranges are combined into a wide range for the same values of weight and factor.
DROP TABLE IF EXISTS #CategoryWeight; CREATE TABLE #CategoryWeight ( [CategoryId] bigint, [weight] float(8), [factor] float(8), [startYear] nvarchar(60), [endYear] nvarchar(60) ) INSERT INTO #CategoryWeight VALUES ( 42, 1, 0, N'2009', N'2014' ), ( 42, 1, 0, N'2009', N'2019' ), ( 42, 1, 0, N'2015', N'2017' ), ( 42, 1, 0, N'2018', N'2019' ), ( 42, 1, 1, N'2020', N'9999' ), ( 40, 1, 0, N'2009', N'2014' ), ( 40, 1, 0, N'2009', N'2017' ), ( 40, 1, 0, N'2015', N'2017' ), ( 40, 1, 0, N'2020', N'9999' ), ( 40, 1, 1, N'2018', N'2019' ), ( 45, 1, 0, N'2009', N'2014' ), ( 45, 0, 0, N'2015', N'2017' ), ( 45, 1, 0, N'2020', N'9999' ), ( 45, 0, 1, N'2018', N'2019' );' CategoryID weight factor startYear endYear 42 1 0 2009 2014 42 1 0 2009 2019 42 1 0 2015 2017 42 1 0 2018 2019 42 1 1 2020 9999 40 1 0 2009 2014 40 1 0 2009 2017 40 1 0 2015 2017 40 1 0 2020 9999 40 1 1 2018 2019 45 1 0 2009 2014 45 0 0 2015 2017 45 1 0 2020 9999 45 0 1 2018 2019
Expected result:
CategoryID weight factor startYear endYear 42 1 0 2009 2019 42 1 1 2020 9999 40 1 0 2009 2017 40 1 1 2018 2019 40 1 0 2020 9999 45 1 0 2009 2014 45 0 0 2015 2017 45 0 1 2018 2019 45 1 0 2020 9999
Advertisement
Answer
You have overlapping time periods. That makes any assumption about the data troublesome — because the same year could have different values on different rows (nothing in your question precludes this).
Hence, the approach that I recommend is to expand out the data and then recombine it into chunks where the values are the same. The following uses a recursive CTE to expand the data and then a gaps-and-islands trick to recombine it:
with cte as ( select categoryid, weight, factor, convert(int, startyear) as year, convert(int, endyear) as endyear from categoryweight union all select categoryid, weight, factor, year + 1, endyear from cte where year < endyear ) select categoryid, weight, factor, min(year), max(year) from (select categoryid, weight, factor, year, row_number() over (partition by categoryid, weight, factor order by year) as seqnum from (select distinct categoryid, weight, factor, year from cte) cte ) cte group by categoryid, weight, factor, (year - seqnum) order by categoryid, min(year) option (maxrecursion 0);
Here is a db<>fiddle.
I notice a few things about your data.
- You are using
float
for some of the values. This is very dangerous, because two values might look the same but really differ by a very small amount. Use thedecimal
/numeric
type instead, so what-you-see-is-what-you-get. - The
year
values are strings, when they should be integers. Use the proper data types!