I have the following table (#CategoryWeight) where weight and factor values are stored for each category with date ranges. I need to aggregate / simplify this data, when possible, so that consecutive data ranges are combined into a wide range for the same values of weight and factor.
DROP TABLE IF EXISTS #CategoryWeight;
CREATE TABLE #CategoryWeight ( [CategoryId] bigint, [weight] float(8), [factor] float(8), [startYear] nvarchar(60), [endYear] nvarchar(60) )
INSERT INTO #CategoryWeight
VALUES
( 42, 1, 0, N'2009', N'2014' ),
( 42, 1, 0, N'2009', N'2019' ),
( 42, 1, 0, N'2015', N'2017' ),
( 42, 1, 0, N'2018', N'2019' ),
( 42, 1, 1, N'2020', N'9999' ),
( 40, 1, 0, N'2009', N'2014' ),
( 40, 1, 0, N'2009', N'2017' ),
( 40, 1, 0, N'2015', N'2017' ),
( 40, 1, 0, N'2020', N'9999' ),
( 40, 1, 1, N'2018', N'2019' ),
( 45, 1, 0, N'2009', N'2014' ),
( 45, 0, 0, N'2015', N'2017' ),
( 45, 1, 0, N'2020', N'9999' ),
( 45, 0, 1, N'2018', N'2019' );'
CategoryID weight factor startYear endYear
42 1 0 2009 2014
42 1 0 2009 2019
42 1 0 2015 2017
42 1 0 2018 2019
42 1 1 2020 9999
40 1 0 2009 2014
40 1 0 2009 2017
40 1 0 2015 2017
40 1 0 2020 9999
40 1 1 2018 2019
45 1 0 2009 2014
45 0 0 2015 2017
45 1 0 2020 9999
45 0 1 2018 2019
Expected result:
CategoryID weight factor startYear endYear
42 1 0 2009 2019
42 1 1 2020 9999
40 1 0 2009 2017
40 1 1 2018 2019
40 1 0 2020 9999
45 1 0 2009 2014
45 0 0 2015 2017
45 0 1 2018 2019
45 1 0 2020 9999
Advertisement
Answer
You have overlapping time periods. That makes any assumption about the data troublesome — because the same year could have different values on different rows (nothing in your question precludes this).
Hence, the approach that I recommend is to expand out the data and then recombine it into chunks where the values are the same. The following uses a recursive CTE to expand the data and then a gaps-and-islands trick to recombine it:
with cte as (
select categoryid, weight, factor,
convert(int, startyear) as year, convert(int, endyear) as endyear
from categoryweight
union all
select categoryid, weight, factor,
year + 1, endyear
from cte
where year < endyear
)
select categoryid, weight, factor, min(year), max(year)
from (select categoryid, weight, factor, year,
row_number() over (partition by categoryid, weight, factor order by year) as seqnum
from (select distinct categoryid, weight, factor, year from cte) cte
) cte
group by categoryid, weight, factor, (year - seqnum)
order by categoryid, min(year)
option (maxrecursion 0);
Here is a db<>fiddle.
I notice a few things about your data.
- You are using
float
for some of the values. This is very dangerous, because two values might look the same but really differ by a very small amount. Use thedecimal
/numeric
type instead, so what-you-see-is-what-you-get. - The
year
values are strings, when they should be integers. Use the proper data types!