Skip to content
Advertisement

Arranging data by year and proportion

I have one table with the following format:

+---------+------------+------------+------------+----------+--+--+
| ENTITY  | SUB_ENTITY | PROPORTION | START_DATE | END_DATE |  |  |
+---------+------------+------------+------------+----------+--+--+
| 1000160 |     855960 | 0.6        |   19621001 | 20080702 |  |  |
| 1000203 |     054453 | 0.07       |   19720101 | 20170503 |  |  |
| 1000203 |     739846 | 0.07       |   19720101 | 20170503 |  |  |
| 1000203 |     283733 | 0.07       |   19720101 | 20170503 |  |  |
| 1000203 |     547953 | 3.33       |   19720101 | 20170503 |  |  |
| 1000203 |     984244 | 3.33       |   19720101 | NULL     |  |  |
| 1000233 |     857387 | 0.02       |   19541118 | NULL     |  |  |
| 1000233 |     862361 | 0.02       |   19541118 | NUILL    |  |  |
| 1000233 |     973876 | 0.02       |   19541118 | 20120321 |  |  |
| 1000233 |     017440 | NULL       |   19541118 | 20110714 |  |  |
| 1000233 |     575824 | NULL       |   19541118 | 20071127 |  |  |
| 1000241 |     006706 | 26         |   20030601 | 20130628 |  |  |
| 1000241 |     985828 | 27.2       |   20030601 | 20130628 |  |  |
| 1000241 |     060678 | 46.21      |   20030601 | 20180914 |  |  |
| 1000241 |     319255 | NULL       |   20030608 | 20130628 |  |  |
| 1000267 |     206424 | 5.73       |   19580901 | 20120530 |  |  |
| 1000267 |     599785 | 6.15       |   19580901 | 20120530 |  |  |
| 1000267 |     709129 | 6.7        |   19580901 | 20120530 |  |  |
| 1000267 |     805343 | 35.75      |   19580901 | 20120530 |  |  |
+---------+------------+------------+------------+----------+--+--+

Note 1: [End_Date = NULL] means that participation has not ended. [Proportion = NULL] means that participation is 0.

Output: I want to have the proportion of each sub_entity for every active year.

Note 2: One Sub_Entity can be a proportion of different Entities.

The final table will be something like:

+---------+------------+------------------+------------+
| Entity  | Sub_Entity |       Year       | Proportion |
+---------+------------+------------------+------------+
| 1000160 |     855960 | 1962             | 0.6        |
| 1000160 |     855960 | .                | 0.6        |
| 1000160 |     855960 | .                | 0.6        |
| 1000160 |     855960 | 2008             | 0.6        |
| 1000203 |     054453 | 1972             | 0.07       |
| 1000203 |     054453 | .                | 0.07       |
| 1000203 |     054453 | .                | 0.07       |
| 1000203 |     054453 | 2017             | 0.07       |
| 1000203 |     739846 | 1972             | 0.07       |
| 1000203 |     739846 | .                | 0.07       |
| 1000203 |     739846 | .                | 0.07       |
| 1000203 |     739846 | 2017             | 0.07       |
| 1000203 |     547953 | 1972             | 3.33       |
| 1000203 |     547953 | .                | 3.33       |
| 1000203 |     547953 | .                | 3.33       |
| 1000203 |     547953 | 2017             | 3.33       |
| 1000203 |     984244 | 1972             | 3.33       |
| 1000203 |     984244 | .                | 3.33       |
| 1000203 |     984244 | .                | 3.33       |
| 1000203 |     984244 | (This Year 2019) | 3.33       |
|         |            |                  |            |
+---------+------------+------------------+------------+

Note 3: I used points to denote years in between that range.

Advertisement

Answer

Try this Regressionist:

select 1000160 as ENTITY,    855960 as SUB_ENTITY, 0.6    as PROPORTION,  
19621001 as START_DATE, 20080702 as END_DATE 
into #tmp
union select 1000203, 054453, 0.07      ,   19720101, 20170503
union select 1000203, 739846, 0.07      ,   19720101, 20170503 
union select 1000203, 283733, 0.07      ,   19720101, 20170503 
union select 1000203, 547953, 3.33      ,   19720101, 20170503 
union select 1000203, 984244, 3.33      ,   19720101, NULL     
union select 1000233, 857387, 0.02      ,   19541118, NULL     
union select 1000233, 862361, 0.02      ,   19541118, NULL    
union select 1000233, 973876, 0.02      ,   19541118, 20120321 
union select 1000233, 017440, NULL      ,   19541118, 20110714 
union select 1000233, 575824, NULL      ,   19541118, 20071127 
union select 1000241, 006706, 26        ,   20030601, 20130628 
union select 1000241, 985828, 27.2      ,   20030601, 20130628 
union select 1000241, 060678, 46.21     ,   20030601, 20180914 
union select 1000241, 319255, NULL      ,   20030608, 20130628 
union select 1000267, 206424, 5.73      ,   19580901, 20120530 
union select 1000267, 599785, 6.15      ,   19580901, 20120530 
union select 1000267, 709129, 6.7       ,   19580901, 20120530 
union select 1000267, 805343, 35.75     ,   19580901, 20120530

select ENTITY, SUB_ENTITY , left(START_DATE,4) + n,  PROPORTION  from #tmp cross join 
(
    SELECT TOP (100) n = CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id]))-1
    FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2 
)a
where left(START_DATE,4)  + n <= left(isnull(END_DATE, year(getdate())),4) --and ENTITY =  1000203 and SUB_ENTITY = 984244     
order by 2

Explanation:

This portion:
(
    SELECT TOP (100) n = CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id]))-1
    FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2 
)a

gives me number between 0 and 99. With the cross join I add those number to every year in the START_DATE as long as START_DATE + (0..99) doesnt exceed END_DATE or the current year is END_DATE is null.

Hope its a little clearier!

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement