I have 2 dimension tables and 1 fact table as follows:
user_dim
user_id | user_name | user_joining_date |
---|---|---|
1 | Steve | 2013-01-04 |
2 | Adam | 2012-11-01 |
3 | John | 2013-05-05 |
4 | Tony | 2012-01-01 |
5 | Dan | 2010-01-01 |
6 | Alex | 2019-01-01 |
7 | Kim | 2019-01-01 |
bundle_dim
bundle_id | bundle_name | bundle_type | bundle_cost_per_day |
---|---|---|---|
101 | movies and TV | prime | 5.5 |
102 | TV and sports | prime | 6.5 |
103 | Cooking | prime | 7 |
104 | Sports and news | prime | 5 |
105 | kids movie | extra | 2 |
106 | kids educative | extra | 3.5 |
107 | spanish news | extra | 2.5 |
108 | Spanish TV and sports | extra | 3.5 |
109 | Travel | extra | 2 |
plans_fact
user_id | bundle_id | bundle_start_date | bundle_end_date |
---|---|---|---|
1 | 101 | 2019-10-10 | 2020-10-10 |
2 | 107 | 2020-01-15 | (null) |
2 | 106 | 2020-01-15 | 2020-12-31 |
2 | 101 | 2020-01-15 | (null) |
2 | 103 | 2020-01-15 | 2020-02-15 |
1 | 101 | 2020-10-11 | (null) |
1 | 107 | 2019-10-10 | 2020-10-10 |
1 | 105 | 2019-10-10 | 2020-10-10 |
4 | 101 | 2021-01-01 | 2021-02-01 |
3 | 104 | 2020-02-17 | 2020-03-17 |
2 | 108 | 2020-01-15 | (null) |
4 | 102 | 2021-01-01 | (null) |
4 | 103 | 2021-01-01 | (null) |
4 | 108 | 2021-01-01 | (null) |
5 | 103 | 2020-01-15 | (null) |
5 | 101 | 2020-01-15 | 2020-02-15 |
6 | 101 | 2021-01-01 | 2021-01-17 |
6 | 101 | 2021-01-20 | (null) |
6 | 108 | 2021-01-01 | (null) |
7 | 104 | 2020-02-17 | (null) |
7 | 103 | 2020-01-17 | 2020-01-18 |
1 | 102 | 2020-12-11 | (null) |
2 | 106 | 2021-01-01 | (null) |
7 | 107 | 2020-01-15 | (null) |
note: NULL bundle_end_date refers to active subscription.
user active days can be calculated as: bundle_end_date - bundle_start_date
(for the given bundle)
total revenue per user could be calculated as : total no. of active days * bundle rate per day
I am looking to write a query to find revenue generated per user per year.
Here is what I have for the overall revenue per user:
select pf.user_id , sum(datediff(day, pf.bundle_start_date, coalesce(pf.bundle_end_date, getdate())) * bd.price_per_day) total_cost_per_bundle from plans_fact pf inner join bundle_dim bd on bd.bundle_id = pf.bundle_id group by pf.user_id order by pf.user_id;
Advertisement
Answer
You need a ‘year’ table to help parse out each multi-year spanning row into it’s seperate years. For each year, you need to also recalculate the start and end dates. That’s what I do in the yearParsed
cte in the code below. I hard code the years into the join statement that creates y
. You probably will do it different but however you get those values will work.
After that, pretty much sum as you did before, just adding the year column to your grouping.
Aside from that, all I did was move the null coalesce logic to the cte to make the overall logic simpler.
with yearParsed as ( select pf.*, y.year, startDt = iif(pf.bundle_start_date > y.startDt, pf.bundle_start_date, y.startDt), endDt = iif(ap.bundle_end_date < y.endDt, ap.bundle_end_date, y.endDt) from plans_fact pf cross apply (select bundle_end_date = isnull(pf.bundle_end_date, getdate())) ap join (values (2019, '2019-01-01', '2019-12-31'), (2020, '2020-01-01', '2020-12-31'), (2021, '2021-01-01', '2021-12-31') ) y (year, startDt, endDt) on pf.bundle_start_date <= y.endDt and ap.bundle_end_date >= y.startDt ) select yp.user_id, yp.year, total_cost_per_bundle = sum(datediff(day, yp.startDt, yp.endDt) * bd.bundle_cost_per_day) from yearParsed yp join bundle_dim bd on bd.bundle_id = yp.bundle_id group by yp.user_id, yp.year order by yp.user_id, yp.year;
Now, if this is common, you should probably create a base-table for your ‘year’ table. But if it’s not common, but for this report you don’t want to have to keep coming back to hard-code the year information into the y
table, you can do this:
declare @yearTable table ( year int, startDt char(10), endDt char(10) ); with y as ( select year = year(min(pf.bundle_start_date)) from @plans_fact pf union all select year + 1 from y where year < year(getdate()) ) insert @yearTable select year, startDt = convert(char(4),year) + '-01-01', endDt = convert(char(4),year) + '-12-31' from y;
and it will create the appropriate years for you. But you can see why creating a base table may be preferred if you have this or a similar need often.