I have some monthly data as below:
x
Month | Category | Monthly Value
2020-07-01| Food | 1
2020-07-01| Entertainment | 4
2020-08-01| Entertainment | 2
2020-09-01| Entertainment | 1
I want to calculate the cumulative sum for each Category and get the result as below:
Month | Category | Cumulative Sum
2020-07-01 | Food | 1
2020-08-01 | Food | 1
2020-09-01 | Food | 1
2020-07-01 | Entertainment | 4
2020-08-01 | Entertainment | 6
2020-09-01 | Entertainment | 7
I’m writing the window sum query as below:
SELECT
month
, category
, sum("monthly value") OVER (PARTITION BY "category" ORDER BY "month" ASC ROWS UNBOUNDED PRECEDING) AS "Cumulative Sum"
from (
select date_trunc('month', daily_date) as month, category, sum(daily_value) as "monthly value"
from sample_table
group by date_trunc('month', daily_date) as month, category)
But, I’m getting as follows:
Month | Category | Cumulative Sum
2020-07-01 | Food | 1
2020-07-01 | Entertainment | 4
2020-08-01 | Entertainment | 6
2020-09-01 | Entertainment | 7
Why is “Food” Category’s Cumulative Sum not showing up for the months of 2020-08-01
and 2020-09-01
? How can I make the result be displayed as expected (shown in the 2nd table).
I’m using Redshift btw. Thanks!
Advertisement
Answer
Use a cross join
to generate the rows and then left join
to bring in the values:
select m.month, c.category, t.monthly_value,
sum(t.monthly_value) over (partition by c.category order by m.month) as running_monthly_value
from (select distinct month from t) m cross join
(select distinct category from t) c left join
t
on t.month = m.month and t.category = c.category;