Skip to content
Advertisement

Keep the sum even on days without revenue in cumulative sum when using window function in Presto

So my problem is that I have sales data from, say for the sake of clarity, 3 different products. I will be selling 10 of these a week and I want to visualize them in a cumulative sum. I have been using the following little snippet to get the cumulative sum of the revenue.

SUM(revenue) OVER (PARTITION BY purchase_date, product ORDER BY day) AS cumulative_revenue

However, this is not sufficient since I have the product in the window function. It works only as long as I have sales data on every single day of the week for each product. If I sell product_1 four pieces a week, the result from the query will show cumulative revenue only from those days. So this means that if I have all sales of that product on Monday to Wednesday, the rest of the week won’t have them listed in the output. This causes problems if I try to visualize the data by stacking the results as the rest of the week will have lower cumulative revenue that the beginning of the week.

So what I want is to get it to show 0 on revenue for all products on all days of the week. I can of course do this with some cross join magic, but it is sloooow since I have quite a lot of rows, so is there a way to do it with a window function?

My data looks a bit like this

purchase_date|product|buyer|revenue
-----------------------------------
12/12/2020   | pr_1  |  a  | 100.0
12/12/2020   | pr_2  |  b  | 200.0
13/12/2020   | pr_1  |  d  | 100.0
14/12/2020   | pr_1  |  t  | 100.0
...

Advertisement

Answer

You can generate a row for all product/date combinations by using a cross join and then a left join. I suspect you want something like this:

select p.product, d.purchase_date,
       sum(t.revenue) as revenue_on_date,
       sum(sum(t.revenue)) over (partition by product order by d.purchase_date) as cumulative_revenue
from (select distinct product from t) p cross join
     (select distinct purchase_date from t) d left join
     t
     on p.product = t.product and
        d.purchase_date = t.purchase_date
group by p.product, p.purchase_date;

Note: This assumes that there is at least one purchase on each day. Otherwise, you might need another source for all dates in the range you care about.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement