Given the following data sample, I’m trying to perform a running sum() isolated to date, account_no, service. Hoping someone here can help. I started with this code: The above worked as expected but then I wanted to see if I could add in other data features such as account_name and service such that I c…

Redshift Running SUM challenge

Given the following data sample, I’m trying to perform a running sum() isolated to date, account_no, service. Hoping someone here can help.

snapshot_dt             | account_no | account_name | service | cost_one
2022-02-22 00:00:00.000 | 00001      | name_1       | svc_1   | 0.10
2022-02-22 00:00:00.000 | 00001      | name_1       | svc_2   | 0.10
2022-02-22 00:00:00.000 | 00001      | name_1       | svc_3   | 0.10
2022-02-22 00:00:00.000 | 00002      | name_2       | svc_1   | 0.10
2022-02-22 00:00:00.000 | 00002      | name_2       | svc_2   | 0.10
2022-02-22 00:00:00.000 | 00002      | name_2       | svc_3   | 0.10
2022-02-22 00:00:00.000 | 00003      | name_3       | svc_1   | 0.10
2022-02-22 00:00:00.000 | 00003      | name_3       | svc_2   | 0.10
2022-02-22 00:00:00.000 | 00003      | name_3       | svc_3   | 0.10
2022-02-23 00:00:00.000 | 00001      | name_1       | svc_1   | 0.10
2022-02-23 00:00:00.000 | 00001      | name_1       | svc_2   | 0.10
2022-02-23 00:00:00.000 | 00001      | name_1       | svc_3   | 0.10
2022-02-23 00:00:00.000 | 00002      | name_2       | svc_1   | 0.10
2022-02-23 00:00:00.000 | 00002      | name_2       | svc_2   | 0.10
2022-02-23 00:00:00.000 | 00002      | name_2       | svc_3   | 0.10
2022-02-23 00:00:00.000 | 00003      | name_3       | svc_1   | 0.10
2022-02-23 00:00:00.000 | 00003      | name_3       | svc_2   | 0.10
2022-02-23 00:00:00.000 | 00003      | name_3       | svc_3   | 0.10
...

I started with this code:

with _data as (
    select
        snapshot_dt
        , sum(cost_1) as total_cost_1
    from my_table 
    group by snapshot_dt
)
select 
    snapshot_dt
    , sum(total_cost_1) over (order by snapshot_dt rows unbounded preceding) as running_cost_1
from _data

The above worked as expected but then I wanted to see if I could add in other data features such as account_name and service such that I could get specific running costs by snapshot_dt, account_name, or service.

I tried adding these data features in with some grouping statements and ultimately what appeared to happen was a running total based on prior row, I suspect related to rows unbounded preceding.

Answer

Your original code with all the places you need to include account_name and service in between double-stars (**). Take out the double stars and it be fine:

with _data as (
    select
    snapshot_dt, sum(cost_1) as total_cost_1
    from my_table 
    group by **account_name, service,** snapshot_dt
)
select **account_name, service,** snapshot_dt
, sum(total_cost_1) over (Partition By **account_name, service,** order by snapshot_dt rows unbounded preceding) as running_cost_1
from _data
** Order By account_name, service, snapshot_dt **

Advertisement

Answer