I have a input table as this:
date ID Flag 01.01 A 0 01.02 A 0 01.03 A 1 01.04 A 1 01.05 A 1 01.06 A 0 01.07 A 1 01.08 A 1 01.09 A 0 01.01 B 1 01.02 B 0 01.03 B 1 01.04 B 1 01.05 B 1 01.06 B 1 01.07 B 1 01.08 B 0 01.09 B 0
I would like to calculate the cumulative sum of the flag for each, but the calculation should be refreshed if the flag is equal to 0 once again.
So, the output table should be:
date ID Flag CUMSUM 01.01 A 0 0 01.02 A 0 0 01.03 A 1 1 01.04 A 1 2 01.05 A 1 3 01.06 A 0 0 01.07 A 1 1 01.08 A 1 2 01.09 A 0 0 01.01 B 1 1 01.02 B 0 0 01.03 B 1 1 01.04 B 1 2 01.05 B 1 3 01.06 B 1 4 01.07 B 1 5 01.08 B 0 0 01.09 B 0 0
So, any suggestion to solve this issue? I tried with this:
sum(FLAG) over (partition by ID order by date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
But i have no idea how can I “refresh” the cumulative calculation, any idea?
Thanks in advance!
Advertisement
Answer
You need to assign a grouping. The grouping is the number of 0
s before each row. Then use that for partitioning:
select t.*, (case when flag = 1 then sum(flag) over (partition by id, grp order by date) else 0 end) as cumulative_sum from (select t.*, sum(case when flag = 0 then 1 else 0 end) over (partition by id order by date) as grp from t ) t;
Note: The window frame specification (rows between . . .
) is not needed assuming that the dates are unique (as in your sample data). So, I just left it out.
Also, you can simplify the calculation in the subquery to:
sum(1 - flag = 0) over (partition by id order by date) as grp