Skip to content
Advertisement

Cummulative sum up to a date

I have a table with 4 columns. that looks something like this.

 +-------------------------------------------------------------+
 |         ID      |      date |  a1      |         a2         |
 +-------------------------------------------------------------+
 |1                | 31AUG2015 | 100      |        70
 +-------------------------------------------------------------+
 |1                | 01SEPT2015| 150      |        80
 +-------------------------------------------------------------+
 |2                | 31AUG2015 | 900      |        0
 +-------------------------------------------------------------+
 |2                | 01SEPT2015| 150      |        100
 +-------------------------------------------------------------+

I want to add a1 and a2 for all rows before and up to that date, so I have something more like this:

 +-------------------------------------------------------------+
 |         ID      |      date |  a1      |         a2         |
 +-------------------------------------------------------------+
 |1                | 31AUG2015 | 100      |        70
 +-------------------------------------------------------------+
 |1                | 01SEPT2015| 250      |        150
 +-------------------------------------------------------------+
 |2                | 31AUG2015 | 900      |        0
 +-------------------------------------------------------------+
 |2                | 01SEPT2015| 1050     |        100
 +-------------------------------------------------------------+

This was my attempt. A self join up to that date:

proc sql;
create table want as
select
    a.id
    ,a.date
    ,sum(a.a1)
    ,sum(a.a2)
from 
    have a,
    have b
where 
    a.id = b.id and
    a.dt <=  b.dt

group by
    a.id
    ,a.date

quit;

The results are a mess and nothing lines up with my expectations. I am sure I am making a big mistake somewhere, but I would appreciate some guidance with either fixes to the proc sql or a data step.

Advertisement

Answer

A data step will make it easy. Sort it by your id and date, then use a sum statement to accumulate your values.

proc sort data=have;
     by id date;
run;

data want;
   set have;
   by id date;

   /* Reset cumulative sum at the start of each ID */
   if(first.id) then call missing(a1_cume, a2_cume);

   a1_cume+a1;
   a2_cume+a2;
run;
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement