Extrapolate running total from .LAST value of a BY…

I trying to figure out how to create a running total, if all we have is the total sum in the .LAST variable. I created a sample dataset that should make it a little clearer what I am after:

DATA SALES_DATA; 
    INPUT REGION_ID STORE_ID YEAR SALES; 
    DATALINES; 
        1   1   2000    .
        1   1   2001    .
        1   1   2002    .
        1   1   2003    40
        1   2   1977    .
        1   2   1978    .
        1   2   1979    .
        1   2   1980    .
        1   2   1981    12
        2   3   1999    .
        2   3   2000    .
        2   3   2001    .
        2   4   2002    17
        3   4   1956    .
        3   4   1957    22

​x
 
DATA SALES_DATA;     INPUT REGION_ID STORE_ID YEAR SALES;     DATALINES;         1   1   2000    .        1   1   2001    .        1   1   2002    .        1   1   2003    40        1   2   1977    .        1   2   1978    .        1   2   1979    .        1   2   1980    .        1   2   1981    12        2   3   1999    .        2   3   2000    .        2   3   2001    .        2   4   2002    17        3   4   1956    .        3   4   1957    22​

;

so, as you can see we only have data for the last time the store was in business, which includes all the sales from previous years. Assuming sales have been completely linear and added up year over year, how would I tell SAS to grab the STORE_ID.LAST value then divide it by count of years we have data for to put it into the STORE_ID.FIRST’s SALES field? Once I figure out how to get the value from the last field into the first I am planning on just running the usual running total (after dividing by the count, that can be created by something like:

DATA SALES; 
    SET SALES; 
    BY REGION_ID STORE_ID; 
    IF FIRST.STORE = 1 THEN 
        COUNT =0;
        COUNT+1; 
run;

 
DATA SALES;     SET SALES;     BY REGION_ID STORE_ID;     IF FIRST.STORE = 1 THEN         COUNT =0;        COUNT+1; run;​

So, ideally the final table would start of like:

DATA SALES_DATA; 
INPUT REGION_ID STORE_ID YEAR SALES; 
DATALINES; 
    1   1   2000    10
    1   1   2001    20
    1   1   2002    30
    1   1   2003    40

 
DATA SALES_DATA; INPUT REGION_ID STORE_ID YEAR SALES; DATALINES;     1   1   2000    10    1   1   2001    20    1   1   2002    30    1   1   2003    40​

…

I looked into PROC EXPAND, but I could not get it to work for my case. Any suggestions are greatly welcomed!

Answer

Just a little improvement of Chris J’s code, simply extrapolating in proc sql. Do not forget to give the full credit to him 🙂

proc sql;
    create table filled(rename=(sales_filled=sales)) as
    select REGION_ID, STORE_ID, YEAR, max(SALES)/(max(YEAR)-min(YEAR)+1)*(YEAR-min(YEAR)+1) as sales_filled
    from sales_data
    group by REGION_ID, STORE_ID
    order by REGION_ID, STORE_ID, Year;
quit;

 
proc sql;    create table filled(rename=(sales_filled=sales)) as    select REGION_ID, STORE_ID, YEAR, max(SALES)/(max(YEAR)-min(YEAR)+1)*(YEAR-min(YEAR)+1) as sales_filled    from sales_data    group by REGION_ID, STORE_ID    order by REGION_ID, STORE_ID, Year;quit;​

Note: this works even if the years were not consecutive (e.g. year 2001 is somehow missing) because the sales value are filled based on the “ratio of timespan” instead of “counts”.

Extrapolate running total from .LAST value of a BY GROUP

Advertisement

Answer