I trying to figure out how to create a running total, if all we have is the total sum in the .LAST variable. I created a sample dataset that should make it a little clearer what I am after:
DATA SALES_DATA; INPUT REGION_ID STORE_ID YEAR SALES; DATALINES; 1 1 2000 . 1 1 2001 . 1 1 2002 . 1 1 2003 40 1 2 1977 . 1 2 1978 . 1 2 1979 . 1 2 1980 . 1 2 1981 12 2 3 1999 . 2 3 2000 . 2 3 2001 . 2 4 2002 17 3 4 1956 . 3 4 1957 22
;
so, as you can see we only have data for the last time the store was in business, which includes all the sales from previous years. Assuming sales have been completely linear and added up year over year, how would I tell SAS to grab the STORE_ID.LAST value then divide it by count of years we have data for to put it into the STORE_ID.FIRST’s SALES field? Once I figure out how to get the value from the last field into the first I am planning on just running the usual running total (after dividing by the count, that can be created by something like:
DATA SALES; SET SALES; BY REGION_ID STORE_ID; IF FIRST.STORE = 1 THEN COUNT =0; COUNT+1; run;
So, ideally the final table would start of like:
DATA SALES_DATA; INPUT REGION_ID STORE_ID YEAR SALES; DATALINES; 1 1 2000 10 1 1 2001 20 1 1 2002 30 1 1 2003 40
…
I looked into PROC EXPAND, but I could not get it to work for my case. Any suggestions are greatly welcomed!
Advertisement
Answer
Just a little improvement of Chris J’s code, simply extrapolating in proc sql. Do not forget to give the full credit to him 🙂
proc sql; create table filled(rename=(sales_filled=sales)) as select REGION_ID, STORE_ID, YEAR, max(SALES)/(max(YEAR)-min(YEAR)+1)*(YEAR-min(YEAR)+1) as sales_filled from sales_data group by REGION_ID, STORE_ID order by REGION_ID, STORE_ID, Year; quit;
Note: this works even if the years were not consecutive (e.g. year 2001 is somehow missing) because the sales value are filled based on the “ratio of timespan” instead of “counts”.