I have table in SAS Enterprise Guide like below:
name of table: table123
date typ:
- ID – numeric
- DT – numeric
- VAL1 – string
- VAL2 – string
ID | DT | VAL1 | VAL2 |
---|---|---|---|
123 | 20210523 | PR | P |
123 | 20211222 | P | P |
222 | 20210502 | PR | M |
222 | 20210711 | M | PR |
38 | 20210513 | PR | PR |
991 | 20211123 | M | P |
And I need to create column “COL1” with values 0/1:
- If ID changed value from column “VAL1” from PR to value P or M in column “VAL2” and has not changed back to PR for 4 months or more -> 1 else 0
SO I need something like below:
ID | DT | VAL1 | VAL2 | COL1 |
---|---|---|---|---|
123 | 20210523 | PR | P | 1 |
123 | 20211222 | P | P | 1 |
222 | 20210502 | PR | M | 0 |
222 | 20210711 | M | PR | 0 |
38 | 20210513 | PR | PR | 0 |
991 | 20211123 | M | P | 0 |
Because:
- 123 – has 1 in “COL1” because changed PR to P and does not changed back to PR for 4 month or more
- 222 – has 0 in “COL1” because changed PR to M but did not last like this for at least 4 months (after 2 months backed to PR)
- 38 – has 0 in “COL1” because did not change PR to P or M
- 991 – has 0 in “COL1” because did not change PR to P or M
How can I do that in SAS Enterprise Guide in PROC SQL ?
Advertisement
Answer
You’ll want to use a data step for this one since it uses fairly complex logic. You must first convert your date into a SAS date for this to work.
data col1_flag; set have; by id dt; /* Do not reset these variables to missing at each row */ retain flag_pr_to_p_pm dt_from_pr_to_p_m; /* Reset flags and dates at each new ID */ if(first.id) then call missing(flag_pr_to_p_pm, dt_from_pr_to_p_m); /* If PR (VAL1) changes to P or M (VAL2), then set a flag. If VAL1 changes back from P or M, reset the flag to 0*/ if(VAL1 = 'PR' and VAL2 IN('P', 'M') ) then do; flag_pr_to_p_pm = 1; /* Store the number of months when it originally went from PR --> P or M */ dt_from_pr_to_p_m = dt; end; else if(VAL1 NOT IN ('P', 'M') ) then flag_pr_to_p_pm = 0; /* 1. We've encountered a situation where PR --> P or M 2. It's been 4 months since PR --> P or M 3. P or M has not changed back to PR 4. OR: There's one row that goes from PR --> P or M */ COL1 = ( flag_pr_to_p_pm AND intck('month', dt_from_pr_to_p_m, dt) > 4 AND NOT (VAL1 IN('P', 'M') AND VAL2 = 'PR') OR (flag_pr_to_p_pm AND first.id AND last.id) ) ; if(COL1) then output; format dt_from_pr_to_p_m date9.; keep id; run;
Output:
ID DT VAL1 VAL2 COL1 38 13MAY2021 PR PR 0 123 23MAY2021 PR P 1 123 22DEC2021 P P 1 222 02MAY2021 PR M 0 222 11JUL2021 M PR 0 991 23NOV2021 M P 0