Skip to content
Advertisement

Compare last row to previous row by group and populate new column

I need to compare the last row of a group to the row above it, see if changes occur in a few columns, and populate a new column with 1 if a change occurs. The data presentation below will explain better.

Also need to account for having a group with only 1 row.

what we have:

 Group    Name     Sport     DogName   Eligibility 
   1      Tom      BBALL     Toto        Yes
   1      Tom      BBall     Toto        Yes
   1      Tom      golf      spot        Yes
   2     Nancy     vllyball  Jimmy       yes
   2     Nancy     vllyball  rover       no
  

what we want:

 Group    Name     Sport     DogName   Eligibility N_change  S_change  D_Change  E_change
     1      Tom      BBALL     Toto        Yes      0           0        0       0
     1      Tom      BBall     Toto        Yes      0           0        0       0
     1      Tom      golf      spot        Yes      0           1        1       0
     2     Nancy     vllyball  Jimmy       yes      0           0        0       0
     2     Nancy     vllyball  rover       no       0           0        1       1

Only care about changes from row to row within group. Thank you for any help in advance.

The rows are already ordered so we only need to last two of the group. If it is easier to compare sequential rows in a group then that is just as good for my purposes.

I did know this would be arrays and I struggle with these because never use them for my typical sas modeling. Wanted to keep things short and sweet.

Advertisement

Answer

Use the data step and lag statements. Ensure your data is sorted by group first, and that the rows within groups are sorted in the correct order. Using arrays will make your code much smaller.

The logic below will compare each row with the previous row. A flag of 1 will be set only if:

  1. It’s not the first row of the group
  2. The current value differs from the previous value.

The syntax var = (test logic); is a shortcut to automatically generate dummy flags.

data want;
    set have;
    by group;

    array var[*]        name sport dogname eligibility;
    array lagvar[*] $   lag_name lag_sport lag_dogname lag_eligibility;
    array changeflag[*] N_change S_change D_change E_change;

    do i = 1 to dim(var);
        lagvar[i]     = lag(var[i]);
        changeflag[i] = (var[i] NE lagvar[i] AND NOT first.group);
    end;
  
   drop lag: i;
run;
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement