Skip to content
Advertisement

Why ‘where’ statement seems to filter expected rows in SAS proc SQL?

I full joined 2 tables first and then full joined a 3rd table, now I got 1000 more rows in the result. But when I added a where statement following the join process I can only get 200 more rows and it seems that some expected rows were filtered. I don’t know what I’ve done wrong.

proc sql;
    create table ECG.RECON as 
    select a.SUBJID as SUBJID_004 , 
        a.VISIT as VISIT_004, 
        input(a.EGDAT, yymmdd10.) as EGDAT_004 , 
        ...
        b.SUBJID as SUBJID_001 , 
        ...
        c.DSDECOD 
    from 
    SOURCE.A a full join SOURCE.B b on 
            (a.SUBJID = b.SUBJID and a.VISIT = b.VISIT )
    full join SOURCE.C as c on b.SUBJID = c.SUBJID
    where c.EPOCH = "SCR" and c.DSDECOD ne "FAILURE" and a.TEST = "Inter";
quit;


Advertisement

Answer

Your where clause is causing empty rows to be filtered. Consider a simplified schema:

TableA

Col1    Col2
----------------
1        A
2        B

TableB

Col1    Col2
----------------
1        X
3        Y

And a simple full join with no filter:

SELECT  *
FROM    TableA AS A
        FULL JOIN TableB AS B
            ON A.Col1 = B.Col1

Which will return

A.Col1   A.Col2     B.Col1    B.Col2
---------------------------------------
1        A          1           X
2        B          NULL        NULL
NULL    NULL        3           Y

Now, if you apply a filter to anything from A, e.g. WHERE A.Col1 = 1, you’ll get rid of the 2nd Row (probably as intended) since 2 <> 1, but you’ll also remove the 3rd row, since A.Col is NULL, and NULL <> 1. As you have removed all rows with no matching record in TableA you have effectively turned your full join into a left join. If you then apply a further predicate on TableB, your left join becomes an inner join.

With Full joins, I find the easiest solution is to apply your filters before the join by using subqueries, e.g.:

SELECT  *
FROM    (SELECT * FROM TableA WHERE Col1 = 1) AS A
        FULL JOIN TableB AS B
            ON A.Col1 = B.Col1;

Which removes the 2nd row, but still retains the 3rd row from the previous results:

A.Col1   A.Col2     B.Col1    B.Col2
---------------------------------------
1        A          1           X
NULL    NULL        3           Y

You can also use OR, but the more predicates you have the more convoluted this can get, e.g.

SELECT  *
FROM    TableA AS A
        FULL JOIN TableB AS B
            ON A.Col1 = B.Col1
WHERE   (Col1 = 1 OR A.Col1 IS NULL);
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement