I full joined 2 tables first and then full joined a 3rd table, now I got 1000 more rows in the result. But when I added a where statement following the join process I can only get 200 more rows and it seems that some expected rows were filtered. I don’t know what I’ve done wrong.
proc sql; create table ECG.RECON as select a.SUBJID as SUBJID_004 , a.VISIT as VISIT_004, input(a.EGDAT, yymmdd10.) as EGDAT_004 , ... b.SUBJID as SUBJID_001 , ... c.DSDECOD from SOURCE.A a full join SOURCE.B b on (a.SUBJID = b.SUBJID and a.VISIT = b.VISIT ) full join SOURCE.C as c on b.SUBJID = c.SUBJID where c.EPOCH = "SCR" and c.DSDECOD ne "FAILURE" and a.TEST = "Inter"; quit;
Advertisement
Answer
Your where clause is causing empty rows to be filtered. Consider a simplified schema:
TableA
Col1 Col2 ---------------- 1 A 2 B
TableB
Col1 Col2 ---------------- 1 X 3 Y
And a simple full join with no filter:
SELECT * FROM TableA AS A FULL JOIN TableB AS B ON A.Col1 = B.Col1
Which will return
A.Col1 A.Col2 B.Col1 B.Col2 --------------------------------------- 1 A 1 X 2 B NULL NULL NULL NULL 3 Y
Now, if you apply a filter to anything from A, e.g. WHERE A.Col1 = 1
, you’ll get rid of the 2nd Row (probably as intended) since 2 <> 1
, but you’ll also remove the 3rd row, since A.Col
is NULL
, and NULL <> 1
. As you have removed all rows with no matching record in TableA
you have effectively turned your full join into a left join. If you then apply a further predicate on TableB
, your left join becomes an inner join.
With Full joins, I find the easiest solution is to apply your filters before the join by using subqueries, e.g.:
SELECT * FROM (SELECT * FROM TableA WHERE Col1 = 1) AS A FULL JOIN TableB AS B ON A.Col1 = B.Col1;
Which removes the 2nd row, but still retains the 3rd row from the previous results:
A.Col1 A.Col2 B.Col1 B.Col2 --------------------------------------- 1 A 1 X NULL NULL 3 Y
You can also use OR
, but the more predicates you have the more convoluted this can get, e.g.
SELECT * FROM TableA AS A FULL JOIN TableB AS B ON A.Col1 = B.Col1 WHERE (Col1 = 1 OR A.Col1 IS NULL);