I full joined 2 tables first and then full joined a 3rd table, now I got 1000 more rows in the result. But when I added a where statement following the join process I can only get 200 more rows and it seems that some expected rows were filtered. I don’t know what I’ve done wrong.
proc sql;
create table ECG.RECON as
select a.SUBJID as SUBJID_004 ,
a.VISIT as VISIT_004,
input(a.EGDAT, yymmdd10.) as EGDAT_004 ,
b.SUBJID as SUBJID_001 ,
c.DSDECOD
from
SOURCE.A a full join SOURCE.B b on
(a.SUBJID = b.SUBJID and a.VISIT = b.VISIT )
full join SOURCE.C as c on b.SUBJID = c.SUBJID
where c.EPOCH = "SCR" and c.DSDECOD ne "FAILURE" and a.TEST = "Inter";
quit;
Advertisement
Answer
Your where clause is causing empty rows to be filtered. Consider a simplified schema:
TableA
Col1 Col2
----------------
1 A
2 B
TableB
Col1 Col2
----------------
1 X
3 Y
And a simple full join with no filter:
SELECT *
FROM TableA AS A
FULL JOIN TableB AS B
ON A.Col1 = B.Col1
Which will return
A.Col1 A.Col2 B.Col1 B.Col2
---------------------------------------
1 A 1 X
2 B NULL NULL
NULL NULL 3 Y
Now, if you apply a filter to anything from A, e.g. WHERE A.Col1 = 1
, you’ll get rid of the 2nd Row (probably as intended) since 2 <> 1
, but you’ll also remove the 3rd row, since A.Col
is NULL
, and NULL <> 1
. As you have removed all rows with no matching record in TableA
you have effectively turned your full join into a left join. If you then apply a further predicate on TableB
, your left join becomes an inner join.
With Full joins, I find the easiest solution is to apply your filters before the join by using subqueries, e.g.:
SELECT *
FROM (SELECT * FROM TableA WHERE Col1 = 1) AS A
FULL JOIN TableB AS B
ON A.Col1 = B.Col1;
Which removes the 2nd row, but still retains the 3rd row from the previous results:
A.Col1 A.Col2 B.Col1 B.Col2
---------------------------------------
1 A 1 X
NULL NULL 3 Y
You can also use OR
, but the more predicates you have the more convoluted this can get, e.g.
SELECT *
FROM TableA AS A
FULL JOIN TableB AS B
ON A.Col1 = B.Col1
WHERE (Col1 = 1 OR A.Col1 IS NULL);