Skip to content
Advertisement

Unexpected NULLs in outer join

I have 3 tables I’m trying to join. Let’s call them TableA, TableB, and TableC:

DECLARE @TableA TABLE
(
    Key1 int,
    PRIMARY KEY
    (
        Key1
    )
)
DECLARE @TableB TABLE
(
    Key1 int,
    Key2 int,
    PRIMARY KEY
    (
        Key1,
        Key2
    )
)
DECLARE @TableC TABLE
(
    Key3 int NOT NULL,
    Key1 int NOT NULL,
    Key2 int NULL,
    PRIMARY KEY
    (
        Key3
    )
)

Here’s some sample data:

INSERT INTO @TableA (Key1) VALUES (1);
INSERT INTO @TableB (Key1, Key2) VALUES (1, 1), (1, 2), (1, 3), (1, 4)
INSERT INTO @TableC (Key3, Key1, Key2) VALUES (1, 1, NULL), (2, 1, NULL), (3, 1, 1), (4, 1, 3)

TableB and TableC both have foreign keys to TableA via Key1. In practice, TableC can also refer to TableB via Key1 and Key2 combined, if Key2 is not null, but there’s no actual foreign key. Key3 is irrelevant, other than the fact that Key1 and Key2 are not part of the primary key of TableC.

I’m trying to write a query which combines TableB and TableC:

SELECT
    TableA.Key1 AS [A1],
    TableB.Key1 AS [B1],
    TableB.Key2 AS [B2],
    TableC.Key1 AS [C1],
    TableC.Key2 AS [C2],
    TableC.Key3 AS [C3]
FROM @TableA AS TableA
FULL OUTER JOIN @TableC AS TableC
    ON TableC.Key1 = TableA.Key1
FULL OUTER JOIN @TableB AS TableB
    ON (TableB.Key1 = TableA.Key1 AND TableC.Key1 IS NULL)
    OR (TableC.Key1 = TableB.Key1 AND TableC.Key2 = TableB.Key2)
WHERE (TableA.Key1 = TableB.Key1 OR TableA.Key1 = TableC.Key1)
ORDER BY TableB.Key2, TableC.Key2

My expectation is that TableB and TableC should both include all their rows, matching up the ones that match on both keys, and NULLS where they don’t match.

I expect to get this:

A1       B1       B2       C1       C2      C3
1        NULL     NULL     1        NULL    1
1        NULL     NULL     1        NULL    2
1        1        1        1        1       3
1        1        2        NULL     NULL    NULL -- THIS ROW IS MISSING
1        1        3        1        3       4
1        1        4        NULL     NULL    NULL -- THIS ROW IS MISSING

But instead I get this:

A1       B1       B2       C1       C2      C3
1        NULL     NULL     1        NULL    1
1        NULL     NULL     1        NULL    2
1        1        1        1        1       3
1        1        3        1        3       4

If I comment out the WHERE clause, I get all the rows I expect, except A1 is NULL for the missing rows:

A1       B1       B2       C1       C2      C3
1        NULL     NULL     1        NULL    1
1        NULL     NULL     1        NULL    2
1        1        1        1        1       3
NULL     1        2        NULL     NULL    NULL   -- A1 should be 1
1        1        3        1        3       4
NULL     1        4        NULL     NULL    NULL   -- A1 should be 1

Why is TableA.Key1 coming back NULL and causing it to exclude rows where TableB.Key2 is missing?

EDIT:

Here’s the final fixed query after I learned what I was doing wrong:

SELECT
    TableA.Key1 AS A1,
    Subquery.*
FROM @TableA AS TableA
INNER JOIN
(
    SELECT
        TableB.Key1 AS [B1],
        TableB.Key2 AS [B2],
        TableC.Key1 AS [C1],
        TableC.Key2 AS [C2],
        TableC.Key3 AS [C3]
    FROM @TableC AS TableC
    FULL OUTER JOIN @TableB AS TableB
        ON TableB.Key1 = TableC.Key1 AND TableB.Key2 = TableC.Key2
) AS Subquery 
    ON Subquery.B1 = TableA.Key1 OR Subquery.C1 = TableA.Key1
ORDER BY Subquery.B2, Subquery.C2

Advertisement

Answer

Why is TableA.Key1 coming back NULL and causing it to exclude rows where TableB.Key2 is missing?

A full outer join is the same as an INNER JOIN but any unmatched rows from either side are added back in with NULL for the columns from the other side.

Your query does full outer join of A and C first so start by looking at the result of that.

SELECT
    TableA.Key1 AS [A1],
    TableC.Key1 AS [C1],
    TableC.Key2 AS [C2],
    TableC.Key3 AS [C3]
FROM @TableA AS TableA
FULL OUTER JOIN @TableC AS TableC
    ON TableC.Key1 = TableA.Key1

This returns the following virtual table (VT1) going on to the next stage. As this is the same result as INNER JOIN I doubt it needs any explanation. Every row in @TableC successfully matched the single row in @TableA.

+----+----+------+----+
| A1 | C1 |  C2  | C3 |
+----+----+------+----+
|  1 |  1 | NULL |  1 |
|  1 |  1 | NULL |  2 |
|  1 |  1 | 1    |  3 |
|  1 |  1 | 3    |  4 |
+----+----+------+----+

This is then full outer joined onto B. The contents of B are

+------+------+
| Key1 | Key2 |
+------+------+
|    1 |    1 |
|    1 |    2 |
|    1 |    3 |
|    1 |    4 |
+------+------+

The INNER JOIN of those two result sets with predicate ON (TableB.Key1 = [A1] AND [C1] IS NULL) OR ([C1] = TableB.Key1 AND [C2] = TableB.Key2) returns only 2 rows.

+----+----+----+----+----+----+
| A1 | B1 | B2 | C1 | C2 | C3 |
+----+----+----+----+----+----+
|  1 |  1 |  1 |  1 |  1 |  3 |
|  1 |  1 |  3 |  1 |  3 |  4 |
+----+----+----+----+----+----+

Unmatched rows from VT1 are added back in as per LEFT JOIN (these are the ones where C3 is 1 or 2)

+----+------+------+----+------+----+
| A1 |  B1  |  B2  | C1 |  C2  | C3 |
+----+------+------+----+------+----+
|  1 | NULL | NULL |  1 | NULL |  1 |
|  1 | NULL | NULL |  1 | NULL |  2 |
|  1 | 1    | 1    |  1 | 1    |  3 |
|  1 | 1    | 3    |  1 | 3    |  4 |
+----+------+------+----+------+----+

and unmatched rows from B as per RIGHT JOIN (these were the ones where B2 is 2 or 4)

giving you the final result

+------+------+------+------+------+------+
|  A1  |  B1  |  B2  |  C1  |  C2  |  C3  |
+------+------+------+------+------+------+
| 1    | NULL | NULL | 1    | NULL | 1    |
| 1    | NULL | NULL | 1    | NULL | 2    |
| 1    | 1    | 1    | 1    | 1    | 3    |
| 1    | 1    | 3    | 1    | 3    | 4    |
| NULL | 1    | 2    | NULL | NULL | NULL |
| NULL | 1    | 4    | NULL | NULL | NULL |
+------+------+------+------+------+------+
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement