I’m using Teradata SQL and I wrote the following query (pay attention at the LEFT JOIN)
SELECT key1, secondvalue, count(DISTINCT firstvalue) FROM ( SELECT t1.val AS firstvalue, t1.key1, t2.val AS secondvalue FROM table1 t1 LEFT JOIN table2 t2 ON t1.key1 = t1.key1 AND t1.key2 = t2.key2 AND t1.key3 = t2.key3 ) AS Testcase GROUP BY 1, 2
Some entries for t1.key2 und t1.key3 (of the left sided table) are NULL. When that’s the case, the rows are not showing in the result, why? Is that Teradata specific, I would expect a LEFT JOIN to show rows with NULL values.
If I for example choose to reduce the LEFT JOIN statement to
SELECT key1, secondvalue, count(DISTINCT firstvalue) FROM ( SELECT t1.val AS firstvalue, t1.key1, t2.val AS secondvalue FROM table1 t1 LEFT JOIN table2 t2 ON t1.key1 = t1.key1 ) AS Testcase GROUP BY 1, 2
More distinct values for key1 are showing up, which shouldn’t be the case, right? I’d want to see all distinct values for key1, even when then key2 and key3 are NULL. If not, I want to see the specific value from the second table.
I really don’t want to map the NULL values to another value first.
Advertisement
Answer
Some entries for t1.key2 und t1.key3 (of the left sided table) are NULL. When that’s the case, the rows are not showing in the result, why?
Likely because group by
and distinct
will group all nulls together. It is true that in SQL, null
does not necessarily equal null
, but I believe that from a distinct and group by perspective, nulls
are considered identical.
I would expect a LEFT JOIN to show rows with NULL values.
I’ve never tried a left outer join where null is on the left side. I can’t say whether the join would emit a result for such a row, but I can tell you that it wouldn’t match an equi-join because null
does not equal null
. Are you looking for a full outer join?
LEFT JOIN table2 t2 ON t1.key1 = t1.key1
This predicate will match every row where t1.key1 is not null
, which is almost a Cartesian product. That is why your count is so much larger.
I think you want
LEFT JOIN table2 t2 ON t1.key1 = t2.key1
I’d want to see all distinct values for key1, even when then key2 and key3 are NULL
Overall, I think what you want is
LEFT JOIN table2 t2 ON t1.key1 = t2.key1 AND coalesce(t1.key2, -1) = coalesce(t2.key2, -1) AND coalesce(t1.key3, -1) = coalesce(t2.key3, -1)
You want the second argument to coalesce
to be a syntactically valid value of whatever type key2
and key3
are, but be a value that is not valid in your usage of it (otherwise, we’d joining rows where we have -1
on one side and null
on the other.