I’m trying to build a tricky join across grains, with tableA being at a lower grain than tableB. In this example, I’m trying to accomplish the following results:
tableA.id | tableA.fieldA | tableB.id |
---|---|---|
123 | 1 | 123 |
123 | 2 | null |
123 | 3 | null |
234 | 1 | 234 |
234 | 2 | null |
234 | 3 | null |
Here is my query to accomplish this:
Select
*
from
tableA
left join tableB
on tableB.id = tableA.id
and tableA.fieldA = 1
Unfortunately, the “tableA.fieldA = 1” filter is acting as a filter on the entire query, and not just on the join, resulting in the following:
tableA.id | tableA.fieldA | tableB.id |
---|---|---|
123 | 1 | 123 |
234 | 1 | 234 |
Can anyone tell me what’s going on and how to accomplish what I’m going for? Thanks!
Advertisement
Answer
I wrote the following code per your description and SQL on my Redshift and I get the answer you are looking for.
CREATE TABLE table_a (
ID int,
A int);
INSERT INTO table_a VALUES (123, 1) ;
INSERT INTO table_a VALUES (123, 2) ;
INSERT INTO table_a VALUES (123, 3) ;
INSERT INTO table_a VALUES (234, 1) ;
INSERT INTO table_a VALUES (234, 2) ;
INSERT INTO table_a VALUES (234, 3) ;
CREATE TABLE table_b (
ID int);
INSERT INTO table_b VALUES (123) ;
INSERT INTO table_b VALUES (234) ;
Select
*
from
table_A
left join table_B
on table_B.id = table_A.id
and table_A.a = 1;
Results in:
id a id
123 1 123
123 2 NULL
123 3 NULL
234 1 234
234 2 NULL
234 3 NULL
I see a few possibilities at the moment – 1) your cluster is having issues / bug in the exact version you are running OR 2) your question write up doesn’t represent the whole picture OR 3) my code doesn’t represent your situation.
Does the code I provided above recreate the issue on your cluster? If not can you provide the necessary ingredients (DDL, SQL) to recreate the issue?