I’m trying to build a tricky join across grains, with tableA being at a lower grain than tableB. In this example, I’m trying to accomplish the following results:
tableA.id | tableA.fieldA | tableB.id |
---|---|---|
123 | 1 | 123 |
123 | 2 | null |
123 | 3 | null |
234 | 1 | 234 |
234 | 2 | null |
234 | 3 | null |
Here is my query to accomplish this:
Select * from tableA left join tableB on tableB.id = tableA.id and tableA.fieldA = 1
Unfortunately, the “tableA.fieldA = 1” filter is acting as a filter on the entire query, and not just on the join, resulting in the following:
tableA.id | tableA.fieldA | tableB.id |
---|---|---|
123 | 1 | 123 |
234 | 1 | 234 |
Can anyone tell me what’s going on and how to accomplish what I’m going for? Thanks!
Advertisement
Answer
I wrote the following code per your description and SQL on my Redshift and I get the answer you are looking for.
CREATE TABLE table_a ( ID int, A int); INSERT INTO table_a VALUES (123, 1) ; INSERT INTO table_a VALUES (123, 2) ; INSERT INTO table_a VALUES (123, 3) ; INSERT INTO table_a VALUES (234, 1) ; INSERT INTO table_a VALUES (234, 2) ; INSERT INTO table_a VALUES (234, 3) ; CREATE TABLE table_b ( ID int); INSERT INTO table_b VALUES (123) ; INSERT INTO table_b VALUES (234) ; Select * from table_A left join table_B on table_B.id = table_A.id and table_A.a = 1;
Results in:
id a id 123 1 123 123 2 NULL 123 3 NULL 234 1 234 234 2 NULL 234 3 NULL
I see a few possibilities at the moment – 1) your cluster is having issues / bug in the exact version you are running OR 2) your question write up doesn’t represent the whole picture OR 3) my code doesn’t represent your situation.
Does the code I provided above recreate the issue on your cluster? If not can you provide the necessary ingredients (DDL, SQL) to recreate the issue?