Skip to content
Advertisement

Why is my join filter being applied to my entire query in redshift?

I’m trying to build a tricky join across grains, with tableA being at a lower grain than tableB. In this example, I’m trying to accomplish the following results:

tableA.id tableA.fieldA tableB.id
123 1 123
123 2 null
123 3 null
234 1 234
234 2 null
234 3 null

Here is my query to accomplish this:

Select
    *
from
    tableA

    left join tableB
    on tableB.id = tableA.id
    and tableA.fieldA = 1

Unfortunately, the “tableA.fieldA = 1” filter is acting as a filter on the entire query, and not just on the join, resulting in the following:

tableA.id tableA.fieldA tableB.id
123 1 123
234 1 234

Can anyone tell me what’s going on and how to accomplish what I’m going for? Thanks!

Advertisement

Answer

I wrote the following code per your description and SQL on my Redshift and I get the answer you are looking for.

CREATE TABLE table_a (
  ID int,
  A int);
                   
INSERT INTO table_a VALUES (123, 1)  ;  
INSERT INTO table_a VALUES (123, 2)  ;
INSERT INTO table_a VALUES (123, 3)  ;
INSERT INTO table_a VALUES (234, 1)  ;
INSERT INTO table_a VALUES (234, 2)  ;
INSERT INTO table_a VALUES (234, 3)  ;

CREATE TABLE table_b (
  ID int);
  
INSERT INTO table_b VALUES (123)  ;  
INSERT INTO table_b VALUES (234)  ;

Select
    *
from
    table_A
    left join table_B
    on table_B.id = table_A.id
    and table_A.a = 1;

Results in:

id  a   id
123 1   123
123 2   NULL
123 3   NULL
234 1   234
234 2   NULL
234 3   NULL

I see a few possibilities at the moment – 1) your cluster is having issues / bug in the exact version you are running OR 2) your question write up doesn’t represent the whole picture OR 3) my code doesn’t represent your situation.

Does the code I provided above recreate the issue on your cluster? If not can you provide the necessary ingredients (DDL, SQL) to recreate the issue?

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement