Skip to content
Advertisement

SQL/Presto: right join bigger than original table due to NULL

I need to right join 2 tables with 3 conditions but the resulting table is bigger than left or right table.

left_table a is like the following:

 capacity value  group_id   level_id    tags
 100       3      a            ab        
 120       5      a            afb       lala
 122       4      b            afg       hhh
 122       6      c            adfg      

right table b is like the following: bigger than left table

 user      group_id    level_id    tags 
 adsf      a           ab          
 af        a           abf         df
 sf        a           afb         lala
 dsf       b           afg         hhh
 sdf       c           adfg        

I want to append the value and capacity value to the right table b. I have used the following query but the resulting table is larger than the right table. I noticed that it is due to the NULL in tags in both the right and left tables, but i am wondering how to resolve this issue.

  select a.capacity, a.value, b.*
   from a
   right join b
   on a.group_id = b._group_id
   and a.level_id = b.level_id
   and a.tags = b.tags

Advertisement

Answer

I noticed that it is due to the NULL in tags in both the right and left tables

No, this is not the cause of duplicates. In fact NULL values fail the comparison, so you will not get a match at all if either value is NULL. That is, the row in b will be returned with NULL values for the columns from a.

If you want NULL values to match as being equal, then you need a NULL-safe comparison — and Presto supports the SQL Standard is not distinct from. I also strongly prefer left join over right join:

select a.capacity, a.value, b.*
from b left join
     a
     on a.group_id = b._group_id and
        a.level_id = b.level_id and
        a.tags is not distinct from b.tags;

If you are getting duplicates, it is because you have duplicates in a. You can check for this using:

select group_id, level_id, tags, count(*)
from a
group by group_id, level_id, tags
having count(*) >= 2;
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement