I need to right join 2 tables with 3 conditions but the resulting table is bigger than left or right table.
left_table a is like the following:
capacity value group_id level_id tags 100 3 a ab 120 5 a afb lala 122 4 b afg hhh 122 6 c adfg
right table b is like the following: bigger than left table
user group_id level_id tags adsf a ab af a abf df sf a afb lala dsf b afg hhh sdf c adfg
I want to append the value and capacity value to the right table b. I have used the following query but the resulting table is larger than the right table. I noticed that it is due to the NULL in tags in both the right and left tables, but i am wondering how to resolve this issue.
select a.capacity, a.value, b.* from a right join b on a.group_id = b._group_id and a.level_id = b.level_id and a.tags = b.tags
Advertisement
Answer
I noticed that it is due to the NULL in tags in both the right and left tables
No, this is not the cause of duplicates. In fact NULL
values fail the comparison, so you will not get a match at all if either value is NULL
. That is, the row in b
will be returned with NULL
values for the columns from a
.
If you want NULL
values to match as being equal, then you need a NULL
-safe comparison — and Presto supports the SQL Standard is not distinct from
. I also strongly prefer left join
over right join
:
select a.capacity, a.value, b.* from b left join a on a.group_id = b._group_id and a.level_id = b.level_id and a.tags is not distinct from b.tags;
If you are getting duplicates, it is because you have duplicates in a
. You can check for this using:
select group_id, level_id, tags, count(*) from a group by group_id, level_id, tags having count(*) >= 2;