Skip to content
Advertisement

Is it right situation to do left outer join?

I have two datatset, A and B.(A is a big dataset like personal info and B is like a small summarized demographic statistic data) I want to append summary info to A data with demographic info.

As one single row of B can matches with multiple row of A, is it right to send a hive ql like A left outer join B?

For example, A is

|---------|---------|
|age_range| region  |
|---------|---------|
|14-20    |    1    |
|---------|---------|
|14-20    |    1    |
|---------|---------|
|20-25    |    1    |
|---------|---------|
|20-25    |    2    |
|---------|---------|

and B is

|---------|--------|--------|
|age_range| region | ratio  |
|---------|--------|--------|
|14-20    |   1    |   0.1  |
|---------|--------|--------|
|20-25    |   1    |   0.2  |
|---------|--------|--------|

Advertisement

Answer

From what you describe a join is appropriate. If all rows in a have a matching row in b, then an inner join is sufficient:

select . . .
from a join
     b
     on <matching columns>;

You would use left join when a has rows that do not match in b and you want them in the result set.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement