Is it right situation to do left outer join?

I have two datatset, A and B.(A is a big dataset like personal info and B is like a small summarized demographic statistic data) I want to append summary info to A data with demographic info.

As one single row of B can matches with multiple row of A, is it right to send a hive ql like A left outer join B?

For example, A is

|---------|---------|
|age_range| region  |
|---------|---------|
|14-20    |    1    |
|---------|---------|
|14-20    |    1    |
|---------|---------|
|20-25    |    1    |
|---------|---------|
|20-25    |    2    |
|---------|---------|

​x
 
|---------|---------||age_range| region  ||---------|---------||14-20    |    1    ||---------|---------||14-20    |    1    ||---------|---------||20-25    |    1    ||---------|---------||20-25    |    2    ||---------|---------|​

and B is

|---------|--------|--------|
|age_range| region | ratio  |
|---------|--------|--------|
|14-20    |   1    |   0.1  |
|---------|--------|--------|
|20-25    |   1    |   0.2  |
|---------|--------|--------|

 
|---------|--------|--------||age_range| region | ratio  ||---------|--------|--------||14-20    |   1    |   0.1  ||---------|--------|--------||20-25    |   1    |   0.2  ||---------|--------|--------|​

Answer

From what you describe a join is appropriate. If all rows in a have a matching row in b, then an inner join is sufficient:

select . . .
from a join
     b
     on <matching columns>;

 
select . . .from a join     b     on <matching columns>;​

You would use left join when a has rows that do not match in b and you want them in the result set.

Advertisement

Answer