I have two datatset, A and B.(A is a big dataset like personal info and B is like a small summarized demographic statistic data) I want to append summary info to A data with demographic info.
As one single row of B can matches with multiple row of A, is it right to send a hive ql like A left outer join B
?
For example, A is
|---------|---------| |age_range| region | |---------|---------| |14-20 | 1 | |---------|---------| |14-20 | 1 | |---------|---------| |20-25 | 1 | |---------|---------| |20-25 | 2 | |---------|---------|
and B is
|---------|--------|--------| |age_range| region | ratio | |---------|--------|--------| |14-20 | 1 | 0.1 | |---------|--------|--------| |20-25 | 1 | 0.2 | |---------|--------|--------|
Advertisement
Answer
From what you describe a join
is appropriate. If all rows in a
have a matching row in b
, then an inner join is sufficient:
select . . . from a join b on <matching columns>;
You would use left join
when a
has rows that do not match in b
and you want them in the result set.