Suppose I have a dataset like this
|-----------------|----------------| | ID | Values | |-----------------|----------------| | 123 | aaaa | |-----------------|----------------| | 234 | bbb | |-----------------|----------------| | 123 | ab3d | |-----------------|----------------| | 264 | 34g3ff | |-----------------|----------------| | 783 | 341g5h | |-----------------|----------------| | 921 | 7jdfew | |-----------------|----------------| | 264 | 53fj | |-----------------|----------------|
I would like to randomly select, say, 3 values from the distinct ID values. One possibility is to get a table like this
|-----------------|----------------| | ID | Values | |-----------------|----------------| | 123 | aaaa | |-----------------|----------------| | 123 | ab3d | |-----------------|----------------| | 783 | 341g5h | |-----------------|----------------| | 921 | 7jdfew | |-----------------|----------------|
How shall I do that in Hive?
Advertisement
Answer
Here is one option using a join and rand()
:
select id, values from mytable t inner join (select distinct id from mytable order by rand() limit 3) i on i.id = t.id
The subquery randomly selects 3 id
s, then the outer query brings all related rows.