Skip to content
Advertisement

Hive: randomly select N values from distinct values of one column

Suppose I have a dataset like this

I would like to randomly select, say, 3 values from the distinct ID values. One possibility is to get a table like this

How shall I do that in Hive?

Advertisement

Answer

Here is one option using a join and rand():

The subquery randomly selects 3 ids, then the outer query brings all related rows.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement