Skip to content
Advertisement

Automating Repeated Unions

I’m running a query like this:

This returns a random sampling. I might have 7 items from class_1 and 3 items from class_2. I would like to return exactly 5 items from each class, and the following code works:

This gets unwieldy if I want a random sampling from ten classes, instead of only three. What is the best way to do this?

(I’m using Presto/Hive, so any tips for those engines would be appreciated).

Advertisement

Answer

Use a function like row_number to do this. This makes the selection independent of the number of types.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement