Skip to content
Advertisement

Automating Repeated Unions

I’m running a query like this:

SELECT id FROM table
WHERE table.type IN (1, 2, 3)
LIMIT 15

This returns a random sampling. I might have 7 items from class_1 and 3 items from class_2. I would like to return exactly 5 items from each class, and the following code works:

SELECT id FROM (
SELECT id, type FROM table WHERE type = 1 LIMIT 5
UNION
SELECT id, type FROM table WHERE type = 2 LIMIT 5
UNION ...
ORDER BY type ASC)

This gets unwieldy if I want a random sampling from ten classes, instead of only three. What is the best way to do this?

(I’m using Presto/Hive, so any tips for those engines would be appreciated).

Advertisement

Answer

Use a function like row_number to do this. This makes the selection independent of the number of types.

SELECT id,type
FROM (SELECT id, type, row_number() over(partition by type order by id) as rnum --adjust the partition by and order by columns as needed
      FROM table
     ) T 
WHERE rnum <= 5 
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement