I’m running a query like this:
SELECT id FROM table WHERE table.type IN (1, 2, 3) LIMIT 15
This returns a random sampling. I might have 7 items from class_1
and 3 items from class_2
. I would like to return exactly 5 items from each class, and the following code works:
SELECT id FROM ( SELECT id, type FROM table WHERE type = 1 LIMIT 5 UNION SELECT id, type FROM table WHERE type = 2 LIMIT 5 UNION ... ORDER BY type ASC)
This gets unwieldy if I want a random sampling from ten classes, instead of only three. What is the best way to do this?
(I’m using Presto/Hive, so any tips for those engines would be appreciated).
Advertisement
Answer
Use a function like row_number
to do this. This makes the selection independent of the number of types.
SELECT id,type FROM (SELECT id, type, row_number() over(partition by type order by id) as rnum --adjust the partition by and order by columns as needed FROM table ) T WHERE rnum <= 5