I have a table like this
type Total
A 100
A 123
A 154
A 50
A 54
B 200
B 166
B 423
B 342
B 213
C 520
C 130
C 234
C 512
I want to select the top 3 total by groups. How can i do it?
Advertisement
Answer
In most [big data] use cases using ROW_NUMBER() is not fine as it ends up with resource exceeded error. This is because it requires all point of same group be present in same/one node which in case of data skew leads to above mentioned error in BigQuery
Option 1
One of the usual ways to address this issue is using ARRAY_AGG() function as it is in below below example
#standardSQL
SELECT type, total FROM (
SELECT type, ARRAY_AGG(total ORDER BY total DESC LIMIT 3) arr
FROM `project.dataset.table` GROUP BY type
), UNNEST(arr) total
If to run above against data example from your question
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'A' type, 100 total UNION ALL
SELECT 'A', 123 UNION ALL
SELECT 'A', 154 UNION ALL
SELECT 'A', 50 UNION ALL
SELECT 'A', 54 UNION ALL
SELECT 'B', 200 UNION ALL
SELECT 'B', 166 UNION ALL
SELECT 'B', 423 UNION ALL
SELECT 'B', 342 UNION ALL
SELECT 'B', 213 UNION ALL
SELECT 'C', 520 UNION ALL
SELECT 'C', 130 UNION ALL
SELECT 'C', 234 UNION ALL
SELECT 'C', 512
)
SELECT type, total FROM (
SELECT type, ARRAY_AGG(total ORDER BY total DESC LIMIT 3) arr
FROM `project.dataset.table` GROUP BY type
), UNNEST(arr) total
-- ORDER BY type
you will get expected result as
Row type total
1 A 154
2 A 123
3 A 100
4 B 423
5 B 342
6 B 213
7 C 520
8 C 512
9 C 234
Option 2
But there is yet another interesting option to consider for really big data – to use APPROX_TOP_SUM() function as in below example
#standardSQL
SELECT type, value AS total FROM (
SELECT type, APPROX_TOP_SUM(total, total, 3) arr
FROM `project.dataset.table` GROUP BY type
), UNNEST(arr)
obviously, with the same output as above for sample data