I have a table like this
type Total A 100 A 123 A 154 A 50 A 54 B 200 B 166 B 423 B 342 B 213 C 520 C 130 C 234 C 512
I want to select the top 3 total by groups. How can i do it?
Advertisement
Answer
In most [big data] use cases using ROW_NUMBER() is not fine as it ends up with resource exceeded error. This is because it requires all point of same group be present in same/one node which in case of data skew leads to above mentioned error in BigQuery
Option 1
One of the usual ways to address this issue is using ARRAY_AGG() function as it is in below below example
#standardSQL SELECT type, total FROM ( SELECT type, ARRAY_AGG(total ORDER BY total DESC LIMIT 3) arr FROM `project.dataset.table` GROUP BY type ), UNNEST(arr) total
If to run above against data example from your question
#standardSQL WITH `project.dataset.table` AS ( SELECT 'A' type, 100 total UNION ALL SELECT 'A', 123 UNION ALL SELECT 'A', 154 UNION ALL SELECT 'A', 50 UNION ALL SELECT 'A', 54 UNION ALL SELECT 'B', 200 UNION ALL SELECT 'B', 166 UNION ALL SELECT 'B', 423 UNION ALL SELECT 'B', 342 UNION ALL SELECT 'B', 213 UNION ALL SELECT 'C', 520 UNION ALL SELECT 'C', 130 UNION ALL SELECT 'C', 234 UNION ALL SELECT 'C', 512 ) SELECT type, total FROM ( SELECT type, ARRAY_AGG(total ORDER BY total DESC LIMIT 3) arr FROM `project.dataset.table` GROUP BY type ), UNNEST(arr) total -- ORDER BY type
you will get expected result as
Row type total 1 A 154 2 A 123 3 A 100 4 B 423 5 B 342 6 B 213 7 C 520 8 C 512 9 C 234
Option 2
But there is yet another interesting option to consider for really big data – to use APPROX_TOP_SUM() function as in below example
#standardSQL SELECT type, value AS total FROM ( SELECT type, APPROX_TOP_SUM(total, total, 3) arr FROM `project.dataset.table` GROUP BY type ), UNNEST(arr)
obviously, with the same output as above for sample data