Skip to content
Advertisement

Bigquery: Select top 3 with Group By condition

I have a table like this

I want to select the top 3 total by groups. How can i do it?

Advertisement

Answer

In most [big data] use cases using ROW_NUMBER() is not fine as it ends up with resource exceeded error. This is because it requires all point of same group be present in same/one node which in case of data skew leads to above mentioned error in BigQuery

Option 1

One of the usual ways to address this issue is using ARRAY_AGG() function as it is in below below example

If to run above against data example from your question

you will get expected result as

Option 2

But there is yet another interesting option to consider for really big data – to use APPROX_TOP_SUM() function as in below example

obviously, with the same output as above for sample data

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement