Skip to content
Advertisement

Google BigQuery: How to filter out rows by a particular column’s value frequency

Say that I only want to return rows where a column value occurs at least twice.

I would do something like

SELECT 
table1.columnA
from table1
GROUP BY 
table1.columnA
HAVING COUNT(*) > 2

That works for just one column, but if I want to do return several columns but only have the filter apply to one column, it doesn’t work. My attempt is

SELECT 
table1.columnA,
table1.columnB,
from table1
GROUP BY 
table1.columnA
HAVING COUNT(*) > 2

Which gives a “ColumnB which is neither GROUPED nor AGGREGATED ” error.

From this post, it seems that I need to have all values in SELECT to be grouped or aggregated, but I only one to filter by one particular column

BIGQUERY SELECT list expression references column CHANNEL_ID which is neither grouped nor aggregated at [10:13]

So I’m still trying to figure out a way to filter by value frequency for a particular column.

Advertisement

Answer

You can use window function to count frequency and then filter. For example:

select distinct 
    columnA,
    columnB
from
    (select 
        *,
        row_number() over(partition by columnA) as rn 
     from table1)
where rn > 2

Let me know, if it is still not working for you.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement