Skip to content
Advertisement

BigQuery count each item in array across table

I cannot quite find what I’m looking for, so here goes:

I’m looking for a way to get a count of the number of times an item occurs in an array across the entire table.

Imagine you have a table child_names with two columns – user_id and children

  • I know it’s unusual to have two children with same name, but bear with me

.

user_id   children
1          Bob, Jane, Bob 
2          Jeff, Jane
3          Bob, Matt
4          Jane, John

I am looking for a result that would have two columns

Bob      3
Jane     3
Jeff     1
Matt     1
John     1

So far I have this

SELECT
  ARRAY(
    SELECT AS STRUCT child, `count`
    FROM t.children child
    
    LEFT JOIN (
      SELECT AS STRUCT child, COUNT(1) `count`
      FROM t.children child
      GROUP BY child
      
    ) stats
    USING(child) 
  ) hashtag
FROM `child_names` t,
UNNEST(children)

But this gives me a count of how many children have that name per parent, not per table.

I get

Bob    2
Jane   1
Jeff   1
Jane   1
Bob    1
Matt   1
etc.

I hope that makes sense. Any help would be appreciated.

Advertisement

Answer

Use below

SELECT name, COUNT(*) cnt
FROM child_names, 
UNNEST(children) name
GROUP BY name       

if applied to sample data in your question – output is

enter image description here

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement