Skip to content
Advertisement

How to do the equivalent of ‘distinct’ on array field in BQ?

Let’s take the following data:

enter image description here

It can also be generated in BQ with the following statement:

with x as (
    select 1 as RowID, DATE('2014-01-01') as Date, [200, 300] as Payments, ["Contractor", "Monday"] as Tags union all
    select 2 as RowID, DATE('2014-01-02') as Date, [100, 200, 200, 200, 700] as Payments, ["Contractor", "Bi-Weekly"] as Tags
)
select * from x

How would I do the following two ‘distinct’ totals on the right of the following:

enter image description here

That is, I want to get a “total” that doesn’t double-count, or rather, only gets the distinct items based on the RowID.

Advertisement

Answer

If I understand correctly, it is the total row that you really care about. You can calculate the first two columns by unnesting both tags and payments in parallel.

Then for the total, unnest the payments without the tags:

with x as (
      select 1 as RowID, DATE('2014-01-01') as Date, [200, 300] as Payments, ["Contractor", "Monday"] as Tags union all
      select 2 as RowID, DATE('2014-01-02') as Date, [100, 200, 200, 200, 700] as Payments, ["Contractor", "Bi-Weekly"] as Tags
     )
select tag, count(*), sum(payment)
from x cross join
     unnest(x.payments) payment cross join
     unnest(x.tags) tag
group by 1
union all 
select 'Total', count(*), sum(payment)
from x cross join
      unnest(x.payments) payment;
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement