BigQuery – JOIN on two tables using string and array

We have two tables in BigQuery like below:

Table A

 Name | Question  | Answer
 -----+-----------+-------
 Bob  | Interest  | a
 Bob  | Interest  | b
 Sue  | Interest  | a
 Sue  | Interest  | c
 Joe  | Interest  | a
 Joe  | Interest  | b
 Joe  | Interest  | c
 Joe  | Interest  | d

​x
 
 Name | Question  | Answer -----+-----------+------- Bob  | Interest  | a Bob  | Interest  | b Sue  | Interest  | a Sue  | Interest  | c Joe  | Interest  | a Joe  | Interest  | b Joe  | Interest  | c Joe  | Interest  | d​

Table B (Static)

           Interests                        |   Segment
--------------------------------------------+------------------
["a"]                                       |   S1
["a","b"]                                   |   S2 
["a", "b", "c", "d"]                        |   S3

 
           Interests                        |   Segment--------------------------------------------+------------------["a"]                                       |   S1["a","b"]                                   |   S2 ["a", "b", "c", "d"]                        |   S3​

Expected table

 User | Question  | Answer
 -----+-----------+-------
 Bob  | Interest  | a
 Bob  | Interest  | b
 Sue  | Interest  | a
 Sue  | Interest  | c
 Joe  | Interest  | a
 Joe  | Interest  | b
 Joe  | Interest  | c
 Joe  | Interest  | d
          (+)
 Bob  | Segment   | S1
 Bob  | Segment   | S2
 Sue  | Segment   | S1
 Joe  | Segment   | S1
 Joe  | Segment   | S2
 Joe  | Segment   | S3

 
 User | Question  | Answer -----+-----------+------- Bob  | Interest  | a Bob  | Interest  | b Sue  | Interest  | a Sue  | Interest  | c Joe  | Interest  | a Joe  | Interest  | b Joe  | Interest  | c Joe  | Interest  | d          (+) Bob  | Segment   | S1 Bob  | Segment   | S2 Sue  | Segment   | S1 Joe  | Segment   | S1 Joe  | Segment   | S2 Joe  | Segment   | S3 ​

In the above tables, the Answer field is of string type and Interests is of array type.

Pointers:

One user can have one or more interests.
One or more interests can belong to one segment.
A user will be assigned to a segment only when all of his interests are matched.

Any inputs/thoughts in the right direction would be greatly appreciated.

Answer

Below is for BigQuery Standard SQL

#standardSQL
select name, question, answer from `project.dataset.tableA`
union all
select name, 'Segment', segment
from (
  select 
    name, 'Segment', segment,
    ( select countif(y is null)
      from b.interest x
      left join a.answers y
      on x = y
    ) = 0 qualified
  from (
    select name, array_agg(answer) answers
    from `project.dataset.tableA`
    group by name
  ) a, `project.dataset.tableB` b
)
where qualified

 
#standardSQLselect name, question, answer from `project.dataset.tableA`union allselect name, 'Segment', segmentfrom (  select     name, 'Segment', segment,    ( select countif(y is null)      from b.interest x      left join a.answers y      on x = y    ) = 0 qualified  from (    select name, array_agg(answer) answers    from `project.dataset.tableA`    group by name  ) a, `project.dataset.tableB` b)where qualified    ​

if to apply to sample data from your question – output is

Advertisement

Answer