We have two tables in BigQuery like below:
Table A
Name | Question | Answer -----+-----------+------- Bob | Interest | a Bob | Interest | b Sue | Interest | a Sue | Interest | c Joe | Interest | a Joe | Interest | b Joe | Interest | c Joe | Interest | d
Table B (Static)
Interests | Segment --------------------------------------------+------------------ ["a"] | S1 ["a","b"] | S2 ["a", "b", "c", "d"] | S3
Expected table
User | Question | Answer -----+-----------+------- Bob | Interest | a Bob | Interest | b Sue | Interest | a Sue | Interest | c Joe | Interest | a Joe | Interest | b Joe | Interest | c Joe | Interest | d (+) Bob | Segment | S1 Bob | Segment | S2 Sue | Segment | S1 Joe | Segment | S1 Joe | Segment | S2 Joe | Segment | S3
In the above tables, the Answer field is of string type and Interests is of array type.
Pointers:
- One user can have one or more interests.
- One or more interests can belong to one segment.
- A user will be assigned to a segment only when all of his interests are matched.
Any inputs/thoughts in the right direction would be greatly appreciated.
Advertisement
Answer
Below is for BigQuery Standard SQL
#standardSQL select name, question, answer from `project.dataset.tableA` union all select name, 'Segment', segment from ( select name, 'Segment', segment, ( select countif(y is null) from b.interest x left join a.answers y on x = y ) = 0 qualified from ( select name, array_agg(answer) answers from `project.dataset.tableA` group by name ) a, `project.dataset.tableB` b ) where qualified
if to apply to sample data from your question – output is