We have two tables in BigQuery like below:
Table A
x
Name | Question | Answer
-----+-----------+-------
Bob | Interest | a
Bob | Interest | b
Sue | Interest | a
Sue | Interest | c
Joe | Interest | a
Joe | Interest | b
Joe | Interest | c
Joe | Interest | d
Table B (Static)
Interests | Segment
--------------------------------------------+------------------
["a"] | S1
["a","b"] | S2
["a", "b", "c", "d"] | S3
Expected table
User | Question | Answer
-----+-----------+-------
Bob | Interest | a
Bob | Interest | b
Sue | Interest | a
Sue | Interest | c
Joe | Interest | a
Joe | Interest | b
Joe | Interest | c
Joe | Interest | d
(+)
Bob | Segment | S1
Bob | Segment | S2
Sue | Segment | S1
Joe | Segment | S1
Joe | Segment | S2
Joe | Segment | S3
In the above tables, the Answer field is of string type and Interests is of array type.
Pointers:
- One user can have one or more interests.
- One or more interests can belong to one segment.
- A user will be assigned to a segment only when all of his interests are matched.
Any inputs/thoughts in the right direction would be greatly appreciated.
Advertisement
Answer
Below is for BigQuery Standard SQL
#standardSQL
select name, question, answer from `project.dataset.tableA`
union all
select name, 'Segment', segment
from (
select
name, 'Segment', segment,
( select countif(y is null)
from b.interest x
left join a.answers y
on x = y
) = 0 qualified
from (
select name, array_agg(answer) answers
from `project.dataset.tableA`
group by name
) a, `project.dataset.tableB` b
)
where qualified
if to apply to sample data from your question – output is