This question is a continuation of these two:
We have a table in Big Query like below.
Input table:
Name | Question | Answer -----+-----------+------- Bob | Interest | ["a"] Sue | Interest | ["a", "b"] Joe | Interest | ["b"] Joe | Gender | Male Bob | Gender | Female Sue | DOB | 2020-10-17 Bob | Others | { "country" : "es", "language" : "ca"}
Note: All the values in the Answer column are stringified values and the Arrays / JSON objects are dynamic.
We want to convert the above table to the below format to make it BI/Visualisation friendly.
Desired table:
+-------------------------------------------------------------+ | Name | a | b | c | Gender | DOB | country | language | +-------------------------------------------------------------+ | Bob | 1 | 0 | 0 | Female | 2020-10-17 | es | ca | | Sue | 1 | 1 | 0 | - | - | - | - | | Joe | 0 | 1 | 0 | Male | - | - | - | +-------------------------------------------------------------+
Advertisement
Answer
Below is for BigQuery Standard SQL
#standardSQL create temp table data as select name, question, value as answer from `project.dataset.table`, unnest(split(translate(answer, '[]" ', ''))) value where question = 'Interest' union all select name, question, answer from `project.dataset.table` where not question in ('Interest', 'Others') union all select name, split(value, ':')[offset(0)] as question, split(value, ':')[offset(1)] as answer from `project.dataset.table`, unnest(split(translate(answer, '{}" ', ''))) value where question = 'Others'; EXECUTE IMMEDIATE ( SELECT """ SELECT name, """ || STRING_AGG("""MAX(IF(answer = '""" || value || """', 1, 0)) AS """ || value, ', ') FROM ( SELECT DISTINCT answer value FROM data WHERE question = 'Interest' ORDER BY value )) || ( SELECT ", " || STRING_AGG("""MAX(IF(question = '""" || value || """', answer, '-')) AS """ || value, ', ') FROM ( SELECT DISTINCT question value FROM data WHERE question != 'Interest' ORDER BY value )) || """ FROM data GROUP BY name """;
if to apply to sample data from your question
with `project.dataset.table` AS ( select 'Bob' name, 'Interest' question, '["a"]' answer union all select 'Sue', 'Interest', '["a", "b"]' union all select 'Joe', 'Interest', '["b"]' union all select 'Joe', 'Gender', 'Male' union all select 'Bob', 'Gender', 'Female' union all select 'Sue', 'DOB', '2020-10-17' union all select 'Bob', 'Others', '{ "country" : "es", "language" : "ca"}' )
the output is
Note: EXECUTE IMMEDIATE
part of above script is exactly the same as in previous post – the change is only in preparing original data into temp table data
and than using it in EXECUTE IMMEDIATE