Skip to content
Advertisement

Big Query – Transpose array/json objects into columns

This question is a continuation of these two:

  1. Big Query – Transpose arrays into colums
  2. Big Query – Transpose Specific fields into Columns

We have a table in Big Query like below.

Input table:

 Name | Question  | Answer
 -----+-----------+-------
 Bob  | Interest  | ["a"]     
 Sue  | Interest  | ["a", "b"]
 Joe  | Interest  | ["b"]
 Joe  | Gender    | Male
 Bob  | Gender    | Female
 Sue  | DOB       | 2020-10-17
 Bob  | Others    | { "country" : "es", "language" : "ca"}

Note: All the values in the Answer column are stringified values and the Arrays / JSON objects are dynamic.

We want to convert the above table to the below format to make it BI/Visualisation friendly.

Desired table:

 +-------------------------------------------------------------+
 | Name | a | b | c | Gender | DOB        | country | language |
 +-------------------------------------------------------------+
 | Bob  | 1 | 0 | 0 | Female | 2020-10-17 |   es    |   ca     |
 | Sue  | 1 | 1 | 0 |   -    |     -      |   -     |   -      |
 | Joe  | 0 | 1 | 0 |  Male  |     -      |   -     |   -      |
 +-------------------------------------------------------------+

Advertisement

Answer

Below is for BigQuery Standard SQL

#standardSQL
create temp table data as
select name, question, value as answer 
from `project.dataset.table`, 
unnest(split(translate(answer, '[]" ', ''))) value
where question = 'Interest'
union all
select name, question, answer 
from `project.dataset.table`
where not question in ('Interest', 'Others')
union all
select name, 
  split(value, ':')[offset(0)] as question, 
  split(value, ':')[offset(1)] as answer 
from `project.dataset.table`, 
unnest(split(translate(answer, '{}" ', ''))) value
where question = 'Others';

EXECUTE IMMEDIATE (
  SELECT """
    SELECT name, """ || STRING_AGG("""MAX(IF(answer = '""" || value || """', 1, 0)) AS """ || value, ', ')   
FROM (
  SELECT DISTINCT answer value FROM data
  WHERE question = 'Interest' ORDER BY value
)) || (
  SELECT ", " || STRING_AGG("""MAX(IF(question = '""" || value || """', answer, '-')) AS """ || value, ', ')   
FROM (
    SELECT DISTINCT question value FROM data
    WHERE question != 'Interest' ORDER BY value
)) || """  
  FROM data 
  GROUP BY name
  """;     

if to apply to sample data from your question

with `project.dataset.table` AS (
  select 'Bob' name, 'Interest' question, '["a"]' answer union all
  select 'Sue', 'Interest', '["a", "b"]' union all
  select 'Joe', 'Interest', '["b"]' union all
  select 'Joe', 'Gender', 'Male' union all
  select 'Bob', 'Gender', 'Female' union all
  select 'Sue', 'DOB', '2020-10-17' union all
  select 'Bob', 'Others', '{ "country" : "es", "language" : "ca"}' 
)    

the output is

enter image description here

Note: EXECUTE IMMEDIATE part of above script is exactly the same as in previous post – the change is only in preparing original data into temp table data and than using it in EXECUTE IMMEDIATE

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement