Skip to content
Advertisement

BigQuery: How to aggregate records in a STRUCT or JSON field?

I want to aggregate a pair of columns as a dictionary of key,value pairs to construct a STRUCT or a JSON STRING over multiple records.

Currently my implementation leverages the STRING nature of JSON & STRING_AGG to build such a JSON value:

SELECT
    id, 
    (
        "{" ||
        STRING_AGG(
            '"' || base.key || '":' || CAST(base.val AS STRING),
            ','
        ) || "}"
    )
    AS json_val
FROM (
    SELECT 1 AS id, "a" AS `key`, 100 AS val
    UNION ALL
    SELECT 1 AS id, "b" AS `key`, 200 AS val
    UNION ALL
    SELECT 1 AS id, "c" AS `key`, 300 AS val
    UNION ALL
    SELECT 2 AS id, "a" AS `key`, 400 AS val
    UNION ALL
    SELECT 2 AS id, "b" AS `key`, 500 AS val
    UNION ALL
    SELECT 2 AS id, "c" AS `key`, 600 AS val
    UNION ALL
    SELECT 3 AS id, "a" AS `key`, 700 AS val
) base
GROUP BY 1

Which results in the following:

id,json_val
1 ,{"a":100,"b":200,"c":300}    
2 ,{"a":400,"b":500,"c":600}    
3 ,{"a":700}

Is there a more readable approach? Kind of a STRUCT_AGG(key_field STRING, value_field ) or equivalently signed JSON_DICT_AGG/JSON_STRUCT_AGG?

Advertisement

Answer

Consider below approach

SELECT id, 
  REPLACE(TRANSLATE(FORMAT('%T', ARRAY_AGG(STRUCT(key, val))), '[]()', '{}'), '", ', '":') json_val
FROM (
    SELECT 1 AS id, "a" AS `key`, 100 AS val UNION ALL
    SELECT 1 AS id, "b" AS `key`, 200 AS val UNION ALL
    SELECT 1 AS id, "c" AS `key`, 300 AS val UNION ALL
    SELECT 2 AS id, "a" AS `key`, 400 AS val UNION ALL
    SELECT 2 AS id, "b" AS `key`, 500 AS val UNION ALL
    SELECT 2 AS id, "c" AS `key`, 600 AS val UNION ALL
    SELECT 3 AS id, "a" AS `key`, 700 AS val
) base
GROUP BY id       

with output

enter image description here

If you want to wrap up this into UDF – see below example

CREATE TEMP FUNCTION ARRAY_TO_JSON(arr ANY TYPE) AS (
  REPLACE(TRANSLATE(FORMAT('%T', arr), '[]()', '{}'), '", ', '":')
);
SELECT id, 
  ARRAY_TO_JSON(ARRAY_AGG(STRUCT(key, val))) json_val
FROM (
. . .
) base
GROUP BY id       

And btw, you can do exactly same “trick” with your own version – as in below example

CREATE TEMP FUNCTION ARRAY_TO_JSON(arr ANY TYPE) AS ((
  SELECT "{" || STRING_AGG('"' || key || '":' || val) || "}"
  FROM UNNEST(arr)
));
SELECT id, 
  ARRAY_TO_JSON(ARRAY_AGG(STRUCT(key, val))) json_val
FROM (
. . . 
) base
GROUP BY id     

All with same output already presented on the top of the answer

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement