I am extracting JSON data out of a BigQuery column using JSON_EXTRACT
. Now I want to extract lists of values and run aggregate functions (like AVG
) against them. Testing the JsonPath expression .objects[*].v
succeeds on http://jsonpath.curiousconcept.com/. But the query:
SELECT JSON_EXTRACT(json_column, "$.id") as id, AVG(JSON_EXTRACT(json_column, "$.objects[*].v")) as average_value FROM [tablename]
throws a JsonPath parse error on BigQuery. Is this possible on BigQuery? Or do I need to preprocess my data in order to run aggregate functions against data inside of my JSON?
My data looks similar to this:
# Record 1 { "id": "abc", "objects": [ { "id": 1, "v": 1 }, { "id": 2, "v": 3 } ] } # Record 2 { "id": "def", "objects": [ { "id": 1, "v": 2 }, { "id": 2, "v": 5 } ] }
This is related to another question.
Update: The problem can be simplified by running two queries. First, run JSON_EXTRACT
and save the results into a view. Secondly, run the aggregate function against this view. But even then I need to correct the JsonPath expression $.objects[*].v
to prevent the JSONPath parse error
.
Advertisement
Answer
Leverage SPLIT() to pivot repeatable fields into separate rows. Also might be easier/cleaner to put this into a subquery and put AVG outside:
SELECT id, AVG(v) as average FROM ( SELECT JSON_EXTRACT(json_column, "$.id") as id, INTEGER( REGEXP_EXTRACT( SPLIT( JSON_EXTRACT(json_column, "$.objects") ,"},{" ) ,r'"v":([^,]+),')) as v FROM [mytable] ) GROUP BY id;