I have JSON data which is saved in BigQuery as a string.
{ "event":{ "action":"prohibitedSoftwareCheckResult", "clientTime":"2017-07-16T12:55:40.828Z", "clientTimeZone":"3", "serverTime":"2017-07-16T12:55:39.000Z", "processList":{ "1":"outlook.exe", "2":"notepad.exe" } }, "user":{ "id":123456, } }
I want to have a result set where each process will be in a different row. Something like:
UserID ProcessName ------------------------- 123456 outlook.exe 123456 notepad.exe
I saw there is an option to query repeated data but the field needs to be RECORD type to my understanding.
Is it possible to convert to RECORD type “on the fly” in a subquery? (I can’t change the source field to RECORD).
Or, is there a different way to return the desired result set?
Advertisement
Answer
This could be a possible work around for you:
SELECT user_id, processListValues FROM( SELECT JSON_EXTRACT_SCALAR(json_data, '$.user.id') user_id, REGEXP_EXTRACT_ALL(JSON_EXTRACT(json_data, '$.event.processList'), r':"([a-zA-Z0-9.]+)"') processListValues FROM data ), UNNEST(processListValues) processListValues
Using your JSON as example:
WITH data AS( SELECT """{ "event":{ "action":"prohibitedSoftwareCheckResult", "clientTime":"2017-07-16T12:55:40.828Z", "clientTimeZone":"3", "serverTime":"2017-07-16T12:55:39.000Z", "processList":{ "1":"outlook.exe", "2":"notepad.exe", "3":"outlo3245345okexe" } }, "user":{ "id":123456, } }""" as json_data ) SELECT user_id, processListValues FROM( SELECT JSON_EXTRACT_SCALAR(json_data, '$.user.id') user_id, REGEXP_EXTRACT_ALL(JSON_EXTRACT(json_data, '$.event.processList'), r':"([a-zA-Z0-9.]+)"') processListValues FROM data ), UNNEST(processListValues) processListValues
Results:
Row user_id processListValues 1 123456 outlook.exe 2 123456 notepad.exe 3 123456 outlo3245345okexe