I need to fetch key value pairs from the second object in array. Also, need to create new columns with the fetched data. I am only interested in the second object, some arrays have 3 objects, some have 4 etc. The data looks like this:
[{'adUnitCode': ca-pub, 'id': 35, 'name': ca-pub}, {'adUnitCode': hmies, 'id': 49, 'name': HMIES}, {'adUnitCode': moda, 'id': 50, 'name': moda}, {'adUnitCode': nova, 'id': 55, 'name': nova}, {'adUnitCode': listicle, 'id': 11, 'name': listicle}] [{'adUnitCode': ca-pub, 'id': 35, 'name': ca-pub-73}, {'adUnitCode': hmiuk-jam, 'id': 23, 'name': HM}, {'adUnitCode': recipes, 'id': 26, 'name': recipes}] [{'adUnitCode': ca-pub, 'id': 35, 'name': ca-pub-733450927}, {'adUnitCode': digital, 'id': 48, 'name': Digital}, {'adUnitCode': movies, 'id': 50, 'name': movies}, {'adUnitCode': cannes-film-festival, 'id': 57, 'name': cannes-film-festival}, {'adUnitCode': article, 'id': 57, 'name': article}]
The desired output:
adUnitCode id name hmies 49 HMIES hmiuk-jam 23 HM digital 48 Digital
Advertisement
Answer
Below is for BigQuery Standard SQL
#standardSQL select json_extract_scalar(second_object, "$.adUnitCode") as adUnitCode, json_extract_scalar(second_object, "$.id") as id, json_extract_scalar(second_object, "$.name") as name from `project.dataset.table`, unnest( [json_extract_array(regexp_replace(mapping, r"(: )([w-]+)(,|})", "\1'\2'\3"))[safe_offset(1)]] ) as second_object
if applied to sample data from your question – output is
as you can see, the “trick” here is to use proper regexp in regexp_replace function. I’ve included now any alphabetical chars and -
. you can include more as you see needed
As an alternative yo can try regexp_replace(mapping, r"(: )([^,}]+)", "\1'\2'")
as in below example – so you will cover potentially more cases without changes in code
#standardSQL select json_extract_scalar(second_object, "$.adUnitCode") as adUnitCode, json_extract_scalar(second_object, "$.id") as id, json_extract_scalar(second_object, "$.name") as name from `project.dataset.table`, unnest( [json_extract_array(regexp_replace(mapping, r"(: )([^,}]+)", "\1'\2'"))[safe_offset(1)]] ) as second_object