I have a schema that has a repeated field nested into another repeated field like so: person.children.toys
. I want to make this inner field not repeated (so child can have only single nullable toy). I know that for such change I need to make a new table with new schema and run SQL query that inserts modified results into it, but I don’t know how to make the query. I need it to select first toy (or null) for each child and insert resulting objects into new table. There is a guarantee that in source table all children have no more than 1 toy.
Advertisement
Answer
Below is for BigQuery Standard SQL
I know – it might look over-complicated – but it totally preserves original schema while eliminating all but first (or null) toys. This can be handy if your real schema has more than just few fields so you don’t need to worry about them
#standardSQL WITH `project.dataset.table` AS ( SELECT 1 id, STRUCT([STRUCT('mike' AS name, ['woody'] AS toys)] AS children) AS person UNION ALL SELECT 2 id, STRUCT([STRUCT('nik', ['buzz', 'bobeep']), ('john', ['car', 'buzz', 'bobeep'])] AS children) AS person UNION ALL SELECT 3 id, STRUCT([STRUCT('vincent', IF(TRUE,[],['']))] AS children) AS person ) SELECT * REPLACE( (SELECT AS STRUCT * REPLACE ( (SELECT ARRAY_AGG(t) FROM (SELECT * REPLACE((SELECT toy FROM UNNEST(toys) toy WITH OFFSET ORDER BY OFFSET LIMIT 1) AS toys) FROM UNNEST(children)) t) AS children) FROM UNNEST([person])) AS person) FROM `project.dataset.table`
If to apply to below data
Row id person.children.name person.children.toys 1 1 mike toy1 2 2 nik toy2 toy3 john toy4 toy5 toy6 3 3 vincent
result will be
Row id person.children.name person.children.toys 1 1 mike toy1 2 2 nik toy2 john toy4 3 3 vincent null
Note: toys field originally REPEATED STRING
becomes just STRING