I have a schema that has a repeated field nested into another repeated field like so: person.children.toys
. I want to make this inner field not repeated (so child can have only single nullable toy). I know that for such change I need to make a new table with new schema and run SQL query that inserts modified results into it, but I don’t know how to make the query. I need it to select first toy (or null) for each child and insert resulting objects into new table. There is a guarantee that in source table all children have no more than 1 toy.
Advertisement
Answer
Below is for BigQuery Standard SQL
I know – it might look over-complicated – but it totally preserves original schema while eliminating all but first (or null) toys. This can be handy if your real schema has more than just few fields so you don’t need to worry about them
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, STRUCT([STRUCT('mike' AS name, ['woody'] AS toys)] AS children) AS person UNION ALL
SELECT 2 id, STRUCT([STRUCT('nik', ['buzz', 'bobeep']), ('john', ['car', 'buzz', 'bobeep'])] AS children) AS person UNION ALL
SELECT 3 id, STRUCT([STRUCT('vincent', IF(TRUE,[],['']))] AS children) AS person
)
SELECT *
REPLACE(
(SELECT AS STRUCT *
REPLACE (
(SELECT ARRAY_AGG(t) FROM
(SELECT * REPLACE((SELECT toy FROM UNNEST(toys) toy WITH OFFSET ORDER BY OFFSET LIMIT 1) AS toys) FROM UNNEST(children)) t)
AS children)
FROM UNNEST([person]))
AS person)
FROM `project.dataset.table`
If to apply to below data
Row id person.children.name person.children.toys
1 1 mike toy1
2 2 nik toy2
toy3
john toy4
toy5
toy6
3 3 vincent
result will be
Row id person.children.name person.children.toys
1 1 mike toy1
2 2 nik toy2
john toy4
3 3 vincent null
Note: toys field originally REPEATED STRING
becomes just STRING