Skip to content
Advertisement

How to remove Repeated field in BigQuery schema?

I have a schema that has a repeated field nested into another repeated field like so: person.children.toys. I want to make this inner field not repeated (so child can have only single nullable toy). I know that for such change I need to make a new table with new schema and run SQL query that inserts modified results into it, but I don’t know how to make the query. I need it to select first toy (or null) for each child and insert resulting objects into new table. There is a guarantee that in source table all children have no more than 1 toy.

Advertisement

Answer

Below is for BigQuery Standard SQL

I know – it might look over-complicated – but it totally preserves original schema while eliminating all but first (or null) toys. This can be handy if your real schema has more than just few fields so you don’t need to worry about them

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 id, STRUCT([STRUCT('mike' AS name, ['woody'] AS toys)] AS children) AS person    UNION ALL
  SELECT 2 id, STRUCT([STRUCT('nik', ['buzz', 'bobeep']), ('john', ['car', 'buzz', 'bobeep'])] AS children) AS person   UNION ALL
  SELECT 3 id, STRUCT([STRUCT('vincent', IF(TRUE,[],['']))] AS children) AS person
)
SELECT * 
  REPLACE(
    (SELECT AS STRUCT * 
        REPLACE (
          (SELECT ARRAY_AGG(t) FROM
          (SELECT * REPLACE((SELECT toy FROM UNNEST(toys) toy WITH OFFSET ORDER BY OFFSET LIMIT 1) AS toys) FROM UNNEST(children)) t)
          AS children)
      FROM UNNEST([person]))
  AS person)
FROM `project.dataset.table`     

If to apply to below data

Row id  person.children.name    person.children.toys     
1   1   mike                    toy1     
2   2   nik                     toy2     
                                toy3     
        john                    toy4     
                                toy5     
                                toy6     
3   3   vincent                

result will be

Row id  person.children.name    person.children.toys     
1   1   mike                    toy1     
2   2   nik                     toy2     
        john                    toy4     
3   3   vincent                 null       

Note: toys field originally REPEATED STRING becomes just STRING

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement