Skip to content
Advertisement

Fix duplicate rows while selecting nested fields from BigQuery using JOIN and UNNEST

I am trying to copy data from one BQ table to another and having trouble with un-nesting and JOINs. I have two fields of REPEATED RECORD type in my schema and when I try to JOIN those, I get a lot of duplicate rows. HereĀ“s my query:

SELECT
  Title as Title,
  ARRAY_AGG(STRUCT(Address.Field1, Address.Field2)) AS Address,
  ARRAY_AGG(STRUCT(Work.Field1, Work.Field2)) AS Work
  
FROM
   source_table
   LEFT JOIN UNNEST(Address) AS Address,
   LEFT JOIN UNNEST(Work) AS Work

GROUP BY
  Title

What I want to achieve is:

| Title | Address.Field1|Address.Field2| Work.Field1|Work.Field2|
|-------|---------------|--------------|------------|-----------|
| A     |     aaa       |     ccc      |     eee    |     ggg   |
|       |     bbb       |     ddd      |     fff    |     hhh   |

But when I JOIN Address and Work, I get a log of duplicate rows like this:

| Title | Address.Field1|Address.Field2| Work.Field1|Work.Field2|
|-------|---------------|--------------|------------|-----------|
| A     |     aaa       |     ccc      |     eee    |     ggg   |
|       |     aaa       |     ccc      |     fff    |     hhh   |
|       |     aaa       |     ccc      |     eee    |     ggg   |
|       |     bbb       |     ddd      |     fff    |     hhh   |

I need help with fixing the JOINS and would appreciate some suggestions. Thanks in advance.

Advertisement

Answer

Try subqueries:

SELECT
  Title as Title,
  ARRAY(select STRUCT(Field1, Field2) from UNNEST(source_table.Address)) AS Address,
  ARRAY(select STRUCT(Field1, Field2) from UNNEST(source_table.Work)) AS Work
FROM
   source_table
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement