Skip to content
Advertisement

How do I use BigQuery DML to transform some fields of a struct nested within an array, within a struct, within an array?

I think this is a more complex version of the question in Update values in struct arrays in BigQuery.

I’m trying to update some of the fields in a struct, where the struct is heavily nested. I’m having trouble creating the SQL to do it. Here’s my table schema:

CREATE TABLE `my_dataset.test_data_for_so`
(
  date DATE,
  hits ARRAY<STRUCT<search STRUCT<query STRING, other_column STRING>, metadata ARRAY<STRUCT<key STRING, value STRING>>>>
);

This is what the schema looks like in the BigQuery GUI after I create the table:

BigQuery table schema from GUI

Here’s the data I’ve inserted:

INSERT INTO `my_dataset.test_data_for_so` (date, hits)
VALUES (
  CAST('2021-01-01' AS date),
  [
    STRUCT(
      STRUCT<query STRING, other_column STRING>('foo bar', 'foo bar'),
      [
        STRUCT<key STRING, value STRING>('foo bar', 'foo bar')
      ]
    )
  ]
)

My goal is to transform the “search.query” and “metadata.value” fields. For example, uppercasing them, leaving every other column (and every other struct field) in the row unchanged.

I’m looking for a solution involving either manually specifying each column in the SQL, or preferably, one where I can only mention the columns/fields I want to transform in the SQL, omitting all other columns/fields. This is a minimal example. The table I’m working on in production has hundreds of columns and fields.

For example, that row, when transformed this way, would change from:

[
  {
    "date": "2021-01-01",
    "hits": [
      {
        "search": {
          "query": "foo bar",
          "other_column": "foo bar"
        },
        "metadata": [
          {
            "key": "foo bar",
            "value": "foo bar"
          }
        ]
      }
    ]
  }
]

to:

[
  {
    "date": "2021-01-01",
    "hits": [
      {
        "search": {
          "query": "FOO BAR",
          "other_column": "foo bar"
        },
        "metadata": [
          {
            "key": "foo bar",
            "value": "FOO BAR"
          }
        ]
      }
    ]
  }
]

Advertisement

Answer

preferably, one where I can only mention the columns/fields I want to transform in the SQL …

Use below approach – it does exactly what you wish – ONLY those fields that are to be updated are in use, all other (tens or hundreds …) are preserved as is

update your_table
set hits = array(
  select as struct *
    replace(
      (select as struct * replace (upper(query) as query) from unnest([search])) as search, 
      array(select as struct * replace(upper(value) as value) from unnest(metadata)) as metadata
    )
  from unnest(hits)
)
where true;            

if applied to sample data in your question – result is

enter image description here

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement