Skip to content
Advertisement

Add nested column to BigQuery table, joining on value of another nested column in standard SQL

I have a reasonably complex dataset being pulled into BigQuery table via an Airflow DAG which cannot easily be adjusted.

This job pulls data into a table with this format:

| Line_item_id | Device         |
|--------------|----------------|
| 123          | 202; 5; 100    |
| 124          | 100; 2         |
| 135          | 504; 202; 2    |

At the moment, I am using this query (written in standard SQL within the BQ Web UI) to split the device ids into individual nested rows:

SELECT 
  Line_item_id,
  ARRAY(SELECT AS STRUCT(SPLIT(RTRIM(Device,';'),'; '))) as Device,

Output:

| Line_item_id | Device |
|--------------|--------|
| 123          | 202    |
|              | 203    |
|              | 504    |
| 124          | 102    |
|              | 2      |
| 135          | 102    |

The difficulty I am facing is I have a separate match table containing the device ids and their corresponding names. I need to add the device names to the above table, as nested values next to their corresponding ids.

The match table looks something like this (with many more rows):

| Device_id | Device_name |
|-----------|-------------|
| 202       | Smartphone  |
| 203       | AppleTV     |
| 504       | Laptop      |

The ideal output I am looking for would be:

| Line_item_id | Device_id | Device_name |
|--------------|-----------|-------------|
| 123          | 202       | Android     |
|              | 203       | AppleTV     |
|              | 504       | Laptop      |
| 124          | 102       | iphone      |
|              | 2         | Unknown     |
| 135          | 102       | iphone      |

If anybody knows how to achieve this I would be grateful for help.

EDIT:

Gordon’s solution works perfectly, but in addition to this, if anybody wants to re-nest the data afterwards (so you effectively end up with the same table and additional nested rows), this was the query I finally ended up with:

select t.line_item_id, ARRAY_AGG(STRUCT(d as id, ot.name as name)) as device
from first_table t cross join
     unnest(split(Device, '; ')) d join
     match_table ot
     on ot.id  = d
GROUP BY line_item_id

Advertisement

Answer

You can move the parsing logic to the from clause and then join in what you want:

select *
from (select 124 as line_item_id, '203; 100; 6; 2' as device) t cross join
     unnest(split(device, '; ')) d join
     other_table ot
     on ot.device = d;
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement