I have a reasonably complex dataset being pulled into BigQuery table via an Airflow DAG which cannot easily be adjusted.
This job pulls data into a table with this format:
| Line_item_id | Device |
|--------------|----------------|
| 123 | 202; 5; 100 |
| 124 | 100; 2 |
| 135 | 504; 202; 2 |
At the moment, I am using this query (written in standard SQL within the BQ Web UI) to split the device ids into individual nested rows:
SELECT
Line_item_id,
ARRAY(SELECT AS STRUCT(SPLIT(RTRIM(Device,';'),'; '))) as Device,
Output:
| Line_item_id | Device |
|--------------|--------|
| 123 | 202 |
| | 203 |
| | 504 |
| 124 | 102 |
| | 2 |
| 135 | 102 |
The difficulty I am facing is I have a separate match table containing the device ids and their corresponding names. I need to add the device names to the above table, as nested values next to their corresponding ids.
The match table looks something like this (with many more rows):
| Device_id | Device_name |
|-----------|-------------|
| 202 | Smartphone |
| 203 | AppleTV |
| 504 | Laptop |
The ideal output I am looking for would be:
| Line_item_id | Device_id | Device_name |
|--------------|-----------|-------------|
| 123 | 202 | Android |
| | 203 | AppleTV |
| | 504 | Laptop |
| 124 | 102 | iphone |
| | 2 | Unknown |
| 135 | 102 | iphone |
If anybody knows how to achieve this I would be grateful for help.
EDIT:
Gordon’s solution works perfectly, but in addition to this, if anybody wants to re-nest the data afterwards (so you effectively end up with the same table and additional nested rows), this was the query I finally ended up with:
select t.line_item_id, ARRAY_AGG(STRUCT(d as id, ot.name as name)) as device
from first_table t cross join
unnest(split(Device, '; ')) d join
match_table ot
on ot.id = d
GROUP BY line_item_id
Advertisement
Answer
You can move the parsing logic to the from
clause and then join
in what you want:
select *
from (select 124 as line_item_id, '203; 100; 6; 2' as device) t cross join
unnest(split(device, '; ')) d join
other_table ot
on ot.device = d;