I have a reasonably complex dataset being pulled into BigQuery table via an Airflow DAG which cannot easily be adjusted.
This job pulls data into a table with this format:
| Line_item_id | Device | |--------------|----------------| | 123 | 202; 5; 100 | | 124 | 100; 2 | | 135 | 504; 202; 2 |
At the moment, I am using this query (written in standard SQL within the BQ Web UI) to split the device ids into individual nested rows:
SELECT Line_item_id, ARRAY(SELECT AS STRUCT(SPLIT(RTRIM(Device,';'),'; '))) as Device,
Output:
| Line_item_id | Device | |--------------|--------| | 123 | 202 | | | 203 | | | 504 | | 124 | 102 | | | 2 | | 135 | 102 |
The difficulty I am facing is I have a separate match table containing the device ids and their corresponding names. I need to add the device names to the above table, as nested values next to their corresponding ids.
The match table looks something like this (with many more rows):
| Device_id | Device_name | |-----------|-------------| | 202 | Smartphone | | 203 | AppleTV | | 504 | Laptop |
The ideal output I am looking for would be:
| Line_item_id | Device_id | Device_name | |--------------|-----------|-------------| | 123 | 202 | Android | | | 203 | AppleTV | | | 504 | Laptop | | 124 | 102 | iphone | | | 2 | Unknown | | 135 | 102 | iphone |
If anybody knows how to achieve this I would be grateful for help.
EDIT:
Gordon’s solution works perfectly, but in addition to this, if anybody wants to re-nest the data afterwards (so you effectively end up with the same table and additional nested rows), this was the query I finally ended up with:
select t.line_item_id, ARRAY_AGG(STRUCT(d as id, ot.name as name)) as device from first_table t cross join unnest(split(Device, '; ')) d join match_table ot on ot.id = d GROUP BY line_item_id
Advertisement
Answer
You can move the parsing logic to the from
clause and then join
in what you want:
select * from (select 124 as line_item_id, '203; 100; 6; 2' as device) t cross join unnest(split(device, '; ')) d join other_table ot on ot.device = d;