I have an input table which has a list of all queries run in Bigquery. I need to create two columns one with the dataset name and other with table name.
The queries example is as follows (some could be nested). I need to separate the dataset name and tablename using the dot in the string
Queries | dataset | table |
---|---|---|
select * from advp.accounts ; | advp | accounts |
select * from gqd.customers where id in (select id from gwq.facts) ; | gqd | customers |
gwq | facts | |
create table hsq.sales (id int64, ss string); | hsq | sales |
Advertisement
Answer
You can try using REGEXP_EXTRACT_ALL
to extract all instances of “<database>.<table>“, then unnest the corresponding arrays.
WITH cte AS ( SELECT *, REGEXP_EXTRACT_ALL(Queries, r'([a-zA-Z0-9]+).') AS db_names, REGEXP_EXTRACT_ALL(Queries, r'(.[a-zA-Z0-9]+)') AS table_names FROM tab ) SELECT cte.Queries, dataset, table_ FROM cte, cte.db_names AS dataset, cte.table_names AS table_