Skip to content
Advertisement

Regex to retrieve dataset and table name using dot in SQL BigQUery

I have an input table which has a list of all queries run in Bigquery. I need to create two columns one with the dataset name and other with table name.

The queries example is as follows (some could be nested). I need to separate the dataset name and tablename using the dot in the string

Queries dataset table
select * from advp.accounts ; advp accounts
select * from gqd.customers where id in (select id from gwq.facts) ; gqd customers
gwq facts
create table hsq.sales (id int64, ss string); hsq sales

Advertisement

Answer

You can try using REGEXP_EXTRACT_ALL to extract all instances of “<database>.<table>“, then unnest the corresponding arrays.

WITH cte AS (
    SELECT *, REGEXP_EXTRACT_ALL(Queries, r'([a-zA-Z0-9]+).') AS db_names, 
              REGEXP_EXTRACT_ALL(Queries, r'(.[a-zA-Z0-9]+)') AS table_names
    FROM tab
)
SELECT cte.Queries, dataset, table_
FROM cte, cte.db_names AS dataset, cte.table_names AS table_
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement