Skip to content
Advertisement

Extract all elements from a string separated by underscores using regex

I am trying to capture all elements and store in separate column from the below string,seprated via underscores(campaign name for an advertisement) and then I wish to compare it with a master table having the true values to determine how accurate the data is being recorded.

eg: Input : enter image description here

Expected output is : enter image description here

My first element extraction was : REGEXP_EXTRACT(campaign_name, r"[^_+]{3}")) as parsed_campaign_agency

I only extracted first 3 letters because according to the naming convention(truth table), the agency name is made of only 3 letters.

Caveat: Some elements can have variable lengths too. eg. The third element “CrossBMC” could be 3 letters in length or more.

I am new to regex and the data lies in a SQL table(in BigQuery) so I thought it could be achieved via SQL’s regex_extract but what I am having trouble is to extract all elements at once.

Any help is appreciated 🙂

Advertisement

Answer

If number of underscores constant and knows you can use SUBSTRING_INDEX like:

SELECT 
    SUBSTRING_INDEX(campaign_name,'_',1) first,
    SUBSTRING_INDEX(SUBSTRING_INDEX(campaign_name,'_',2),'_',-1) second,
    SUBSTRING_INDEX(SUBSTRING_INDEX(campaign_name,'_',3),'_',-1) third
FROM your_table; 

Here you can try an example SQLize.online

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement