I am trying to capture all elements and store in separate column from the below string,seprated via underscores(campaign name for an advertisement) and then I wish to compare it with a master table having the true values to determine how accurate the data is being recorded.
My first element extraction was : REGEXP_EXTRACT(campaign_name, r"[^_+]{3}")) as parsed_campaign_agency
I only extracted first 3 letters because according to the naming convention(truth table), the agency name is made of only 3 letters.
Caveat: Some elements can have variable lengths too. eg. The third element “CrossBMC” could be 3 letters in length or more.
I am new to regex and the data lies in a SQL table(in BigQuery) so I thought it could be achieved via SQL’s regex_extract but what I am having trouble is to extract all elements at once.
Any help is appreciated 🙂
Advertisement
Answer
If number of underscores constant and knows you can use SUBSTRING_INDEX like:
SELECT SUBSTRING_INDEX(campaign_name,'_',1) first, SUBSTRING_INDEX(SUBSTRING_INDEX(campaign_name,'_',2),'_',-1) second, SUBSTRING_INDEX(SUBSTRING_INDEX(campaign_name,'_',3),'_',-1) third FROM your_table;
Here you can try an example SQLize.online