I have this query that helps me to find separate key words within strings (very useful with utm_campaign and utm_content):
SELECT utm_campaign, splits[SAFE_OFFSET(0)] AS country, splits[SAFE_OFFSET(1)] AS product, splits[SAFE_OFFSET(2)] AS budget, splits[SAFE_OFFSET(3)] AS source, splits[SAFE_OFFSET(4)] AS campaign, splits[SAFE_OFFSET(5)] AS audience FROM ( SELECT utm_campaign, SPLIT(REGEXP_REPLACE( utm_campaign, r'([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_(.+)', r'1|2|3|4|5|6|7'), '|') AS splits FROM funnel_campaign)
For example: if I have a umt_campaign like this:
us_latam_mkt_google_black-friday_audiencie-custom_NNN-NNN_nnn_trafic_responsiv
The query from above will help me to separate each word with a _ in between. So I’ll have a result like this:
utm_campaign | country | product | budget | source | campaign | audience |
---|---|---|---|---|---|---|
us_latam_mkt_google_black-friday_audiencie-custom_NNN-NNN_nnn_trafic_responsiv | us | latam | mkt | black-friday | audience-custom |
- What I want from the query from above is to give me in this case the
audience
column. I tried to add the query from above as a sub-query on this query in REVENUE because in this table I don’t have theaudience
column but I have theutm_campaign
column. Inside the utm_campaign string, the sixth fragment is the audience (with this query I have the error “Scalar subquery produced more than one element”):
WITH COST AS ( SELECT POS AS POS, DATE AS DATE, EXTRACT(WEEK FROM DATE) AS WEEK, SOURCE AS SOURCE, MEDIUM AS MEDIUM, CAMPAIGN AS CAMPAIGN, AD_CONTENT, FORMAT AS FORMAT, "" AS BU_OD, SUM(CLICKS)/1000 AS CLICKS, SUM(IMPRESSIONS)/1000 AS IMPRESSIONS, SUM(COST)/1000 AS COST, sum(0) as SESSIONS, SUM(0) AS TRANSACTIONS, SUM(0) AS search_flight_pv, SUM(0) AS search_flight_upv, SUM(0) AS PAX, SUM(0) AS REVENUE, FROM MSR_funnel_campaign_table WHERE DATE >= DATE '2019-01-01' AND MEDIUM NOT LIKE 'DISPLAY_CORP' GROUP BY 1,2,3,4,5,6,7,8,9 ), REVENUE AS( SELECT POS AS POS, date AS DATE, EXTRACT(WEEK FROM DATE) AS WEEK, SOURCE_CAT AS SOURCE, medium_group_2 AS MEDIUM, CAMPAIGN AS CAMPAIGN, AD_CONTENT, CASE WHEN SOURCE_CAT = 'FACEBOOK' THEN ( SELECT splits[SAFE_OFFSET(5)] AS FORMAT, FROM ( SELECT ad_content, SPLIT(REGEXP_REPLACE( ad_content, r'([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_(.+)', r'1|2|3|4|5|6|7'), '|') AS splits FROM ga_digital_marketing)) END AS FORMAT, BU_OD AS BU_OD, SUM(0) AS CLICKS, SUM(0) AS IMPRESSIONS, SUM(0) AS COST, sum(sessions)/1000 as SESSIONS, SUM(TRANSACTIONS)/1000 AS TRANSACTIONS, SUM(search_flight_pv)/1000 AS search_flight_pv, SUM(search_flight_upv)/1000 AS search_flight_upv, SUM(PAX)/1000 AS PAX, SUM(REVENUE)/1000 AS REVENUE, FROM ga_digital_marketing WHERE PAX_TYPE = 'PAID' AND DATE >= DATE '2019-01-01' AND MEDIUM NOT LIKE 'DISPLAY_CORP' GROUP BY 1,2,3,4,5,6,7,8,9 ), COST_REVENUE AS ( SELECT * FROM COST UNION ALL SELECT * FROM REVENUE ) SELECT DATE, WEEK, POS, SOURCE, MEDIUM, CAMPAIGN, AD_CONTENT, FORMAT, BU, CLICKS, IMPRESSIONS, SESSIONS, TRANSACTIONS, search_flight_pv, search_flight_upv, COST, PAX, REVENUE, FROM COST_REVENUE WHERE 1=1 AND DATE >= '2019-01-01'
What am I doing wrong here?
What I would like too see is having a match between the format dimension from COST and the format dimension from REVENUE (which it doesn’t exists, but it is within the campaign column).
Advertisement
Answer
You don’t really need the interior select
statements as your campaign data should be in the same row of the table.
Change this:
CASE WHEN SOURCE_CAT = 'FACEBOOK' THEN ( SELECT splits[SAFE_OFFSET(5)] AS FORMAT, FROM ( SELECT ad_content, SPLIT(REGEXP_REPLACE( ad_content, r'([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_(.+)', r'1|2|3|4|5|6|7'), '|') AS splits FROM ga_digital_marketing)) END AS FORMAT,
to something like this:
-- also replacing case with if for only 1 case IF(SOURCE_CAT = 'FACEBOOK', SPLIT(REGEXP_REPLACE( ad_content, r'([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_(.+)', r'1|2|3|4|5|6|7'), '|')[SAFE_OFFSET(5)], NULL) AS FORMAT,