Skip to content
Advertisement

BigQuery – Scalar subquery produced more than one element –

I have this query that helps me to find separate key words within strings (very useful with utm_campaign and utm_content):

    SELECT
    utm_campaign,
    splits[SAFE_OFFSET(0)] AS country,
    splits[SAFE_OFFSET(1)] AS product,
    splits[SAFE_OFFSET(2)] AS budget,
    splits[SAFE_OFFSET(3)] AS source,
    splits[SAFE_OFFSET(4)] AS campaign,
    splits[SAFE_OFFSET(5)] AS audience
    
    FROM (
      SELECT
        utm_campaign,
        SPLIT(REGEXP_REPLACE(
                utm_campaign,
                r'([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_(.+)',
                r'1|2|3|4|5|6|7'),
              '|') AS splits
      FROM funnel_campaign)

For example: if I have a umt_campaign like this:

us_latam_mkt_google_black-friday_audiencie-custom_NNN-NNN_nnn_trafic_responsiv

The query from above will help me to separate each word with a _ in between. So I’ll have a result like this:

utm_campaign country product budget source campaign audience
us_latam_mkt_google_black-friday_audiencie-custom_NNN-NNN_nnn_trafic_responsiv us latam mkt google black-friday audience-custom
  • What I want from the query from above is to give me in this case the audience column. I tried to add the query from above as a sub-query on this query in REVENUE because in this table I don’t have the audience column but I have the utm_campaign column. Inside the utm_campaign string, the sixth fragment is the audience (with this query I have the error “Scalar subquery produced more than one element”):
    WITH COST AS  (
    
    SELECT 
    
    POS AS POS,
    DATE AS DATE,
    EXTRACT(WEEK FROM DATE) AS WEEK,
    SOURCE AS SOURCE,
    MEDIUM AS MEDIUM,
    CAMPAIGN AS CAMPAIGN,
    AD_CONTENT,
    FORMAT AS FORMAT,
    "" AS BU_OD,
    SUM(CLICKS)/1000 AS CLICKS,
    SUM(IMPRESSIONS)/1000 AS IMPRESSIONS,
    SUM(COST)/1000 AS COST,
    sum(0) as SESSIONS,
    SUM(0) AS TRANSACTIONS,
    SUM(0) AS search_flight_pv,
    SUM(0) AS search_flight_upv,
    SUM(0) AS PAX,
    SUM(0) AS REVENUE,
    
    FROM MSR_funnel_campaign_table 
    WHERE DATE >= DATE '2019-01-01' 
    AND MEDIUM NOT LIKE 'DISPLAY_CORP' 
    GROUP BY 1,2,3,4,5,6,7,8,9
    ),
    
    REVENUE AS(
    SELECT
    POS AS POS,
    date AS DATE,
    EXTRACT(WEEK FROM DATE) AS WEEK,
    SOURCE_CAT AS SOURCE,
    medium_group_2 AS MEDIUM,
    CAMPAIGN AS CAMPAIGN,
    AD_CONTENT,
    CASE
      WHEN SOURCE_CAT = 'FACEBOOK' THEN 
    (
    
    SELECT
    splits[SAFE_OFFSET(5)] AS FORMAT,
   
    FROM  (
      SELECT
        ad_content,
        SPLIT(REGEXP_REPLACE(
                ad_content,
                r'([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_(.+)',
                r'1|2|3|4|5|6|7'),
              '|') AS splits 
    FROM ga_digital_marketing)) END AS FORMAT,
   
    
    BU_OD AS BU_OD,
    SUM(0) AS CLICKS,
    SUM(0) AS IMPRESSIONS,
    SUM(0) AS COST,
    sum(sessions)/1000 as SESSIONS,
    SUM(TRANSACTIONS)/1000 AS TRANSACTIONS,
    SUM(search_flight_pv)/1000 AS search_flight_pv,
    SUM(search_flight_upv)/1000 AS search_flight_upv,
    SUM(PAX)/1000 AS PAX,
    SUM(REVENUE)/1000 AS REVENUE,  
    
    FROM ga_digital_marketing 
    WHERE PAX_TYPE = 'PAID'
    AND DATE >= DATE '2019-01-01'
    AND MEDIUM NOT LIKE 'DISPLAY_CORP'
    GROUP BY 1,2,3,4,5,6,7,8,9
    ),
    
    COST_REVENUE AS (
    SELECT *
    FROM COST
    
    UNION ALL
    
    SELECT *
    FROM REVENUE
    
    )
    
    SELECT
    DATE,
    WEEK,
    POS,
    SOURCE,
    MEDIUM,
    CAMPAIGN,
    AD_CONTENT,
    FORMAT,
    BU,
    CLICKS,
    IMPRESSIONS,
    SESSIONS,
    TRANSACTIONS,
    search_flight_pv,
    search_flight_upv,
    COST,
    PAX,
    REVENUE,
    
    FROM COST_REVENUE
    
    WHERE
     1=1 
    AND DATE >= '2019-01-01'

What am I doing wrong here?

What I would like too see is having a match between the format dimension from COST and the format dimension from REVENUE (which it doesn’t exists, but it is within the campaign column).

Advertisement

Answer

You don’t really need the interior select statements as your campaign data should be in the same row of the table.

Change this:

    CASE
      WHEN SOURCE_CAT = 'FACEBOOK' THEN 
    (
    
    SELECT
    splits[SAFE_OFFSET(5)] AS FORMAT,
   
    FROM  (
      SELECT
        ad_content,
        SPLIT(REGEXP_REPLACE(
                ad_content,
                r'([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_(.+)',
                r'1|2|3|4|5|6|7'),
              '|') AS splits 
    FROM ga_digital_marketing)) END AS FORMAT,

to something like this:

    -- also replacing case with if for only 1 case
    IF(SOURCE_CAT = 'FACEBOOK', 
            SPLIT(REGEXP_REPLACE(
                ad_content,
                r'([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_(.+)',
                r'1|2|3|4|5|6|7'),
              '|')[SAFE_OFFSET(5)], NULL) AS FORMAT,
   
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement