Skip to content
Advertisement

Summing totals from nested and unnested data in a single query

I am working on nested Google Analytics data to build a query, I need to unnest 3 levels in order to get all the fields I need, but once I have unnested the SUM() of my .totals fields are far too high, I assume because their values are being repeated.

I have no way if identifying, for example, bounces, at hit level, so I need to use totals.bounces to get this value.

How could I adjust my query below to get the correct sum for bounces and revenue, alongside my totals for the unnested values?

SELECT
  customDimension.value AS UserID,
  # Visits from this individual
  SUM(totals.visits) AS visits,
  # Orders from this individual
  COUNT(DISTINCT hits.transaction.transactionId) AS orders,
  # AOV
  SAFE_DIVIDE(SUM(hits.transaction.transactionRevenue)/1000000 , COUNT(DISTINCT hits.transaction.transactionId)) AS AOV,
  #Bounces from this individual
   IFNULL(SUM(totals.bounces),
    0) AS bounces,
  IFNULL(SUM(hits.transaction.transactionRevenue)/1000000,
    0) AS revenue,
  # Conversion rate of the individual
  SAFE_DIVIDE(COUNT(DISTINCT hits.transaction.transactionId),COUNT(DISTINCT CONCAT(CAST(fullVisitorId AS STRING),CAST(visitId AS STRING)))) AS conversion_rate,

  ROUND(IFNULL(SUM(totals.bounces)/COUNT(DISTINCT CONCAT(CAST(fullVisitorId AS STRING),CAST(visitId AS STRING))),
      0),5) AS bounce_rate,

FROM
  `MY.DATA.ga_sessions_20*` AS t
CROSS JOIN
  UNNEST (hits) AS hits
CROSS JOIN
  UNNEST(t.customdimensions) AS customDimension
CROSS JOIN
  UNNEST(hits.product) AS hits_product

  WHERE parse_date('%y%m%d', _table_suffix) between 
DATE_sub(current_date(), interval 7 day) and
DATE_sub(current_date(), interval 1 day)

  AND customDimension.index = 2
  AND customDimension.value NOT LIKE "true"
  AND customDimension.value NOT LIKE "false"
  AND customDimension.value NOT LIKE "undefined"
  AND customDimension.value IS NOT NULL
GROUP BY
  UserID,
  hits.eventInfo.eventCategory

Advertisement

Answer

Now that I have a little more experience with BigQuery, I can answer this question based on how I achieve this today.

Using WITH() I create multiple queries and includes onto the fields I need at those levels, for example, my first WITH() statement will have no unnesting and will correctly sum the totals. fields. A second WITH() can then UNNEST() the hits level and sum the fields I want to count within here.

These queries can then be united with a common join and will show the correct values, without duplication, at each level of the unnesting.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement