I get the following Table with the query underneath:
SELECT
fullVisitorId,
COUNT(fullVisitorId) as id_count,
ARRAY_AGG(trafficSource.medium) AS trafic_medium
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170101`
GROUP BY
fullVisitorId
ORDER BY
id_count DESC
For each of the values in the trafic_medium
column (e.g.: cpc, referral, organic, etc.) I am trying to figure out how often each value occurred in the array, so preferably add a new column ‘count’ that shows ho often that value occurred?
+-----------+---------+------+
| array_agg | medium | count|
+-----------+---------+------+
| 123 | cpc | 2 |
+-----------+---------+------+
| | organic | 1 |
+-----------+---------+------+
| | cpc | 2 |
+-----------+---------+------+
| 456 | organic | 2 |
+-----------+---------+------+
| | organic | 2 |
+-----------+---------+------+
| | cpc | 1 |
+-----------+---------+------+
I’m new to SQL so I’m quite stuck.
I tried this so far:
WITH medium AS
(
SELECT
fullVisitorId,
COUNT(fullVisitorId) as id_count,
ARRAY_AGG(trafficSource.medium) AS trafic_medium
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170101`
GROUP BY
fullVisitorId
ORDER BY
id_count DESC
)
SELECT
fullVisitorId,
trafic_medium,
(SELECT AS STRUCT Any_Value(trafic_medium) AS name, COUNT(*) AS count
FROM
UNNEST(trafic_medium) AS trafic_medium) AS trafic_medium_2,
FROM
medium
Based on this thread: How to count frequency of elements in a bigquery array field
However this only shows the number of ‘Any_Value not for all distinct.
I would appreciate some help!
p.s. I am doing this in BigQuery on the ‘bigquery-public-dataset.google_analytics_sample’
Advertisement
Answer
Below is for BigQuery Standard SQL to help you get started
#standardSQL
SELECT id, trafic_medium,
ARRAY(
SELECT AS STRUCT medium, COUNT(1) `count`
FROM t.trafic_medium medium
GROUP BY medium
) stats
FROM `project.dataset.table` t
if to apply to sample/dummy data from you question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 123 id, ['cpc', 'organic', 'cpc'] trafic_medium UNION ALL
SELECT 456, ['organic', 'organic', 'cpc']
)
SELECT id, trafic_medium,
ARRAY(
SELECT AS STRUCT medium, COUNT(1) `count`
FROM t.trafic_medium medium
GROUP BY medium
) stats
FROM `project.dataset.table` t
-- ORDER BY id
result will be
As an option – you can use below version
#standardSQL
SELECT id,
ARRAY(
SELECT AS STRUCT medium, `count`
FROM t.trafic_medium medium
LEFT JOIN (
SELECT AS STRUCT medium, COUNT(1) `count`
FROM t.trafic_medium medium
GROUP BY medium
) stats
USING(medium)
) trafic_medium
FROM `project.dataset.table` t
-- ORDER BY id
which (if to apply to the same dummy data) will output below
This version looks more aligned with your expected result