On BQ I’m trying to find sessions that have visited both pageA (URL contains keyword “main”) and pageB (URL contains keyword “side”), and the pages that session visited. Here is my logic, I first wanted to find out sessions that have visited pageAs (URL contains keyword “main”), then I wanted to do a join, so to find out those sessions who have visited pageAs, what else pages on the site have they visited. Below is my query:
x
select a. sessionID, b.pagepath
from
(SELECT
CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
hits.page.pagepath as pagepath
FROM
`xx.xxx.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE
totals.visits = 1
and hits.page.pagepath like '%main%'
AND _TABLE_SUFFIX BETWEEN '20220214'
AND '20220225') a
left join
(SELECT
CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
hits.page.pagepath as pagepath
FROM
`xx.xxx.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE
totals.visits = 1
AND _TABLE_SUFFIX BETWEEN '20220214'
AND '20220225') b
on a.sessionID=b.sessionID
order by 1 desc
I am attaching an example output here:
sessionID | Pagepath |
---|---|
123 | /main-size |
123 | /main-size |
456 | /main-size |
456 | /main-size |
456 | /side-hide |
456 | /side-build |
456 | /June-event |
In this case session 456 meets my condition because it has visited both page contains “main” and page contains “side”, however I wonder through this output how I can query to only get below output:
sessionID | Pagepath |
---|---|
456 | /main-size |
456 | /main-size |
456 | /side-hide |
456 | /side-build |
456 | /June-event |
Advertisement
Answer
Consider below query.
SELECT * EXCEPT(path)
FROM sample_table, UNNEST([REGEXP_EXTRACT(Pagepath, r'(main|side)')]) path
QUALIFY COUNT(DISTINCT path) OVER (PARTITION BY sessionID) = 2
-- Query results
+-----+-----------+-------------+
| Row | sessionID | Pagepath |
+-----+-----------+-------------+
| 1 | 456 | /main-size |
| 2 | 456 | /main-size |
| 3 | 456 | /side-hide |
| 4 | 456 | /June-event |
| 5 | 456 | /side-build |
+-----+-----------+-------------+