On BQ I’m trying to find sessions that have visited both pageA (URL contains keyword “main”) and pageB (URL contains keyword “side”), and the pages that session visited. Here is my logic, I first wanted to find out sessions that have visited pageAs (URL contains keyword “main”), then I wanted to do a join, so to find out those sessions who have visited pageAs, what else pages on the site have they visited. Below is my query:
select a. sessionID, b.pagepath from (SELECT CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID, hits.page.pagepath as pagepath FROM `xx.xxx.ga_sessions_*`, UNNEST(hits) AS hits WHERE totals.visits = 1 and hits.page.pagepath like '%main%' AND _TABLE_SUFFIX BETWEEN '20220214' AND '20220225') a left join (SELECT CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID, hits.page.pagepath as pagepath FROM `xx.xxx.ga_sessions_*`, UNNEST(hits) AS hits WHERE totals.visits = 1 AND _TABLE_SUFFIX BETWEEN '20220214' AND '20220225') b on a.sessionID=b.sessionID order by 1 desc
I am attaching an example output here:
sessionID | Pagepath |
---|---|
123 | /main-size |
123 | /main-size |
456 | /main-size |
456 | /main-size |
456 | /side-hide |
456 | /side-build |
456 | /June-event |
In this case session 456 meets my condition because it has visited both page contains “main” and page contains “side”, however I wonder through this output how I can query to only get below output:
sessionID | Pagepath |
---|---|
456 | /main-size |
456 | /main-size |
456 | /side-hide |
456 | /side-build |
456 | /June-event |
Advertisement
Answer
Consider below query.
SELECT * EXCEPT(path) FROM sample_table, UNNEST([REGEXP_EXTRACT(Pagepath, r'(main|side)')]) path QUALIFY COUNT(DISTINCT path) OVER (PARTITION BY sessionID) = 2 -- Query results +-----+-----------+-------------+ | Row | sessionID | Pagepath | +-----+-----------+-------------+ | 1 | 456 | /main-size | | 2 | 456 | /main-size | | 3 | 456 | /side-hide | | 4 | 456 | /June-event | | 5 | 456 | /side-build | +-----+-----------+-------------+