How to extract users’ journey data between two specific pages in Google Analytics BigQuery Export data?
Example:
Website has 100 pages: hits.page.pagePath=/page_1
to hits.page.pagePath=/page_100
.
Goal is to extract user journey data from /page_13
to /page_22
, including all in-between pages.
The challenge is that the journey is not sequential, like /page_13
-> /page14
-> ...
-> /page_22
.
But could be /page13
-> /page_5
-> /page_41
-> /page_99
-> /page_22
.
Advertisement
Answer
You can use array_agg()
. If I understand correctly, you want a group to being when it first hits page_13 and end when it hits page_22.
Let me assume that for each user you want the first hit of 13 to the first hit of 22. You can identify the group by these two characteristics:
select h.* from (select h.*, countif( page like '%/page_13' ) over (partition by user order by hit_time) as hit_13, countif( page like '%/page_22' ) over (partition by user order by hit_time) as hit_22, countif( page like '%/page_22' ) over (partition by user) as has_22 from hits h ) h where has_22 and hit_13 > 0 and (hit_22 = 0 or page like '%/page_22);
This returns the pages that start with 13, end with 22 and ensure that users have both.
Now for the journey, just use aggregate. But, alas, BQ does not allow aggregation on arrays — if you subsequently want to summarize by the journeys. So, I’ll use string_agg()
instead:
select h.user, string_agg(page order by hit_time, ' -> ') from (select h.* from (select h.*, countif( page like '%/page_13' ) over (partition by user order by hit_time) as hit_13, countif( page like '%/page_22' ) over (partition by user order by hit_time) as hit_22, countif( page like '%/page_22' ) over (partition by user) as has_22 from hits h ) h where has_22 and hit_13 > 0 and (hit_22 = 0 or page like '%/page_22) ) h group by user;