Skip to content
Advertisement

Extract user journey data between two pages in BigQuery Google Analytics data

How to extract users’ journey data between two specific pages in Google Analytics BigQuery Export data?

Example:

Website has 100 pages: hits.page.pagePath=/page_1 to hits.page.pagePath=/page_100.

Goal is to extract user journey data from /page_13 to /page_22, including all in-between pages.

The challenge is that the journey is not sequential, like /page_13 -> /page14 -> ... -> /page_22.

But could be /page13 -> /page_5 -> /page_41 -> /page_99 -> /page_22.

Advertisement

Answer

You can use array_agg(). If I understand correctly, you want a group to being when it first hits page_13 and end when it hits page_22.

Let me assume that for each user you want the first hit of 13 to the first hit of 22. You can identify the group by these two characteristics:

select h.*
from (select h.*,
             countif( page like '%/page_13' ) over (partition by user order by hit_time) as hit_13,
             countif( page like '%/page_22' ) over (partition by user order by hit_time) as hit_22,
             countif( page like '%/page_22' ) over (partition by user) as has_22
      from hits h
     ) h
where has_22 and
      hit_13 > 0 and
      (hit_22 = 0 or page like '%/page_22);

This returns the pages that start with 13, end with 22 and ensure that users have both.

Now for the journey, just use aggregate. But, alas, BQ does not allow aggregation on arrays — if you subsequently want to summarize by the journeys. So, I’ll use string_agg() instead:

select h.user,
       string_agg(page order by hit_time, ' -> ')
from (select h.*
      from (select h.*,
                   countif( page like '%/page_13' ) over (partition by user order by hit_time) as hit_13,
                   countif( page like '%/page_22' ) over (partition by user order by hit_time) as hit_22,
                   countif( page like '%/page_22' ) over (partition by user) as has_22
            from hits h
           ) h
      where has_22 and
            hit_13 > 0 and
            (hit_22 = 0 or page like '%/page_22)
     ) h
group by user;
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement