Skip to content
Advertisement

Using where clause within a partition

I have huge pageview level table, where each session has multiple pageviews.

Session -> Pageviews – 1 : M

From the pageviews table, I need to consolidate the data into one row per session with meta data like FirstPageview, LastPageview, #ofvisitstoLogin

  1. Firstpageview is defined as the very first pageview for a session (this can be login)

  2. Lastpageview is defined as a pageview that is not /login (this cannot be login, if it is pick the pageview before this)

  3. #ofvisitstologin is defined as the #oftimesthesession landed on login

visitorid sessionid timestamp webpagepath
abc 123 2021-01-01 10:30:00 homepage
abc 123 2021-01-01 10:30:50 login
abc 123 2021-01-01 10:31:00 page2
abc 123 2021-01-01 10:32:50 page3
abc 123 2021-01-01 10:33:00 page2
abc 123 2021-01-01 10:35:50 login

The output that I want,

visitorid sessionid FristPageview LastPageview #ofvisitstoLogin
abc 123 homepage page2 2

Is it possible to do this with window functions? Without using subqueries / joins because this is a huge table and it affects performance.

Sample query that I have,

select 
     visitor_id, 
     session_id,
     first_value(webpage_path) over (partition by visitor_id, session_id order by timestamp asc) as first_pageview,
      last_value(webpage_path) over (partition by visitor_id, session_id order by timestamp asc) as last_page_view,
      sum(iff(webpage_path = 'login', 1, 0)) over (partition by visitor_id, session_id timestamp asc) as noofvisitstologin
 from pageviews;

But this code doesn’t work because for the last_page_view column will be “login” if i run this code. But i need it to be “page2” instead.

Is this achievable without using subqueries / self join?

Advertisement

Answer

LAST_VALUE supports IGNORE NULLS clause and it could be combined with NULLIF(webpage_path, 'login'):

select 
 visitor_id, 
 session_id,
 first_value(webpage_path) 
 over (partition by visitor_id, session_id order by timestamp asc) as first_pageview,
 last_value(NULLIF(webpage_path,'login')) ignore nulls 
 over (partition by visitor_id, session_id order by timestamp asc) as last_page_view,
 count_if(webpage_path = 'login') 
 over (partition by visitor_id, session_id timestamp asc) as noofvisitstologin
from pageviews;

Second: SUM(IIF(...)) could be replaced with COUNT_IF().

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement