Skip to content
Advertisement

Using where clause within a partition

I have huge pageview level table, where each session has multiple pageviews.

Session -> Pageviews – 1 : M

From the pageviews table, I need to consolidate the data into one row per session with meta data like FirstPageview, LastPageview, #ofvisitstoLogin

  1. Firstpageview is defined as the very first pageview for a session (this can be login)

  2. Lastpageview is defined as a pageview that is not /login (this cannot be login, if it is pick the pageview before this)

  3. #ofvisitstologin is defined as the #oftimesthesession landed on login

visitorid sessionid timestamp webpagepath
abc 123 2021-01-01 10:30:00 homepage
abc 123 2021-01-01 10:30:50 login
abc 123 2021-01-01 10:31:00 page2
abc 123 2021-01-01 10:32:50 page3
abc 123 2021-01-01 10:33:00 page2
abc 123 2021-01-01 10:35:50 login

The output that I want,

visitorid sessionid FristPageview LastPageview #ofvisitstoLogin
abc 123 homepage page2 2

Is it possible to do this with window functions? Without using subqueries / joins because this is a huge table and it affects performance.

Sample query that I have,

But this code doesn’t work because for the last_page_view column will be “login” if i run this code. But i need it to be “page2” instead.

Is this achievable without using subqueries / self join?

Advertisement

Answer

LAST_VALUE supports IGNORE NULLS clause and it could be combined with NULLIF(webpage_path, 'login'):

Second: SUM(IIF(...)) could be replaced with COUNT_IF().

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement