I have table as shown below. It has 3 columns user
, page
and timestamp
.
+------+------------------+-----------+ | user | page | timestamp | +------+------------------+-----------+ | 1 | homepage?c=1234 | 1234 | | 1 | homepage?c=1234 | 1245 | | 1 | homepage?c=1234 | 1260 | | 1 | homepage?c=1234 | 1280 | | 1 | Signup?=1233 | 1293 | | 1 | Signup?=121asd | 1303 | | 1 | Signup?=212 | 1317 | | 1 | Signup?123213 | 1337 | | 1 | homepage?c=1234 | 1357 | | 1 | Hotels | 1370 | | 1 | Hotels | 1384 | | 1 | Hotels | 1398 | | 1 | Signup?=121asd | 1413 | | 1 | Signup?=121as123 | 1433 | | 1 | homepage?c=1234 | 1447 | | 1 | homepage?c=1234 | 1463 | | 1 | homepage?c=1234 | 1482 | | 1 | homepage?c=1234 | 1496 | +------+------------------+-----------+
In the above table I want to calculate number of times a user visits a particular page. Each record in the above table is a hit, so just grouping by column page
wont help.
I essentially want to count continuous records for a particular page only once. I only want to increment the counter for a particular page when user visits some other page in between.
Something as shown below:
The user visits homepage 3 times (yellow color), Signup page 2 times (blue color) and Hotels page 1 time (orange in color).
The output expected is as below:
+-------+----------+--------+--------+ | | Homepage | Signup | Hotels | +-------+----------+--------+--------+ | users | 3 | 2 | 1 | +-------+----------+--------+--------+
Advertisement
Answer
So what you want to do is first get the page name out of the url, so homepage?c=1234 becomes homepage. Then for each visit you can use LAG
to see if the page changed between and use a window SUM
function to sum up all the changes up until this point. Each of the page visit groups will then get a unique grouping number, then it’s just a matter of counting the distinct grouping numbers for each of them.
I’ve split this into individual CTEs for clarity and understanding:
WITH pagesplit AS ( SELECT user , (split(page, '?')[safe_ordinal(1)]) AS page , ts FROM so.visits ), page_with_prev AS ( SELECT user , page , COALESCE(LAG(_page) OVER (PARTITION BY user ORDER BY _ts), '?') AS prev_page , ts FROM pagesplit ), count_consecutive AS ( SELECT user , page , SUM(CASE WHEN page != prev_page THEN 1 ELSE NULL END) OVER (PARTITION BY user ORDER BY ts) AS grouping_no FROM page_with_prev ) SELECT user , page , COUNT(DISTINCT grouping_no) AS visit_count FROM count_streaks GROUP BY user, page
Pivoting the result into columns may not make much sense unless you know exactly how many you will have and that no new ones will ever be added. You can do that if you want of course.