Skip to content
Advertisement

Calculate number of times a user visited a page in SQL

I have table as shown below. It has 3 columns user, page and timestamp.

+------+------------------+-----------+
| user |       page       | timestamp |
+------+------------------+-----------+
|    1 | homepage?c=1234  |      1234 |
|    1 | homepage?c=1234  |      1245 |
|    1 | homepage?c=1234  |      1260 |
|    1 | homepage?c=1234  |      1280 |
|    1 | Signup?=1233     |      1293 |
|    1 | Signup?=121asd   |      1303 |
|    1 | Signup?=212      |      1317 |
|    1 | Signup?123213    |      1337 |
|    1 | homepage?c=1234  |      1357 |
|    1 | Hotels           |      1370 |
|    1 | Hotels           |      1384 |
|    1 | Hotels           |      1398 |
|    1 | Signup?=121asd   |      1413 |
|    1 | Signup?=121as123 |      1433 |
|    1 | homepage?c=1234  |      1447 |
|    1 | homepage?c=1234  |      1463 |
|    1 | homepage?c=1234  |      1482 |
|    1 | homepage?c=1234  |      1496 |
+------+------------------+-----------+ 

In the above table I want to calculate number of times a user visits a particular page. Each record in the above table is a hit, so just grouping by column page wont help.

I essentially want to count continuous records for a particular page only once. I only want to increment the counter for a particular page when user visits some other page in between.

Something as shown below:

enter image description here

The user visits homepage 3 times (yellow color), Signup page 2 times (blue color) and Hotels page 1 time (orange in color).

The output expected is as below:

+-------+----------+--------+--------+
|       | Homepage | Signup | Hotels |
+-------+----------+--------+--------+
| users |        3 |      2 |      1 |
+-------+----------+--------+--------+

Advertisement

Answer

So what you want to do is first get the page name out of the url, so homepage?c=1234 becomes homepage. Then for each visit you can use LAG to see if the page changed between and use a window SUM function to sum up all the changes up until this point. Each of the page visit groups will then get a unique grouping number, then it’s just a matter of counting the distinct grouping numbers for each of them.

I’ve split this into individual CTEs for clarity and understanding:

WITH pagesplit AS
(
    SELECT user
         , (split(page, '?')[safe_ordinal(1)]) AS page
         , ts 
      FROM so.visits
),
page_with_prev AS
(
    SELECT user
         , page
         , COALESCE(LAG(_page) OVER (PARTITION BY user ORDER BY _ts), '?') AS prev_page
         , ts
      FROM pagesplit
),
count_consecutive AS
(
    SELECT user
         , page
         , SUM(CASE WHEN page != prev_page THEN 1 ELSE NULL END) OVER (PARTITION BY user ORDER BY ts) AS grouping_no
      FROM page_with_prev
)
SELECT user
     , page
     , COUNT(DISTINCT grouping_no) AS visit_count
  FROM count_streaks
 GROUP BY user, page

Pivoting the result into columns may not make much sense unless you know exactly how many you will have and that no new ones will ever be added. You can do that if you want of course.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement