I have table in PostgreSQL with timestamp column created_at and integer column user_id.
id | created_at | user_id 1 | 2019-10-14 09:26:53.813 | 1 2 | 2019-10-14 09:26:54.813 | 1 3 | 2019-10-14 09:46:53.813 | 1 4 | 2019-10-14 09:46:54.813 | 2 5 | 2019-10-14 09:46:55.813 | 1 6 | 2019-10-14 09:46:56.813 | 1 7 | 2019-10-14 09:46:57.813 | 2
Each row represents some action from user. I need to calculate average user session length. Session is defined as group of actions with time difference less than 10 minutes. When there is difference 10 minutes or more between two user actions, new session started.
I was thinking about:
- Group rows by
user_id. - Group rows (somehow) by difference between current and next item for single user. (to get sessions)
- Calculate diff between first and last row in one group (session). (to get session lengths)
- Calculate average session length. (to get average session length)
But I am not able to write it in SQL.
Could you give me some advice/example how can be this done in SQL?
Advertisement
Answer
SELECT
user_id,
AVG(diff)
FROM (
SELECT DISTINCT
user_id,
group_id,
first_value(created_at) OVER (PARTITION BY user_id, group_id ORDER BY created_at DESC)
- first_value(created_at) OVER (PARTITION BY user_id, group_id ORDER BY created_at) as diff
FROM (
SELECT
id, created_at, user_id,
SUM(group_id) OVER (PARTITION BY user_id ORDER BY created_at) AS group_id
FROM (
SELECT
*,
(created_at
- lag(created_at, 1, created_at) OVER (PARTITION BY user_id ORDER BY created_at)
> interval '10 minutes')::int AS group_id
FROM
mytable
)s
)s
)s
GROUP BY user_id
(created_at - lag(created_at, 1, created_at) OVER (PARTITION BY user_id ORDER BY created_at) > interval '10 minutes')::int AS group_id: thelag()window function takes thecreated_atvalue from the previous record (second parameter: step length, third parameter: default value if no previous record == current value) within the ordereduser_idpartition (group). Then the difference between currentcreated_atvalue an the previous one is calculated. If this value is > 10 minutes, the result istrue;falseotherwise. This boolean can be casted into anintvalue, which results in0or1- Cumulative
SUM()over the0/1values, which results ingroup_ids for every new session peruser_id - The first created_at timestamp per
user_idand sessiongroup_idcan be get byfirst_value()window function, the last one byfirst_value()andDESCorder. The difference gives you the length of each session. TheDISTINCTclause is used, because the difference value is put onto every record. But we need this only once. - Finally, you can group and
AVG()this difference for the users.