I have table in PostgreSQL with timestamp column
created_at and integer column
id | created_at | user_id 1 | 2019-10-14 09:26:53.813 | 1 2 | 2019-10-14 09:26:54.813 | 1 3 | 2019-10-14 09:46:53.813 | 1 4 | 2019-10-14 09:46:54.813 | 2 5 | 2019-10-14 09:46:55.813 | 1 6 | 2019-10-14 09:46:56.813 | 1 7 | 2019-10-14 09:46:57.813 | 2
Each row represents some action from user. I need to calculate average user session length. Session is defined as group of actions with time difference less than 10 minutes. When there is difference 10 minutes or more between two user actions, new session started.
I was thinking about:
- Group rows by
- Group rows (somehow) by difference between current and next item for single user. (to get sessions)
- Calculate diff between first and last row in one group (session). (to get session lengths)
- Calculate average session length. (to get average session length)
But I am not able to write it in SQL.
Could you give me some advice/example how can be this done in SQL?
SELECT user_id, AVG(diff) FROM ( SELECT DISTINCT user_id, group_id, first_value(created_at) OVER (PARTITION BY user_id, group_id ORDER BY created_at DESC) - first_value(created_at) OVER (PARTITION BY user_id, group_id ORDER BY created_at) as diff FROM ( SELECT id, created_at, user_id, SUM(group_id) OVER (PARTITION BY user_id ORDER BY created_at) AS group_id FROM ( SELECT *, (created_at - lag(created_at, 1, created_at) OVER (PARTITION BY user_id ORDER BY created_at) > interval '10 minutes')::int AS group_id FROM mytable )s )s )s GROUP BY user_id
(created_at - lag(created_at, 1, created_at) OVER (PARTITION BY user_id ORDER BY created_at) > interval '10 minutes')::int AS group_id: the
lag()window function takes the
created_atvalue from the previous record (second parameter: step length, third parameter: default value if no previous record == current value) within the ordered
user_idpartition (group). Then the difference between current
created_atvalue an the previous one is calculated. If this value is > 10 minutes, the result is
falseotherwise. This boolean can be casted into an
intvalue, which results in
1values, which results in
group_ids for every new session per
- The first created_at timestamp per
group_idcan be get by
first_value()window function, the last one by
DESCorder. The difference gives you the length of each session. The
DISTINCTclause is used, because the difference value is put onto every record. But we need this only once.
- Finally, you can group and
AVG()this difference for the users.