I have table in PostgreSQL with timestamp column created_at
and integer column user_id
.
id | created_at | user_id 1 | 2019-10-14 09:26:53.813 | 1 2 | 2019-10-14 09:26:54.813 | 1 3 | 2019-10-14 09:46:53.813 | 1 4 | 2019-10-14 09:46:54.813 | 2 5 | 2019-10-14 09:46:55.813 | 1 6 | 2019-10-14 09:46:56.813 | 1 7 | 2019-10-14 09:46:57.813 | 2
Each row represents some action from user. I need to calculate average user session length. Session is defined as group of actions with time difference less than 10 minutes. When there is difference 10 minutes or more between two user actions, new session started.
I was thinking about:
- Group rows by
user_id
. - Group rows (somehow) by difference between current and next item for single user. (to get sessions)
- Calculate diff between first and last row in one group (session). (to get session lengths)
- Calculate average session length. (to get average session length)
But I am not able to write it in SQL.
Could you give me some advice/example how can be this done in SQL?
Advertisement
Answer
SELECT user_id, AVG(diff) FROM ( SELECT DISTINCT user_id, group_id, first_value(created_at) OVER (PARTITION BY user_id, group_id ORDER BY created_at DESC) - first_value(created_at) OVER (PARTITION BY user_id, group_id ORDER BY created_at) as diff FROM ( SELECT id, created_at, user_id, SUM(group_id) OVER (PARTITION BY user_id ORDER BY created_at) AS group_id FROM ( SELECT *, (created_at - lag(created_at, 1, created_at) OVER (PARTITION BY user_id ORDER BY created_at) > interval '10 minutes')::int AS group_id FROM mytable )s )s )s GROUP BY user_id
(created_at - lag(created_at, 1, created_at) OVER (PARTITION BY user_id ORDER BY created_at) > interval '10 minutes')::int AS group_id
: thelag()
window function takes thecreated_at
value from the previous record (second parameter: step length, third parameter: default value if no previous record == current value) within the ordereduser_id
partition (group). Then the difference between currentcreated_at
value an the previous one is calculated. If this value is > 10 minutes, the result istrue
;false
otherwise. This boolean can be casted into anint
value, which results in0
or1
- Cumulative
SUM()
over the0
/1
values, which results ingroup_id
s for every new session peruser_id
- The first created_at timestamp per
user_id
and sessiongroup_id
can be get byfirst_value()
window function, the last one byfirst_value()
andDESC
order. The difference gives you the length of each session. TheDISTINCT
clause is used, because the difference value is put onto every record. But we need this only once. - Finally, you can group and
AVG()
this difference for the users.