# Group rows by datediff and then use diff function

I have table in PostgreSQL with timestamp column `created_at` and integer column `user_id`.

```id | created_at | user_id
1 | 2019-10-14 09:26:53.813 | 1
2 | 2019-10-14 09:26:54.813 | 1
3 | 2019-10-14 09:46:53.813 | 1
4 | 2019-10-14 09:46:54.813 | 2
5 | 2019-10-14 09:46:55.813 | 1
6 | 2019-10-14 09:46:56.813 | 1
7 | 2019-10-14 09:46:57.813 | 2
```

Each row represents some action from user. I need to calculate average user session length. Session is defined as group of actions with time difference less than 10 minutes. When there is difference 10 minutes or more between two user actions, new session started.

1. Group rows by `user_id`.
2. Group rows (somehow) by difference between current and next item for single user. (to get sessions)
3. Calculate diff between first and last row in one group (session). (to get session lengths)
4. Calculate average session length. (to get average session length)

But I am not able to write it in SQL.

Could you give me some advice/example how can be this done in SQL?

step-by-step demo:db<>fiddle

```SELECT
user_id,
AVG(diff)
FROM (
SELECT DISTINCT
user_id,
group_id,
first_value(created_at) OVER (PARTITION BY user_id, group_id ORDER BY created_at DESC)
- first_value(created_at) OVER (PARTITION BY user_id, group_id ORDER BY created_at) as diff
FROM (
SELECT
id, created_at, user_id,
SUM(group_id) OVER (PARTITION BY user_id ORDER BY created_at) AS group_id
FROM (
SELECT
*,
(created_at
- lag(created_at, 1, created_at) OVER (PARTITION BY user_id ORDER BY created_at)
> interval '10 minutes')::int AS group_id
FROM
mytable
)s
)s
)s
GROUP BY user_id
```
1. `(created_at - lag(created_at, 1, created_at) OVER (PARTITION BY user_id ORDER BY created_at) > interval '10 minutes')::int AS group_id`: the `lag()` window function takes the `created_at` value from the previous record (second parameter: step length, third parameter: default value if no previous record == current value) within the ordered `user_id` partition (group). Then the difference between current `created_at` value an the previous one is calculated. If this value is > 10 minutes, the result is `true`; `false` otherwise. This boolean can be casted into an `int` value, which results in `0` or `1`
2. Cumulative `SUM()` over the `0`/`1` values, which results in `group_id`s for every new session per `user_id`
3. The first created_at timestamp per `user_id` and session `group_id` can be get by `first_value()` window function, the last one by `first_value()` and `DESC` order. The difference gives you the length of each session. The `DISTINCT` clause is used, because the difference value is put onto every record. But we need this only once.
4. Finally, you can group and `AVG()` this difference for the users.
User contributions licensed under: CC BY-SA
8 People found this is helpful