I’m trying to calculate average hours between two adjacent sessions using the data from the following table:
user_id | event_timestamp | session_num |
---|---|---|
A | 2021-04-16 10:00:00.000 UTC | 1 |
A | 2021-04-16 11:00:00.000 UTC | 2 |
A | 2021-04-16 13:00:00.000 UTC | 3 |
A | 2021-04-16 16:00:00.000 UTC | 4 |
B | 2021-04-16 12:00:00.000 UTC | 1 |
B | 2021-04-16 14:00:00.000 UTC | 2 |
B | 2021-04-16 19:00:00.000 UTC | 3 |
C | 2021-04-16 10:00:00.000 UTC | 1 |
C | 2021-04-16 17:00:00.000 UTC | 2 |
C | 2021-04-16 18:00:00.000 UTC | 3 |
So, for user A we have
1 hour between session_num = 2 and session_num = 1, 2 hours between session_num = 3 and session_num = 2, 3 hours between session_num = 4 and session_num = 3.
Same for the other users:
2, 5
hours for user B;
7, 1
hours for user C.
The result I expect to get should be the arithmetic average of this date_diff(HOUR).
So, avg(1,2,3,2,5,7,1)
= 3 hours is the average time between two adjacent sessions.
Any one have an idea what query can be used so the date_diff function would be applien only for anjacent sessions?
Advertisement
Answer
The average hours between sessions for a given user is most simply calculated as:
select user_id, timestamp_diff(max(event_timestamp), min(event_timestamp), hour) * 1.0 / nullif(count(*) - 1, 0) from t group by user_id;
That is, the average time between sessions for a user is the maximum timestamp minus the minimum timestamp divided by one less than the number of sessions.