Skip to content
Advertisement

How to use date_diff for two adjacent sessions using BigQuery?

I’m trying to calculate average hours between two adjacent sessions using the data from the following table:

user_id event_timestamp session_num
A 2021-04-16 10:00:00.000 UTC 1
A 2021-04-16 11:00:00.000 UTC 2
A 2021-04-16 13:00:00.000 UTC 3
A 2021-04-16 16:00:00.000 UTC 4
B 2021-04-16 12:00:00.000 UTC 1
B 2021-04-16 14:00:00.000 UTC 2
B 2021-04-16 19:00:00.000 UTC 3
C 2021-04-16 10:00:00.000 UTC 1
C 2021-04-16 17:00:00.000 UTC 2
C 2021-04-16 18:00:00.000 UTC 3

So, for user A we have

1 hour  between session_num = 2 and session_num = 1,
2 hours between session_num = 3 and session_num = 2,
3 hours between session_num = 4 and session_num = 3.

Same for the other users:

2, 5 hours for user B;

7, 1 hours for user C.

The result I expect to get should be the arithmetic average of this date_diff(HOUR).

So, avg(1,2,3,2,5,7,1) = 3 hours is the average time between two adjacent sessions.

Any one have an idea what query can be used so the date_diff function would be applien only for anjacent sessions?

Advertisement

Answer

The average hours between sessions for a given user is most simply calculated as:

select user_id,
       timestamp_diff(max(event_timestamp), min(event_timestamp), hour) * 1.0 / nullif(count(*) - 1, 0)
from t
group by user_id;

That is, the average time between sessions for a user is the maximum timestamp minus the minimum timestamp divided by one less than the number of sessions.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement