I have these tables:
CREATE TABLE labels(user_id INT, session_id INT, start_time TIMESTAMP,mode TEXT); INSERT INTO labels (user_id,session_id,start_time,mode) VALUES (48,652,'2016-04-01 00:47:00+01','foot'), (9,656,'2016-04-01 00:03:39+01','car'),(9,657,'2016-04-01 00:26:51+01','car'), (9,658,'2016-04-01 00:45:19+01','car'),(46,663,'2016-04-01 00:13:12+01','car'); CREATE TABLE raw_data(user_id INT, session_id INT,timestamp TIMESTAMP) ; INSERT INTO raw_data(user_id, session_id,timestamp) VALUES (8,652,'2016-04-01 00:46:11.638+01'),(8,652,'2016-04-01 00:47:00.566+01'), (8,652,'2016-04-01 00:48:06.383+01'),(9,656,'2016-04-01 00:14:17.707+01'), (9,656,'2016-04-01 00:15:18.664+01'),(9,656,'2016-04-01 00:16:19.687+01'), (9,656,'2016-04-01 00:24:20.691+01'),(9,656,'2016-04-01 00:25:23.681+01'), (9,657,'2016-04-01 00:24:50.842+01'),(9,657,'2016-04-01 00:26:51.096+01'), (9,657,'2016-04-01 00:37:54.092+01');
I want to investigate sessions of a user having time time difference between successive rows greater than 5 minutes, reporting:
- session and session mode
- number of sessions with the gap-and-island problem.
My attempt:
select session_id ,timestamp ,user_id ,start_time ,count(diff) over()/2 as number_of_session_with_problem from ( select * ,case when timestamp-lag(timestamp) over(partition by session_id order by timestamp) > '00:05:00.000' then 1 when lead(timestamp) over(partition by session_id order by timestamp) - timestamp > '00:05:00.000' then 1 end as diff from raw_data join labels using(session_id) ) t
Error:
ERROR: column reference "user_id" is ambiguous LINE 3: ,user_id ^
Expected results:
session_id timestamp user_id start_time number_of_session_with_problem 656 2016-04-01 00:16:19.687 9 2016-04-01 00:03:39 1 657 2016-04-01 00:26:51.096 9 2016-04-01 00:26:51 2
Note: dbfiddle.
Advertisement
Answer
The problem is we have two user_id
and they’re not defined so you need to specifically choose them and give them aliases.
select session_id ,timestamp ,user_id ,start_time ,count(diff) over() as number_of_sessions_with_problem from ( select session_id ,timestamp ,labels.user_id ,start_time ,case when lead(timestamp) over(partition by session_id order by timestamp)-timestamp > '00:05:00.000' then 1 end as diff from raw_data join labels using(session_id) ) t where diff = 1
session_id | timestamp | user_id | start_time | number_of_sessions_with_problem |
---|---|---|---|---|
656 | 2016-04-01 00:16:19.687 | 9 | 2016-04-01 00:03:39 | 2 |
657 | 2016-04-01 00:26:51.096 | 9 | 2016-04-01 00:26:51 | 2 |