Skip to content
Advertisement

Gap-and-island for more than time threshold

I have these tables:

CREATE TABLE labels(user_id INT, session_id INT,
  start_time TIMESTAMP,mode TEXT);

INSERT INTO labels (user_id,session_id,start_time,mode)
VALUES  (48,652,'2016-04-01 00:47:00+01','foot'),
(9,656,'2016-04-01 00:03:39+01','car'),(9,657,'2016-04-01 00:26:51+01','car'),
(9,658,'2016-04-01 00:45:19+01','car'),(46,663,'2016-04-01 00:13:12+01','car');

CREATE TABLE raw_data(user_id INT, session_id INT,timestamp TIMESTAMP) ;

INSERT INTO raw_data(user_id, session_id,timestamp)          
VALUES (8,652,'2016-04-01 00:46:11.638+01'),(8,652,'2016-04-01 00:47:00.566+01'),
       (8,652,'2016-04-01 00:48:06.383+01'),(9,656,'2016-04-01 00:14:17.707+01'),
       (9,656,'2016-04-01 00:15:18.664+01'),(9,656,'2016-04-01 00:16:19.687+01'),
       (9,656,'2016-04-01 00:24:20.691+01'),(9,656,'2016-04-01 00:25:23.681+01'),
       (9,657,'2016-04-01 00:24:50.842+01'),(9,657,'2016-04-01 00:26:51.096+01'),
       (9,657,'2016-04-01 00:37:54.092+01');

I want to investigate sessions of a user having time time difference between successive rows greater than 5 minutes, reporting:

  • session and session mode
  • number of sessions with the gap-and-island problem.

My attempt:

select session_id     
      ,timestamp    
      ,user_id  
      ,start_time   
      ,count(diff) over()/2 as number_of_session_with_problem
from  (
       select *
              ,case when timestamp-lag(timestamp) over(partition by session_id order by timestamp)    > '00:05:00.000' then 1 when lead(timestamp) over(partition by session_id order by timestamp) - timestamp > '00:05:00.000' then 1 end as diff
       from   raw_data join labels using(session_id)
      ) t

Error:

ERROR:  column reference "user_id" is ambiguous
LINE 3:       ,user_id 
               ^

Expected results:

session_id  timestamp            user_id    start_time        number_of_session_with_problem
   656     2016-04-01 00:16:19.687  9      2016-04-01 00:03:39     1
   657     2016-04-01 00:26:51.096  9      2016-04-01 00:26:51     2 

Note: dbfiddle.

Advertisement

Answer

The problem is we have two user_id and they’re not defined so you need to specifically choose them and give them aliases.

select  session_id    
       ,timestamp   
       ,user_id 
       ,start_time  
       ,count(diff) over() as number_of_sessions_with_problem
from   (
       select session_id     
             ,timestamp    
             ,labels.user_id  
             ,start_time   
             ,case when lead(timestamp) over(partition by session_id order by timestamp)-timestamp > '00:05:00.000' then 1 end as diff
       from   raw_data join labels using(session_id)
       ) t
where  diff = 1
session_id timestamp user_id start_time number_of_sessions_with_problem
656 2016-04-01 00:16:19.687 9 2016-04-01 00:03:39 2
657 2016-04-01 00:26:51.096 9 2016-04-01 00:26:51 2

Fiddle

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement