Edit – I am still digesting the great responses and learning how they work, awesome knowledge!
Using SQL Server 2016, I am trying to figure out if any users have done something at least once every hour, for 24 consecutive hours.
I have a table similar to this, in the example below, the only user who qualifies is ‘21482’. They actually have more than one event an hour (which is fine) but they have done at least one thing every hour for 24 straight hours. They also have events on 8-22 but they would not qualify as they are not consecutive (i.e. there’s a gap between the 8-22 events and 8-23 events).
The other user ‘45578’, does not qualify, even though they have more than 24 events in a single day, there was not activity during every hour for 24 straight hours. It doesn’t matter if the activity spans more than a single day, that is fine.
Ideas on how to accomplish something like this?
CREATE TABLE AuditData ([UserID] varchar(20), [Event_Timestamp] datetime) ; INSERT INTO AuditData ([UserID], [Event_Timestamp]) VALUES (21482, '2021-08-22 00:15:30.000'), (21482, '2021-08-22 00:25:30.000'), (21482, '2021-08-22 02:05:45.000'), (21482, '2021-08-22 04:15:15.000'), (21482, '2021-08-22 06:25:10.000'), (21482, '2021-08-22 08:01:05.000'), (21482, '2021-08-23 00:05:30.000'), (21482, '2021-08-23 01:05:45.000'), (21482, '2021-08-23 02:15:15.000'), (21482, '2021-08-23 03:25:10.000'), (21482, '2021-08-23 04:01:05.000'), (21482, '2021-08-23 05:55:55.000'), (21482, '2021-08-23 06:50:58.000'), (21482, '2021-08-23 07:30:02.000'), (21482, '2021-08-23 08:10:12.000'), (21482, '2021-08-23 08:14:12.000'), (21482, '2021-08-23 09:18:30.000'), (21482, '2021-08-23 10:17:32.000'), (21482, '2021-08-23 11:30:21.000'), (21482, '2021-08-23 11:32:21.000'), (21482, '2021-08-23 12:45:30.000'), (21482, '2021-08-23 13:01:30.000'), (21482, '2021-08-23 14:59:30.000'), (21482, '2021-08-23 15:44:30.000'), (21482, '2021-08-23 16:49:30.000'), (21482, '2021-08-23 17:50:30.000'), (21482, '2021-08-23 18:10:30.000'), (21482, '2021-08-23 19:02:45.000'), (21482, '2021-08-23 20:11:50.000'), (21482, '2021-08-23 21:55:15.000'), (21482, '2021-08-23 22:16:55.000'), (21482, '2021-08-23 23:58:10.000'), (45578, '2021-08-23 05:05:30.000'), (45578, '2021-08-23 05:05:45.000'), (45578, '2021-08-23 05:15:15.000'), (45578, '2021-08-23 05:25:10.000'), (45578, '2021-08-23 05:41:05.000'), (45578, '2021-08-23 05:55:55.000'), (45578, '2021-08-23 06:50:58.000'), (45578, '2021-08-23 07:30:02.000'), (45578, '2021-08-23 08:10:12.000'), (45578, '2021-08-23 09:18:30.000'), (45578, '2021-08-23 10:17:32.000'), (45578, '2021-08-23 11:30:21.000'), (45578, '2021-08-23 11:31:21.000'), (45578, '2021-08-23 12:32:21.000'), (45578, '2021-08-23 13:33:21.000'), (45578, '2021-08-23 13:33:21.000'), (45578, '2021-08-23 13:33:21.000'), (45578, '2021-08-23 14:33:21.000'), (45578, '2021-08-23 14:33:21.000'), (45578, '2021-08-23 14:33:21.000'), (45578, '2021-08-23 14:33:21.000'), (45578, '2021-08-23 15:33:21.000'), (45578, '2021-08-23 15:33:21.000'), (45578, '2021-08-23 15:33:21.000'), (45578, '2021-08-23 15:33:21.000'), (45578, '2021-08-23 15:33:21.000'), (45578, '2021-08-23 16:33:21.000'), (45578, '2021-08-23 16:33:21.000') ;
Thanks for any suggestions.
Advertisement
Answer
This is a type of gaps-and-islands problem.
There are many solutions.
- One of the classic ones is to use
LAG
to check if the current row is the start of a new group, assign group IDs using a runningCOUNT
, then group by this new ID
To truncate to the beginning of the hour, we use
DATEADD(hour, DATEDIFF(hour, 0, ad.Event_Timestamp), 0))
WITH StartPoints AS ( SELECT ad.UserID, v.Hour, IsStart = CASE WHEN LAG(v.Hour) OVER (PARTITION BY ad.UserID ORDER BY v.Hour) >= DATEADD(hour, -1, v.Hour) THEN NULL ELSE 1 END FROM AuditData ad CROSS APPLY(VALUES(DATEADD(hour, DATEDIFF(hour, 0, ad.Event_Timestamp), 0))) v(Hour) GROUP BY ad.UserID, v.Hour ), Groupings AS ( SELECT *, GroupId = COUNT(IsStart) OVER (PARTITION BY ad.UserID ORDER BY ad.Hour ROWS UNBOUNDED PRECEDING) FROM StartPoints ad ) SELECT DISTINCT ad.UserID FROM Groupings ad GROUP BY ad.UserID, ad.GroupId HAVING COUNT(*) >= 24;
- Another solution is, which works here because all we are looking for is the existence of an island, is to use
LAG
with an offset of 23 to check whether that row is exactly 23 hours before current. Then we simply group byUserID
WITH Hours AS ( SELECT ad.UserID, v.Hour, Hour24 = LAG(v.Hour, 23) OVER (PARTITION BY ad.UserID ORDER BY v.Hour) FROM AuditData ad CROSS APPLY(VALUES(DATEADD(hour, DATEDIFF(hour, 0, ad.Event_Timestamp), 0))) v(Hour) GROUP BY ad.UserID, v.Hour ) SELECT DISTINCT ad.UserID FROM Hours ad WHERE ad.Hour24 = DATEADD(hour, -23, ad.Hour);