I have a login table with columns UserID and login_date like below (but with thousands of entries). How would I find the longest streak of consecutive log-ins for each user and what the start and end days were for those streaks?
Edit: Using Postgres. Removed question about streaks only counting M-F and not weekends (Do I make a separate post about that?)
UserID login_date 112 2020-03-14 131 2020-07-26 112 2019-03-15 323 2019-02-28 639 2019-02-09
Advertisement
Answer
You can do this as a gap-and-islands problem. Assuming one login per day (as in the example data), you can subtract an enumerated sequence, and the value is constant on consecutive days.
In Standard SQL, the logic for all sequences look like:
select userid, min(login_date), max(login_date) from (select t.*, row_number() over (partition by userid order by login_date) as seqn from t ) t group by userid, login_date - seqnum * interval '1 day';
In Postgres, you can actually get the longest using distinct on
:
select distinct on (userid) userid, min(login_date), max(login_date) from (select t.*, row_number() over (partition by userid order by login_date) as seqn from t ) t group by userid, login_date - seqnum * interval '1 day' order by userid, count(*) desc;
Note: If thee are duplicates for user/day combinations, use dense_rank()
instead of row_number()
.
If you want to handle weekends and holidays, then a calendar table is recommended.