I have a table with two columns, userid and date. I want to query this table using a specific userid and date, and from that I want to return the row count of entries going backwards from the entered date. i.e. where userid = 1 AND date = '2020-07-09'
. It should stop counting if there is a gap between the next date.
So here’s my table:
userid | date ------------------- 1 | 2020-07-27 1 | 2020-07-28 2 | 2020-07-28 1 | 2020-07-29
The streak for userid 1
and date 2020-07-29
would be 3.
Now if I remove an entry:
userid | date ------------------- 1 | 2020-07-27 2 | 2020-07-28 1 | 2020-07-29
The streak for userid 1
and date 2020-07-29
would be 1. This is because the 2020-07-28 date is missing for the userid.
How could I do this with postgres sql? I have looked into the generate_series
function but this requires me to set a start and end date.
Advertisement
Answer
You could achieve this using the following:
Approach 1
Using window functions, you could achieve this eg
SELECT MAX("date") - MIN("date") + 1 as streak FROM ( SELECT *, SUM(date_cont) OVER (PARTITION BY "userid" ORDER BY "date" ASC) as gn FROM ( SELECT * , CASE WHEN "date"-LAG("date",1,("date"- interval '1 day')::date) OVER ( PARTITION BY "userid" ORDER BY "date" ) =1 THEN 0 ELSE 1 END as date_cont FROM t WHERE "userid"=1 AND "date" <= '2020-07-29' ) t1 ) t2 GROUP BY gn, "userid" ORDER BY gn DESC LIMIT 1
or replace MAX("date") - MIN("date") + 1
with COUNT(1)
,
or if you would like the entire row data
SELECT *, MAX("date") OVER (PARTITION BY gn,"userid") - MIN("date") OVER (PARTITION BY gn,"userid") + 1 as streak FROM ( SELECT *, SUM(date_cont) OVER (PARTITION BY "userid" ORDER BY "date" ASC) as gn FROM ( SELECT * , CASE WHEN "date"-LAG("date",1,("date"- interval '1 day')::date) OVER ( PARTITION BY "userid" ORDER BY "date" ) =1 THEN 0 ELSE 1 END as date_cont FROM t WHERE "userid"=1 AND "date" <= '2020-07-29' ) t1 ) t2 ORDER BY gn DESC LIMIT 1
or replace MAX("date") OVER (PARTITION BY gn,"userid") - MIN("date") OVER (PARTITION BY gn,"userid") + 1
with COUNT(1) OVER (PARTITION BY gn,"userid")
.
NB. Since we have filtered based on userid
we could simple partition by gn
only
Approach 2
Create a function to extract the streak. This function loops through the data and breaks when it determines that the streak has been broken.
CREATE OR REPLACE FUNCTION getStreak( user_id int, start_date DATE ) RETURNS int AS $BODY$ DECLARE r RECORD; BEGIN FOR r IN ( SELECT * , ROW_NUMBER() OVER ( PARTITION BY "userid" ORDER BY "date" DESC ) as streak, "date"-LAG("date",1,"date") OVER ( PARTITION BY "userid" ORDER BY "date" ) as date_diff FROM t WHERE "userid"=user_id AND "date" <= start_date ORDER BY "date" DESC ) LOOP IF r.date_diff > 1 THEN RETURN r.streak; END IF; END LOOP; RETURN COALESCE(r.streak,0); END; $BODY$ LANGUAGE plpgsql;
with example usage
SELECT getStreak(1,'2020-07-29');
Approach 3
This approach identifies the streak using the difference from the chosen date and row number
SELECT MAX(rn) as streak FROM ( SELECT * , MIN('2020-07-29'::date- "date") OVER (PARTITION BY "userid") as earliest_diff, ROW_NUMBER() OVER (PARTITION BY "userid" ORDER BY "date" DESC) as rn, ('2020-07-29'::date- "date") as diff FROM t WHERE "date" <='2020-07-29' AND "userid"=1 ) t1 WHERE rn = (diff-earliest_diff+1)
Demo
You may view a working demo with test cases here
Let me know if this works for you.