Return number of rows (streak count) from entries where each entry is the previous days date

Question

I have a table with two columns, userid and date. I want to query this table using a specific userid and date, and from that I want to return the row count of entries going backwards from the entered date. i.e. where userid = 1 AND date = &#8216;2020-07-09&#8217;. It should stop counting if there is a gap bet…

Accepted Answer

You could achieve this using the following:Approach 1Using window functions, you could achieve this egSELECT    MAX("date") - MIN("date")  + 1  as streakFROM (    SELECT        *,        SUM(date_cont) OVER (PARTITION BY "userid" ORDER BY "date" ASC) as gn    FROM (        SELECT             * ,            CASE                WHEN "date"-LAG("date",1,("date"- interval '1 day')::date) OVER (                       PARTITION BY  "userid"                       ORDER BY "date"                ) =1 THEN 0                ELSE 1             END as date_cont        FROM t         WHERE "userid"=1 AND "date" <= '2020-07-29'             ) t1) t2 GROUP BY gn, "userid"ORDER BY gn DESCLIMIT 1or replace MAX("date") - MIN("date")  + 1 with COUNT(1),or if you would like the entire row dataSELECT    *,    MAX("date") OVER (PARTITION BY gn,"userid") -     MIN("date") OVER (PARTITION BY gn,"userid") + 1  as streakFROM (    SELECT        *,        SUM(date_cont) OVER (PARTITION BY "userid" ORDER BY "date" ASC) as gn    FROM (        SELECT             * ,            CASE                WHEN "date"-LAG("date",1,("date"- interval '1 day')::date) OVER (                       PARTITION BY  "userid"                       ORDER BY "date"                ) =1 THEN 0                ELSE 1             END as date_cont        FROM t         WHERE "userid"=1 AND "date" <= '2020-07-29'             ) t1) t2 ORDER BY gn DESCLIMIT 1or replace MAX("date") OVER (PARTITION BY gn,"userid") - MIN("date") OVER (PARTITION BY gn,"userid") + 1 with COUNT(1) OVER (PARTITION BY gn,"userid").NB. Since we have filtered based on userid we could simple partition by gn onlyApproach 2Create a function to extract the streak. This function loops through the data and breaks when it determines that the streak has been broken.CREATE OR REPLACE FUNCTION getStreak(    user_id int,    start_date DATE) RETURNS int AS$BODY$DECLARE    r  RECORD; BEGIN    FOR r IN (      SELECT           * ,          ROW_NUMBER() OVER (              PARTITION BY "userid"              ORDER BY "date" DESC          ) as streak,          "date"-LAG("date",1,"date") OVER (              PARTITION BY  "userid"              ORDER BY "date"          ) as date_diff      FROM t       WHERE "userid"=user_id AND "date" <= start_date       ORDER BY "date" DESC    )    LOOP        IF r.date_diff > 1 THEN            RETURN r.streak;                END IF;    END LOOP;    RETURN COALESCE(r.streak,0);END;$BODY$LANGUAGE plpgsql;with example usageSELECT getStreak(1,'2020-07-29');Approach 3This approach identifies the streak using the difference from the chosen date and row numberSELECT     MAX(rn) as streakFROM (        SELECT             * ,            MIN('2020-07-29'::date- "date")  OVER (PARTITION BY "userid") as earliest_diff,            ROW_NUMBER() OVER (PARTITION BY "userid" ORDER BY "date" DESC) as rn,            ('2020-07-29'::date- "date") as diff        FROM t         WHERE "date" <='2020-07-29' AND "userid"=1) t1WHERE rn = (diff-earliest_diff+1) DemoYou may view a working demo with test cases hereLet me know if this works for you.

Advertisement

Answer

Approach 1

Approach 2

Approach 3

Demo