Skip to content
Advertisement

Return number of rows (streak count) from entries where each entry is the previous days date

I have a table with two columns, userid and date. I want to query this table using a specific userid and date, and from that I want to return the row count of entries going backwards from the entered date. i.e. where userid = 1 AND date = '2020-07-09'. It should stop counting if there is a gap between the next date.

So here’s my table:

userid | date
-------------------
1     | 2020-07-27
1     | 2020-07-28
2     | 2020-07-28
1     | 2020-07-29

The streak for userid 1 and date 2020-07-29 would be 3.

Now if I remove an entry:

userid | date
-------------------
1     | 2020-07-27
2     | 2020-07-28
1     | 2020-07-29

The streak for userid 1 and date 2020-07-29 would be 1. This is because the 2020-07-28 date is missing for the userid.

How could I do this with postgres sql? I have looked into the generate_series function but this requires me to set a start and end date.

Advertisement

Answer

You could achieve this using the following:

Approach 1

Using window functions, you could achieve this eg

SELECT
    MAX("date") - MIN("date")  + 1  as streak
FROM (
    SELECT
        *,
        SUM(date_cont) OVER (PARTITION BY "userid" ORDER BY "date" ASC) as gn
    FROM (
        SELECT 
            * ,
            CASE
                WHEN "date"-LAG("date",1,("date"- interval '1 day')::date) OVER (
                       PARTITION BY  "userid"
                       ORDER BY "date"
                ) =1 THEN 0
                ELSE 1
             END as date_cont
        FROM t 
        WHERE "userid"=1 AND "date" <= '2020-07-29' 
        
    ) t1
) t2 
GROUP BY gn, "userid"
ORDER BY gn DESC
LIMIT 1

or replace MAX("date") - MIN("date") + 1 with COUNT(1), or if you would like the entire row data

SELECT
    *,
    MAX("date") OVER (PARTITION BY gn,"userid") - 
    MIN("date") OVER (PARTITION BY gn,"userid") + 1  as streak
FROM (
    SELECT
        *,
        SUM(date_cont) OVER (PARTITION BY "userid" ORDER BY "date" ASC) as gn
    FROM (
        SELECT 
            * ,
            CASE
                WHEN "date"-LAG("date",1,("date"- interval '1 day')::date) OVER (
                       PARTITION BY  "userid"
                       ORDER BY "date"
                ) =1 THEN 0
                ELSE 1
             END as date_cont
        FROM t 
        WHERE "userid"=1 AND "date" <= '2020-07-29' 
        
    ) t1
) t2 
ORDER BY gn DESC
LIMIT 1

or replace MAX("date") OVER (PARTITION BY gn,"userid") - MIN("date") OVER (PARTITION BY gn,"userid") + 1 with COUNT(1) OVER (PARTITION BY gn,"userid").

NB. Since we have filtered based on userid we could simple partition by gn only

Approach 2

Create a function to extract the streak. This function loops through the data and breaks when it determines that the streak has been broken.

CREATE OR REPLACE FUNCTION getStreak(
    user_id int,
    start_date DATE
) RETURNS int AS
$BODY$
DECLARE
    r  RECORD; 
BEGIN
    FOR r IN (
      SELECT 
          * ,
          ROW_NUMBER() OVER (
              PARTITION BY "userid"
              ORDER BY "date" DESC
          ) as streak,
          "date"-LAG("date",1,"date") OVER (
              PARTITION BY  "userid"
              ORDER BY "date"
          ) as date_diff
      FROM t 
      WHERE "userid"=user_id AND "date" <= start_date 
      ORDER BY "date" DESC
    )
    LOOP
        IF r.date_diff > 1 THEN
            RETURN r.streak;        
        END IF;
    END LOOP;
    RETURN COALESCE(r.streak,0);
END;
$BODY$
LANGUAGE plpgsql;

with example usage

SELECT getStreak(1,'2020-07-29');

Approach 3

This approach identifies the streak using the difference from the chosen date and row number

SELECT 
    MAX(rn) as streak
FROM (
        SELECT 
            * ,
            MIN('2020-07-29'::date- "date")  OVER (PARTITION BY "userid") as earliest_diff,
            ROW_NUMBER() OVER (PARTITION BY "userid" ORDER BY "date" DESC) as rn,
            ('2020-07-29'::date- "date") as diff
        FROM t 
        WHERE "date" <='2020-07-29' AND "userid"=1
) t1
WHERE rn = (diff-earliest_diff+1) 

Demo

You may view a working demo with test cases here

Let me know if this works for you.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement