Skip to content
Advertisement

Grouped aggregate counts for each date in a series of dates

I am trying to get grouped task counts by state over a series of dates using the following tables:

tasks
-----
| id | title       | state_id | inserted_at         |
| -- | ----------- | -------- | ------------------- |
| 1  | First Task  | 1        | 2022-05-05 19:16:44 |
| 2  | Second Task | 1        | 2022-05-07 18:54:40 |
| 3  | Third Task  | 1        | 2022-05-07 19:18:28 |
| 4  | Fourth Task | 1        | 2022-05-10 15:28:57 |
task_states
-----
| id | label      |
| -- | ---------- |
| 1  | Assigns    |
| 2  | In Process |
| 3  | Completed  |
task_logs
-----
| id | event   | target | value      | task_id | inserted_at        |
| -- | ------- | ------ | ---------- | ------- | -------------------|
| 1  | changed | state  | Assigns    | 1       | 2022-05-05 19:16:44|
| 2  | changed | state  | In Progress| 1       | 2022-05-06 11:43:14|
| 3  | changed | state  | Assigns    | 2       | 2022-05-07 18:54:40|
| 4  | changed | state  | Assigns    | 3       | 2022-05-07 19:18:28|
| 5  | changed | state  | Completed  | 1       | 2022-05-08 12:11:38|
| 6  | changed | state  | In Progress| 2       | 2022-05-09 09:22:53|
| 7  | changed | state  | Assigns    | 4       | 2022-05-10 15:28:57|
| 8  | changed | state  | Completed  | 2       | 2022-05-11 11:21:53|
| 9  | changed | state  | In Progress| 3       | 2022-05-11 17:42:02|

There isn’t a consistent daily “state” record for each task because task_logs only has entries for when a task changes state. This means I have to get the last “state change” log for each task prior to a specified date. I’ve got the following query working to get the task count in each state for one day ago:

SELECT date('2022-05-10'), state.id as state_id, state.label, count(sub.id)
FROM (
   SELECT DISTINCT ON (t.id) t.id, logs.value
   FROM tasks t
   INNER JOIN task_logs logs ON logs.task_id = t.id
   WHERE date(logs.inserted_at) <= date('2022-05-10') AND logs.target = 'state'
   ORDER BY t.id, logs.inserted_at DESC
) sub
RIGHT JOIN task_states state ON state.label = sub.value
GROUP BY state.id
ORDER BY state.id;
------------------
| date       | state_id | label      | count |
| ---------- | -------- | ---------- | ----- |
| 2022-05-10 | 1        | Assigns    | 2     |
| 2022-05-10 | 2        | In Process | 1     |
| 2022-05-10 | 3        | Completed  | 1     |

My trouble comes from trying to combine the query above with generate_series to get the daily count over a series of dates, something like:

| date       | state_id | label       | count |
| ---------- | -------- | ----------- | ----- |
| 2022-05-05 | 1        | Assigns     | 1     |
| 2022-05-05 | 2        | In Progress | 0     |
| 2022-05-05 | 3        | Complete    | 0     |
| 2022-05-06 | 1        | Assigns     | 0     |
| 2022-05-06 | 2        | In Progress | 1     |
| 2022-05-06 | 3        | Complete    | 0     |
| 2022-05-07 | 1        | Assigns     | 2     |
| 2022-05-07 | 2        | In Progress | 1     |
| 2022-05-07 | 3        | Complete    | 0     |
| 2022-05-08 | 1        | Assigns     | 2     |
| 2022-05-08 | 2        | In Progress | 0     |
| 2022-05-08 | 3        | Complete    | 1     |
| 2022-05-09 | 1        | Assigns     | 1     |
| 2022-05-09 | 2        | In Progress | 1     |
| 2022-05-09 | 3        | Complete    | 1     |
| 2022-05-10 | 1        | Assigns     | 2     |
| 2022-05-10 | 2        | In Progress | 1     |
| 2022-05-10 | 3        | Complete    | 1     |
| 2022-05-11 | 1        | Assigns     | 1     |
| 2022-05-11 | 2        | In Progress | 1     |
| 2022-05-11 | 3        | Complete    | 2     |

Here’s a dbfiddle setup with the tables above. Any thoughts/ideas on how to perform the query above (or rewrite it) for each date in a series of dates (generate_series(current_date - interval '5 day', current_date, '1 day')) would be greatly appreciated!

Advertisement

Answer

Consider a stored function to loop through the generated series of dates and capture each daily aggregated snapshot:

CREATE OR REPLACE FUNCTION build_daily_log_agg(_interval_days TEXT)
  RETURNS TABLE ("date" TEXT,
                 state_id INTEGER, 
                 state_label TEXT,
                 "count" INTEGER)
  LANGUAGE plpgsql AS
$func$

DECLARE dt RECORD;
BEGIN 

    CREATE TEMPORARY TABLE daily_log_agg (
        "date" TEXT, 
        state_id INTEGER, 
        state_label TEXT, 
        "count" INTEGER
    );
    

    FOR dt IN SELECT dates FROM generate_series( 
            current_date - _interval_days::interval,
            current_date, '1 day' 
        ) AS dates LOOP
        
        INSERT INTO daily_log_agg ("date", state_id, state_label, "count")
        SELECT dt.dates AS "date",
               state.id AS state_id, 
               state.label, 
               COUNT(sub.id) AS "count"
        FROM (
            SELECT DISTINCT ON (t.id) t.id, logs.value
            FROM tasks t
            INNER JOIN task_logs logs ON logs.task_id = t.id
            WHERE date(logs.inserted_at) <= dt.dates
              AND logs.target = 'state'
            ORDER BY t.id, logs.inserted_at DESC
        ) sub
        RIGHT JOIN task_states state ON state.label = sub.value
        GROUP BY state.id
        ORDER BY state.id;
        
   END LOOP; 
   
   RETURN QUERY
   SELECT * FROM daily_log_agg;
END
$func$;


SELECT * FROM build_daily_log_agg('12 days');

Online Demo

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement