I am trying to get grouped task
counts by state
over a series of dates using the following tables:
tasks ----- | id | title | state_id | inserted_at | | -- | ----------- | -------- | ------------------- | | 1 | First Task | 1 | 2022-05-05 19:16:44 | | 2 | Second Task | 1 | 2022-05-07 18:54:40 | | 3 | Third Task | 1 | 2022-05-07 19:18:28 | | 4 | Fourth Task | 1 | 2022-05-10 15:28:57 |
task_states ----- | id | label | | -- | ---------- | | 1 | Assigns | | 2 | In Process | | 3 | Completed |
task_logs ----- | id | event | target | value | task_id | inserted_at | | -- | ------- | ------ | ---------- | ------- | -------------------| | 1 | changed | state | Assigns | 1 | 2022-05-05 19:16:44| | 2 | changed | state | In Progress| 1 | 2022-05-06 11:43:14| | 3 | changed | state | Assigns | 2 | 2022-05-07 18:54:40| | 4 | changed | state | Assigns | 3 | 2022-05-07 19:18:28| | 5 | changed | state | Completed | 1 | 2022-05-08 12:11:38| | 6 | changed | state | In Progress| 2 | 2022-05-09 09:22:53| | 7 | changed | state | Assigns | 4 | 2022-05-10 15:28:57| | 8 | changed | state | Completed | 2 | 2022-05-11 11:21:53| | 9 | changed | state | In Progress| 3 | 2022-05-11 17:42:02|
There isn’t a consistent daily “state” record for each task because task_logs
only has entries for when a task changes state. This means I have to get the last “state change” log for each task prior to a specified date. I’ve got the following query working to get the task count in each state for one day ago:
SELECT date('2022-05-10'), state.id as state_id, state.label, count(sub.id) FROM ( SELECT DISTINCT ON (t.id) t.id, logs.value FROM tasks t INNER JOIN task_logs logs ON logs.task_id = t.id WHERE date(logs.inserted_at) <= date('2022-05-10') AND logs.target = 'state' ORDER BY t.id, logs.inserted_at DESC ) sub RIGHT JOIN task_states state ON state.label = sub.value GROUP BY state.id ORDER BY state.id; ------------------ | date | state_id | label | count | | ---------- | -------- | ---------- | ----- | | 2022-05-10 | 1 | Assigns | 2 | | 2022-05-10 | 2 | In Process | 1 | | 2022-05-10 | 3 | Completed | 1 |
My trouble comes from trying to combine the query above with generate_series
to get the daily count over a series of dates, something like:
| date | state_id | label | count | | ---------- | -------- | ----------- | ----- | | 2022-05-05 | 1 | Assigns | 1 | | 2022-05-05 | 2 | In Progress | 0 | | 2022-05-05 | 3 | Complete | 0 | | 2022-05-06 | 1 | Assigns | 0 | | 2022-05-06 | 2 | In Progress | 1 | | 2022-05-06 | 3 | Complete | 0 | | 2022-05-07 | 1 | Assigns | 2 | | 2022-05-07 | 2 | In Progress | 1 | | 2022-05-07 | 3 | Complete | 0 | | 2022-05-08 | 1 | Assigns | 2 | | 2022-05-08 | 2 | In Progress | 0 | | 2022-05-08 | 3 | Complete | 1 | | 2022-05-09 | 1 | Assigns | 1 | | 2022-05-09 | 2 | In Progress | 1 | | 2022-05-09 | 3 | Complete | 1 | | 2022-05-10 | 1 | Assigns | 2 | | 2022-05-10 | 2 | In Progress | 1 | | 2022-05-10 | 3 | Complete | 1 | | 2022-05-11 | 1 | Assigns | 1 | | 2022-05-11 | 2 | In Progress | 1 | | 2022-05-11 | 3 | Complete | 2 |
Here’s a dbfiddle setup with the tables above. Any thoughts/ideas on how to perform the query above (or rewrite it) for each date in a series of dates (generate_series(current_date - interval '5 day', current_date, '1 day')
) would be greatly appreciated!
Advertisement
Answer
Consider a stored function to loop through the generated series of dates and capture each daily aggregated snapshot:
CREATE OR REPLACE FUNCTION build_daily_log_agg(_interval_days TEXT) RETURNS TABLE ("date" TEXT, state_id INTEGER, state_label TEXT, "count" INTEGER) LANGUAGE plpgsql AS $func$ DECLARE dt RECORD; BEGIN CREATE TEMPORARY TABLE daily_log_agg ( "date" TEXT, state_id INTEGER, state_label TEXT, "count" INTEGER ); FOR dt IN SELECT dates FROM generate_series( current_date - _interval_days::interval, current_date, '1 day' ) AS dates LOOP INSERT INTO daily_log_agg ("date", state_id, state_label, "count") SELECT dt.dates AS "date", state.id AS state_id, state.label, COUNT(sub.id) AS "count" FROM ( SELECT DISTINCT ON (t.id) t.id, logs.value FROM tasks t INNER JOIN task_logs logs ON logs.task_id = t.id WHERE date(logs.inserted_at) <= dt.dates AND logs.target = 'state' ORDER BY t.id, logs.inserted_at DESC ) sub RIGHT JOIN task_states state ON state.label = sub.value GROUP BY state.id ORDER BY state.id; END LOOP; RETURN QUERY SELECT * FROM daily_log_agg; END $func$; SELECT * FROM build_daily_log_agg('12 days');