I am trying to get grouped task
counts by state
over a series of dates using the following tables:
tasks
-----
| id | title | state_id | inserted_at |
| -- | ----------- | -------- | ------------------- |
| 1 | First Task | 1 | 2022-05-05 19:16:44 |
| 2 | Second Task | 1 | 2022-05-07 18:54:40 |
| 3 | Third Task | 1 | 2022-05-07 19:18:28 |
| 4 | Fourth Task | 1 | 2022-05-10 15:28:57 |
task_states
-----
| id | label |
| -- | ---------- |
| 1 | Assigns |
| 2 | In Process |
| 3 | Completed |
task_logs
-----
| id | event | target | value | task_id | inserted_at |
| -- | ------- | ------ | ---------- | ------- | -------------------|
| 1 | changed | state | Assigns | 1 | 2022-05-05 19:16:44|
| 2 | changed | state | In Progress| 1 | 2022-05-06 11:43:14|
| 3 | changed | state | Assigns | 2 | 2022-05-07 18:54:40|
| 4 | changed | state | Assigns | 3 | 2022-05-07 19:18:28|
| 5 | changed | state | Completed | 1 | 2022-05-08 12:11:38|
| 6 | changed | state | In Progress| 2 | 2022-05-09 09:22:53|
| 7 | changed | state | Assigns | 4 | 2022-05-10 15:28:57|
| 8 | changed | state | Completed | 2 | 2022-05-11 11:21:53|
| 9 | changed | state | In Progress| 3 | 2022-05-11 17:42:02|
There isn’t a consistent daily “state” record for each task because task_logs
only has entries for when a task changes state. This means I have to get the last “state change” log for each task prior to a specified date. I’ve got the following query working to get the task count in each state for one day ago:
SELECT date('2022-05-10'), state.id as state_id, state.label, count(sub.id)
FROM (
SELECT DISTINCT ON (t.id) t.id, logs.value
FROM tasks t
INNER JOIN task_logs logs ON logs.task_id = t.id
WHERE date(logs.inserted_at) <= date('2022-05-10') AND logs.target = 'state'
ORDER BY t.id, logs.inserted_at DESC
) sub
RIGHT JOIN task_states state ON state.label = sub.value
GROUP BY state.id
ORDER BY state.id;
------------------
| date | state_id | label | count |
| ---------- | -------- | ---------- | ----- |
| 2022-05-10 | 1 | Assigns | 2 |
| 2022-05-10 | 2 | In Process | 1 |
| 2022-05-10 | 3 | Completed | 1 |
My trouble comes from trying to combine the query above with generate_series
to get the daily count over a series of dates, something like:
| date | state_id | label | count |
| ---------- | -------- | ----------- | ----- |
| 2022-05-05 | 1 | Assigns | 1 |
| 2022-05-05 | 2 | In Progress | 0 |
| 2022-05-05 | 3 | Complete | 0 |
| 2022-05-06 | 1 | Assigns | 0 |
| 2022-05-06 | 2 | In Progress | 1 |
| 2022-05-06 | 3 | Complete | 0 |
| 2022-05-07 | 1 | Assigns | 2 |
| 2022-05-07 | 2 | In Progress | 1 |
| 2022-05-07 | 3 | Complete | 0 |
| 2022-05-08 | 1 | Assigns | 2 |
| 2022-05-08 | 2 | In Progress | 0 |
| 2022-05-08 | 3 | Complete | 1 |
| 2022-05-09 | 1 | Assigns | 1 |
| 2022-05-09 | 2 | In Progress | 1 |
| 2022-05-09 | 3 | Complete | 1 |
| 2022-05-10 | 1 | Assigns | 2 |
| 2022-05-10 | 2 | In Progress | 1 |
| 2022-05-10 | 3 | Complete | 1 |
| 2022-05-11 | 1 | Assigns | 1 |
| 2022-05-11 | 2 | In Progress | 1 |
| 2022-05-11 | 3 | Complete | 2 |
Here’s a dbfiddle setup with the tables above. Any thoughts/ideas on how to perform the query above (or rewrite it) for each date in a series of dates (generate_series(current_date - interval '5 day', current_date, '1 day')
) would be greatly appreciated!
Advertisement
Answer
Consider a stored function to loop through the generated series of dates and capture each daily aggregated snapshot:
CREATE OR REPLACE FUNCTION build_daily_log_agg(_interval_days TEXT)
RETURNS TABLE ("date" TEXT,
state_id INTEGER,
state_label TEXT,
"count" INTEGER)
LANGUAGE plpgsql AS
$func$
DECLARE dt RECORD;
BEGIN
CREATE TEMPORARY TABLE daily_log_agg (
"date" TEXT,
state_id INTEGER,
state_label TEXT,
"count" INTEGER
);
FOR dt IN SELECT dates FROM generate_series(
current_date - _interval_days::interval,
current_date, '1 day'
) AS dates LOOP
INSERT INTO daily_log_agg ("date", state_id, state_label, "count")
SELECT dt.dates AS "date",
state.id AS state_id,
state.label,
COUNT(sub.id) AS "count"
FROM (
SELECT DISTINCT ON (t.id) t.id, logs.value
FROM tasks t
INNER JOIN task_logs logs ON logs.task_id = t.id
WHERE date(logs.inserted_at) <= dt.dates
AND logs.target = 'state'
ORDER BY t.id, logs.inserted_at DESC
) sub
RIGHT JOIN task_states state ON state.label = sub.value
GROUP BY state.id
ORDER BY state.id;
END LOOP;
RETURN QUERY
SELECT * FROM daily_log_agg;
END
$func$;
SELECT * FROM build_daily_log_agg('12 days');