I have data from issues in a table with the following schema.
CREATE TABLE if not EXISTS issues ( id int, created_at DATE, deleted_at DATE ); INSERT INTO issues VALUES (1, '2020-01-01', '2020-01-02'), (2, '2020-01-02', '2020-01-05'), (3, '2020-01-04', '2020-01-03'), (4, '2020-01-04', '2020-01-07'), (5, '2020-01-04', '2020-01-07');
I want to know how many issues are open every day using PostgreSQL. We consider an issue being open at date x if:
x >= created_at
x <= deleted_at
There may be days where no issues were created or deleted, as in the example. How can I do the query to obtain how many issues are open every day?
Advertisement
Answer
A simple method uses generate_series()
:
select gs.dte, count(*) from issues i cross join lateral generate_series(i.created_at, i.deleted_at, interval '1 day') gs(dte) group by gs.dte order by gs.dte;
Here is a db<>fiddle.
The above skips dates with no issues. One way to fix this would be to generate all dates and then:
select gs.dte, count(i.id) from (select min(created_at) as first_date, max(deleted_at) as last_date from issues i ) m cross join lateral generate_series(m.first_date, m.last_date, interval '1 day') gs(dte) left join issues i on gs.dte >= i.created_at and gs.dte <= i.deleted_at group by gs.dte order by gs.dte;
However, the first method should be more efficient if this is not necessary.