I need to retrieve aggregate information (by means of SUM) on the basis of two separate dates – created_at
and erased_at
.
The requirement is as follows:
Column | Description |
---|---|
Month | Month as YYYY-MM |
Active_users | Total number of unique users in the Month. The users have not been erased during or before the Month. |
An example set of data in a “users” table is as below:
id created_at erased_at 20 2017-08-17 08:04 2020-08-04 10:17 27 2017-09-08 13:21 2020-08-04 10:15 31 2017-09-10 11:03 2020-10-01 15:42 61 2017-09-19 10:51 2020-08-17 15:13 71 2017-09-20 06:44 2020-08-04 10:15 80 2017-09-20 10:52 2020-08-04 10:17 217 2017-10-10 06:24 2020-08-04 10:16 247 2017-10-11 14:22 2020-08-04 10:15 249 2017-10-11 22:14 2020-08-04 10:15 256 2017-10-12 11:31 2020-08-04 10:17 428 2017-11-02 13:13 2020-10-01 15:15 649 2017-12-11 11:21 2020-10-01 15:16 651 2017-12-11 11:56 2020-08-04 10:15 810 2018-02-06 09:09 2021-07-29 09:03 811 2018-02-06 09:10 2021-07-29 09:03 833 2018-02-09 14:25 2020-08-04 10:16 968 2018-03-17 04:55 2020-10-15 06:08
The particular monthly sums of users created_at
in a given month can be made by means of:
SELECT to_char(users.created_at, 'YYYY MM') AS Month, count(users.id) AS Created_users, FROM users GROUP BY Month ORDER BY Month DESC
(The same may be done for erased_at
, of course.)
However, how do I formulate a query to present the number of active users within any month, who have already been created (not necessarily within the particular month) and who have not been erased at the time?
I’ve tried various subqueries and joins and I’m obviously beyond my paygrade on it and I do understand this probably is a student-level kind of question.
Please, help.
(I’m on Postgresql 9.6 in case it needs to be that advanced.)
Advertisement
Answer
One method generates the months and then uses JOIN
and GROUP BY
:
select gs.yyyymm, count(u.id) from generate_series('2018-01-01'::date, '2018-12-01'::date, interval '1 month') gs(yyyymm) left join users u on u.created_at < gs.yyyymm + interval '1 month' and (u.erased_at >= gs.yyymm or u.erased_at is null) group by gs.yyyymm;
Note: This is users active at any point during the month, which is how I interpret your question. You can tweak the query if you really mean:
- Users active for the entire month.
- Users active on the first day of the month.
- Users active on the last day of the month.
EDIT:
You can use the dates in the data using:
select gs.yyyymm, count(u.id) from (select yyyymm from (select min(erased_at) as min_ea, max(erased_at) as max_ea from users ) u cross join lateral generate_series(date_trunc('month', min_ea), date_trunc('month', max_ea), interval '1 month' ) gs left join users u on u.created_at < gs.yyyymm + interval '1 month' and (u.erased_at >= gs.yyymm or u.erased_at is null) group by gs.yyyymm;