Skip to content
Advertisement

how to filter aggregate query ordered by date

I need to retrieve aggregate information (by means of SUM) on the basis of two separate dates – created_at and erased_at.

The requirement is as follows:

Column Description
Month Month as YYYY-MM
Active_users Total number of unique users in the Month. The users have not been erased during or before the Month.

An example set of data in a “users” table is as below:

id  created_at          erased_at
20  2017-08-17 08:04    2020-08-04 10:17
27  2017-09-08 13:21    2020-08-04 10:15
31  2017-09-10 11:03    2020-10-01 15:42
61  2017-09-19 10:51    2020-08-17 15:13
71  2017-09-20 06:44    2020-08-04 10:15
80  2017-09-20 10:52    2020-08-04 10:17
217 2017-10-10 06:24    2020-08-04 10:16
247 2017-10-11 14:22    2020-08-04 10:15
249 2017-10-11 22:14    2020-08-04 10:15
256 2017-10-12 11:31    2020-08-04 10:17
428 2017-11-02 13:13    2020-10-01 15:15
649 2017-12-11 11:21    2020-10-01 15:16
651 2017-12-11 11:56    2020-08-04 10:15
810 2018-02-06 09:09    2021-07-29 09:03
811 2018-02-06 09:10    2021-07-29 09:03
833 2018-02-09 14:25    2020-08-04 10:16
968 2018-03-17 04:55    2020-10-15 06:08

The particular monthly sums of users created_at in a given month can be made by means of:

SELECT
to_char(users.created_at, 'YYYY MM') AS Month,
count(users.id) AS Created_users,
FROM users

GROUP BY Month
ORDER BY Month DESC

(The same may be done for erased_at, of course.)

However, how do I formulate a query to present the number of active users within any month, who have already been created (not necessarily within the particular month) and who have not been erased at the time?

I’ve tried various subqueries and joins and I’m obviously beyond my paygrade on it and I do understand this probably is a student-level kind of question.

Please, help.

(I’m on Postgresql 9.6 in case it needs to be that advanced.)

Advertisement

Answer

One method generates the months and then uses JOIN and GROUP BY:

select gs.yyyymm, count(u.id)
from generate_series('2018-01-01'::date, '2018-12-01'::date, interval '1 month') gs(yyyymm) left join
     users u
     on u.created_at < gs.yyyymm + interval '1 month' and
        (u.erased_at >= gs.yyymm or u.erased_at is null)
group by gs.yyyymm;

Note: This is users active at any point during the month, which is how I interpret your question. You can tweak the query if you really mean:

  • Users active for the entire month.
  • Users active on the first day of the month.
  • Users active on the last day of the month.

EDIT:

You can use the dates in the data using:

select gs.yyyymm, count(u.id)
from (select yyyymm
      from (select min(erased_at) as min_ea, max(erased_at) as max_ea
            from users
           ) u cross join lateral
           generate_series(date_trunc('month', min_ea), date_trunc('month', max_ea), interval '1 month'
     ) gs left join
     users u
     on u.created_at < gs.yyyymm + interval '1 month' and
        (u.erased_at >= gs.yyymm or u.erased_at is null)
group by gs.yyyymm;

 
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement