Skip to content
Advertisement

How can I calculate an “active users” aggregation from an activity log in SQL?

In PostgreSQL, I have a table that logs activity for all users, with an account ID and a timestamp field:

SELECT account_id, created FROM activity_log;

A single account_id can appear many times in a day, or not at all.

I would like a chart showing the number of “active users” each day, where “active users” means “users who have done any activity within the previous X days”.

If X is 1, then we can just truncate timestamp to ‘day’ and aggregate:

SELECT date_trunc('day', created) AS date, count(DISTINCT account_id) 
FROM activity_log
GROUP BY date_trunc('day', created) ORDER BY date;

If X is exactly 7, then we could truncate to ‘week’ and aggregate – although this gives me only one data point for a week, when I actually want one data point per day.

But I need to solve for the general case of different X, and give a distinct data point for each day.

Advertisement

Answer

One method is to generate the dates and then count using left join and group by or similar logic. The following uses a lateral join:

select gs.dte, al.num_accounts
from generate_series('2021-01-01'::date, '2021-01-31'::date, interval '1 day'
                    ) gs(dte) left join lateral
     (select count(distinct al.account_id) as num_accounts
      from activity_log al
      where al.created >= gs.dte - (<n - 1>) * interval '1 day' and
            al.created < gs.dte + interval '1 day'
     ) al
     on 1=1
order by gs.dte;

<n - 1> is one less than the number of days. So for one week, it would be 6.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement