In PostgreSQL, I have a table that logs activity for all users, with an account ID and a timestamp field:
SELECT account_id, created FROM activity_log;
A single account_id
can appear many times in a day, or not at all.
I would like a chart showing the number of “active users” each day, where “active users” means “users who have done any activity within the previous X days”.
If X is 1, then we can just truncate timestamp to ‘day’ and aggregate:
SELECT date_trunc('day', created) AS date, count(DISTINCT account_id) FROM activity_log GROUP BY date_trunc('day', created) ORDER BY date;
If X is exactly 7, then we could truncate to ‘week’ and aggregate – although this gives me only one data point for a week, when I actually want one data point per day.
But I need to solve for the general case of different X, and give a distinct data point for each day.
Advertisement
Answer
One method is to generate the dates and then count using left join
and group by
or similar logic. The following uses a lateral join:
select gs.dte, al.num_accounts from generate_series('2021-01-01'::date, '2021-01-31'::date, interval '1 day' ) gs(dte) left join lateral (select count(distinct al.account_id) as num_accounts from activity_log al where al.created >= gs.dte - (<n - 1>) * interval '1 day' and al.created < gs.dte + interval '1 day' ) al on 1=1 order by gs.dte;
<n - 1>
is one less than the number of days. So for one week, it would be 6
.