SQL 30 day active user query

Question

I have a table of users and how many events they fired on a given date: DATE USERID EVENTS 2021-08-27 1 5 2021-07-25 1 7 2021-07-23 2 3 2021-07-20 3 9 2021-06-22 1 9 2021-05-05 1 4 2021-05-05 2 2 2021-05-05 3 6 2021-05-05 4 8 2021-05-05 5 1 I want to create a table showing number of active users

Accepted Answer

This is tricky to do as window functions — because count(distinct) is not permitted. You can use a self-join:select t1.date, count(distinct t2.userid)from table t join table t2 on t2.date <= t.date and t2.date > t.date - interval '30 day'group by t1.date;However, that can be expensive. One solution is to “unpivot” the data. That is, do an incremental count per user of going “in” and “out” of active states and then do a cumulative sum:with d as ( -- calculate the dates with "ins" and "outs" select user, date, +1 as inc from table union all select user, date + interval '30 day', -1 as inc from table ), d2 as ( -- accumulate to get the net actives per day select date, user, sum(inc) as change_on_day, sum(sum(inc)) over (partition by user order by date) as running_inc from d group by date, user ), d3 as ( -- summarize into active periods select user, min(date) as start_date, max(date) as end_date from (select d2.*, sum(case when running_inc = 0 then 1 else 0 end) over (partition by user order by date) as active_period from d2 ) d2 where running_inc > 0 group by user )select d.date, count(d3.user)from (select distinct date from table) d left join d3 on d.date >= start_date and d.date < end_dategroup by d.date;

DATE	USERID	EVENTS
2021-08-27	1	5
2021-07-25	1	7
2021-07-23	2	3
2021-07-20	3	9
2021-06-22	1	9
2021-05-05	1	4
2021-05-05	2	2
2021-05-05	3	6
2021-05-05	4	8
2021-05-05	5	1

Advertisement

Answer