I have a problem where I’d like to get all user_id’s in the “active” state for each day. An event is recorded only when the state changes for a user. The state for each user_id should be preserved until an “inactive” event is fired (see example data and outcome). Different users can have their state changed during the same day.
How do I do this? I have tried working with ARRAY_AGG and also grouping the two different events and using lag based on this answer. I get stuck at the phase where I would then need to subtract the user_id’s that get the inactive event from the array.
SELECT DATE("2019-11-11") as date, 1 as user_id, "inactive" as state UNION ALL SELECT DATE("2019-11-12"), 2, "active" UNION ALL SELECT DATE("2019-11-13"), 1, "active" UNION ALL SELECT DATE("2019-11-14"), 1, "inactive" UNION ALL SELECT DATE("2019-11-14"), 3, "active" UNION ALL SELECT DATE("2019-11-15"), 2, "inactive"
Desired output:
date | active_users 2019-11-11| [] 2019-11-12| [2] 2019-11-13| [2,1] 2019-11-14| [2,3] 2019-11-15| [3]
Appreciate the all and any help I can get!
Advertisement
Answer
One method is to generate a series and aggregate. First, get the range of days for activity:
select t.*, date_add(next_inactive, interval -1 day) from (select t.*, min(case when status = 'inactive' then date end) over (partition by user_id order by date desc) as next_inactive, max(date) over () as max_date from t ) t where state = 'active'
Then generate the dates and aggregate:
select day, array_agg(user_id) from (select t.*, date_add(next_inactive, interval -1 day) s last_active_date from (select t.*, min(case when status = 'inactive' then date end) over (partition by user_id order by date desc) as next_inactive, max(date) over () as max_date from t ) t where state = 'active' ) t cross join unnest(generate_date_array(date, coalesce(last_active_date, max_date), interval 1 day) day group by day;