Skip to content
Advertisement

How to get an array of user_id’s being active at a specific point in time based on their latest event in BigQuery?

I have a problem where I’d like to get all user_id’s in the “active” state for each day. An event is recorded only when the state changes for a user. The state for each user_id should be preserved until an “inactive” event is fired (see example data and outcome). Different users can have their state changed during the same day.

How do I do this? I have tried working with ARRAY_AGG and also grouping the two different events and using lag based on this answer. I get stuck at the phase where I would then need to subtract the user_id’s that get the inactive event from the array.

Desired output:

Appreciate the all and any help I can get!

Advertisement

Answer

One method is to generate a series and aggregate. First, get the range of days for activity:

Then generate the dates and aggregate:

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement