Rolling 90 days active users in BigQuery, improving preformance (DAU/MAU/WAU)

Question

I&#8217;m trying to get the number of unique events on a specific date, rolling 90/30/7 days back. I&#8217;ve got this working on a limited number of rows with the query bellow but for large data sets I get &#8230;

Accepted Answer

Counting unique users requires a lot of resources, even more if you want results over a rolling window. For a scalable solution, look into approximate algorithms like HLL++: https://medium.freecodecamp.org/counting-uniques-faster-in-bigquery-with-hyperloglog-5d3764493a5aFor an exact count, this would work (but gets slower as the window gets larger):#standardSQLSELECT DATE_SUB(date, INTERVAL i DAY) date_grp , COUNT(DISTINCT owner_user_id) unique_90_day_users , COUNT(DISTINCT IF(i<31,owner_user_id,null)) unique_30_day_users , COUNT(DISTINCT IF(i<8,owner_user_id,null)) unique_7_day_usersFROM (  SELECT DATE(creation_date) date, owner_user_id  FROM `bigquery-public-data.stackoverflow.posts_questions`   WHERE EXTRACT(YEAR FROM creation_date)=2017  GROUP BY 1, 2), UNNEST(GENERATE_ARRAY(1, 90)) iGROUP BY 1ORDER BY date_grpThe approximate solution produces results way faster (14s vs 366s, but then the results are approximate):#standardSQLSELECT DATE_SUB(date, INTERVAL i DAY) date_grp , HLL_COUNT.MERGE(sketch) unique_90_day_users , HLL_COUNT.MERGE(DISTINCT IF(i<31,sketch,null)) unique_30_day_users , HLL_COUNT.MERGE(DISTINCT IF(i<8,sketch,null)) unique_7_day_usersFROM (  SELECT DATE(creation_date) date, HLL_COUNT.INIT(owner_user_id) sketch  FROM `bigquery-public-data.stackoverflow.posts_questions`   WHERE EXTRACT(YEAR FROM creation_date)=2017  GROUP BY 1), UNNEST(GENERATE_ARRAY(1, 90)) iGROUP BY 1ORDER BY date_grpUpdated query that gives correct results &#8211; removing rows with less than 90 days (works when no dates are missing):#standardSQLSELECT DATE_SUB(date, INTERVAL i DAY) date_grp , HLL_COUNT.MERGE(sketch) unique_90_day_users , HLL_COUNT.MERGE(DISTINCT IF(i<31,sketch,null)) unique_30_day_users , HLL_COUNT.MERGE(DISTINCT IF(i<8,sketch,null)) unique_7_day_users , COUNT(*) window_daysFROM (  SELECT DATE(creation_date) date, HLL_COUNT.INIT(owner_user_id) sketch  FROM `bigquery-public-data.stackoverflow.posts_questions`   WHERE EXTRACT(YEAR FROM creation_date)=2017  GROUP BY 1), UNNEST(GENERATE_ARRAY(1, 90)) iGROUP BY 1HAVING window_days=90ORDER BY date_grp

Advertisement

Answer