DAU WAU MAU Error in Redshift: [Amazon](500310) Invalid operation: This type of correlated subquery pattern is not supported due to internal error;

Question

I am trying to compute DAU WAU MAU ratios. DAU: active users on the day WAU: active users of the past 7 days MAU: active users of the past 30 days The DAU:WAU, DAU:MAU, and WAU:MAU ratios compute stickiness of user engagement in terms of %. I have a table called my_table that entails datetime_field user_id This table lists all

Accepted Answer

Correlated subqueries are exceptionally expensive to compute and Redshift limits support for these to situations where the optimizer can rewrite the logic to be processed efficiently in parallel. See this &#8220;Death by correlated subqueries&#8221; blog post for an explanation of why they are expensive.When Redshift gives the &#8220;correlated subquery pattern is not supported&#8221; message you can typically rewrite the query to something that will run and be much faster. The following rewrite does the comparison over calendar week/month rather than using rolling date windows.WITH data_set AS (    SELECT DATE_TRUNC('day', datetime_field) AS dt         , user_id                               FROM  my_table    --May want to pin this range to calendar months    WHERE datetime_field <= current_date - INTERVAL '1 month'      AND datetime_field > current_date - INTERVAL '7 months'    GROUP BY dt), daily_count AS (    SELECT dt         , DATE_TRUNC('week', dt)   AS wk         , DATE_TRUNC('month', dt)  AS mth         , COUNT(DISTINCT user_id)  AS dau    FROM  data_set    GROUP BY dt--Per calendar week (not rolling)), weekly_count AS (    SELECT DATE_TRUNC('week', dt)   AS wk         , COUNT(DISTINCT user_id)  AS wau    FROM  data_set    GROUP BY wk--Per calendar month (not rolling)), monthly_count AS (    SELECT DATE_TRUNC('month', dt)  AS mth         , COUNT(DISTINCT user_id)  AS mau    FROM  data_set    GROUP BY mth)SELECT dt     , dau     , dau / NULLIF(wau, 0) AS dau_wau     , wau / NULLIF(mau, 0) AS wau_mauFROM daily_countJOIN weekly_count  USING (wk)JOIN monthly_count USING (mth)ORDER BY dtWorth noting that the multiple COUNT(DISTINCT x) here are still quite expensive. If you intend to run this analysis frequently and/or &#8220;slice and dice&#8221; the distinct counts by many other facets then I recommend using of Redshift&#8217;s HyperLogLog functions which allow you to calculate approximate distinct counts very cheaply.

Advertisement

Answer