Skip to content
Advertisement

DAU WAU MAU Error in Redshift: [Amazon](500310) Invalid operation: This type of correlated subquery pattern is not supported due to internal error;

I am trying to compute DAU WAU MAU ratios.

  • DAU: active users on the day
  • WAU: active users of the past 7 days
  • MAU: active users of the past 30 days

The DAU:WAU, DAU:MAU, and WAU:MAU ratios compute stickiness of user engagement in terms of %.

I have a table called my_table that entails

  • datetime_field
  • user_id

This table lists all user_id that were active on a given datetime_field.

For the previous 6 months (excluding current month), I want to see the three above mentioned ratios (DAU:WAU, DAU:MAU, WAU:MAU) for every day. In other words for every day, I need the daily active user (a distinct count on users on that day), the active users of the past week (distinct count on users for the last 7 days of that day), and the active users of the past month (distinct count on users for the last 30 days of that day).

When running the following query (with variable names modified):

WITH s AS (
    SELECT
        DATE_TRUNC('day', datetime_field) AS dt,
        COUNT(DISTINCT user_id)::float AS dau
    FROM
        my_table
    WHERE
        datetime_field <= current_date - INTERVAL '1 month'
        AND datetime_field > current_date - INTERVAL '7 months'
    GROUP BY dt
)

SELECT
    dt,
    dau,
    (SELECT COUNT(DISTINCT user_id) FROM my_table WHERE datetime_field BETWEEN dt - 7 * INTERVAL '1 day' AND dt) AS wau,
    (SELECT COUNT(DISTINCT user_id) FROM my_table WHERE datetime_field BETWEEN dt - 30 * INTERVAL '1 day' AND dt) AS mau,
    dau / NULLIF(wau, 0) AS dau_wau,
    wau / NULLIF(mau, 0) AS wau_mau
FROM s
ORDER BY dt

I get the following error:

[Amazon](500310) Invalid operation: This type of correlated subquery pattern is not supported due to internal error;

When I run it with only the DAU:WAU or only the DAU:MAU computation it works. But I also need WAU:MAU, so I have to compute all three metrics in one query. Is there a way to make it work with all metrics?

Advertisement

Answer

Correlated subqueries are exceptionally expensive to compute and Redshift limits support for these to situations where the optimizer can rewrite the logic to be processed efficiently in parallel. See this “Death by correlated subqueries” blog post for an explanation of why they are expensive.

When Redshift gives the “correlated subquery pattern is not supported” message you can typically rewrite the query to something that will run and be much faster. The following rewrite does the comparison over calendar week/month rather than using rolling date windows.

WITH data_set AS (
    SELECT DATE_TRUNC('day', datetime_field) AS dt
         , user_id                           
    FROM  my_table
    --May want to pin this range to calendar months
    WHERE datetime_field <= current_date - INTERVAL '1 month'
      AND datetime_field > current_date - INTERVAL '7 months'
    GROUP BY dt
), daily_count AS (
    SELECT dt
         , DATE_TRUNC('week', dt)   AS wk
         , DATE_TRUNC('month', dt)  AS mth
         , COUNT(DISTINCT user_id)  AS dau
    FROM  data_set
    GROUP BY dt
--Per calendar week (not rolling)
), weekly_count AS (
    SELECT DATE_TRUNC('week', dt)   AS wk
         , COUNT(DISTINCT user_id)  AS wau
    FROM  data_set
    GROUP BY wk
--Per calendar month (not rolling)
), monthly_count AS (
    SELECT DATE_TRUNC('month', dt)  AS mth
         , COUNT(DISTINCT user_id)  AS mau
    FROM  data_set
    GROUP BY mth
)
SELECT dt
     , dau
     , dau / NULLIF(wau, 0) AS dau_wau
     , wau / NULLIF(mau, 0) AS wau_mau
FROM daily_count
JOIN weekly_count  USING (wk)
JOIN monthly_count USING (mth)
ORDER BY dt

Worth noting that the multiple COUNT(DISTINCT x) here are still quite expensive. If you intend to run this analysis frequently and/or “slice and dice” the distinct counts by many other facets then I recommend using of Redshift’s HyperLogLog functions which allow you to calculate approximate distinct counts very cheaply.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement