Cohort Analysis using SQL (Snowflake)

Question

I am doing a cohort analysis using the table TRANSACTIONS. Below is the table schema, Below is a quick query to see how USER_ID 12345 (an example) goes through the different cohorts based on the date filter provided, The result for this query with the time frame (two weeks) would be and this USER_ID would be …

Accepted Answer

For starters you CTE could have the redundancy removed like so:WITH all_user_cohort AS (    SELECT        USER_ID,        SUM(IFF(is_payment_added=TRUE, 1,0)) AS payment_added_count    FROM transactions    GROUP BY user_id), ocassional_user_cohort AS (    SELECT * FROM all_user_cohort    WHERE PAYMENT_ADDED_COUNT between 1 AND 10), regular_user_cohort AS (    SELECT * FROM all_user_cohort    WHERE PAYMENT_ADDED_COUNT > 10)SELECTCOUNT(DISTINCT ou.user_id) AS "OCCASIONAL USERS",COUNT(DISTINCT ru.user_id) AS "REGULAR USERS"FROM all_user_cohort AS auLEFT JOIN ocassional_user_cohort ou ON au.user_id=ou.user_idLEFT JOIN regular_user_cohort ru ON au.user_id=ru.user_idLEFT JOIN transactions t ON au.user_id=t.user_idWHERE au.user_id=12345AND TO_DATE(t.payment_date_utc)>='2021-03-01'But the reason you are getting this problem is you are doing the which do the belong in across all time.What you are wanting is to move the date filter into all_user_cohort, and not making tables when you can just sum the number of rows meeting the need.WITH all_user_cohort AS (    SELECT        USER_ID,        SUM(IFF(is_payment_added=TRUE, 1,0)) AS payment_added_count    FROM transactions    WHERE TO_DATE(payment_date_utc)>='2021-03-01'    GROUP BY user_id)   SELECT    SUM(IFF(payment_added_count between 1 AND 10, 1,0)) AS "OCCASIONAL USERS"    SUM(IFF(payment_added_count > 10, 1,0)) AS "REGULAR USERS"FROM transactions WHERE au.user_id=12345Which can also be done differently, if that is more what your looking for, for other reasons.WITH all_user_cohort AS (    SELECT        USER_ID,        SUM(IFF(is_payment_added=TRUE, 1,0)) AS payment_added_count    FROM transactions    WHERE TO_DATE(payment_date_utc)>='2021-03-01'    GROUP BY user_id), classify_users AS (    SELECT user_id        ,CASE             WHEN payment_added_count between 1 AND 10 THEN 'OCCASIONAL USERS'            WHEN payment_added_count > 10 THEN 'REGULAR USERS'            ELSE 'users with zero payments'        END AS classified    FROM all_user_cohort)SELECT classified    ,count(*)FROM classify_usersWHERE user_id=12345GROUP BY 1

Advertisement

Answer