Skip to content
Advertisement

Cohort Analysis using SQL (Snowflake)

I am doing a cohort analysis using the table TRANSACTIONS. Below is the table schema,

USER_ID              NUMBER,
PAYMENT_DATE_UTC     DATE,
IS_PAYMENT_ADDED     BOOLEAN

Below is a quick query to see how USER_ID 12345 (an example) goes through the different cohorts based on the date filter provided,

WITH RESULT(
SELECT
USER_ID,
TO_DATE(PAYMENT_DATE_UTC) AS PAYMENT_DATE,
SUM(CASE WHEN IS_PAYMENT_ADDED=TRUE THEN 1 ELSE 0 END) AS PAYMENT_ADDED_COUNT
FROM TRANSACTIONS
GROUP BY 1,2
HAVING PAYMENT_ADDED_COUNT>=1
ORDER BY 2
)
SELECT
COUNT(DISTINCT r.USER_ID),
SUM(r.PAYMENT_ADDED_COUNT)
FROM RESULT r
WHERE r.USER_ID=12345
AND (r.PAYMENT_DATE>='2021-02-01' AND r.PAYMENT_DATE<'2021-02-15')

The result for this query with the time frame (two weeks) would be

| 1 | 55 |

and this USER_ID would be classified as a Regular User Cohort (one who has made more than 10 payments) for the provided date filter

If the same query is run with the time frame as just one day say '2021-02-07', the result would be

| 1 | 10 |

and this USER_ID would be classified as as Occasional User Cohort (one who has made between 1 and 10 payments) for the provided date filter

I have this below query to bucket the USER_ID’s into the two different cohorts based on the sum of the payments added,

WITH
ALL_USER_COHORT AS 
(SELECT
USER_ID,
SUM(CASE WHEN IS_PAYMENT_ADDED=TRUE THEN 1 ELSE 0 END ) AS PAYMENT_ADDED_COUNT
FROM TRANSACTIONS
GROUP BY USER_ID
),
OCASSIONAL_USER_COHORT AS 
(SELECT
USER_ID,
SUM(CASE WHEN IS_PAYMENT_ADDED=TRUE THEN 1 ELSE 0 END ) AS PAYMENT_ADDED_COUNT
FROM TRANSACTIONS
GROUP BY USER_ID
HAVING (PAYMENT_ADDED_COUNT>=1 AND PAYMENT_ADDED_COUNT<=10)
),
REGULAR_USER_COHORT AS 
(SELECT
USER_ID,
SUM(CASE WHEN IS_PAYMENT_ADDED=TRUE THEN 1 ELSE 0 END ) AS PAYMENT_ADDED_COUNT
FROM TRANSACTIONS
GROUP BY USER_ID
HAVING PAYMENT_ADDED_COUNT>10
)
SELECT
COUNT(DISTINCT ou.USER_ID) AS "OCCASIONAL USERS",
COUNT(DISTINCT ru.USER_ID) AS "REGULAR USERS"
FROM ALL_USER_COHORT au
LEFT JOIN OCASSIONAL_USER_COHORT ou ON au.USER_ID=ou.USER_ID
LEFT JOIN REGULAR_USER_COHORT ru ON au.USER_ID=ru.USER_ID
LEFT JOIN TRANSACTIONS t ON au.USER_ID=t.USER_ID
WHERE au.USER_ID=12345
AND TO_DATE(t.PAYMENT_DATE_UTC)>='2021-02-07'

Ideally the USER_ID 12345 should be bucketed as “OCCASIONAL USERS” as per the provided date filter but the query buckets it as “REGULAR USERS” instead.

Advertisement

Answer

For starters you CTE could have the redundancy removed like so:

WITH all_user_cohort AS (
    SELECT
        USER_ID,
        SUM(IFF(is_payment_added=TRUE, 1,0)) AS payment_added_count
    FROM transactions
    GROUP BY user_id
), ocassional_user_cohort AS (
    SELECT * FROM all_user_cohort
    WHERE PAYMENT_ADDED_COUNT between 1 AND 10
), regular_user_cohort AS (
    SELECT * FROM all_user_cohort
    WHERE PAYMENT_ADDED_COUNT > 10
)
SELECT
COUNT(DISTINCT ou.user_id) AS "OCCASIONAL USERS",
COUNT(DISTINCT ru.user_id) AS "REGULAR USERS"
FROM all_user_cohort AS au
LEFT JOIN ocassional_user_cohort ou ON au.user_id=ou.user_id
LEFT JOIN regular_user_cohort ru ON au.user_id=ru.user_id
LEFT JOIN transactions t ON au.user_id=t.user_id
WHERE au.user_id=12345
AND TO_DATE(t.payment_date_utc)>='2021-03-01'

But the reason you are getting this problem is you are doing the which do the belong in across all time.

What you are wanting is to move the date filter into all_user_cohort, and not making tables when you can just sum the number of rows meeting the need.

WITH all_user_cohort AS (
    SELECT
        USER_ID,
        SUM(IFF(is_payment_added=TRUE, 1,0)) AS payment_added_count
    FROM transactions
    WHERE TO_DATE(payment_date_utc)>='2021-03-01'
    GROUP BY user_id
)   
SELECT
    SUM(IFF(payment_added_count between 1 AND 10, 1,0)) AS "OCCASIONAL USERS"
    SUM(IFF(payment_added_count > 10, 1,0)) AS "REGULAR USERS"
FROM transactions 
WHERE au.user_id=12345

Which can also be done differently, if that is more what your looking for, for other reasons.

WITH all_user_cohort AS (
    SELECT
        USER_ID,
        SUM(IFF(is_payment_added=TRUE, 1,0)) AS payment_added_count
    FROM transactions
    WHERE TO_DATE(payment_date_utc)>='2021-03-01'
    GROUP BY user_id
), classify_users AS (
    SELECT user_id
        ,CASE 
            WHEN payment_added_count between 1 AND 10 THEN 'OCCASIONAL USERS'
            WHEN payment_added_count > 10 THEN 'REGULAR USERS'
            ELSE 'users with zero payments'
        END AS classified
    FROM all_user_cohort
)
SELECT classified
    ,count(*)
FROM classify_users
WHERE user_id=12345
GROUP BY 1 
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement