I am doing a cohort analysis using the table TRANSACTIONS. Below is the table schema,
USER_ID NUMBER, PAYMENT_DATE_UTC DATE, IS_PAYMENT_ADDED BOOLEAN
Below is a quick query to see how USER_ID 12345 (an example) goes through the different cohorts based on the date filter provided,
WITH RESULT( SELECT USER_ID, TO_DATE(PAYMENT_DATE_UTC) AS PAYMENT_DATE, SUM(CASE WHEN IS_PAYMENT_ADDED=TRUE THEN 1 ELSE 0 END) AS PAYMENT_ADDED_COUNT FROM TRANSACTIONS GROUP BY 1,2 HAVING PAYMENT_ADDED_COUNT>=1 ORDER BY 2 ) SELECT COUNT(DISTINCT r.USER_ID), SUM(r.PAYMENT_ADDED_COUNT) FROM RESULT r WHERE r.USER_ID=12345 AND (r.PAYMENT_DATE>='2021-02-01' AND r.PAYMENT_DATE<'2021-02-15')
The result for this query with the time frame (two weeks) would be
| 1 | 55 |
and this USER_ID would be classified as a Regular User Cohort (one who has made more than 10 payments) for the provided date filter
If the same query is run with the time frame as just one day say '2021-02-07'
, the result would be
| 1 | 10 |
and this USER_ID would be classified as as Occasional User Cohort (one who has made between 1 and 10 payments) for the provided date filter
I have this below query to bucket the USER_ID’s into the two different cohorts based on the sum of the payments added,
WITH ALL_USER_COHORT AS (SELECT USER_ID, SUM(CASE WHEN IS_PAYMENT_ADDED=TRUE THEN 1 ELSE 0 END ) AS PAYMENT_ADDED_COUNT FROM TRANSACTIONS GROUP BY USER_ID ), OCASSIONAL_USER_COHORT AS (SELECT USER_ID, SUM(CASE WHEN IS_PAYMENT_ADDED=TRUE THEN 1 ELSE 0 END ) AS PAYMENT_ADDED_COUNT FROM TRANSACTIONS GROUP BY USER_ID HAVING (PAYMENT_ADDED_COUNT>=1 AND PAYMENT_ADDED_COUNT<=10) ), REGULAR_USER_COHORT AS (SELECT USER_ID, SUM(CASE WHEN IS_PAYMENT_ADDED=TRUE THEN 1 ELSE 0 END ) AS PAYMENT_ADDED_COUNT FROM TRANSACTIONS GROUP BY USER_ID HAVING PAYMENT_ADDED_COUNT>10 ) SELECT COUNT(DISTINCT ou.USER_ID) AS "OCCASIONAL USERS", COUNT(DISTINCT ru.USER_ID) AS "REGULAR USERS" FROM ALL_USER_COHORT au LEFT JOIN OCASSIONAL_USER_COHORT ou ON au.USER_ID=ou.USER_ID LEFT JOIN REGULAR_USER_COHORT ru ON au.USER_ID=ru.USER_ID LEFT JOIN TRANSACTIONS t ON au.USER_ID=t.USER_ID WHERE au.USER_ID=12345 AND TO_DATE(t.PAYMENT_DATE_UTC)>='2021-02-07'
Ideally the USER_ID 12345 should be bucketed as “OCCASIONAL USERS” as per the provided date filter but the query buckets it as “REGULAR USERS” instead.
Advertisement
Answer
For starters you CTE could have the redundancy removed like so:
WITH all_user_cohort AS ( SELECT USER_ID, SUM(IFF(is_payment_added=TRUE, 1,0)) AS payment_added_count FROM transactions GROUP BY user_id ), ocassional_user_cohort AS ( SELECT * FROM all_user_cohort WHERE PAYMENT_ADDED_COUNT between 1 AND 10 ), regular_user_cohort AS ( SELECT * FROM all_user_cohort WHERE PAYMENT_ADDED_COUNT > 10 ) SELECT COUNT(DISTINCT ou.user_id) AS "OCCASIONAL USERS", COUNT(DISTINCT ru.user_id) AS "REGULAR USERS" FROM all_user_cohort AS au LEFT JOIN ocassional_user_cohort ou ON au.user_id=ou.user_id LEFT JOIN regular_user_cohort ru ON au.user_id=ru.user_id LEFT JOIN transactions t ON au.user_id=t.user_id WHERE au.user_id=12345 AND TO_DATE(t.payment_date_utc)>='2021-03-01'
But the reason you are getting this problem is you are doing the which do the belong in across all time.
What you are wanting is to move the date filter into all_user_cohort
, and not making tables when you can just sum the number of rows meeting the need.
WITH all_user_cohort AS ( SELECT USER_ID, SUM(IFF(is_payment_added=TRUE, 1,0)) AS payment_added_count FROM transactions WHERE TO_DATE(payment_date_utc)>='2021-03-01' GROUP BY user_id ) SELECT SUM(IFF(payment_added_count between 1 AND 10, 1,0)) AS "OCCASIONAL USERS" SUM(IFF(payment_added_count > 10, 1,0)) AS "REGULAR USERS" FROM transactions WHERE au.user_id=12345
Which can also be done differently, if that is more what your looking for, for other reasons.
WITH all_user_cohort AS ( SELECT USER_ID, SUM(IFF(is_payment_added=TRUE, 1,0)) AS payment_added_count FROM transactions WHERE TO_DATE(payment_date_utc)>='2021-03-01' GROUP BY user_id ), classify_users AS ( SELECT user_id ,CASE WHEN payment_added_count between 1 AND 10 THEN 'OCCASIONAL USERS' WHEN payment_added_count > 10 THEN 'REGULAR USERS' ELSE 'users with zero payments' END AS classified FROM all_user_cohort ) SELECT classified ,count(*) FROM classify_users WHERE user_id=12345 GROUP BY 1