Skip to content
Advertisement

Cohort Analysis using SQL (Snowflake)

I am doing a cohort analysis using the table TRANSACTIONS. Below is the table schema,

Below is a quick query to see how USER_ID 12345 (an example) goes through the different cohorts based on the date filter provided,

The result for this query with the time frame (two weeks) would be

and this USER_ID would be classified as a Regular User Cohort (one who has made more than 10 payments) for the provided date filter

If the same query is run with the time frame as just one day say '2021-02-07', the result would be

and this USER_ID would be classified as as Occasional User Cohort (one who has made between 1 and 10 payments) for the provided date filter

I have this below query to bucket the USER_ID’s into the two different cohorts based on the sum of the payments added,

Ideally the USER_ID 12345 should be bucketed as “OCCASIONAL USERS” as per the provided date filter but the query buckets it as “REGULAR USERS” instead.

Advertisement

Answer

For starters you CTE could have the redundancy removed like so:

But the reason you are getting this problem is you are doing the which do the belong in across all time.

What you are wanting is to move the date filter into all_user_cohort, and not making tables when you can just sum the number of rows meeting the need.

Which can also be done differently, if that is more what your looking for, for other reasons.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement