Skip to content
Advertisement

how to make cohort analysis in mysql

I have a table called order_star_member:

create table order_star_member(
   id INT UNSIGNED NOT NULL AUTO_INCREMENT,
   users_id INT(11) NOT NULL,
   createdAt datetime NOT NULL,
   total_price_star_member decimal(10,2) NOT NULL,
   PRIMARY KEY (id)
);

INSERT INTO order_star_member(users_id, createdAt, total_price_star_member)
VALUES
(15, '2021-01-01', 350000),
(15, '2021-01-02', 400000),
(16, '2021-01-02', 700000),
(15, '2021-02-01', 350000),
(16, '2021-02-02', 700000),
(15, '2021-03-01', 350000),
(16, '2021-03-01', 850000),
(17, '2021-03-03', 350000);

DB Fiddle

I want to find users in the month March with transaction >= 700000 and first transaction >= 700000. The user whose transaction is >= 700000 is called star member.

My query so far:

    SELECT COUNT(users_id) count_star_member,
           year_and_month DIV 100 `year`,
           year_and_month MOD 100 `month`
    FROM (SELECT users_id, 
                 MIN(year_and_month) year_and_month
          FROM ( SELECT users_id, 
                        DATE_FORMAT(createdAt, '%Y%m') year_and_month,
                        SUM(total_price_star_member) month_price
                 FROM order_star_member
                 GROUP BY users_id, 
                          DATE_FORMAT(createdAt, '%Y%m') 
                 HAVING month_price >= 350000 ) starrings
          GROUP BY users_id
          HAVING SUM(year_and_month = '202103') > 0 ) first_starrings
    GROUP BY year_and_month
    ORDER BY `year`, `month`;

    +-------------------+------+-------+
    | count_star_member | year | month |
    +-------------------+------+-------+
    |                 1 | 2021 |     1 |
    +-------------------+------+-------+

Explanation: in march 2021, there’s only one ‘star member’, which is users_id 16, whose first transaction is in january 2021, so ‘star member’ in march 2021 is as above.

But starting from March, the definition of ‘star member’ changes from 700,000 to 350,000.

I want to find the ‘star member’ in March, and his first transaction, but if the first transaction is in a month before March 2021, then the star member should be the user whose transaction >= 700,000 — but if the first transaction is in March 2021, as I sid, select a user whose transaction >= 350,000.

Thus my updated expectation:

        +-------------------+------+-------+
        | count_star_member | year | month |
        +-------------------+------+-------+
        |                 2 | 2021 |     1 |
        |                 1 | 2021 |     3 |
        +-------------------+------+-------+

Explanation : users 15, 16, and 17 are star member in march 2021. but users 15 and 16 are doing their first star member in January 2021 (because it is before March 2021, when the requirement to become star member is 700,000), while user 17 is also a star member because the first transaction is 350,000 in March 2021.

Advertisement

Answer

My understanding is that in determining the final output, you need 2 things:

  1. A user’s first transaction
  2. The users who are star members for the requested month using the condition that before March 2021 cumulative monthly transaction amounts >=700000 and after March >=350000

If correct, since you are using a version less than 8.0(where it could be done with one statement) your solution is as follows:

  1. You need a rules table or some configuration of rules (we’ll call it SMLimitDef) which would look like this entered directly in a table:
insert into SMLimitDef(sEffDate,eEffDate,priceLimit) 
VALUES('1980-01-01','2021-02-28',700000),
('2021-03-01','2999-12-31',350000);

  1. Next, you need a query or view that figures out your first transactions(called vFirstUserTransMatch) which would look something like this:
create view vFirstUserTransMatch as
SELECT *,month(osm.createdAt) as createMonth, year(osm.createdAt) as createYear 
FROM order_star_member osm
where createdAt=(select MIN(createdAt) from order_star_member b 
                  where b.users_id=osm.users_id
                 )
  1. Next you need a summary view or query that summarizes transactions per month per user
create view vOSMSummary as
SELECT users_id,month(osm.createdAt) as createMonth, year(osm.createdAt) as createYear, sum(total_price_star_member) as totalPrice 
FROM order_star_member osm
group by users_id,month(osm.createdAt), year(osm.createdAt);
  1. Next you need a query that puts it all together based on your criteria:
select osm.*,futm.createMonth as firstMonth, futm.createYear as firstYear 
from vOSMSummary osm 
inner join vFirstUserTransMatch futm
on osm.users_id=futm.users_id
where exists(select 'x' from SMLimitDef c 
             where osm.createMonth between Month(c.sEffDate) and Month(c.eEffDate)
             and osm.createYear between Year(c.sEffDate) and Year(c.eEffDate)
             and osm.totalPrice>=c.pricelimit
            )
and osm.CreateMonth=3 and osm.createYear=2021
  1. Lastly, you can do your summary
SELECT COUNT(users_id) count_star_member,
       firstYear `year`,
       firstMonth `month`
FROM (
select osm.*,futm.createMonth as firstMonth, futm.createYear as firstYear 
from vOSMSummary osm 
inner join vFirstUserTransMatch futm
on osm.users_id=futm.users_id
where exists(select 'x' from SMLimitDef c 
             where osm.createMonth between Month(c.sEffDate) and Month(c.eEffDate)
             and osm.createYear between Year(c.sEffDate) and Year(c.eEffDate)
             and osm.totalPrice>=c.pricelimit
            )
and osm.CreateMonth=3 and osm.createYear=2021
) d
group by firstYear, firstMonth

Like I said, if you were using mySQL 8, everything could be in one query using “With” statements but for your version, for readability and simplicity, you need views otherwise you can still embed the sql for those views into the final sql.

Fiddle looks like this

Contrast with version 8 which looks like this

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement