I need some help to do it right in one query (if it possible). (this is a theoretical example and I assume the presence of events in event_name(like registration/action etc)
I have 3 colums:
-user_id -event_timestamp -event_name
From this 3 columns we need to create new table with 4 new columns:
-user year and month registration time -number of new user registration in this month -number of users who returned to the second calendar month after registration -return probability
Result must be looks like this:
2019-1 | 1 | 1 | 100% 2019-2 | 3 | 2 | 67% 2019-3 | 2 | 0 | 0%
What I’ve done now: I’m use this toy example of my possible main table:
CREATE TABLE `main` ( `event_timestamp` timestamp, `user_id` int(10), `event_name` char(12) ) DEFAULT CHARSET=utf8; INSERT INTO `main` (`event_timestamp`, `user_id`, `event_name`) VALUES ('2019-01-23 20:02:21.550', '1', 'registration'), ('2019-01-24 20:03:21.550', '2', 'action'), ('2019-02-21 20:04:21.550', '3', 'registration'), ('2019-02-22 20:05:21.550', '4', 'registration'), ('2019-02-23 20:06:21.550', '5', 'registration'), ('2019-02-23 20:06:21.550', '1', 'action'), ('2019-02-24 20:07:21.550', '6', 'action'), ('2019-03-20 20:08:21.550', '3', 'action'), ('2019-03-21 20:09:21.550', '4', 'action'), ('2019-03-22 20:10:21.550', '9', 'action'), ('2019-03-23 20:11:21.550', '10', 'registration'), ('2019-03-22 20:10:21.550', '4', 'action'), ('2019-03-22 20:10:21.550', '5', 'action'), ('2019-03-24 20:11:21.550', '11', 'registration');
I’m trying to test some queries to create 4 new columns:
This is for column #1, we select month and year from timestamp where action is registration (as I guess), but I need to sum it for month (like 2019-11, 2019-12)
SELECT DATE_FORMAT(event_timestamp, '%Y-%m') AS column_1 FROM main WHERE event_name='registration';
For column #2 we need to sum users with even_name registration in this month for every month, or.. we can trying for searching first time activity by user_id, but I don’t know how to do this.
Here is some thinks about it…
SELECT COUNT(DISTINCT user_id) AS user_count FROM main GROUP BY MONTH(event_timestamp);
SELECT COUNT(DISTINCT user_id) AS user_count FROM main WHERE event_name='registration';
For column #3 we need to compare user_id
with the event_name
registration
and last month event with any event of the second month so we get users who returned for the next month.
Any idea how to create this query?
This is how to calc column #4
SELECT *, ROUND ((column_3/column_2)*100) AS column_4 FROM main;
Advertisement
Answer
I hope you will find the following answer helpful.
The first column is the extraction of year and month. The new_users
column is the COUNT
of the unique user ids when the action is ‘registration’ since the user can be duplicated from the JOIN
as a result of taking multiple actions the following month. The returned_users
column is the number of users who have an action in the next month from the registration. The returned_users
column needs a DISTINCT
clause since a user can have multiple actions during one month. The final column is the probability that you asked from the two previous columns.
The JOIN
clause is a self-join to bring the users that had at least one action the next month of their registration.
SELECT CONCAT(YEAR(A.event_timestamp),'-',MONTH(A.event_timestamp)), COUNT(DISTINCT(CASE WHEN A.event_name LIKE 'registration' THEN A.user_id END)) AS new_users, COUNT(DISTINCT B.user_id) AS returned_users, CASE WHEN COUNT(DISTINCT(CASE WHEN A.event_name LIKE 'registration' THEN A.user_id END))=0 THEN 0 ELSE COUNT(DISTINCT B.user_id)/COUNT(DISTINCT(CASE WHEN A.event_name LIKE 'registration' THEN A.user_id END))*100 END AS My_Ratio FROM main AS A LEFT JOIN main AS B ON A.user_id=B.user_id AND MONTH(A.event_timestamp)+1=MONTH(B.event_timestamp) AND A.event_name='registration' AND B.event_name='action' GROUP BY CONCAT(YEAR(A.event_timestamp),'-',MONTH(A.event_timestamp))