I have an SQL query which counts user uptake (registrations) by day for a particular group
of users. It also does a cumulative (running total)
select Date, Cumulative, Up as Uptake from ( select Date, Up, @running_total:=@running_total + Up as Cumulative from ( select distinct(date(`audit`.`created_at`)) as Date, COALESCE(f.uptake, 0) as Up from `audit` left join ( select date(`users`.`created_at`) as day, count(`users`.`id`) as uptake from `users` where `users`.`group_uuid` = (select `groups`.`uuid` from `groups` where `groups`.`name` = "companyA") group by day ) f on f.day = date(`audit`.`created_at`) where `audit`.`created_at` between '2019-07-03' and CURDATE() ) c JOIN (SELECT @running_total:=0) r ) final order by Date desc
This query is using the audit
table purely to get me a list of dates reliable even if there are no users created on every day, i know the audit
table with have a record.
I could get all the groups easily with
select `groups`.`name` from `groups`
What i want is… An entry per day and group with the uptake (and cumulative if possible) Example:
Date | Cumulative | Uptake | group 2020-04-07 | 2 | 1 | comapnyA 2020-04-07 | 5 | 3 | comapnyB 2020-04-06 | 1 | 0 | comapnyA 2020-04-06 | 2 | 1 | comapnyB 2020-04-05 | 1 | 1 | comapnyA 2020-04-05 | 1 | 1 | comapnyB .... etc
Advertisement
Answer
For versions of MySQL before 8.0, we could make use of user-defined variables, as demonstrated in the OP query.
A few notes:
It looks like we started with a simple GROUP BY created_date, group_uuid
query against just the users
table to get non-zero counts. But that’s got missing rows, where there otherwise be a zero count.
So it looks like we need is a calendar source (a distinct list of dates in a given range, OP query is using query of audit table as calendar source) and we need that cross joined (Cartesian product) to a distinct list of uuid values from group, along with the name. For this exercise, we are going to assume that uuid
is unique in the groups
table, and that each uuid
value is associated with a distinct name
value. (If that’s not true, we’d need to make some adjustments.)
To get the running total separately by group, we’d need to process rows in group order, then within each group, by ascending date order. As we process rows, we need to reset the running total to zero when we start a new group i.e. when detect a change in the group uuid value.
NOTE: The MySQL Reference Manual gives specific warning about using the user-defined variables, both reading and modifying in the same statement, the order of operations is not guaranteed. Up through MySQL 5.7, with carefully crafted queries, we do observe predictable behavior in the order of operations.
So, we could do something like this:
SELECT q.date AS `Date` , q.running_total AS `Cumulative` , q.uptake AS `Uptake` , q.name AS `Group` FROM ( SELECT @rtot := IF(@prev_uuid = grp.uuid,@rtot,0) + IFNULL(cnt.uptake,0) AS `running_total` , IFNULL(cnt.uptake,0) AS `uptake` , @prev_uuid := grp.uuid AS `uuid` , grp.name AS `name` , cal.date AS `date` FROM ( -- initialize user-defined variables SELECT @prev_uuid := NULL , @rtot := 0 ) i CROSS JOIN ( -- calendar source for distinct date values SELECT DATE(a.created_at) AS `date` FROM `audit` a WHERE a.created_at >= '2019-07-03' AND a.created_at <= DATE(NOW()) GROUP BY DATE(a.created_at) ORDER BY DATE(a.created_at) ) cal CROSS JOIN ( -- distinct list of group uuid we want to return SELECT g.uuid AS `uuid` , MAX(g.name) AS `name` FROM `groups` g WHERE g.name IN ('CompanyA','CompanyB') GROUP BY g.uuid ) grp LEFT JOIN ( -- count by group and date SELECT u.group_uuid AS `group_uuid` , DATE(u.created_at) AS `date` , COUNT(u.id) AS `uptake` FROM `users` u WHERE u.created_at >= '2019-07-03' GROUP BY u.group_uuid , DATE(u.created_at) ) cnt ON grp.uuid = cnt.group_uuid AND cal.date = cnt.date ORDER BY grp.uuid , cal.date ) q ORDER BY q.date DESC , q.name ASC
N.B. We have to be careful with the order of operations in the SELECT list that’s using the user-defined variables, for example, we need to test the value of @prev_uuid (saved from the previous row) before we overwrite it with the value of the current row.
Also, this is not tested, there could be syntax errors, missing parens. I strongly recommend testing each of the inline view queries separately (cal
, grp
, cnt
) to verify expected results are being performed. (The WHERE clause in the grp
query is restricting groups to be returned, adjust or eliminate as required.)
Then we can move on to testing the next outer query q
, and confirm the results from that, before wrapping q
in the final outermost query to re-order rows, and display columns in desired order.
** FOLLOWUP **
to see if we can get MySQL to honor the ORDER BY uuid
, we can modify/replace q
Change from this:
SELECT q.date AS `Date` , q.running_total AS `Cumulative` , q.uptake AS `Uptake` , q.name AS `Group` FROM ( SELECT @rtot := IF(@prev_uuid = grp.uuid,@rtot,0) + IFNULL(cnt.uptake,0) AS `running_total` , IFNULL(cnt.uptake,0) AS `uptake` , @prev_uuid := grp.uuid AS `uuid` , grp.name AS `name` , cal.date AS `date` FROM ... ORDER BY grp.uuid , cal.date ) q ORDER BY q.date DESC , q.name ASC
to
SELECT r.date AS `Date` , r.running_total AS `Cumulative` , r.uptake AS `Uptake` , r.name AS `Group` FROM ( SELECT @rtot := IF(@prev_uuid = q.uuid,@rtot,0) + q.uptake AS `running_total` , q.uptake AS `uptake` , @prev_uuid := q.uuid AS `uuid` , q.name AS `name` , q.date AS `date` FROM ( SELECT grp.uuid AS `uuid` , grp.name AS `name` , cal.date AS `date` , IFNULL(cnt.uptake,0) AS `uptake` FROM ... ORDER BY grp.uuid , cal.date ) q ORDER BY q.uuid , q.date ) r ORDER BY r.date DESC , r.name ASC