Skip to content
Advertisement

Mysql query per date and per group

I have an SQL query which counts user uptake (registrations) by day for a particular group of users. It also does a cumulative (running total)

select Date, Cumulative, Up as Uptake
from (
    select Date, Up, @running_total:=@running_total + Up as Cumulative
    from (
        select distinct(date(`audit`.`created_at`)) as Date, COALESCE(f.uptake, 0) as Up
        from `audit`
        left join ( 
            select date(`users`.`created_at`) as day, count(`users`.`id`) as uptake
            from `users`
            where `users`.`group_uuid` = (select `groups`.`uuid` from `groups` where `groups`.`name` = "companyA")
            group by day 
        ) f on f.day = date(`audit`.`created_at`)
        where `audit`.`created_at` between '2019-07-03' and CURDATE()
    ) c
    JOIN (SELECT @running_total:=0) r
) final
order by Date desc

This query is using the audit table purely to get me a list of dates reliable even if there are no users created on every day, i know the audit table with have a record.

I could get all the groups easily with

select `groups`.`name` from `groups`

What i want is… An entry per day and group with the uptake (and cumulative if possible) Example:

Date         |  Cumulative  |      Uptake     |   group
2020-04-07   |    2         |        1        |   comapnyA  
2020-04-07   |    5         |        3        |   comapnyB  
2020-04-06   |    1         |        0        |   comapnyA  
2020-04-06   |    2         |        1        |   comapnyB  
2020-04-05   |    1         |        1        |   comapnyA  
2020-04-05   |    1         |        1        |   comapnyB  
.... etc

Advertisement

Answer

For versions of MySQL before 8.0, we could make use of user-defined variables, as demonstrated in the OP query.

A few notes:

It looks like we started with a simple GROUP BY created_date, group_uuid query against just the users table to get non-zero counts. But that’s got missing rows, where there otherwise be a zero count.

So it looks like we need is a calendar source (a distinct list of dates in a given range, OP query is using query of audit table as calendar source) and we need that cross joined (Cartesian product) to a distinct list of uuid values from group, along with the name. For this exercise, we are going to assume that uuid is unique in the groups table, and that each uuid value is associated with a distinct name value. (If that’s not true, we’d need to make some adjustments.)

To get the running total separately by group, we’d need to process rows in group order, then within each group, by ascending date order. As we process rows, we need to reset the running total to zero when we start a new group i.e. when detect a change in the group uuid value.

NOTE: The MySQL Reference Manual gives specific warning about using the user-defined variables, both reading and modifying in the same statement, the order of operations is not guaranteed. Up through MySQL 5.7, with carefully crafted queries, we do observe predictable behavior in the order of operations.

So, we could do something like this:

SELECT q.date           AS `Date`
     , q.running_total  AS `Cumulative`
     , q.uptake         AS `Uptake`
     , q.name           AS `Group`
  FROM ( SELECT @rtot := IF(@prev_uuid = grp.uuid,@rtot,0) + IFNULL(cnt.uptake,0) AS `running_total`
              , IFNULL(cnt.uptake,0)                                              AS `uptake` 
              , @prev_uuid := grp.uuid                                           AS `uuid`
              , grp.name                                                         AS `name`
              , cal.date                                                         AS `date`
           FROM ( -- initialize user-defined variables
                  SELECT @prev_uuid := NULL
                       , @rtot := 0
                ) i
          CROSS
           JOIN ( -- calendar source for distinct date values
                  SELECT DATE(a.created_at)  AS `date`
                    FROM `audit` a
                   WHERE a.created_at >= '2019-07-03'
                     AND a.created_at <= DATE(NOW())
                   GROUP BY DATE(a.created_at)
                   ORDER BY DATE(a.created_at)
                ) cal
          CROSS
           JOIN ( -- distinct list of group uuid we want to return
                  SELECT g.uuid       AS `uuid`
                       , MAX(g.name)  AS `name`
                    FROM `groups` g
                   WHERE g.name IN ('CompanyA','CompanyB')
                   GROUP BY g.uuid
                ) grp
           LEFT
           JOIN ( -- count by group and date
                  SELECT u.group_uuid        AS `group_uuid`
                       , DATE(u.created_at)  AS `date`
                       , COUNT(u.id)         AS `uptake`
                    FROM `users` u
                   WHERE u.created_at >= '2019-07-03'
                   GROUP
                      BY u.group_uuid
                       , DATE(u.created_at)
                ) cnt
             ON grp.uuid = cnt.group_uuid
            AND cal.date = cnt.date
          ORDER
             BY grp.uuid
              , cal.date
       ) q
 ORDER
    BY q.date DESC
     , q.name ASC

N.B. We have to be careful with the order of operations in the SELECT list that’s using the user-defined variables, for example, we need to test the value of @prev_uuid (saved from the previous row) before we overwrite it with the value of the current row.

Also, this is not tested, there could be syntax errors, missing parens. I strongly recommend testing each of the inline view queries separately (cal, grp, cnt) to verify expected results are being performed. (The WHERE clause in the grp query is restricting groups to be returned, adjust or eliminate as required.)

Then we can move on to testing the next outer query q, and confirm the results from that, before wrapping q in the final outermost query to re-order rows, and display columns in desired order.

** FOLLOWUP **

to see if we can get MySQL to honor the ORDER BY uuid , we can modify/replace q

Change from this:

SELECT q.date           AS `Date`
     , q.running_total  AS `Cumulative`
     , q.uptake         AS `Uptake`
     , q.name           AS `Group`
  FROM ( SELECT @rtot := IF(@prev_uuid = grp.uuid,@rtot,0) + IFNULL(cnt.uptake,0) AS `running_total`
              , IFNULL(cnt.uptake,0)                                              AS `uptake` 
              , @prev_uuid := grp.uuid                                           AS `uuid`
              , grp.name                                                         AS `name`
              , cal.date                                                         AS `date`
           FROM 
                ...
          ORDER
             BY grp.uuid
              , cal.date
       ) q
 ORDER
    BY q.date DESC
     , q.name ASC

to

SELECT r.date           AS `Date`
     , r.running_total  AS `Cumulative`
     , r.uptake         AS `Uptake`
     , r.name           AS `Group`
  FROM ( SELECT @rtot := IF(@prev_uuid = q.uuid,@rtot,0) + q.uptake AS `running_total`
              , q.uptake                                            AS `uptake`
              , @prev_uuid := q.uuid                                AS `uuid`
              , q.name                                              AS `name`
              , q.date                                              AS `date`
           FROM ( SELECT grp.uuid             AS `uuid`
                       , grp.name             AS `name`
                       , cal.date             AS `date`
                       , IFNULL(cnt.uptake,0) AS `uptake`
                    FROM
                         ...
                   ORDER
                      BY grp.uuid
                       , cal.date
                ) q
          ORDER
             BY q.uuid
              , q.date
       ) r
ORDER
    BY r.date DESC
     , r.name ASC
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement