Skip to content
Advertisement

Mysql query : how to calculate statistic growth rate between two period per category

From this table of dates and categories inputs :

enter image description here

I’d like to get these folowing table, showing number of rows per category for each first and second quarter, and the growth of number of rows between second and first quarter, in value, and in percentage – which is a simple statistic table type that could be met to get the number of item per period of time and its growth.

enter image description here

What would be the sql query to get this table ?

Here is the SQL code to create the sql table, in order you to reproduce the schema:

CREATE TABLE `table_a_test_table` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `date_row` timestamp NULL DEFAULT NULL,
  `category` varchar(255) NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;

Here is the sql code to fill the table with the data :

INSERT INTO `table_a_test_table` (`date_row`, `category`) 
VALUES 
    ('2020-01-03 00:00:00', 'A'),
    ('2020-02-02 00:00:00', 'A'),
    ('2020-03-08 00:00:00', 'B'),
    ('2020-02-06 00:00:00', 'C'),
    ('2020-04-07 00:00:00', 'B'),
    ('2020-05-21 00:00:00', 'A'),
    ('2020-06-07 00:00:00', 'C'),
    ('2020-06-08 00:00:00', 'B')
;

I’ve tried the following sql code, to get the result table, but I do not know where I should, and how I should introduce the category field in order to get a group by output.

SELECT 
    nb_rows_q1.nb_of_rows AS `number of row in q1`,
    nb_rows_q2.nb_of_rows AS `number of row in q2`,
    ((nb_rows_q2.nb_of_rows - nb_rows_q1.nb_of_rows)/nb_rows_q1.nb_of_rows)*100 AS `growth nb of rows between q2 vs q1`
FROM 
    (
    SELECT  COUNT(id) AS nb_of_rows
    FROM    table_a_test_table
    WHERE 
        date_row >= '2020-01-01 00:00:00' 
        AND date_row < '2020-04-01 00:00:00'
    ) AS nb_rows_q1,
    (
    SELECT  COUNT(id) AS nb_of_rows
    FROM    table_a_test_table
    WHERE 
            date_row >= '2020-04-01 00:00:00' 
        AND date_row < '2020-07-01 00:00:00'
    ) AS nb_rows_q2 
;

The above code, returns the following :

enter image description here

So now, I’d like t place the category field into the code. But I do not knwo how to do it.

Any idea ?

Advertisement

Answer

You would need to aggregate by categories within your subqueries first so you wouldn’t lose the category details in the final projection. See a sample working fiddle and results below:

CREATE TABLE `table_a_test_table` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `date_row` timestamp NULL DEFAULT NULL,
  `category` varchar(255) NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;

INSERT INTO `table_a_test_table` (`date_row`, `category`) 
VALUES 
    ('2020-01-03 00:00:00', 'A'),
    ('2020-02-02 00:00:00', 'A'),
    ('2020-03-08 00:00:00', 'B'),
    ('2020-02-06 00:00:00', 'C'),
    ('2020-04-07 00:00:00', 'B'),
    ('2020-05-21 00:00:00', 'A'),
    ('2020-06-07 00:00:00', 'C'),
    ('2020-06-08 00:00:00', 'B')
;

Query #1

SELECT 
    nb_rows_q1.category,
    nb_rows_q1.nb_of_rows AS `number of row in q1`,
    nb_rows_q2.nb_of_rows AS `number of row in q2`,
    (nb_rows_q2.nb_of_rows - nb_rows_q1.nb_of_rows) as `variation of nb of row q2 vs q1`,
    ((nb_rows_q2.nb_of_rows - nb_rows_q1.nb_of_rows)/nb_rows_q1.nb_of_rows)*100 AS `growth nb of rows between q2 vs q1`
FROM 
    (
    SELECT  category, COUNT(id) AS nb_of_rows
    FROM    table_a_test_table
    WHERE 
        date_row >= '2020-01-01 00:00:00' 
        AND date_row < '2020-04-01 00:00:00'
    GROUP BY category
    ) AS nb_rows_q1 
    INNER JOIN
    (
    SELECT  category,COUNT(id) AS nb_of_rows
    FROM    table_a_test_table
    WHERE 
            date_row >= '2020-04-01 00:00:00' 
        AND date_row < '2020-07-01 00:00:00'
    GROUP BY category
    ) AS nb_rows_q2  ON nb_rows_q1.category = nb_rows_q2.category
;
category number of row in q1 number of row in q2 variation of nb of row q2 vs q1 growth nb of rows between q2 vs q1
A 2 1 -1 -50.0000
B 1 2 1 100.0000
C 1 1 0 0.0000

View on DB Fiddle

Update 1

Another approach has been included below where the aggregation has been done with the help of a case expression. I have also added to your test data, D only in q2, E only in q1 and F neither in q1 or q2. This approach includes categories in either quarter. I also added a case expression for categories that are new or only occurring in quarter 2 that would have the growth rate as null. This is up to you whether you would like the growth rate returned as null or a default value, I included 100

CREATE TABLE `table_a_test_table` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `date_row` timestamp NULL DEFAULT NULL,
  `category` varchar(255) NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;

INSERT INTO `table_a_test_table` (`date_row`, `category`) 
VALUES 
    ('2020-01-03 00:00:00', 'A'),
    ('2020-02-02 00:00:00', 'A'),
    ('2020-03-08 00:00:00', 'B'),
    ('2020-02-06 00:00:00', 'C'),
    ('2020-04-07 00:00:00', 'B'),
    ('2020-05-21 00:00:00', 'A'),
    ('2020-06-07 00:00:00', 'C'),
    ('2020-06-08 00:00:00', 'B'),
    ('2020-06-08 00:00:00', 'D'),
    ('2020-01-03 00:00:00', 'E'),
    ('2019-01-03 00:00:00', 'F')
;

Query #1

SELECT 
    category,
    q1 as `number of row in q1`,
    q2 as `number of row in q2`,
    (q2 - q1) as `variation of nb of row q2 vs q1`,
    CASE 
        WHEN q1=0 THEN 100.0 
        ELSE((q2-q1)/q1)*100
    END as `growth nb of rows between q2 vs q1`
FROM (
SELECT
    category,
    SUM(CASE WHEN date_row < '2020-04-01 00:00:00' THEN 1 ELSE 0 END ) as q1,
    SUM(CASE WHEN date_row >= '2020-04-01 00:00:00' THEN 1 ELSE 0 END ) as q2
FROM
    table_a_test_table
WHERE 
    date_row BETWEEN '2020-01-01 00:00:00' AND '2020-07-01 00:00:00'
GROUP BY
    category
) summary;
category number of row in q1 number of row in q2 variation of nb of row q2 vs q1 growth nb of rows between q2 vs q1
A 2 1 -1 -50.0000
B 1 2 1 100.0000
C 1 1 0 0.0000
D 0 1 1 100.0
E 1 0 -1 -100.0000

View on DB Fiddle

Feel free to experiment with other filtering options. For example, if you would like to filter out entries like D or E that exist in only one quarter you may consider adding a WHERE clause similar to WHERE q1 > 0 and q2 > 0; as shown below:

SELECT 
    category,
    q1 as `number of row in q1`,
    q2 as `number of row in q2`,
    (q2 - q1) as `variation of nb of row q2 vs q1`,
    CASE 
        WHEN q1=0 THEN 100.0 
        ELSE((q2-q1)/q1)*100
    END as `growth nb of rows between q2 vs q1`
FROM (
SELECT
    category,
    SUM(CASE WHEN date_row < '2020-04-01 00:00:00' THEN 1 ELSE 0 END ) as q1,
    SUM(CASE WHEN date_row >= '2020-04-01 00:00:00' THEN 1 ELSE 0 END ) as q2
FROM
    table_a_test_table
WHERE 
    date_row BETWEEN '2020-01-01 00:00:00' AND '2020-07-01 00:00:00'
GROUP BY
    category
) summary
WHERE q1 > 0 and q2 > 0;
category number of row in q1 number of row in q2 variation of nb of row q2 vs q1 growth nb of rows between q2 vs q1
A 2 1 -1 -50.0000
B 1 2 1 100.0000
C 1 1 0 0.0000

View on DB Fiddle

I hope this helps.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement