From this table of dates and categories inputs :
I’d like to get these folowing table, showing number of rows per category for each first and second quarter, and the growth of number of rows between second and first quarter, in value, and in percentage – which is a simple statistic table type that could be met to get the number of item per period of time and its growth.
What would be the sql query to get this table ?
Here is the SQL code to create the sql table, in order you to reproduce the schema:
CREATE TABLE `table_a_test_table` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `date_row` timestamp NULL DEFAULT NULL, `category` varchar(255) NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB;
Here is the sql code to fill the table with the data :
INSERT INTO `table_a_test_table` (`date_row`, `category`) VALUES ('2020-01-03 00:00:00', 'A'), ('2020-02-02 00:00:00', 'A'), ('2020-03-08 00:00:00', 'B'), ('2020-02-06 00:00:00', 'C'), ('2020-04-07 00:00:00', 'B'), ('2020-05-21 00:00:00', 'A'), ('2020-06-07 00:00:00', 'C'), ('2020-06-08 00:00:00', 'B') ;
I’ve tried the following sql code, to get the result table, but I do not know where I should, and how I should introduce the category
field in order to get a group by output.
SELECT nb_rows_q1.nb_of_rows AS `number of row in q1`, nb_rows_q2.nb_of_rows AS `number of row in q2`, ((nb_rows_q2.nb_of_rows - nb_rows_q1.nb_of_rows)/nb_rows_q1.nb_of_rows)*100 AS `growth nb of rows between q2 vs q1` FROM ( SELECT COUNT(id) AS nb_of_rows FROM table_a_test_table WHERE date_row >= '2020-01-01 00:00:00' AND date_row < '2020-04-01 00:00:00' ) AS nb_rows_q1, ( SELECT COUNT(id) AS nb_of_rows FROM table_a_test_table WHERE date_row >= '2020-04-01 00:00:00' AND date_row < '2020-07-01 00:00:00' ) AS nb_rows_q2 ;
The above code, returns the following :
So now, I’d like t place the category
field into the code. But I do not knwo how to do it.
Any idea ?
Advertisement
Answer
You would need to aggregate by categories within your subqueries first so you wouldn’t lose the category details in the final projection. See a sample working fiddle and results below:
CREATE TABLE `table_a_test_table` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `date_row` timestamp NULL DEFAULT NULL, `category` varchar(255) NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB; INSERT INTO `table_a_test_table` (`date_row`, `category`) VALUES ('2020-01-03 00:00:00', 'A'), ('2020-02-02 00:00:00', 'A'), ('2020-03-08 00:00:00', 'B'), ('2020-02-06 00:00:00', 'C'), ('2020-04-07 00:00:00', 'B'), ('2020-05-21 00:00:00', 'A'), ('2020-06-07 00:00:00', 'C'), ('2020-06-08 00:00:00', 'B') ;
Query #1
SELECT nb_rows_q1.category, nb_rows_q1.nb_of_rows AS `number of row in q1`, nb_rows_q2.nb_of_rows AS `number of row in q2`, (nb_rows_q2.nb_of_rows - nb_rows_q1.nb_of_rows) as `variation of nb of row q2 vs q1`, ((nb_rows_q2.nb_of_rows - nb_rows_q1.nb_of_rows)/nb_rows_q1.nb_of_rows)*100 AS `growth nb of rows between q2 vs q1` FROM ( SELECT category, COUNT(id) AS nb_of_rows FROM table_a_test_table WHERE date_row >= '2020-01-01 00:00:00' AND date_row < '2020-04-01 00:00:00' GROUP BY category ) AS nb_rows_q1 INNER JOIN ( SELECT category,COUNT(id) AS nb_of_rows FROM table_a_test_table WHERE date_row >= '2020-04-01 00:00:00' AND date_row < '2020-07-01 00:00:00' GROUP BY category ) AS nb_rows_q2 ON nb_rows_q1.category = nb_rows_q2.category ;
category | number of row in q1 | number of row in q2 | variation of nb of row q2 vs q1 | growth nb of rows between q2 vs q1 |
---|---|---|---|---|
A | 2 | 1 | -1 | -50.0000 |
B | 1 | 2 | 1 | 100.0000 |
C | 1 | 1 | 0 | 0.0000 |
Update 1
Another approach has been included below where the aggregation has been done with the help of a case expression. I have also added to your test data, D
only in q2
, E
only in q1
and F
neither in q1
or q2
. This approach includes categories in either quarter. I also added a case expression for categories that are new or only occurring in quarter 2 that would have the growth rate as null
. This is up to you whether you would like the growth rate returned as null
or a default value, I included 100
CREATE TABLE `table_a_test_table` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `date_row` timestamp NULL DEFAULT NULL, `category` varchar(255) NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB; INSERT INTO `table_a_test_table` (`date_row`, `category`) VALUES ('2020-01-03 00:00:00', 'A'), ('2020-02-02 00:00:00', 'A'), ('2020-03-08 00:00:00', 'B'), ('2020-02-06 00:00:00', 'C'), ('2020-04-07 00:00:00', 'B'), ('2020-05-21 00:00:00', 'A'), ('2020-06-07 00:00:00', 'C'), ('2020-06-08 00:00:00', 'B'), ('2020-06-08 00:00:00', 'D'), ('2020-01-03 00:00:00', 'E'), ('2019-01-03 00:00:00', 'F') ;
Query #1
SELECT category, q1 as `number of row in q1`, q2 as `number of row in q2`, (q2 - q1) as `variation of nb of row q2 vs q1`, CASE WHEN q1=0 THEN 100.0 ELSE((q2-q1)/q1)*100 END as `growth nb of rows between q2 vs q1` FROM ( SELECT category, SUM(CASE WHEN date_row < '2020-04-01 00:00:00' THEN 1 ELSE 0 END ) as q1, SUM(CASE WHEN date_row >= '2020-04-01 00:00:00' THEN 1 ELSE 0 END ) as q2 FROM table_a_test_table WHERE date_row BETWEEN '2020-01-01 00:00:00' AND '2020-07-01 00:00:00' GROUP BY category ) summary;
category | number of row in q1 | number of row in q2 | variation of nb of row q2 vs q1 | growth nb of rows between q2 vs q1 |
---|---|---|---|---|
A | 2 | 1 | -1 | -50.0000 |
B | 1 | 2 | 1 | 100.0000 |
C | 1 | 1 | 0 | 0.0000 |
D | 0 | 1 | 1 | 100.0 |
E | 1 | 0 | -1 | -100.0000 |
Feel free to experiment with other filtering options. For example, if you would like to filter out entries like D
or E
that exist in only one quarter you may consider adding a WHERE
clause similar to WHERE q1 > 0 and q2 > 0;
as shown below:
SELECT category, q1 as `number of row in q1`, q2 as `number of row in q2`, (q2 - q1) as `variation of nb of row q2 vs q1`, CASE WHEN q1=0 THEN 100.0 ELSE((q2-q1)/q1)*100 END as `growth nb of rows between q2 vs q1` FROM ( SELECT category, SUM(CASE WHEN date_row < '2020-04-01 00:00:00' THEN 1 ELSE 0 END ) as q1, SUM(CASE WHEN date_row >= '2020-04-01 00:00:00' THEN 1 ELSE 0 END ) as q2 FROM table_a_test_table WHERE date_row BETWEEN '2020-01-01 00:00:00' AND '2020-07-01 00:00:00' GROUP BY category ) summary WHERE q1 > 0 and q2 > 0;
category | number of row in q1 | number of row in q2 | variation of nb of row q2 vs q1 | growth nb of rows between q2 vs q1 |
---|---|---|---|---|
A | 2 | 1 | -1 | -50.0000 |
B | 1 | 2 | 1 | 100.0000 |
C | 1 | 1 | 0 | 0.0000 |
I hope this helps.