I have a table like this:
╔════╦══════════════╦═══════════╗ ║ id ║ product ║ date ║ ╠════╬══════════════╬═══════════╣ ║ 1 ║ P1 ║ 2020-08-01║ ║ 2 ║ P2 ║ 2020-08-01║ ║ 3 ║ P2 ║ 2020-08-03║ ║ 4 ║ P3 ║ 2020-08-04║ ╚════╩══════════════╩═══════════╝
the goal is to query how many products have been sold per day, so the query is:
SELECT `date`, COUNT(id) AS `totalProductsSoldOnDate` FROM products GROUP BY `date` ORDER BY `date` ASC
The output is:
date totalProductsSoldOnDate 2020-08-01 | 2 2020-08-03 | 1 2020-08-04 | 1
Ok. Now let us suppose we have a scenario with several records and this query has to be run several times per day, we need to care about performance now in order to retrive the results as fast as possible.
As far as I know, in a situation like this literature recommends to place an index on the column involved in the GROUP BY and ORDER BY, in this case date.
At this point the question is: is this the correct solution?
Moreover, assuming placing the index is fine:
- Which is the most efficient index for a scenario like this, and why?
- More important: how an index may increase performance? Even with an index on date I need to scan all the rows on a table, or I’m missing something?
Advertisement
Answer
Using a secondary INDEX(date, id)
would work faster than as PRIMARY KEY
. This is because the BTree that must be scanned is smaller.
To make the task significantly faster, summarize each day’s data that night: http://mysql.rjweb.org/doc.php/summarytables
In this case, let’s look at what you have:
WHERE -- not present GROUP BY -- just date ORDER BY -- matches the GROUP BY the whole query mentions only `date` and `id`
Let’s try to build an optimal index:
- Deal with
WHERE
— nothing to do. (OK to continue) - Add all the
GROUP BY
columns:INDEX(date,...)
(OK to continue) - Since the
ORDER BY
matches, nothing to do. (OK to continue) - Consider making a “covering index”. That is, add all the rest of the columns needed anywhere in the query. (But don’t make the index “too” big.)
INDEX(date, id)
By “OK to continue”, I mean that there was not a brick wall stopping us:
- 2 range tests, non-sargeable expressions, etc, in
WHERE
GROUP BY
list conflicting withWHERE
(not the case here)ORDER BY
missing ASC and DESC, or disagreeing withGROUP BY
.
In a JOIN
with, mentioning more than one table in WHERE
(etc) is a brick wall.