Skip to content
Advertisement

MySql improve GROUP BY ORDER BY query with index

I have a table like this:

╔════╦══════════════╦═══════════╗
║ id ║  product     ║ date      ║
╠════╬══════════════╬═══════════╣
║  1 ║ P1           ║ 2020-08-01║
║  2 ║ P2           ║ 2020-08-01║
║  3 ║ P2           ║ 2020-08-03║
║  4 ║ P3           ║ 2020-08-04║
╚════╩══════════════╩═══════════╝

the goal is to query how many products have been sold per day, so the query is:

SELECT `date`, COUNT(id) AS `totalProductsSoldOnDate` 
FROM products 
GROUP BY `date` 
ORDER BY `date` ASC

The output is:

   date        totalProductsSoldOnDate
2020-08-01 |         2
2020-08-03 |         1
2020-08-04 |         1

Ok. Now let us suppose we have a scenario with several records and this query has to be run several times per day, we need to care about performance now in order to retrive the results as fast as possible.

As far as I know, in a situation like this literature recommends to place an index on the column involved in the GROUP BY and ORDER BY, in this case date.

At this point the question is: is this the correct solution?

Moreover, assuming placing the index is fine:

  1. Which is the most efficient index for a scenario like this, and why?
  2. More important: how an index may increase performance? Even with an index on date I need to scan all the rows on a table, or I’m missing something?

Advertisement

Answer

Using a secondary INDEX(date, id) would work faster than as PRIMARY KEY. This is because the BTree that must be scanned is smaller.

To make the task significantly faster, summarize each day’s data that night: http://mysql.rjweb.org/doc.php/summarytables

In this case, let’s look at what you have:

WHERE -- not present
GROUP BY -- just date
ORDER BY -- matches the GROUP BY
the whole query mentions only `date` and `id`

Let’s try to build an optimal index:

  1. Deal with WHERE — nothing to do. (OK to continue)
  2. Add all the GROUP BY columns: INDEX(date,...) (OK to continue)
  3. Since the ORDER BY matches, nothing to do. (OK to continue)
  4. Consider making a “covering index”. That is, add all the rest of the columns needed anywhere in the query. (But don’t make the index “too” big.) INDEX(date, id)

By “OK to continue”, I mean that there was not a brick wall stopping us:

  • 2 range tests, non-sargeable expressions, etc, in WHERE
  • GROUP BY list conflicting with WHERE (not the case here)
  • ORDER BY missing ASC and DESC, or disagreeing with GROUP BY.

In a JOIN with, mentioning more than one table in WHERE (etc) is a brick wall.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement