Skip to content
Advertisement

SQL MAX on primary key, is filter condition unncessary if it is already indexed?

id is the primary key.

Is date(created_at) = '2021-11-05' and time(created_at) > TIME('04:00:00') filter condition unnecessary for Max function since studenthistory is already indexed on class_id and student_id?

The only reason I added that datetime filter is because this table will get huge over time. (historical data) And I wanted to reduce the number of rows the query has to search.

But for the case of Max function – I believe MAX would simply fetch the last value without checking the whole row, if it is indexed.

So can i safely remove the datetime filter and turn it into

And have the same performance? (or better since it does not need to filter further?)

Checking the query plan seems like the performance is similar, but the size of the table is rather small as of now..

First:

Second:

Many thanks in advance


UPDATE: applying @rick james’s suggestion:

Changed index to (class_id, student_id, id).

i.e. only class_id is used as an index, (as created_at is no longer in the index. rows_produced_per_join is lower due to filter: 2793,

Without datetime filter:

Runs on all 3 indexes (“class_id”, “student_id”, “id”), same 8381 number of rows slightly lower query cost (940 -> 854)

Applying the first query with original index (“class_id”, “student_id”, “created_at”) yields:

The cost this time is 858, rows “rows_examined_per_scan”: 8381, “rows_produced_per_join”: 2793. Only class_id was used as key however. (why.?) not the remaining student_id and created_at

Advertisement

Answer

Query 1

Don’t split up the date; change to

If you want to check rows that were ‘created’ on the day, use something

Or, if you want to check for “today”:

After 4am this morning:

Using date(created_at) makes the created_at part of the INDEX unusable. (cf “sargable”)

Is likely to return multiple rows — one per student. Perhaps you want to get rid of the group by? Or specify a particular student_id?

Query 2 may run faster:

But the optimal index is INDEX(class_id, student_id, id), (It is OK to include both composite indexes.)

It may return multiple rows, so perhaps you want

MAX

I believe MAX would simply fetch the last value without checking the whole row, if it is indexed.

Sometimes.

Your second query can do that. But the first query cannot — because of the range test (on created_at) being in the way.

EXPLAIN

query plan seems … similar

Alas, EXPLAIN leaves details out. You can get some more details with EXPLAIN FORMAT=JSON SELECT ..., but not necessarily enough details.

I think you will find that the second query will give a much smaller value for “Rows” after adding my suggested index.

A way to get an accurate measure of “rows (table or index) touched”:

Sensor data

For sensor data, consider multiple tables:

  • The raw data (“Fact” table, in Data Warehouse terminology). This has one row per reading per sensor.
  • The latest value for each sensor. This has one row for each of the 90K sensors. It will be a lot easier to maintain this table than to “find the latest” value for each sensor; that’s a “groupwise-maximum” problem.
  • Summary data. An example is to have high/low/average/etc values for each sensor. This has one row per hour (or day or whatever is useful) per sensor.
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement