SQL MAX on primary key, is filter condition unncessary if it is already indexed?

Question

id is the primary key. Is date(created_at) = '2021-11-05' and time(created_at) > TIME('04:00:00') filter condition unnecessary for Max function since studenthistory is already indexed on class_id and student_id? The only reason I added that datetime filter is because this table will get huge over time. (historical data) And I wanted to reduce the number of rows the query has to

Accepted Answer

Query 1select MAX(id) from studenthistory where class_id = 1    and date(created_at) = '2021-11-05'    and time(created_at) > TIME('04:00:00') group by student_idDon&#8217;t split up the date; change toAND created_at > '2021-11-05 04:00:00'If you want to check rows that were &#8216;created&#8217; on the day, use somethingAND created_at >= '2021-11-05'AND created_at  < '2021-11-05' + INTERVAL 1 DAYOr, if you want to check for &#8220;today&#8221;:AND created_at >= CURDATE()After 4am this morning:AND created_at >= CURDATE() + INTERVAL 4 HOURUsing date(created_at) makes the created_at part of the INDEX unusable.  (cf &#8220;sargable&#8221;)select MAX(id) ... group by student_idIs likely to return multiple rows &#8212; one per student.  Perhaps you want to get rid of the group by?  Or specify a particular student_id?Query 2 may run faster:select MAX(id) from studenthistory where class_id = 1 group by student_idBut the optimal index is INDEX(class_id, student_id, id),  (It is OK to include both composite indexes.)It may return multiple rows, so perhaps you wantselect student_id, MAX(id) from studenthistory where class_id = 1 group by student_idMAXI believe MAX would simply fetch the last value without checking the whole row, if it is indexed.Sometimes.Your second query can do that.  But the first query cannot &#8212; because of the range test (on created_at) being in the way.EXPLAINquery plan seems &#8230; similarAlas, EXPLAIN leaves details out.  You can get some more details with EXPLAIN FORMAT=JSON SELECT ..., but not necessarily enough details.I think you will find that the second query will give a much smaller value for &#8220;Rows&#8221; after adding my suggested index.A way to get an accurate measure of &#8220;rows (table or index) touched&#8221;:FLUSH STATUS;SELECT ...;SHOW SESSION STATUS LIKE 'Handler%';Sensor dataFor sensor data, consider multiple tables:The raw data (&#8220;Fact&#8221; table, in Data Warehouse terminology).  This has one row per reading per sensor.The latest value for each sensor.  This has one row for each of the 90K sensors.  It will be a lot easier to maintain this table than to &#8220;find the latest&#8221; value for each sensor; that&#8217;s a &#8220;groupwise-maximum&#8221; problem.Summary data.  An example is to have high/low/average/etc values for each sensor.  This has one row per hour (or day or whatever is useful) per sensor.

Advertisement

Answer