Skip to content
Advertisement

SQL query NOT EXIST very slow

I’m trying to optimize an SQL query as it is slow, and gets slower when the query result is high.

 SELECT * 
 FROM comments  
 WHERE 
     DATE(created_on) > DATE_SUB(CURDATE(), INTERVAL 1 DAY) 
     AND comments.group_id = " . $group_id . " 
     AND comments.user_id != " . $user_id . "  
     AND NOT EXISTS ( 
         SELECT *
         FROM reads
         WHERE 
             comments.post_id = reads.notification_id 
             AND comments.group_id = reads.group_id  
             AND reads.user_id = " . $user_id . " 
             AND comments.nature1 = reads.notification_type
             AND comments.created_on < reads.read_date
     )  
LIMIT 8  

There are indexes concerned fields and Tables are quite big.

Advertisement

Answer

As a starter, this condition:

DATE(created_on) > DATE_SUB(CURDATE(), INTERVAL 1 DAY) 

Should be rewritten as:

created_on >= current_date 

This is functionally equivalent, and not using date functions on the column being filtered gives the database a chance to use an index.

Then, consider the following indexes:

comments(group_id, created_on, user_id)
reads(notification_id, group_id, user_id, notification_type, created_on)

These are two multi-column indexes (called compound indexes), not individual indexes on each column. You’ll notice that they match the where predicates of the query and subquery. The ordering of columns in the index is important (especially in the index on comments): you want columns that have equality predicates first, then columns with inequality predicates.

Finally: do you really need select *? It is better to reduce the list to the columns you actually need; if there is just a few of them, you might want to try and add them to the index on comments.

Side notes:

  • limit without order by is usually not useful. This gives you an arbitrary set of rows out of those that match – and the results may not be consistent over consecutive executions of the same query over the same dataset

  • consider using prepared statements rather than concatenating variables in the query string; this allows MySQL to recognize the query, and reuse an already prepared execution plan (that’s a tiny gain, but always good to take) – and, more important, it prevents SQL injection.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement