Skip to content
Advertisement

How to optimize datetime comparisons in mysql in where clause

CONTEXT

I have a large table full of “documents” that are updated by outside sources. When I notice the updates are more recent than my last touchpoint I need to address these documents. I’m having some serious performance issues though.

EXAMPLE CODE

select count(*) from documents;

gets me back 212,494,397 documents in 1 min 15.24 sec.

select count(*) from documents where COALESCE( updated_at > last_indexed_at, TRUE);

which is apx the actual query gets me 55,988,860 in 14 min 36.23 sec.

select count(*) from documents where COALESCE( updated_at > last_indexed_at, TRUE) limit 1;

notably takes about 15 minutes as well. (this was surprising to me)

THE PROBLEM

How do I perform the updated_at > last_indexed_at in a more reasonable time?

DETAILS

I’m pretty certain that my query is, in some way, not sargable. Unfortunately, I can’t find what about this query prevents it from being executed on a row independent basis.

select count(*) 
from documents 
where last_indexed_at is null or updated_at > last_indexed_at; 

doesn’t do any better.

nor does

select count( distinct( id ) ) 
from documents 
where last_indexed_at is null or updated_at > last_indexed_at limit 1;

nor does

select count( distinct( id ) ) 
from documents limit 1;

EDIT: FOLLOW UP REQUESTED DATA

This question only involves one table (thankfully) in a rails project, so we conveniently have the rails definition for the table.

/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `documents` (
  `id` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `document_id` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `document_type` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `locale` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `allowed_ids` text COLLATE utf8mb4_unicode_ci NOT NULL,
  `fields` mediumtext COLLATE utf8mb4_unicode_ci,
  `created_at` datetime(6) NOT NULL,
  `updated_at` datetime(6) NOT NULL,
  `last_indexed_at` datetime(6) DEFAULT NULL,
  `deleted_at` datetime(6) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_documents_on_document_type` (`document_type`),
  KEY `index_documents_on_locale` (`locale`),
  KEY `index_documents_on_last_indexed_at` (`last_indexed_at`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

SELECT VERSION(); got me 5.7.27-30-log

And probably most import,

explain select count(*) from documents where COALESCE( updated_at > last_indexed_at, TRUE);

gets me exactly

+----+-------------+-----------+------------+------+---------------+------+---------+------+-----------+----------+-------------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows      | filtered | Extra       |
+----+-------------+-----------+------------+------+---------------+------+---------+------+-----------+----------+-------------+
|  1 | SIMPLE      | documents | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 208793754 |   100.00 | Using where |
+----+-------------+-----------+------------+------+---------------+------+---------+------+-----------+----------+-------------+

Advertisement

Answer

Oh! MySQL 5.7 introduced Generated Columns — which gives us a way of indexing expressions! 🥳

If you do something like this:

ALTER TABLE documents
  ADD COLUMN dirty BOOL GENERATED ALWAYS AS (COALESCE(updated_at > last_indexed_at, TRUE)) STORED,
  ADD INDEX index_documents_on_dirty(dirty);

…and change the query to:

SELECT COUNT(*) FROM documents WHERE dirty;

…what results do you get?

Hopefully, we’re moving the work of evaluating COALESCE(updated_at > last_indexed_at, TRUE) from Read time to Write time.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement