I have a SQLite database and I perform the following query:
SELECT category, name, COUNT(side.id) as nb FROM main LEFT JOIN side ON main.id = side.id_main WHERE category != 3 GROUP BY category, name;
This query takes around 0.3 seconds and returns 8000 rows. Here is the associated query plan:
id parent notused detail 7 0 0 SCAN TABLE main USING COVERING INDEX idx_main 19 0 0 SEARCH TABLE side USING AUTOMATIC COVERING INDEX (id_main=?)
Now, I would like to select only one category. So, I rewrote my WHERE
clause like this:
SELECT category, name, COUNT(side.id) as nb FROM main LEFT JOIN side ON main.id = side.id_main WHERE category = 1 GROUP BY category, name;
This query takes now around 43 seconds and returns 5000 rows! It is 100 times longer! Here is the associated query plan:
id parent notused detail 7 0 0 SEARCH TABLE main USING COVERING INDEX idx_main (category=?) 11 0 0 SCAN TABLE side
Why is the equals operator so slow? Is there anything wrong in my query?
Reproduction steps:
- Create the following tables
CREATE TABLE main ( id INTEGER PRIMARY KEY AUTOINCREMENT, category INTEGER, name VARCHAR(64) ); CREATE TABLE side ( id INTEGER PRIMARY KEY AUTOINCREMENT, id_main INTEGER ); CREATE INDEX idx_side ON side (id, id_main); CREATE INDEX idx_main ON main (category, name);
- Use the following Perl script to populate the tables
use strict; use warnings FATAL => 'all'; use DBI; my $dbh = DBI->connect("dbi:SQLite:dbname=file.db","",""); my $v = 'A'; for my $i (1..9000) { my $category = 1; $category = 2 if $i > 5000; $category = 3 if $i > 8000; $dbh->do("INSERT INTO main(category, name) VALUES($category, '$v')"); $v++; } for my $i (1..100000) { my $r = int(rand() * 9000 + 1); $dbh->do("INSERT INTO side(id_main) VALUES($r)"); }
Advertisement
Answer
The difference in the first step is understandable : =
allow SQLite to look up the rows for the given value (SEARCH
) while !=
requires it to go through all the values (SCAN
)
The question is : Why does it create a temporary (AUTOMATIC
) index (which happen to be very effective) in the first case, not in the second ?
I think it’s because, without additional informations, the query planner assumes that an =
clause will yield fewer matches than a !=
, and the cost of building a temporary index is not justified.
The solution is to let SQLite gather information with ANALYZE. The query plan becomes :
QUERY PLAN |--SEARCH TABLE main USING COVERING INDEX idx_main (category=?) `--SEARCH TABLE side USING AUTOMATIC COVERING INDEX (id_main=?)
Another solution is create the required index, so that SQLite doesn’t have to build it.
CREATE INDEX idx_side2 ON side (id_main);
With the following query plan (even without analyse) :
QUERY PLAN |--SEARCH TABLE main USING COVERING INDEX idx_main (category=?) `--SEARCH TABLE side USING COVERING INDEX idx_side2 (id_main=?)