I have a SQLite database and I perform the following query:
SELECT
category,
name,
COUNT(side.id) as nb
FROM main
LEFT JOIN side
ON main.id = side.id_main
WHERE category != 3
GROUP BY category, name;
This query takes around 0.3 seconds and returns 8000 rows. Here is the associated query plan:
id parent notused detail 7 0 0 SCAN TABLE main USING COVERING INDEX idx_main 19 0 0 SEARCH TABLE side USING AUTOMATIC COVERING INDEX (id_main=?)
Now, I would like to select only one category. So, I rewrote my WHERE clause like this:
SELECT
category,
name,
COUNT(side.id) as nb
FROM main
LEFT JOIN side
ON main.id = side.id_main
WHERE category = 1
GROUP BY category, name;
This query takes now around 43 seconds and returns 5000 rows! It is 100 times longer! Here is the associated query plan:
id parent notused detail 7 0 0 SEARCH TABLE main USING COVERING INDEX idx_main (category=?) 11 0 0 SCAN TABLE side
Why is the equals operator so slow? Is there anything wrong in my query?
Reproduction steps:
- Create the following tables
CREATE TABLE main ( id INTEGER PRIMARY KEY AUTOINCREMENT, category INTEGER, name VARCHAR(64) ); CREATE TABLE side ( id INTEGER PRIMARY KEY AUTOINCREMENT, id_main INTEGER ); CREATE INDEX idx_side ON side (id, id_main); CREATE INDEX idx_main ON main (category, name);
- Use the following Perl script to populate the tables
use strict;
use warnings FATAL => 'all';
use DBI;
my $dbh = DBI->connect("dbi:SQLite:dbname=file.db","","");
my $v = 'A';
for my $i (1..9000) {
my $category = 1;
$category = 2 if $i > 5000;
$category = 3 if $i > 8000;
$dbh->do("INSERT INTO main(category, name) VALUES($category, '$v')");
$v++;
}
for my $i (1..100000) {
my $r = int(rand() * 9000 + 1);
$dbh->do("INSERT INTO side(id_main) VALUES($r)");
}
Advertisement
Answer
The difference in the first step is understandable : = allow SQLite to look up the rows for the given value (SEARCH) while != requires it to go through all the values (SCAN)
The question is : Why does it create a temporary (AUTOMATIC) index (which happen to be very effective) in the first case, not in the second ?
I think it’s because, without additional informations, the query planner assumes that an = clause will yield fewer matches than a !=, and the cost of building a temporary index is not justified.
The solution is to let SQLite gather information with ANALYZE. The query plan becomes :
QUERY PLAN |--SEARCH TABLE main USING COVERING INDEX idx_main (category=?) `--SEARCH TABLE side USING AUTOMATIC COVERING INDEX (id_main=?)
Another solution is create the required index, so that SQLite doesn’t have to build it.
CREATE INDEX idx_side2 ON side (id_main);
With the following query plan (even without analyse) :
QUERY PLAN |--SEARCH TABLE main USING COVERING INDEX idx_main (category=?) `--SEARCH TABLE side USING COVERING INDEX idx_side2 (id_main=?)