I have a SQLite database and I perform the following query:
SELECT
category,
name,
COUNT(side.id) as nb
FROM main
LEFT JOIN side
ON main.id = side.id_main
WHERE category != 3
GROUP BY category, name;
This query takes around 0.3 seconds and returns 8000 rows. Here is the associated query plan:
id parent notused detail
7 0 0 SCAN TABLE main USING COVERING INDEX idx_main
19 0 0 SEARCH TABLE side USING AUTOMATIC COVERING INDEX (id_main=?)
Now, I would like to select only one category. So, I rewrote my WHERE
clause like this:
SELECT
category,
name,
COUNT(side.id) as nb
FROM main
LEFT JOIN side
ON main.id = side.id_main
WHERE category = 1
GROUP BY category, name;
This query takes now around 43 seconds and returns 5000 rows! It is 100 times longer! Here is the associated query plan:
id parent notused detail
7 0 0 SEARCH TABLE main USING COVERING INDEX idx_main (category=?)
11 0 0 SCAN TABLE side
Why is the equals operator so slow? Is there anything wrong in my query?
Reproduction steps:
- Create the following tables
CREATE TABLE main (
id INTEGER PRIMARY KEY AUTOINCREMENT,
category INTEGER,
name VARCHAR(64)
);
CREATE TABLE side (
id INTEGER PRIMARY KEY AUTOINCREMENT,
id_main INTEGER
);
CREATE INDEX idx_side ON side (id, id_main);
CREATE INDEX idx_main ON main (category, name);
- Use the following Perl script to populate the tables
use strict;
use warnings FATAL => 'all';
use DBI;
my $dbh = DBI->connect("dbi:SQLite:dbname=file.db","","");
my $v = 'A';
for my $i (1..9000) {
my $category = 1;
$category = 2 if $i > 5000;
$category = 3 if $i > 8000;
$dbh->do("INSERT INTO main(category, name) VALUES($category, '$v')");
$v++;
}
for my $i (1..100000) {
my $r = int(rand() * 9000 + 1);
$dbh->do("INSERT INTO side(id_main) VALUES($r)");
}
Advertisement
Answer
The difference in the first step is understandable : =
allow SQLite to look up the rows for the given value (SEARCH
) while !=
requires it to go through all the values (SCAN
)
The question is : Why does it create a temporary (AUTOMATIC
) index (which happen to be very effective) in the first case, not in the second ?
I think it’s because, without additional informations, the query planner assumes that an =
clause will yield fewer matches than a !=
, and the cost of building a temporary index is not justified.
The solution is to let SQLite gather information with ANALYZE. The query plan becomes :
QUERY PLAN
|--SEARCH TABLE main USING COVERING INDEX idx_main (category=?)
`--SEARCH TABLE side USING AUTOMATIC COVERING INDEX (id_main=?)
Another solution is create the required index, so that SQLite doesn’t have to build it.
CREATE INDEX idx_side2 ON side (id_main);
With the following query plan (even without analyse) :
QUERY PLAN
|--SEARCH TABLE main USING COVERING INDEX idx_main (category=?)
`--SEARCH TABLE side USING COVERING INDEX idx_side2 (id_main=?)