Skip to content
Advertisement

Why is the != operator faster than the = operator?

I have a SQLite database and I perform the following query:

SELECT
    category,
    name,
    COUNT(side.id) as nb
FROM main
LEFT JOIN side
    ON main.id = side.id_main
WHERE category != 3
GROUP BY category, name;

This query takes around 0.3 seconds and returns 8000 rows. Here is the associated query plan:

id  parent  notused detail
7   0   0   SCAN TABLE main USING COVERING INDEX idx_main
19  0   0   SEARCH TABLE side USING AUTOMATIC COVERING INDEX (id_main=?)

Now, I would like to select only one category. So, I rewrote my WHERE clause like this:

SELECT
    category,
    name,
    COUNT(side.id) as nb
FROM main
LEFT JOIN side
    ON main.id = side.id_main
WHERE category = 1
GROUP BY category, name;

This query takes now around 43 seconds and returns 5000 rows! It is 100 times longer! Here is the associated query plan:

id  parent  notused detail
7   0   0   SEARCH TABLE main USING COVERING INDEX idx_main (category=?)
11  0   0   SCAN TABLE side

Why is the equals operator so slow? Is there anything wrong in my query?


Reproduction steps:

  1. Create the following tables
CREATE TABLE main (
id INTEGER PRIMARY KEY AUTOINCREMENT,
category INTEGER,
name VARCHAR(64)
);

CREATE TABLE side (
id INTEGER PRIMARY KEY AUTOINCREMENT,
id_main INTEGER
);

CREATE INDEX idx_side ON side (id, id_main);

CREATE INDEX idx_main ON main (category, name);
  1. Use the following Perl script to populate the tables
use strict;
use warnings FATAL => 'all';

use DBI;

my $dbh = DBI->connect("dbi:SQLite:dbname=file.db","","");

my $v = 'A';

for my $i (1..9000) {
    my $category = 1;
    $category = 2 if $i > 5000;
    $category = 3 if $i > 8000;
    $dbh->do("INSERT INTO main(category, name) VALUES($category, '$v')");
    $v++;
}

for my $i (1..100000) {
    my $r = int(rand() * 9000 + 1);
    $dbh->do("INSERT INTO side(id_main) VALUES($r)");
}

Advertisement

Answer

The difference in the first step is understandable : = allow SQLite to look up the rows for the given value (SEARCH) while != requires it to go through all the values (SCAN)

The question is : Why does it create a temporary (AUTOMATIC) index (which happen to be very effective) in the first case, not in the second ?

I think it’s because, without additional informations, the query planner assumes that an = clause will yield fewer matches than a !=, and the cost of building a temporary index is not justified.

The solution is to let SQLite gather information with ANALYZE. The query plan becomes :

QUERY PLAN
|--SEARCH TABLE main USING COVERING INDEX idx_main (category=?)
`--SEARCH TABLE side USING AUTOMATIC COVERING INDEX (id_main=?)

Another solution is create the required index, so that SQLite doesn’t have to build it.

CREATE INDEX idx_side2 ON side (id_main);

With the following query plan (even without analyse) :

QUERY PLAN
|--SEARCH TABLE main USING COVERING INDEX idx_main (category=?)
`--SEARCH TABLE side USING COVERING INDEX idx_side2 (id_main=?)
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement