Skip to content
Advertisement

Why is GIN trigram index not being used with ILIKE ANY clause?

I have a table with the following definition:

CREATE TABLE clients
        (
            "id" SERIAL PRIMARY KEY,
            "email" TEXT NOT NULL,
            "first_name" TEXT NOT NULL,
            "last_name" TEXT NOT NULL,
            "telephone" TEXT,
            "city" TEXT NOT NULL,
            "street" TEXT NOT NULL,
            "house" TEXT NOT NULL,
            "apartment" TEXT,
            "created_at" TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
            "updated_at" TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
        );

I want to search by partial overlap of any words in a query with any of most of the fields in the table. I have an index which I intent to use to speed up my search:

create index "clients_text_search_idx" on clients
    using gin ((
        first_name || ' ' ||
        last_name || ' ' ||
        coalesce(telephone, '') || ' ' ||
        email || ' ' ||
        city || ' ' ||
        street || ' ' ||
        house || ' ' ||
        coalesce(apartment, '')
        ) gin_trgm_ops);

I fill my table up with realistic fake data, creating more than 100 000 rows. Then I want to see my index being used with a following query, using the exact same expression that I did when creating my index:

explain analyse select * from clients where (
        first_name || ' ' ||
        last_name || ' ' ||
        coalesce(telephone, '') || ' ' ||
        email || ' ' ||
        city || ' ' ||
        street || ' ' ||
        house || ' ' ||
        coalesce(apartment, '')
        ) ilike any (
    select '%' || word || '%'
    from regexp_split_to_table('+123 georg', E'\s+') AS word); -- same as: ilike any (values ('%+123%'), ('%georg%'))

However what I see is simple sequential scan (even if I use set enable_seqscan = false):

Gather  (cost=1000.00..4027885.59 rows=550 width=135) (actual time=2.083..4542.739 rows=166 loops=1)
  Workers Planned: 1
  Workers Launched: 1
  ->  Nested Loop Semi Join  (cost=0.00..4026830.59 rows=324 width=135) (actual time=14.319..4503.745 rows=83 loops=2)
"        Join Filter: (((((((((((((((clients.first_name || ' '::text) || clients.last_name) || ' '::text) || COALESCE(clients.telephone, ''::text)) || ' '::text) || clients.email) || ' '::text) || clients.city) || ' '::text) || clients.street) || ' '::text) || clients.house) || ' '::text) || COALESCE(clients.apartment, ''::text)) ~~* (('%'::text || word.word) || '%'::text))"
        Rows Removed by Join Filter: 109928
        ->  Parallel Seq Scan on clients  (cost=0.00..2540.12 rows=64712 width=135) (actual time=0.032..533.627 rows=55006 loops=2)
        ->  Function Scan on regexp_split_to_table word  (cost=0.00..10.00 rows=1000 width=32) (actual time=0.009..0.028 rows=2 loops=110011)
Planning Time: 3.167 ms
Execution Time: 8.750 ms

But if I replace the whole ilike any ... clause of the previous query with something simple like ilike '%george%', the index is getting used and the query gets executed super fast. So why is my index not being used with ilike any clause?

I use PostgreSQL 11.2 on MacOS Mojave.

Advertisement

Answer

I have to admit I don’t understand why this is happening, but I was able to get it working by making the right-hand side of the ANY an array instead of a subquery using the ARRAY(<subselect>) form.

testdb=# explain analyse select * from clients where (
        first_name || ' ' ||
        last_name || ' ' ||
        coalesce(telephone, '') || ' ' ||
        email || ' ' ||
        city || ' ' ||
        street || ' ' ||
        house || ' ' ||
        coalesce(apartment, '')
        ) ilike any (array(
    select ('%' || word || '%')::text
    from regexp_split_to_table('foo georg', E'\s+') AS word));
                                                                                                                                       QUERY PLAN                                                                                                                                        
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on clients  (cost=126.79..238.91 rows=489 width=62) (actual time=2.108..18.721 rows=10000 loops=1)
   Recheck Cond: (((((((((((((((first_name || ' '::text) || last_name) || ' '::text) || COALESCE(telephone, ''::text)) || ' '::text) || email) || ' '::text) || city) || ' '::text) || street) || ' '::text) || house) || ' '::text) || COALESCE(apartment, ''::text)) ~~* ANY ($0))
   Heap Blocks: exact=84
   InitPlan 1 (returns $0)
     ->  Function Scan on regexp_split_to_table word  (cost=0.00..15.00 rows=1000 width=32) (actual time=0.043..0.045 rows=2 loops=1)
   ->  Bitmap Index Scan on clients_text_search_idx  (cost=0.00..111.67 rows=489 width=0) (actual time=2.060..2.060 rows=10000 loops=1)
         Index Cond: (((((((((((((((first_name || ' '::text) || last_name) || ' '::text) || COALESCE(telephone, ''::text)) || ' '::text) || email) || ' '::text) || city) || ' '::text) || street) || ' '::text) || house) || ' '::text) || COALESCE(apartment, ''::text)) ~~* ANY ($0))
 Planning time: 0.297 ms
 Execution time: 19.254 ms
(9 rows)

This is with enable_seqscan = off on pg 10.12 (all that I have available right now).

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement