I have a table with the following definition:
CREATE TABLE clients ( "id" SERIAL PRIMARY KEY, "email" TEXT NOT NULL, "first_name" TEXT NOT NULL, "last_name" TEXT NOT NULL, "telephone" TEXT, "city" TEXT NOT NULL, "street" TEXT NOT NULL, "house" TEXT NOT NULL, "apartment" TEXT, "created_at" TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(), "updated_at" TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW() );
I want to search by partial overlap of any words in a query with any of most of the fields in the table. I have an index which I intent to use to speed up my search:
create index "clients_text_search_idx" on clients using gin (( first_name || ' ' || last_name || ' ' || coalesce(telephone, '') || ' ' || email || ' ' || city || ' ' || street || ' ' || house || ' ' || coalesce(apartment, '') ) gin_trgm_ops);
I fill my table up with realistic fake data, creating more than 100 000 rows. Then I want to see my index being used with a following query, using the exact same expression that I did when creating my index:
explain analyse select * from clients where ( first_name || ' ' || last_name || ' ' || coalesce(telephone, '') || ' ' || email || ' ' || city || ' ' || street || ' ' || house || ' ' || coalesce(apartment, '') ) ilike any ( select '%' || word || '%' from regexp_split_to_table('+123 georg', E'\s+') AS word); -- same as: ilike any (values ('%+123%'), ('%georg%'))
However what I see is simple sequential scan (even if I use set enable_seqscan = false
):
Gather (cost=1000.00..4027885.59 rows=550 width=135) (actual time=2.083..4542.739 rows=166 loops=1) Workers Planned: 1 Workers Launched: 1 -> Nested Loop Semi Join (cost=0.00..4026830.59 rows=324 width=135) (actual time=14.319..4503.745 rows=83 loops=2) " Join Filter: (((((((((((((((clients.first_name || ' '::text) || clients.last_name) || ' '::text) || COALESCE(clients.telephone, ''::text)) || ' '::text) || clients.email) || ' '::text) || clients.city) || ' '::text) || clients.street) || ' '::text) || clients.house) || ' '::text) || COALESCE(clients.apartment, ''::text)) ~~* (('%'::text || word.word) || '%'::text))" Rows Removed by Join Filter: 109928 -> Parallel Seq Scan on clients (cost=0.00..2540.12 rows=64712 width=135) (actual time=0.032..533.627 rows=55006 loops=2) -> Function Scan on regexp_split_to_table word (cost=0.00..10.00 rows=1000 width=32) (actual time=0.009..0.028 rows=2 loops=110011) Planning Time: 3.167 ms Execution Time: 8.750 ms
But if I replace the whole ilike any ...
clause of the previous query with something simple like ilike '%george%'
, the index is getting used and the query gets executed super fast. So why is my index not being used with ilike any
clause?
I use PostgreSQL 11.2 on MacOS Mojave.
Advertisement
Answer
I have to admit I don’t understand why this is happening, but I was able to get it working by making the right-hand side of the ANY
an array instead of a subquery using the ARRAY(<subselect>)
form.
testdb=# explain analyse select * from clients where ( first_name || ' ' || last_name || ' ' || coalesce(telephone, '') || ' ' || email || ' ' || city || ' ' || street || ' ' || house || ' ' || coalesce(apartment, '') ) ilike any (array( select ('%' || word || '%')::text from regexp_split_to_table('foo georg', E'\s+') AS word)); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on clients (cost=126.79..238.91 rows=489 width=62) (actual time=2.108..18.721 rows=10000 loops=1) Recheck Cond: (((((((((((((((first_name || ' '::text) || last_name) || ' '::text) || COALESCE(telephone, ''::text)) || ' '::text) || email) || ' '::text) || city) || ' '::text) || street) || ' '::text) || house) || ' '::text) || COALESCE(apartment, ''::text)) ~~* ANY ($0)) Heap Blocks: exact=84 InitPlan 1 (returns $0) -> Function Scan on regexp_split_to_table word (cost=0.00..15.00 rows=1000 width=32) (actual time=0.043..0.045 rows=2 loops=1) -> Bitmap Index Scan on clients_text_search_idx (cost=0.00..111.67 rows=489 width=0) (actual time=2.060..2.060 rows=10000 loops=1) Index Cond: (((((((((((((((first_name || ' '::text) || last_name) || ' '::text) || COALESCE(telephone, ''::text)) || ' '::text) || email) || ' '::text) || city) || ' '::text) || street) || ' '::text) || house) || ' '::text) || COALESCE(apartment, ''::text)) ~~* ANY ($0)) Planning time: 0.297 ms Execution time: 19.254 ms (9 rows)
This is with enable_seqscan = off
on pg 10.12 (all that I have available right now).