I am looking into PostgreSQL full text search and would like to know if it is possible to search for the second part of compound words.
It there a way to get ‘Cheesecake’ as a result when I search for ‘cake’?
-- Lets have a table like this: CREATE TABLE IF NOT EXISTS table1( id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY, col1 TEXT, col1_tsv TSVECTOR ); CREATE INDEX IF NOT EXISTS col1_index ON table1 USING gin(col1_tsv);
-- Insert some data into it: INSERT INTO table1 (col1, col1_tsv) VALUES ('Blacklist', TO_TSVECTOR('Blacklist')),('Cheesecake', TO_TSVECTOR('Cheesecake'));
If I search for ‘cake’ or ‘list’ I get no results.
SELECT col1 FROM table1 WHERE col1_tsv @@ to_tsquery('english', 'list'); SELECT col1 FROM table1 WHERE col1_tsv @@ to_tsquery('english', 'cake');
Check it with ts_lexize:
select ts_lexize('english_stem','Blacklist'); select ts_lexize('english_stem','Cheesecake');
The output:
ts_lexize ------------- {blacklist} (1 row) ts_lexize ------------- {cheesecak} (1 row)
Works as designed but is there a way to get the Cheesecake just from searching for cake? (And I do not mean)
select * from table1 where col1 like '%cake%';
When I select the whole table the Cheesecake is also cut into Cheesecak.
select * from table1; id | col1 | col1_tsv ----+------------+--------------- 1 | Blacklist | 'blacklist':1 2 | Cheesecake | 'cheesecak':1
Advertisement
Answer
Full text search is capable of prefix matching. See:
But that only works for left-anchored searches. Your pattern is right-anchored.
You could build an index on reversed strings and search with reversed patterns:
CREATE INDEX table1_col1_rtsv_idx ON table1 USING gin (TO_TSVECTOR('simple', reverse(col1)));
Then this query with prefix search can use the new index:
SELECT col1 FROM table1 WHERE to_tsvector('simple', reverse(col1)) @@ to_tsquery('simple', reverse('cake') || ':*');
But I would consider a trigram index instead. See:
CREATE INDEX table1_col1_gin_trgm_idx ON table1 USING gin (col1 gin_trgm_ops);
Query:
SELECT col1 FROM table1 WHERE col1 LIKE '%cake';
Notably, the pattern is '%cake'
, not '%cake%'
, if “cake” shall be at the end of the string. But the trigram index supports this just as well:
SELECT col1 FROM table1 WHERE col1 LIKE '%cake%';
db<>fiddle here
Text search indexes are typically substantially smaller than trigram indexes – and therefore a bit faster. There are lots of other subtle differences …