I am looking into PostgreSQL full text search and would like to know if it is possible to search for the second part of compound words.
It there a way to get ‘Cheesecake’ as a result when I search for ‘cake’?
-- Lets have a table like this:CREATE TABLE IF NOT EXISTS table1( id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY, col1 TEXT, col1_tsv TSVECTOR);CREATE INDEX IF NOT EXISTS col1_index ON table1 USING gin(col1_tsv);-- Insert some data into it:INSERT INTO table1 (col1, col1_tsv)VALUES ('Blacklist', TO_TSVECTOR('Blacklist')),('Cheesecake', TO_TSVECTOR('Cheesecake'));If I search for ‘cake’ or ‘list’ I get no results.
SELECT col1 FROM table1 WHERE col1_tsv @@ to_tsquery('english', 'list');SELECT col1 FROM table1 WHERE col1_tsv @@ to_tsquery('english', 'cake');Check it with ts_lexize:
select ts_lexize('english_stem','Blacklist');select ts_lexize('english_stem','Cheesecake');The output:
ts_lexize ------------- {blacklist}(1 row) ts_lexize ------------- {cheesecak}(1 row)Works as designed but is there a way to get the Cheesecake just from searching for cake? (And I do not mean)
select * from table1 where col1 like '%cake%';When I select the whole table the Cheesecake is also cut into Cheesecak.
select * from table1; id | col1 | col1_tsv ----+------------+--------------- 1 | Blacklist | 'blacklist':1 2 | Cheesecake | 'cheesecak':1Advertisement
Answer
Full text search is capable of prefix matching. See:
But that only works for left-anchored searches. Your pattern is right-anchored.
You could build an index on reversed strings and search with reversed patterns:
CREATE INDEX table1_col1_rtsv_idx ON table1 USING gin (TO_TSVECTOR('simple', reverse(col1)));Then this query with prefix search can use the new index:
SELECT col1 FROM table1WHERE to_tsvector('simple', reverse(col1)) @@ to_tsquery('simple', reverse('cake') || ':*');But I would consider a trigram index instead. See:
CREATE INDEX table1_col1_gin_trgm_idx ON table1 USING gin (col1 gin_trgm_ops);Query:
SELECT col1 FROM table1WHERE col1 LIKE '%cake';Notably, the pattern is '%cake', not '%cake%', if “cake” shall be at the end of the string. But the trigram index supports this just as well:
SELECT col1 FROM table1WHERE col1 LIKE '%cake%';db<>fiddle here
Text search indexes are typically substantially smaller than trigram indexes – and therefore a bit faster. There are lots of other subtle differences …