I’m trying to wrap my head around a problem but I’m hitting a blank. I know SQL quite well, but I’m not sure how to approach this.
My problem:
Given a string and a table of possible substrings, I need to find the number of occurrences.
The search table consists of a single colum:
searchtable
| pattern TEXT PRIMARY KEY| |-------------------------| | my | | quick | | Earth |
Given the string “Earth is my home planet and where my friends live”, the expected outcome is 3 (2x “my” and 1x “Earth”).
In my function, I have variable bodytext which is the string to examine.
I know I can do IN (SELECT pattern FROM searchtable) to get the list of substrings, and I could possibly use a LIKE ANY clause to get matches, but how can I count occurrences of the substrings in the table within the search string?
Advertisement
Answer
This is easily done without a custom function:
select count(*) from (values ('Earth is my home planet and where my friends live')) v(str) cross join lateral regexp_split_to_table(v.str, ' ') word join patterns p on word = p.pattern
Just break the original string into “words”. Then match on the words.
Another method uses regular expression matching:
select (select count(*) from regexp_matches(v.str, p.rpattern, 'g')) from (values ('Earth is my home planet and where my friends live')) v(str) cross join (select string_agg(pattern, '|') as rpattern from patterns ) p;
This stuffs all the patterns into a regular expression. Not that this version does not take word breaks into account.
Here is a db<>fiddle.