Skip to content
Advertisement

PostgreSQL SQL query to find number of occurrences of substring in string

I’m trying to wrap my head around a problem but I’m hitting a blank. I know SQL quite well, but I’m not sure how to approach this.

My problem:

Given a string and a table of possible substrings, I need to find the number of occurrences.

The search table consists of a single colum:

searchtable

| pattern TEXT PRIMARY KEY|
|-------------------------|
| my                      |
| quick                   |
| Earth                   |

Given the string “Earth is my home planet and where my friends live”, the expected outcome is 3 (2x “my” and 1x “Earth”).

In my function, I have variable bodytext which is the string to examine.

I know I can do IN (SELECT pattern FROM searchtable) to get the list of substrings, and I could possibly use a LIKE ANY clause to get matches, but how can I count occurrences of the substrings in the table within the search string?

Advertisement

Answer

This is easily done without a custom function:

select count(*)
from (values ('Earth is my home planet and where my friends live')) v(str) cross join lateral
     regexp_split_to_table(v.str, ' ') word join
     patterns p
     on word = p.pattern

Just break the original string into “words”. Then match on the words.

Another method uses regular expression matching:

select (select count(*) from regexp_matches(v.str, p.rpattern, 'g'))
from (values ('Earth is my home planet and where my friends live')) v(str) cross join
     (select string_agg(pattern, '|') as rpattern
      from patterns
     ) p;

This stuffs all the patterns into a regular expression. Not that this version does not take word breaks into account.

Here is a db<>fiddle.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement