Skip to content
Advertisement

Is there an elegant and efficient SQL way of listing all substring positions?

I have a text column in an sql database. My task is to find all occurrences of a given string in the texts, and list the character positions text by text (for backreference later in the application).

I have found examples like this that solve a similar issue by a while loop. However, I dislike the idea of writing a loop is there exists a slimmer way of doing the same.

I imagine this to work similarly to STRING_SPLIT in T-SQL, although I emphasize I am preferably looking for a MySQL solution. STRING_SPLIT returns a one column table filled with the substrings of the split string. An imaginary ALL_POSITIONS method could return a one column table filled with the starting positions of the matches in the text, or an empty table, if there’s no match. Or, for the sake of JOINing, there could be another column for a primary key reference.

So let’s take an example table for illustrating my point:

|Id|Text                      |
+--+--------------------------+
| 0|This is my teststring     |
| 1|A second teststring       |

My dream Pseudo-SQL:

SELECT ALL_POSITIONS('st', Text, Id) FROM Table;

Which would yield:

|Id|Position|
+--+--------+
| 0|      13|    <- the first 'st' where Id = 0
| 0|      15|    <- the second 'st' where Id = 0 etc.
| 1|      11|
| 1|      13|

Any ideas are welcome.

Advertisement

Answer

With a recursive CTE for SQL Server:

with cte as (
  select id, charindex('st', text) pos from tablename
  union all
  select t.id, charindex('st', t.text, c.pos + 1) 
  from tablename t inner join cte c
  on c.id = t.id
  where c.pos > 0 and c.pos < len(t.text)
)
select * from cte
where pos > 0
order by id, pos

See the demo.
For MySql 8.0+:

with recursive cte as (
  select id, locate('st', text) pos from tablename
  union all
  select t.id, locate('st', t.text, c.pos + 1) 
  from tablename t inner join cte c
  on c.id = t.id
  where c.pos > 0 and c.pos < length(t.text)
)
select * from cte
where pos > 0
order by id, pos

See the demo.
Results:

> id | pos
> -: | --:
>  0 |  14
>  0 |  16
>  1 |  12
>  1 |  14
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement