Skip to content
Advertisement

How to match a string after a token (using regex)?

I’m trying to extract mail addresses after a token ‘eaddr:’. So it would match the all occurrences in line entries, first consecutive string without spaces after that token: I tried:

SELECT regexp_substr(tab.entry, 'eaddr:(.*?)',1,1,'e',1)
from (
select 'String, email@domain.com' as entry
union
select 'eaddr:mail1@domain.com eaddr:mail2@domain.com sometext     eaddr:   mail3@domain.com some4354% text' as entry
union
select 'eaddr:mail5@domain.org' as entry
union
select 'Just a string' as entry
) tab
;

but it does not work. The correct result set is:

null
mail1@domain.com mail2@domain.com mail3@domain.com
mail5@domain.org
null

Advertisement

Answer

First of all, I suggest using a better regex to verify the email format. I am inspired by Gordon’s SPLIT_TO_TABLE + LATERAL approach, and wrote some sample queries to fetch those emails from the entries.

If you want to get all the emails together, you can use this one:

with t as (
select 'String, email@domain.com' as entry
union
select 'eaddr:mail1@domain.com eaddr:mail2@domain.com sometext     eaddr:   mail3@domain.com some4354% text' as entry
union
select 'eaddr:mail5@domain.org' as entry
union
select 'Just a string' as entry
) 
Select LISTAGG( regexp_substr( s.value, '[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,64}' ) ,' ' ) emails from t,
lateral SPLIT_TO_TABLE(t.entry, 'eaddr:') s
where s.seq > 1;

+---------------------------------------------------------------------+
|                               EMAILS                                |
+---------------------------------------------------------------------+
| mail1@domain.com mail2@domain.com mail3@domain.com mail5@domain.org |
+---------------------------------------------------------------------+

To get the exact result in your question, you can use the following query:

with t as (
select 'String, email@domain.com' as entry
union
select 'eaddr:mail1@domain.com eaddr:mail2@domain.com sometext     eaddr:   mail3@domain.com some4354% text' as entry
union
select 'eaddr:mail5@domain.org' as entry
union
select 'Just a string' as entry
) 
select emails from 
(
Select t.entry, s.*, 
LISTAGG( regexp_substr( IFF(s.seq = 1, '', s.value ), '[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,64}' ) ,' ' ) 
OVER ( PARTITION BY s.seq ) emails
from t,
lateral SPLIT_TO_TABLE(t.entry, ' ') s ) 
where index = 1;

+----------------------------------------------------+
|                       EMAILS                       |
+----------------------------------------------------+
| NULL                                               |
| mail1@domain.com mail2@domain.com mail3@domain.com |
| NULL                                               |
| mail5@domain.org                                   |
+----------------------------------------------------+
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement