I’m trying to extract mail addresses after a token ‘eaddr:’. So it would match the all occurrences in line entries, first consecutive string without spaces after that token: I tried:
x
SELECT regexp_substr(tab.entry, 'eaddr:(.*?)',1,1,'e',1)
from (
select 'String, email@domain.com' as entry
union
select 'eaddr:mail1@domain.com eaddr:mail2@domain.com sometext eaddr: mail3@domain.com some4354% text' as entry
union
select 'eaddr:mail5@domain.org' as entry
union
select 'Just a string' as entry
) tab
;
but it does not work. The correct result set is:
null
mail1@domain.com mail2@domain.com mail3@domain.com
mail5@domain.org
null
Advertisement
Answer
First of all, I suggest using a better regex to verify the email format. I am inspired by Gordon’s SPLIT_TO_TABLE + LATERAL approach, and wrote some sample queries to fetch those emails from the entries.
If you want to get all the emails together, you can use this one:
with t as (
select 'String, email@domain.com' as entry
union
select 'eaddr:mail1@domain.com eaddr:mail2@domain.com sometext eaddr: mail3@domain.com some4354% text' as entry
union
select 'eaddr:mail5@domain.org' as entry
union
select 'Just a string' as entry
)
Select LISTAGG( regexp_substr( s.value, '[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,64}' ) ,' ' ) emails from t,
lateral SPLIT_TO_TABLE(t.entry, 'eaddr:') s
where s.seq > 1;
+---------------------------------------------------------------------+
| EMAILS |
+---------------------------------------------------------------------+
| mail1@domain.com mail2@domain.com mail3@domain.com mail5@domain.org |
+---------------------------------------------------------------------+
To get the exact result in your question, you can use the following query:
with t as (
select 'String, email@domain.com' as entry
union
select 'eaddr:mail1@domain.com eaddr:mail2@domain.com sometext eaddr: mail3@domain.com some4354% text' as entry
union
select 'eaddr:mail5@domain.org' as entry
union
select 'Just a string' as entry
)
select emails from
(
Select t.entry, s.*,
LISTAGG( regexp_substr( IFF(s.seq = 1, '', s.value ), '[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,64}' ) ,' ' )
OVER ( PARTITION BY s.seq ) emails
from t,
lateral SPLIT_TO_TABLE(t.entry, ' ') s )
where index = 1;
+----------------------------------------------------+
| EMAILS |
+----------------------------------------------------+
| NULL |
| mail1@domain.com mail2@domain.com mail3@domain.com |
| NULL |
| mail5@domain.org |
+----------------------------------------------------+