I’m trying to extract mail addresses after a token ‘eaddr:’. So it would match the all occurrences in line entries, first consecutive string without spaces after that token: I tried:
SELECT regexp_substr(tab.entry, 'eaddr:(.*?)',1,1,'e',1) from ( select 'String, email@domain.com' as entry union select 'eaddr:mail1@domain.com eaddr:mail2@domain.com sometext eaddr: mail3@domain.com some4354% text' as entry union select 'eaddr:mail5@domain.org' as entry union select 'Just a string' as entry ) tab ;
but it does not work. The correct result set is:
null mail1@domain.com mail2@domain.com mail3@domain.com mail5@domain.org null
Advertisement
Answer
First of all, I suggest using a better regex to verify the email format. I am inspired by Gordon’s SPLIT_TO_TABLE + LATERAL approach, and wrote some sample queries to fetch those emails from the entries.
If you want to get all the emails together, you can use this one:
with t as ( select 'String, email@domain.com' as entry union select 'eaddr:mail1@domain.com eaddr:mail2@domain.com sometext eaddr: mail3@domain.com some4354% text' as entry union select 'eaddr:mail5@domain.org' as entry union select 'Just a string' as entry ) Select LISTAGG( regexp_substr( s.value, '[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,64}' ) ,' ' ) emails from t, lateral SPLIT_TO_TABLE(t.entry, 'eaddr:') s where s.seq > 1; +---------------------------------------------------------------------+ | EMAILS | +---------------------------------------------------------------------+ | mail1@domain.com mail2@domain.com mail3@domain.com mail5@domain.org | +---------------------------------------------------------------------+
To get the exact result in your question, you can use the following query:
with t as ( select 'String, email@domain.com' as entry union select 'eaddr:mail1@domain.com eaddr:mail2@domain.com sometext eaddr: mail3@domain.com some4354% text' as entry union select 'eaddr:mail5@domain.org' as entry union select 'Just a string' as entry ) select emails from ( Select t.entry, s.*, LISTAGG( regexp_substr( IFF(s.seq = 1, '', s.value ), '[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,64}' ) ,' ' ) OVER ( PARTITION BY s.seq ) emails from t, lateral SPLIT_TO_TABLE(t.entry, ' ') s ) where index = 1; +----------------------------------------------------+ | EMAILS | +----------------------------------------------------+ | NULL | | mail1@domain.com mail2@domain.com mail3@domain.com | | NULL | | mail5@domain.org | +----------------------------------------------------+