Skip to content
Advertisement

Extracting original email and remove email alias in emails

I’m trying to get the primary email from aliases, for example, I have the following examples and I need to normalize them.

  1. from “test.me@email.com” to “testme@email.com”
  2. from “test+me11@email.com” to “test@email.com”
  3. from “test.me.123+22@email.com” to “testme123@email.com”
  4. from “test+123@email.com” to “test@email.com”

I wrote a SQL statement on bigquery but got an error that says “Third argument in SUBSTR() cannot be negative”

SELECT  TRANSLATE(SUBSTR('test+123@email.com', 1, INSTR('test+123@email.com', '+')-1),'+.', '') ||'@' || SUBSTR('test+123@email.com', Instr('test+123@email.com','@')+1)

Any ideas or suggestions for the above or ideas for useing regex_replace instead

Thanks

Advertisement

Answer

Consider below approach

select email, 
  regexp_replace(email, r'([^+@]+)+?([^@]*(.*))', r'13') original_email
from your_table             

if applied to sample data in your question – output is

enter image description here

In case if for some reason you want dots to be removed – use below

select email, original_email,
  format('%%s@%%s', 
    replace(split(original_email, '@')[offset(0)], '.', ''), 
    split(original_email, '@')[offset(1)]
  ) with_stripped_dots
from your_table, 
unnest([regexp_replace(email, r'([^+@]+)+?([^@]*(.*))', r'13')]) original_email             

with output

enter image description here

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement