Skip to content
Advertisement

Regex to match duplicate/alias e-mails in MySQL

I am trying to come up with some regular expression to check whether an e-mail exists in a database. What is more specific here is that I want to find e-mails that are the same, but might be written differently. E.g. john.doe@example.com is the same as johndode+123@example.com and j.o.h.n.d.o.e@example.com.

Using another script I remove all dots and text after + so that john.doe+123@example.com becomes johndoe@example.com. How can I use this “stripped” address to match its aliases in a MySQL database?

I have little experience with RegEx and I haven’t really used it alongside SQL before. Anyway, I could only come up with the following code:

RegEx (matches all +<any text> and . before the @ symbol

(+.*(?=@)|.(?=.*@))

SQL

SELECT email FROM users WHERE email REGEXP '(+.*(?=@)|.(?=.*@))'
//or
SELECT * from users WHERE email REGEXP_LIKE('johndoe@example.com', '(+.*(?=@)|.(?=.*@))')

I get the following error for both:

#1064 – You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near REGEXP_LIKE('johndoe@example.com', '(+.*(?=@)|.(?=.*@))')

What I am trying to do is select email from users where email = 'johndoe@example.com' but disregard these characters (+.*(?=@)|.(?=.*@)
(the code is executed as a prepared statement)

Any tip or advice would be highly appreciated!

Advertisement

Answer

In MySQL, the simplest method might be breaking the problem into the username and domain:

where replace(substring_index(substring_index(email, '@', 1), '+', 1), '.', '') = substring_index('johndoe@example.com', '@', 1) and
      substring_index(email, '@', -1) = substring_index('johndoe@example.com', '@', -1) and
      email like '%@%' and
      email not like '%@%@%'
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement