I am trying to come up with some regular expression to check whether an e-mail exists in a database. What is more specific here is that I want to find e-mails that are the same, but might be written differently.
E.g. john.doe@example.com
is the same as johndode+123@example.com
and j.o.h.n.d.o.e@example.com
.
Using another script I remove all dots and text after +
so that john.doe+123@example.com
becomes johndoe@example.com
. How can I use this “stripped” address to match its aliases in a MySQL database?
I have little experience with RegEx and I haven’t really used it alongside SQL before. Anyway, I could only come up with the following code:
RegEx (matches all +<any text>
and .
before the @ symbol
(+.*(?=@)|.(?=.*@))
SQL
SELECT email FROM users WHERE email REGEXP '(+.*(?=@)|.(?=.*@))' //or SELECT * from users WHERE email REGEXP_LIKE('johndoe@example.com', '(+.*(?=@)|.(?=.*@))')
I get the following error for both:
#1064 – You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near
REGEXP_LIKE('johndoe@example.com', '(+.*(?=@)|.(?=.*@))')
What I am trying to do is select email from users where email = 'johndoe@example.com' but disregard these characters (+.*(?=@)|.(?=.*@)
(the code is executed as a prepared statement)
Any tip or advice would be highly appreciated!
Advertisement
Answer
In MySQL, the simplest method might be breaking the problem into the username and domain:
where replace(substring_index(substring_index(email, '@', 1), '+', 1), '.', '') = substring_index('johndoe@example.com', '@', 1) and substring_index(email, '@', -1) = substring_index('johndoe@example.com', '@', -1) and email like '%@%' and email not like '%@%@%'