I want to delete all duplicate entries in my MySQL database.
I found a lot of solutions in the web but in every case the query worked only when you are looking up for 1 column.
In my case I need a query for more than 1 row which will be executed as far as possible – the table is 500MB big with more than 6 Million entries.
My Table (example)
id name status email ---------------------------- x Mark 1 Mark@trash-mail.com x Anna 1 Anna@trash-mail.com x Mark 0 Mark@trash-mail.com x Mark 1 Mark@trash-mail.com
What should now happen? I need a query like this:
DELETE * FROM my_table WHERE (name == name, status == status, email == email)
When executing this query the table has to look like this:
id name status email ---------------------------- x Mark 1 Mark@trash-mail.com x Anna 1 Anna@trash-mail.com x Mark 0 Mark@trash-mail.com
The last Mark-Entry was deleted because the name, status, and email field was equal to another. In some cases there are 20 or more with the same data which has to be deleted so that there is just one left.
Currently I have a bad solution with a PHP-Script. I am going to each row, looking for the same data and delete it. It works like it should but its too slow… I think maybe 5.000 Items per Hour…
You know any solution to solve this in a better way?
Advertisement
Answer
My suggestion would be to use the method where you truncate the table and reinsert the data. Something like:
create temporary table temp as select min(id) as id, name, status, email from mytable group by name, status, email; truncate table mytable; insert into mytable(id, name, status, email) select id, name, status, email from temp;
Doing the deleting in place can be quite expensive. If you decide to do so, I would create the same temporary table with an index on id
, and then use:
delete m from mytable m left join temp on m.id = temp.id where temp.id is null;
You can use a limit clause (such as limit 10000
) to run this in batches.