We have a table (let us call it originalTbl) that has duplicate entries that we want to delete. By duplicate I mean all values other than an AUTO INCREMENT index field are the same. One way to do this is to create a new table, like the existing table (let us call it uniqueTbl), and then have a query like:
INSERT INTO uniqueTbl (non-Index-field_1, non-Index-field_2, …, non-Index-field_n) SELECT DISTINCT non-Index-field_1, non-Index-field_2, …, non-Index-field_n FROM originalTbl;
Later on we will drop originalTbl and rename uniqueTbl to originalTbl.
However, I am looking for an alternate approach, that will delete the duplicate entries from originalTbl directly, without the overhead of first creating the uniqueTbl and then renaming it to originalTbl.
Advertisement
Answer
Unless you have very few duplicates, your method will be much, much faster. If you only have a few (say less than 1%), then you can try:
delete o
from originalTbl o left join
(select col1, col2, . . ., min(id) as min_id
from originalTbl o
group by col1, col2, . . .
) oo
on oo.min_id = o.id
where oo.min_id is null;