We have a table (let us call it originalTbl
) that has duplicate entries that we want to delete. By duplicate I mean all values other than an AUTO INCREMENT index field are the same. One way to do this is to create a new table, like the existing table (let us call it uniqueTbl
), and then have a query like:
x
INSERT INTO uniqueTbl (non-Index-field_1, non-Index-field_2, …, non-Index-field_n)
SELECT DISTINCT non-Index-field_1, non-Index-field_2, …, non-Index-field_n FROM originalTbl;
Later on we will drop originalTbl
and rename uniqueTbl
to originalTbl
.
However, I am looking for an alternate approach, that will delete the duplicate entries from originalTbl
directly, without the overhead of first creating the uniqueTbl
and then renaming it to originalTbl
.
Advertisement
Answer
Unless you have very few duplicates, your method will be much, much faster. If you only have a few (say less than 1%), then you can try:
delete o
from originalTbl o left join
(select col1, col2, . . ., min(id) as min_id
from originalTbl o
group by col1, col2, . . .
) oo
on oo.min_id = o.id
where oo.min_id is null;