We have a table (let us call it originalTbl
) that has duplicate entries that we want to delete. By duplicate I mean all values other than an AUTO INCREMENT index field are the same. One way to do this is to create a new table, like the existing table (let us call it uniqueTbl
), and then have a query like:
INSERT INTO uniqueTbl (non-Index-field_1, non-Index-field_2, …, non-Index-field_n) SELECT DISTINCT non-Index-field_1, non-Index-field_2, …, non-Index-field_n FROM originalTbl;
Later on we will drop originalTbl
and rename uniqueTbl
to originalTbl
.
However, I am looking for an alternate approach, that will delete the duplicate entries from originalTbl
directly, without the overhead of first creating the uniqueTbl
and then renaming it to originalTbl
.
Advertisement
Answer
Unless you have very few duplicates, your method will be much, much faster. If you only have a few (say less than 1%), then you can try:
delete o from originalTbl o left join (select col1, col2, . . ., min(id) as min_id from originalTbl o group by col1, col2, . . . ) oo on oo.min_id = o.id where oo.min_id is null;