Optimizing MySQL-Query for removing duplicates and adjusting foreign key references

Question

i have following tables:address_table and customer_address_table The address table stores addresses and the customer_address table stores the relation between customer and address. A customer can have multiple addresses, therefore the second table. In the address table there are duplicate rows (different id b…

Accepted Answer

Don&#8217;t do it in a single pass.  Instead, write a loop (in client code) tofind the &#8216;next&#8217; few dupsfix them.step to next batchThis may still take day(s), but it won&#8217;t be impacting the rest of the system.Step 1SELECT a.id, b.id    FROM address AS a    JOIN address AS b  USING(city, street, house_number, state, zip_code, country)    WHERE a.id BETWEEN $left_off AND $left_off + 100Step 2With that short, possibly empty list, of pairs, fix the links.Step 3$left_off = $left_off + 100Exit if no more.When finished, you had better addUNIQUE(city, street, house_number, state, zip_code, country)to prevent further dups.  If adding the index fails, then there are more dups to clean up; continue where you left off.More on chunking for Delete, etc:  http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks

Advertisement

Answer