Skip to content
Advertisement

MySQL use DELETE FROM to remove duplicates rows

I’m learning MySQL and today I tried to solve an MySQL question on leetcode: https://leetcode.com/problems/delete-duplicate-emails/solution/

The basic idea is to remove duplicate rows from the table and only keep the one with the smallest Id. Expect results:

The solution is

And I tried a second test case myself:

The output is:

I don’t understand why the condition p1.Id > p2.Id only keeps the row with the smallest id because I think it only rules out the smaller one, it wouldn’t apply to the case where there were more than 2 duplicates, right? But as the second test case shows, it indeed works. I think it’s because of the syntax of

How does this work exactly? Can someone gives me an explanation here, thanks!

Advertisement

Answer

First, this is a very bad way of implementing this code. But I guess you get what you pay for.

Second, simply run the query as a select:

(Note that I’ve rewritten the logic as a JOIN. You should always use proper, explicit, standard, readable JOIN syntax, but the two methods are functionally equivalent.)

On your second example, the results of this query are:

What is notable is that id = 1 is never in the second column — and that is the column that determines which ids are deleted. In other words, all but the smallest id for each email get deleted because there is a smaller id.

This also hints at why this is a really bad solution. MySQL has to deal with two rows for id = 3. Perhaps it attempts to delete both. Perhaps it has to just deal with extra data. Either way, there is extra work. And the more rows with the same email in the data the more extra duplicates are created.

An alternative method, such as:

Does not have this problem and, in my opinion, the intent is clearer.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement