It’s a common question: Write an SQL query to delete all the duplicate emails, keeping only one unique email with the smallest id.
Example 1:
Input: Person table:
+----+------------------+ | id | email | +----+------------------+ | 1 | john@example.com | | 2 | bob@example.com | | 3 | john@example.com | +----+------------------+
Output:
+----+------------------+ | id | email | +----+------------------+ | 1 | john@example.com | | 2 | bob@example.com | +----+------------------+
I know that to select what I don’t want, I have to do something like this:
SELECT p1.* FROM Person p1, Person p2 WHERE p1.Email = p2.Email AND p1.Id > p2.Id ;
After that portion I got stuck, so I looked at the answer:
DELETE p1 FROM Person p1, Person p2 WHERE p1.Email = p2.Email AND p1.Id > p2.Id
Where I get confused is the second part of the prompt, where you have to delete the selection (it doesn’t specify this in the prompt, however I am assuming you have to delete from the original table because, practically, someone would go looking for the table “Person” not table “p2” or “p1”). At least I think?
And so… I’m confused how the answer works. I thought that assigning a name to the table (for example “p1” and “p2”) creates a temporary copy table, that “p1” and “p2” are physically different tables than “Person”. But wouldn’t that mean that deleting p1 leaves you with table p2, and that you were able to get the result you needed with table “p2” however the original table “Person” was never modified?
Appreciate any help… the more I think about it the more confused I get, and so I feel like I am missing something very fundamental here. Thank you!
Advertisement
Answer
No, p1
and p2
are aliases. They are just shorthand ways of referring to the table. It’s to let you have to type less.
SELECT customer_orders.partnumber, customer_orders.custnumber FROM customer_orders
is more typing and reading than
SELECT co.partnumber, co.custnumber FROM co
In your case, there are two instances of the Person
table, so without p1
and p2
it would look like this:
DELETE Person FROM Person, Person WHERE Person.Email = Person.Email AND Person.Id > Person.Id
and that doesn’t make sense.