select * from table1 where ID in ( select min(a.ID) from (select * from table1) a group by id_x, id_y, col_z having count(*) > 1)
Above query ran in 2.2 seconds returning four result. Now when I change the select *
to delete
, it hangs up indefinitely.
delete from table1 where ID in ( select min(a.ID) from (select * from table1) a group by id_x, id_y, col_z having count(*) > 1)
If I move the position of group by
clause inside the alias select query, it will no longer hang.
delete from table1 where ID in ( select a.ID from (select min(ID) from table1 group by id_x, id_y, col_z having count(*) > 1) a)
Why does it hang? Even though (select * from table1)
pulls millions of records, the query doesn’t seem to stop executing for hours. Can anybody explain what huddles the query? It puzzles me because the select query works fine whereas the delete query hangs.
EDIT: My focus here is why it hangs. I already posted work-around that works fine. But in order to develop prevention system, I need to get to the root cause of this..
Advertisement
Answer
Use a JOIN
instead of WHERE ID IN (SELECT ...)
.
DELETE t1 FROM table1 AS t1 JOIN ( SELECT MIN(id) AS minId FROM table1 GROUP BY id_x, id_y, col_z HAVING COUNT(*) > 1) AS t2 ON t1.id = t2.minId
I think your query is not being optimized because it has to recalculate the subquery after each deletion, since deleting a row could change the MIN(id)
for that group. Using a JOIN
requires the grouping and aggregation to be done just once.