Best way to delete millions of rows by ID

Question

I need to delete about 2 million rows from my PG database. I have a list of IDs that I need to delete. However, any way I try to do this is taking days. I tried putting them in a table and doing it in batches of 100. 4 days later, this is still running with only 297268 rows deleted.

Accepted Answer

It all depends &#8230;Assuming no concurrent write access to involved tables or you may have to lock tables exclusively or this route may not be for you at all.Delete all indexes (possibly except the ones needed for the delete itself).Recreate them afterwards. That&#8217;s typically much faster than incremental updates to indexes.Check if you have triggers that can safely be deleted / disabled temporarily.Do foreign keys reference your table? Can they be deleted? Temporarily deleted?Depending on your autovacuum settings it may help to run VACUUM ANALYZE before the operation.Some of the points listed in the related chapter of the manual Populating a Database may also be of use, depending on your setup.If you delete large portions of the table and the rest fits into RAM, the fastest and easiest way may be this:BEGIN; -- typically faster and safer wrapped in a single transactionSET LOCAL temp_buffers = '1000MB'; -- enough to hold the temp tableCREATE TEMP TABLE tmp ASSELECT t.*FROM   tbl tLEFT   JOIN del_list d USING (id)WHERE  d.id IS NULL;      -- copy surviving rows into temporary table-- ORDER BY ?             -- optionally order favorably while being at itTRUNCATE tbl;             -- empty table - truncate is very fast for big tablesINSERT INTO tblTABLE tmp;        -- insert back surviving rows.COMMIT;This way you don&#8217;t have to recreate views, foreign keys or other depending objects. And you get a pristine (sorted) table without bloat.Read about the temp_buffers setting in the manual. This method is fast as long as the table fits into memory, or at least most of it. The transaction wrapper defends against losing data if your server crashes in the middle of this operation.Run VACUUM ANALYZE afterwards. Or (typically not necessary after going the TRUNCATE route) VACUUM FULL ANALYZE to bring it to minimum size (takes exclusive lock). For big tables consider the alternatives CLUSTER / pg_repack or similar:Optimize Postgres query on timestamp rangeFor small tables, a simple DELETE instead of TRUNCATE is often faster:DELETE FROM tbl tUSING  del_list dWHERE  t.id = d.id;Read the Notes section for TRUNCATE in the manual. In particular (as Pedro also pointed out in his comment):TRUNCATE cannot be used on a table that has foreign-key referencesfrom other tables, unless all such tables are also truncated in thesame command. [&#8230;]And:TRUNCATE will not fire any ON DELETE triggers that might exist forthe tables.

Advertisement

Answer