I inherited a PostgreSQL database in production with one table that is around 250 GB in size. It only has around ten thousand live rows which I estimate to be not more than 20 MB.
The table grew to such a size because AUTOVACUUM
has been turned off at some time. (I know why this was done. It will be reactivated and the original issue has been fixed, so this is not part of the question.)
Our problem is that many queries take pretty long time. For example, a SELECT count(*) FROM foo;
takes around 15 minutes.
Now after considering other options, I’d like to run a VACUUM FULL
on the table. I try to estimate the duration this would take to complete so I can plan a maintenance window.
In my understanding, VACUUM FULL
creates a new table, copies all live tuples to it and replaces the original table with this copy.
My estimation would be that this process doesn’t take much longer than a simple query like the above on this table as the live data is pretty slim in overall size and count.
Would you agree that my expectation of the run time of ‘VACUUM FULL’ is somehow realistic? If not, why not?
Are there best practises for estimating VACUUM FULL
durations?
Advertisement
Answer
The only dependable estimate can be had by restoring a file system backup on a similar machine and test it. That’s what I would recommend.
The duration will not only depend on the size, but also on the amount of bloat: if there are fewer real data, it will be faster.
That said, I’d ask for a maintenance window of 2 hours, which should be ample on anything but very questionable hardware.