Vacuuming the free space map considered harmful?
We're tracking down an issue that we've seen in two separate installations so far, which is that, at the very end of a vacuum, the vacuum operation starts using *very* high levels of CPU and (sometimes) I/O, often to the point that the system becomes unable to service other requests. We've seen this on versions 15, 16, and 17 so far.
The common data points are:
1. The table being vacuumed is large (>250 million rows, often in the >10 billion row level).
2. The table has a relatively high churn rate.
3. The number of updated / deleted rows before that particular vacuum cycle are very high.
Everything seems to point to the vacuum free space map operation, since it would have a lot of work to do in that particular situation, it happens at just the right place in the vacuum cycle, and its resource consumption is not throttled the way the regular vacuum operation is.
Assuming this analysis is correct, our current proposal as a temporary fix is to increase the frequency of autovacuum on those tables, so that the free space map vacuum operation has less to do, and is less likely to consume the system. In the longer run, is it worth considering implementing a cost delay inside of the free space map update operations?
Hi,
On Wed, Mar 19, 2025 at 09:53:37AM +0100, Christophe Pettus wrote:
We're tracking down an issue that we've seen in two separate
installations so far, which is that, at the very end of a vacuum, the
vacuum operation starts using *very* high levels of CPU and
(sometimes) I/O, often to the point that the system becomes unable to
service other requests. We've seen this on versions 15, 16, and 17 so
far.
Ouch.
The common data points are:
1. The table being vacuumed is large (>250 million rows, often in the
10 billion row level).
2. The table has a relatively high churn rate.
3. The number of updated / deleted rows before that particular vacuum
cycle are very high.Everything seems to point to the vacuum free space map operation,
since it would have a lot of work to do in that particular situation,
it happens at just the right place in the vacuum cycle, and its
resource consumption is not throttled the way the regular vacuum
operation is.
Independent of throttling, if it turns out free space map vacuum is
indeed the culprit, I think it would make sense to add that one as a
dedicated phase so it can be more easily tracked in
pg_stat_progress_vacuum etc.
Michael
On 2025-Mar-19, Christophe Pettus wrote:
Everything seems to point to the vacuum free space map operation,
since it would have a lot of work to do in that particular situation,
it happens at just the right place in the vacuum cycle, and its
resource consumption is not throttled the way the regular vacuum
operation is.Assuming this analysis is correct, our current proposal as a temporary
fix is to increase the frequency of autovacuum on those tables, so
that the free space map vacuum operation has less to do, and is less
likely to consume the system. In the longer run, is it worth
considering implementing a cost delay inside of the free space map
update operations?
Assuming this analysis is correct, I agree that the FSM vacuuming should
also be throttled, as long as that can be done without blocking
concurrent operations (insertions) on the table.
Another idea to consider might be to split out the vacuuming of the FSM
from the vacuuming of the table itself; given that HOT pruning can free
significant amounts of space prior to actual vacuuming, it's possible
that there are wins to scheduling it separately. However, having a way
to track stats for it might be tricky.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"If you want to have good ideas, you must have many ideas. Most of them
will be wrong, and what you have to learn is which ones to throw away."
(Linus Pauling)
On Wed, Mar 19, 2025 at 4:54 AM Christophe Pettus <xof@thebuild.com> wrote:
Everything seems to point to the vacuum free space map operation, since it would have a lot of work to do in that particular situation, it happens at just the right place in the vacuum cycle, and its resource consumption is not throttled the way the regular vacuum operation is.
Do you know how big the FSM was?
As others have said, it could be worth adding a phase to
pg_stat_progress_vacuum.
I know you said you saw this all the way back to 15, but it made me
wonder if this would actually get worse after 17 and the tidstore
optimization -- since a large table with indexes requiring multiple
vacuum passes would actually end up having its FSM vacuumed sooner.
- Melanie
On Mar 19, 2025, at 14:12, Melanie Plageman <melanieplageman@gmail.com> wrote:
Do you know how big the FSM was?
Unfortunately, no; both installations are locked-down hosted environments.
As others have said, it could be worth adding a phase to
pg_stat_progress_vacuum.
+1.
On Mar 19, 2025, at 12:12, Álvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Assuming this analysis is correct, I agree that the FSM vacuuming should
also be throttled, as long as that can be done without blocking
concurrent operations (insertions) on the table.
From an (admittedly somewhat naïve) look at the code, it appears that having it honor the cost delay wouldn't introduce excessive blocking, as long as the delay wasn't implemented at a really dumb place.
Hi,
On 2025-03-19 15:05:47 +0100, Christophe Pettus wrote:
On Mar 19, 2025, at 14:12, Melanie Plageman <melanieplageman@gmail.com> wrote:
Do you know how big the FSM was?
Unfortunately, no; both installations are locked-down hosted environments.
That can be checked with pg_relation_size(), e.g.
SELECT pg_relation_size('pg_class', 'fsm');
nothing special required.
Greetings,
Andres Freund
On Mar 19, 2025, at 15:06, Christophe Pettus <xof@thebuild.com> wrote:
From an (admittedly somewhat naïve) look at the code, it appears that having it honor the cost delay wouldn't introduce excessive blocking, as long as the delay wasn't implemented at a really dumb place.
Specifically, it looks like doing the cost-based wait immediately after backend/storage/freespace/freespace.c:898 (in HEAD) wouldn't be in a critical section, although I could be totally in the weeds on this.