allow changing autovacuum_max_workers without restarting

Started by Nathan Bossartabout 2 years ago71 messageshackers

nathandbossart@gmail.com

about 2 years ago

I frequently hear about scenarios where users with thousands upon thousands
of tables realize that autovacuum is struggling to keep up. When they
inevitably go to bump up autovacuum_max_workers, they discover that it
requires a server restart (i.e., downtime) to take effect, causing further
frustration. For this reason, I think $SUBJECT is a desirable improvement.
I spent some time looking for past discussions about this, and I was
surprised to not find any, so I thought I'd give it a try.

The attached proof-of-concept patch demonstrates what I have in mind.
Instead of trying to dynamically change the global process table, etc., I'm
proposing that we introduce a new GUC that sets the effective maximum
number of autovacuum workers that can be started at any time. This means
there would be two GUCs for the number of autovacuum workers: one for the
number of slots reserved for autovacuum workers, and another that restricts
the number of those slots that can be used. The former would continue to
require a restart to change its value, and users would typically want to
set it relatively high. The latter could be changed at any time and would
allow for raising or lowering the maximum number of active autovacuum
workers, up to the limit set by the other parameter.

The proof-of-concept patch keeps autovacuum_max_workers as the maximum
number of slots to reserve for workers, but I think we should instead
rename this parameter to something else and then reintroduce
autovacuum_max_workers as the new parameter that can be adjusted without
restarting. That way, autovacuum_max_workers continues to work much the
same way as in previous versions.

There are a couple of weird cases with this approach. One is when the
restart-only limit is set lower than the PGC_SIGHUP limit. In that case, I
think we should just use the restart-only limit. The other is when there
are already N active autovacuum workers and the PGC_SIGHUP parameter is
changed to something less than N. For that case, I think we should just
block starting additional workers until the number of workers drops below
the new parameter's value. I don't think we should kill existing workers,
or anything else like that.

TBH I've been sitting on this idea for a while now, only because I think it
has a slim chance of acceptance, but IMHO this is a simple change that
could help many users.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Sami Imseih

samimseih@gmail.com

about 2 years ago

In reply to: Nathan Bossart (#1)

Re: allow changing autovacuum_max_workers without restarting

I frequently hear about scenarios where users with thousands upon thousands
of tables realize that autovacuum is struggling to keep up. When they
inevitably go to bump up autovacuum_max_workers, they discover that it
requires a server restart (i.e., downtime) to take effect, causing further
frustration. For this reason, I think $SUBJECT is a desirable improvement.
I spent some time looking for past discussions about this, and I was
surprised to not find any, so I thought I'd give it a try.

I did not review the patch in detail yet, but +1 to the idea.
It's not just thousands of tables that suffer from this.
If a user has a few large tables hogging the autovac workers, then other
tables don't get the autovac cycles they require. Users are then forced
to run manual vacuums, which adds complexity to their operations.

The attached proof-of-concept patch demonstrates what I have in mind.
Instead of trying to dynamically change the global process table, etc., I'm
proposing that we introduce a new GUC that sets the effective maximum
number of autovacuum workers that can be started at any time.

max_worker_processes defines a pool of max # of background workers allowed.
parallel workers and extensions that spin up background workers all utilize from
this pool.

Should autovacuum_max_workers be able to utilize from max_worker_processes also?

This will allow autovacuum_max_workers to be dynamic while the user only has
to deal with an already existing GUC. We may want to increase the default value
for max_worker_processes as part of this.

Regards,

Sami
Amazon Web Services (AWS)

allow changing autovacuum_max_workers without restarting

Attachments:

Attachments:

Attachments: