BUG #12772: Unexpected autovacuum behavior

Started by Nonameabout 11 years ago3 messagesbugs
Jump to latest
#1Noname
jd@ods.org

The following bug has been logged on the website:

Bug reference: 12772
Logged by: Jason DiCioccio
Email address: jd@ods.org
PostgreSQL version: 9.3.6
Operating system: Ubuntu 14.04 (Linux 3.13.0-40-generic)
Description:

Here's the situation I ran into:

autovacuum was vacuuming a large table in database 'db1' (to prevent txid
wraparound). It turns out that one of the indexes of this table was
corrupted from the 9.3.4 WAL issue. So VACUUM failed repeatedly.
Nonetheless, autovacuum kept trying.

Meanwhile in database 'db2', there were a number of tables in serious need
of vacuuming. However, autovacuum would never touch these, even though
there were far less than autovacuum_max_workers running. Other tables in
database 'db1' WERE being vacuumed, however.

So it appears that the logic is that autovacuum operates solely on one
database at a time. Even if there is only one table that needs vacuuming in
that first database, it will not spawn any workers to vacuum tables that do
need vacuuming in a second database until it has completed vacuuming the
necessary tables from the first database.

This, to me, is unexpected behavior. I'd expect autovacuum to not act as if
there were a large barrier between databases and to vacuum any table that
need it and that the configuration permits.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Jeff Janes
jeff.janes@gmail.com
In reply to: Noname (#1)
Fwd: BUG #12772: Unexpected autovacuum behavior

On Sat, Feb 14, 2015 at 10:14 AM, <jd@ods.org> wrote:

The following bug has been logged on the website:

Bug reference: 12772
Logged by: Jason DiCioccio
Email address: jd@ods.org
PostgreSQL version: 9.3.6
Operating system: Ubuntu 14.04 (Linux 3.13.0-40-generic)
Description:

Here's the situation I ran into:

autovacuum was vacuuming a large table in database 'db1' (to prevent txid
wraparound). It turns out that one of the indexes of this table was
corrupted from the 9.3.4 WAL issue. So VACUUM failed repeatedly.
Nonetheless, autovacuum kept trying.

Meanwhile in database 'db2', there were a number of tables in serious need
of vacuuming. However, autovacuum would never touch these, even though
there were far less than autovacuum_max_workers running. Other tables in
database 'db1' WERE being vacuumed, however.

So it appears that the logic is that autovacuum operates solely on one
database at a time. Even if there is only one table that needs vacuuming
in
that first database, it will not spawn any workers to vacuum tables that do
need vacuuming in a second database until it has completed vacuuming the
necessary tables from the first database.

This, to me, is unexpected behavior. I'd expect autovacuum to not act as
if
there were a large barrier between databases and to vacuum any table that
need it and that the configuration permits.

(Sorry, accidentally did not include the list on my original reply, here is
a copy to the list)

This is a known issue, but so far there hasn't be a lot of urgency to fix
it, probably because of a lack of complaints from the field until now, and
because there are a few ways to address it and no one has forcefully
advocated for one particular solution.

Basically it considers the database being in need of wrap around vacuum as
an emergency (even thought it probably isn't) and funnels all future
workers to it. But when only one table is in need of that wrap around
vacuum, all the other workers can't accomplish anything in that database,
and other databases get starved.

/messages/by-id/CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=YPyFPQ@mail.gmail.com

Cheers,

Jeff

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#2)
Re: Fwd: BUG #12772: Unexpected autovacuum behavior

Jeff Janes <jeff.janes@gmail.com> writes:

On Sat, Feb 14, 2015 at 10:14 AM, <jd@ods.org> wrote:

So it appears that the logic is that autovacuum operates solely on one
database at a time. Even if there is only one table that needs vacuuming in
that first database, it will not spawn any workers to vacuum tables that do
need vacuuming in a second database until it has completed vacuuming the
necessary tables from the first database.

This is a known issue, but so far there hasn't be a lot of urgency to fix
it, probably because of a lack of complaints from the field until now, and
because there are a few ways to address it and no one has forcefully
advocated for one particular solution.

Basically it considers the database being in need of wrap around vacuum as
an emergency (even thought it probably isn't) and funnels all future
workers to it. But when only one table is in need of that wrap around
vacuum, all the other workers can't accomplish anything in that database,
and other databases get starved.

Yeah. The thing that makes it difficult to behave in an ideal way is that
the autovacuum launcher has only database-level granularity of knowledge.
It knows that there's at least one table in that database that is at risk
of wraparound, but it doesn't know if there's more than one (and, btw,
it usually is a reasonable bet that there *is* more than one, because
probably most of the database last got an anti-wraparound vacuum at about
the same time). So it's hard to tell whether "send all the troops that
way" is good strategy or just a waste of resources.

The previous thread you mentioned failed to come up with any ideas that
materially improved things, and I suspect that it's not possible to do
much without giving the launcher more information than it gets now.

One simple mechanism that perhaps could do that is having autovacuum
workers report back on whether they found anything useful to do or not.
If we know we have worker(s) active in a given database, but another
worker exits reporting there was nothing for it to do, we could conclude
that that database is saturated for the moment, and stop dispatching new
workers to it till the situation changes. I'm not entirely sure about
details though; in particular you'd have to be very sure that you didn't
miss any "situation changing" conditions or a database could become
permanently ignored.

There might be other feedback mechanisms that would work better, but
I think we need to consider adding something.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs