8.3.1 autovacuum stopped doing anything months ago

Started by Jeffrey Bakerover 17 years ago4 messages
#1Jeffrey Baker
jwbaker@gmail.com

I have an 8.3.1 instance on Linux and since June 29th the autovacuum process
has claimed to be working on the same three tables. That's OK, I am a very
patient man, and these are very large tables. Today I started to get
transaction wraparound warnings, so I go and check it out. Turns out the
autovacuum processes are all just doing nothing. When I strace them, they
are all three blocked on syscalls.

So I restart the database and run a vacuum. Of course, once the wraparound
warning is reached, there's no way to disable the autovac, so now my vacuum
maintenance job is competing with three invulnerable autovacuum processes.
I am thinking of sending them SIGSTOP.

Anyway, I have some issues. One, of course, is that the autovacuum should
not have been deadlocked or otherwise stalled like that. Perhaps it needs a
watchdog of some kind. Has anyone else experienced an issue like that in
8.3.1? The only thing I can see in the release notes that indicates this
problem may have been fixed is the following:

"Repair two places where SIGTERM exit of a backend could leave corrupted
state in shared memory (Tom)"

However I don't know who or what would have sent SIGTERM to the autovacuum
children.

Secondly, there really does need to be an autovacuum=off,really,thanks so
that my maintenance can proceed without competition for i/o resources. Is
there any way to make that happen? Is my SIGSTOP idea dangerous?

-jwb

#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Jeffrey Baker (#1)
Re: 8.3.1 autovacuum stopped doing anything months ago

Jeffrey Baker wrote:

Secondly, there really does need to be an autovacuum=off,really,thanks so
that my maintenance can proceed without competition for i/o resources. Is
there any way to make that happen?

You could bump up autovacuum_freeze_max_age while you run the vacuums
manually, and then set it back. It will require a restart, though.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#3Robert Treat
xzilla@users.sourceforge.net
In reply to: Jeffrey Baker (#1)
Re: 8.3.1 autovacuum stopped doing anything months ago

On Friday 19 September 2008 00:23:34 Jeffrey Baker wrote:

Anyway, I have some issues. One, of course, is that the autovacuum should
not have been deadlocked or otherwise stalled like that. Perhaps it needs
a watchdog of some kind. Has anyone else experienced an issue like that in
8.3.1? The only thing I can see in the release notes that indicates this
problem may have been fixed is the following:

We have several checks in the check_postgres script which are in this area
(warnings for approaching autovacuum freeze max age, warnings when approching
xid wrap, monitoring of tables analyze/vacuum activity) Those can at least
alert you to problems before they become too big a hassle.

Secondly, there really does need to be an autovacuum=off,really,thanks so
that my maintenance can proceed without competition for i/o resources. Is
there any way to make that happen? Is my SIGSTOP idea dangerous?

If Heikis solution applies, it's better (see also vacuum_freeze_min_age) , but
if its too late for that, you can go into single user mode, which will
prevent autovacuum; it's a bit more heavy handed though.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

#4Jeffrey Baker
jwbaker@gmail.com
In reply to: Robert Treat (#3)
Re: 8.3.1 autovacuum stopped doing anything months ago

On Fri, Sep 19, 2008 at 11:42 AM, Robert Treat <xzilla@users.sourceforge.net

wrote:

On Friday 19 September 2008 00:23:34 Jeffrey Baker wrote:

Anyway, I have some issues. One, of course, is that the autovacuum

should

not have been deadlocked or otherwise stalled like that. Perhaps it

needs

a watchdog of some kind. Has anyone else experienced an issue like that

in

8.3.1? The only thing I can see in the release notes that indicates this
problem may have been fixed is the following:

We have several checks in the check_postgres script which are in this area

Are you referring to the nagios plugin? I already use it, and nagios didn't
make a peep. Perhaps I should check for a more recent revision.

-jwb