can't stop autovacuum by HUP'ing the server

Started by Dave Cramerover 17 years ago35 messageshackers
Jump to latest
#1Dave Cramer
pg@fastcrypt.com

I'd like to stop autovac by changing the conf file then sending the
server a HUP

This "appears" to work, the logs show autovac terminated by
administrative command. Then a few minutes later I see a vacuum
process spawned.

Is it possible that there are timers that aren't being properly
terminated here ?

Dave

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dave Cramer (#1)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer <pg@fastcrypt.com> writes:

I'd like to stop autovac by changing the conf file then sending the
server a HUP

Uh ... why should that stop an autovac already in progress? I'd
only expect it to affect future launches.

regards, tom lane

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dave Cramer (#1)
Re: can't stop autovacuum by HUP'ing the server

Hi Dave,

Dave Cramer wrote:

I'd like to stop autovac by changing the conf file then sending the
server a HUP

This "appears" to work, the logs show autovac terminated by
administrative command. Then a few minutes later I see a vacuum process
spawned.

Is it possible that there are timers that aren't being properly
terminated here ?

I wonder whether there are tables that need an emergency vacuum (i.e.
they have reached the freeze horizon). What version are you running
anyway?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#4Dave Cramer
pg@fastcrypt.com
In reply to: Tom Lane (#2)
Re: can't stop autovacuum by HUP'ing the server

On 24-Aug-08, at 10:12 PM, Tom Lane wrote:

Dave Cramer <pg@fastcrypt.com> writes:

I'd like to stop autovac by changing the conf file then sending the
server a HUP

Uh ... why should that stop an autovac already in progress? I'd
only expect it to affect future launches.

Well, I go the extra mile and kill any remaing autovac procs

Here are the logs

2008-08-25 04:00:01 EDT [32276] LOG: autovacuum launcher shutting down
2008-08-25 04:00:01 EDT [20526] LOG: autovacuum launcher started
2008-08-25 04:00:01 EDT [21404] FATAL: terminating autovacuum
process due to administrator command
2008-08-25 04:00:01 EDT [23418] FATAL: terminating autovacuum
process due to administrator command
2008-08-25 04:00:01 EDT [21703] FATAL: terminating autovacuum
process due to administrator command

So now that I've read this why would the launcher start again ?

Dave

Show quoted text

regards, tom lane

#5Dave Cramer
pg@fastcrypt.com
In reply to: Alvaro Herrera (#3)
Re: can't stop autovacuum by HUP'ing the server

On 24-Aug-08, at 10:17 PM, Alvaro Herrera wrote:

Hi Dave,

Dave Cramer wrote:

I'd like to stop autovac by changing the conf file then sending the
server a HUP

This "appears" to work, the logs show autovac terminated by
administrative command. Then a few minutes later I see a vacuum
process
spawned.

Is it possible that there are timers that aren't being properly
terminated here ?

I wonder whether there are tables that need an emergency vacuum (i.e.
they have reached the freeze horizon). What version are you running
anyway?

version 8.3 and doubtful.

Since a number of people actually do this it would seem like a better
way to temporarily suspend autovac should be on the todo ?

Dave

Show quoted text

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dave Cramer (#5)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer wrote:

Since a number of people actually do this it would seem like a better
way to temporarily suspend autovac should be on the todo ?

No -- what you are doing is documented to work. However, it only stops
the autovac launcher, not a currently-running worker. If this isn't
working, I'd like to know why.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dave Cramer (#4)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer wrote:

Well, I go the extra mile and kill any remaing autovac procs

Here are the logs

2008-08-25 04:00:01 EDT [32276] LOG: autovacuum launcher shutting down
2008-08-25 04:00:01 EDT [20526] LOG: autovacuum launcher started

What did you SIGHUP, the launcher or postmaster? You need the latter.
The launcher should exit automatically at that time.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#8Dave Cramer
pg@fastcrypt.com
In reply to: Alvaro Herrera (#7)
Re: can't stop autovacuum by HUP'ing the server

On 25-Aug-08, at 10:43 AM, Alvaro Herrera wrote:

Dave Cramer wrote:

Well, I go the extra mile and kill any remaing autovac procs

Here are the logs

2008-08-25 04:00:01 EDT [32276] LOG: autovacuum launcher shutting
down
2008-08-25 04:00:01 EDT [20526] LOG: autovacuum launcher started

What did you SIGHUP, the launcher or postmaster? You need the latter.
The launcher should exit automatically at that time.

No, I am HUP'ing the postmaster then subsequently killing any
autovacuums still around, which may be the problem.

I may be killing the launcher prematurely. I'll try again tonight.

Dave

Show quoted text

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#9Dave Cramer
pg@fastcrypt.com
In reply to: Dave Cramer (#8)
Re: can't stop autovacuum by HUP'ing the server

On Mon, Aug 25, 2008 at 1:07 PM, Dave Cramer <pg@fastcrypt.com> wrote:

On 25-Aug-08, at 10:43 AM, Alvaro Herrera wrote:

Dave Cramer wrote:

Well, I go the extra mile and kill any remaing autovac procs

Here are the logs

2008-08-25 04:00:01 EDT [32276] LOG: autovacuum launcher shutting down
2008-08-25 04:00:01 EDT [20526] LOG: autovacuum launcher started

What did you SIGHUP, the launcher or postmaster? You need the latter.
The launcher should exit automatically at that time.

No, I am HUP'ing the postmaster then subsequently killing any autovacuums
still around, which may be the problem.

I may be killing the launcher prematurely. I'll try again tonight.

Ok, here are the logs from last night

2008-08-26 04:00:02 EDT [25407] LOG: received SIGHUP, reloading
configuration files
2008-08-26 04:00:02 EDT [22649] LOG: autovacuum launcher shutting down
2008-08-26 04:00:02 EDT [30438] LOG: autovacuum launcher started

you can see the SIGHUP, the launcher being shut down, and starting right
back up again ???

is this expected behaviour ?

Dave

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dave Cramer (#9)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer wrote:

Ok, here are the logs from last night

2008-08-26 04:00:02 EDT [25407] LOG: received SIGHUP, reloading
configuration files
2008-08-26 04:00:02 EDT [22649] LOG: autovacuum launcher shutting down
2008-08-26 04:00:02 EDT [30438] LOG: autovacuum launcher started

you can see the SIGHUP, the launcher being shut down, and starting right
back up again ???

is this expected behaviour ?

Certainly not, and that's not what I see here either. I assume process
25407 is (was) the postmaster, yes?

If you "show autovacuum", is it on?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#11Dave Cramer
pg@fastcrypt.com
In reply to: Alvaro Herrera (#10)
Re: can't stop autovacuum by HUP'ing the server

On Tue, Aug 26, 2008 at 9:37 AM, Alvaro Herrera
<alvherre@commandprompt.com>wrote:

Dave Cramer wrote:

Ok, here are the logs from last night

2008-08-26 04:00:02 EDT [25407] LOG: received SIGHUP, reloading
configuration files
2008-08-26 04:00:02 EDT [22649] LOG: autovacuum launcher shutting down
2008-08-26 04:00:02 EDT [30438] LOG: autovacuum launcher started

you can see the SIGHUP, the launcher being shut down, and starting right
back up again ???

is this expected behaviour ?

Certainly not, and that's not what I see here either. I assume process
25407 is (was) the postmaster, yes?

If you "show autovacuum", is it on?

Yes that was the postmaster, and I did check to see if autovacuum was on,
and it was not.

Dave

Show quoted text

--
Alvaro Herrera
http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#12Dave Cramer
pg@fastcrypt.com
In reply to: Dave Cramer (#11)
Re: can't stop autovacuum by HUP'ing the server

On Tue, Aug 26, 2008 at 9:59 AM, Dave Cramer <pg@fastcrypt.com> wrote:

On Tue, Aug 26, 2008 at 9:37 AM, Alvaro Herrera <
alvherre@commandprompt.com> wrote:

Dave Cramer wrote:

Ok, here are the logs from last night

2008-08-26 04:00:02 EDT [25407] LOG: received SIGHUP, reloading
configuration files
2008-08-26 04:00:02 EDT [22649] LOG: autovacuum launcher shutting down
2008-08-26 04:00:02 EDT [30438] LOG: autovacuum launcher started

you can see the SIGHUP, the launcher being shut down, and starting right
back up again ???

is this expected behaviour ?

Certainly not, and that's not what I see here either. I assume process
25407 is (was) the postmaster, yes?

If you "show autovacuum", is it on?

Yes that was the postmaster, and I did check to see if autovacuum was on,
and it was not.

Dave

So where do we go from here ?

#13Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dave Cramer (#12)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer wrote:

On Tue, Aug 26, 2008 at 9:59 AM, Dave Cramer <pg@fastcrypt.com> wrote:

On Tue, Aug 26, 2008 at 9:37 AM, Alvaro Herrera <
alvherre@commandprompt.com> wrote:

Certainly not, and that's not what I see here either. I assume process
25407 is (was) the postmaster, yes?

If you "show autovacuum", is it on?

Yes that was the postmaster, and I did check to see if autovacuum was on,
and it was not.

So where do we go from here ?

The only possible explanation for this behavior is that somebody is
signalling the postmaster due to Xid wraparound issues. This is keyed
on some GUC vars -- Perhaps you have autovacuum_freeze_max_age set to an
insane value?

varsup.c line 246
/*
* We'll start trying to force autovacuums when oldest_datfrozenxid gets
* to be more than autovacuum_freeze_max_age transactions old.
*
* Note: guc.c ensures that autovacuum_freeze_max_age is in a sane range,
* so that xidVacLimit will be well before xidWarnLimit.
*
* [...]
*/
xidVacLimit = oldest_datfrozenxid + autovacuum_freeze_max_age;

...

if (TransactionIdFollowsOrEquals(curXid, xidVacLimit) &&
IsUnderPostmaster)
SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);

However, I think that in allowed configurations you should also receive
these warnings:

/* Give an immediate warning if past the wrap warn point */
if (TransactionIdFollowsOrEquals(curXid, xidWarnLimit))
ereport(WARNING,
(errmsg("database \"%s\" must be vacuumed within %u transactions",
NameStr(*oldest_datname),
xidWrapLimit - curXid),
errhint("To avoid a database shutdown, execute a full-database VACUUM in \"%s\".",
NameStr(*oldest_datname))));

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#14Dave Cramer
pg@fastcrypt.com
In reply to: Alvaro Herrera (#13)
Re: can't stop autovacuum by HUP'ing the server

On Tue, Aug 26, 2008 at 10:56 AM, Alvaro Herrera <alvherre@commandprompt.com

wrote:

Dave Cramer wrote:

On Tue, Aug 26, 2008 at 9:59 AM, Dave Cramer <pg@fastcrypt.com> wrote:

On Tue, Aug 26, 2008 at 9:37 AM, Alvaro Herrera <
alvherre@commandprompt.com> wrote:

Certainly not, and that's not what I see here either. I assume

process

25407 is (was) the postmaster, yes?

If you "show autovacuum", is it on?

Yes that was the postmaster, and I did check to see if autovacuum was

on,

and it was not.

So where do we go from here ?

The only possible explanation for this behavior is that somebody is
signalling the postmaster due to Xid wraparound issues. This is keyed
on some GUC vars -- Perhaps you have autovacuum_freeze_max_age set to an
insane value?

Doesn't appear to be insane ?

#autovacuum_freeze_max_age = 200000000 # maximum XID age before forced
vacuum

#15Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dave Cramer (#14)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer wrote:

On Tue, Aug 26, 2008 at 10:56 AM, Alvaro Herrera <alvherre@commandprompt.com

The only possible explanation for this behavior is that somebody is
signalling the postmaster due to Xid wraparound issues. This is keyed
on some GUC vars -- Perhaps you have autovacuum_freeze_max_age set to an
insane value?

Doesn't appear to be insane ?

#autovacuum_freeze_max_age = 200000000 # maximum XID age before forced
vacuum

Not only sane, but also the default ;-)

What's the max age(pg_database.datfrozenxid)?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#16Dave Cramer
pg@fastcrypt.com
In reply to: Alvaro Herrera (#15)
Re: can't stop autovacuum by HUP'ing the server

On Tue, Aug 26, 2008 at 11:41 AM, Alvaro Herrera <alvherre@commandprompt.com

wrote:

Dave Cramer wrote:

On Tue, Aug 26, 2008 at 10:56 AM, Alvaro Herrera <

alvherre@commandprompt.com

The only possible explanation for this behavior is that somebody is
signalling the postmaster due to Xid wraparound issues. This is keyed
on some GUC vars -- Perhaps you have autovacuum_freeze_max_age set to

an

insane value?

Doesn't appear to be insane ?

#autovacuum_freeze_max_age = 200000000 # maximum XID age before forced
vacuum

Not only sane, but also the default ;-)

What's the max age(pg_database.datfrozenxid)?

select datfrozenxid from pg_database ;
datfrozenxid
--------------
201850617
101850961
86039359
21522712

Show quoted text

--
Alvaro Herrera
http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#17Dave Cramer
pg@fastcrypt.com
In reply to: Dave Cramer (#16)
Re: can't stop autovacuum by HUP'ing the server

On Tue, Aug 26, 2008 at 11:51 AM, Dave Cramer <pg@fastcrypt.com> wrote:

On Tue, Aug 26, 2008 at 11:41 AM, Alvaro Herrera <
alvherre@commandprompt.com> wrote:

Dave Cramer wrote:

On Tue, Aug 26, 2008 at 10:56 AM, Alvaro Herrera <

alvherre@commandprompt.com

The only possible explanation for this behavior is that somebody is
signalling the postmaster due to Xid wraparound issues. This is keyed
on some GUC vars -- Perhaps you have autovacuum_freeze_max_age set to

an

insane value?

Doesn't appear to be insane ?

#autovacuum_freeze_max_age = 200000000 # maximum XID age before forced
vacuum

Not only sane, but also the default ;-)

What's the max age(pg_database.datfrozenxid)?

select datfrozenxid from pg_database ;
datfrozenxid
--------------
201850617
101850961
86039359
21522712

this code in autovacuum.c looks like it might be interesting

if (AutoVacuumShmem->av_signal[AutoVacForkFailed])
{
/*
* If the postmaster failed to start a new
worker, we sleep
* for a little while and resend the
signal. The new worker's
* state is still in memory, so this is
sufficient. After
* that, we restart the main loop.
*
* XXX should we put a limit to the number
of times we retry?
* I don't think it makes much sense,
because a future start
* of a worker will continue to fail in the
same way.
*/

AutoVacuumShmem->av_signal[AutoVacForkFailed] = false;
pg_usleep(100000L); /* 100ms */

SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER);
continue;

Do these signals get cleaned up on a reload ?

Dave

#18Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dave Cramer (#16)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer wrote:

On Tue, Aug 26, 2008 at 11:41 AM, Alvaro Herrera <alvherre@commandprompt.com

What's the max age(pg_database.datfrozenxid)?

select datfrozenxid from pg_database ;
datfrozenxid
--------------
201850617
101850961
86039359
21522712

Well, the first one is over the limit, isn't it?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#19Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dave Cramer (#17)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer wrote:

AutoVacuumShmem->av_signal[AutoVacForkFailed] = false;
pg_usleep(100000L); /* 100ms */

SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER);
continue;

Do these signals get cleaned up on a reload ?

Well, not on a reload specifically, but this signal is of prompt
response (i.e. the postmaster acts immediately on it). See
CheckPostmasterSignal. Also, note that this code starts a worker, not
the launcher which is what you're seeing.

The signal you're looking for is PMSIGNAL_START_AUTOVAC_LAUNCHER (see
the varsup.c code in the vicinity of what I posted earlier). The
postmaster response is to set start_autovac_launcher (see
sigusr1_handler in postmaster.c) and when this is seen set, the launcher
is started (see ServerLoop in postmaster.c). However the bit you're
interested in is *why* the signal is being sent, which is what the
freeze limits determine.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#20Dave Cramer
pg@fastcrypt.com
In reply to: Alvaro Herrera (#19)
Re: can't stop autovacuum by HUP'ing the server

On Tue, Aug 26, 2008 at 12:10 PM, Alvaro Herrera <alvherre@commandprompt.com

wrote:

Dave Cramer wrote:

AutoVacuumShmem->av_signal[AutoVacForkFailed] = false;
pg_usleep(100000L); /* 100ms

*/

SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER);
continue;

Do these signals get cleaned up on a reload ?

Well, not on a reload specifically, but this signal is of prompt
response (i.e. the postmaster acts immediately on it). See
CheckPostmasterSignal. Also, note that this code starts a worker, not
the launcher which is what you're seeing.

Ok, back to why

turns out template0 is the culprit, why is autovac not vacuuming this ?

Dave

#21Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dave Cramer (#20)
#22Dave Cramer
pg@fastcrypt.com
In reply to: Alvaro Herrera (#21)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#21)
#24Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dave Cramer (#22)
#25Dave Cramer
pg@fastcrypt.com
In reply to: Alvaro Herrera (#24)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#24)
#27Joshua D. Drake
jd@commandprompt.com
In reply to: Dave Cramer (#25)
#28Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dave Cramer (#25)
#29Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#26)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#28)
#31Dave Cramer
pg@fastcrypt.com
In reply to: Tom Lane (#30)
#32Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dave Cramer (#31)
#33Dave Cramer
pg@fastcrypt.com
In reply to: Alvaro Herrera (#32)
#34Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dave Cramer (#33)
#35Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#29)