can't stop autovacuum by HUP'ing the server

Started by Dave Cramerover 17 years ago35 messages
#1Dave Cramer
pg@fastcrypt.com

I'd like to stop autovac by changing the conf file then sending the
server a HUP

This "appears" to work, the logs show autovac terminated by
administrative command. Then a few minutes later I see a vacuum
process spawned.

Is it possible that there are timers that aren't being properly
terminated here ?

Dave

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dave Cramer (#1)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer <pg@fastcrypt.com> writes:

I'd like to stop autovac by changing the conf file then sending the
server a HUP

Uh ... why should that stop an autovac already in progress? I'd
only expect it to affect future launches.

regards, tom lane

#3Alvaro Herrera
alvherre@commandprompt.com
In reply to: Dave Cramer (#1)
Re: can't stop autovacuum by HUP'ing the server

Hi Dave,

Dave Cramer wrote:

I'd like to stop autovac by changing the conf file then sending the
server a HUP

This "appears" to work, the logs show autovac terminated by
administrative command. Then a few minutes later I see a vacuum process
spawned.

Is it possible that there are timers that aren't being properly
terminated here ?

I wonder whether there are tables that need an emergency vacuum (i.e.
they have reached the freeze horizon). What version are you running
anyway?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#4Dave Cramer
pg@fastcrypt.com
In reply to: Tom Lane (#2)
Re: can't stop autovacuum by HUP'ing the server

On 24-Aug-08, at 10:12 PM, Tom Lane wrote:

Dave Cramer <pg@fastcrypt.com> writes:

I'd like to stop autovac by changing the conf file then sending the
server a HUP

Uh ... why should that stop an autovac already in progress? I'd
only expect it to affect future launches.

Well, I go the extra mile and kill any remaing autovac procs

Here are the logs

2008-08-25 04:00:01 EDT [32276] LOG: autovacuum launcher shutting down
2008-08-25 04:00:01 EDT [20526] LOG: autovacuum launcher started
2008-08-25 04:00:01 EDT [21404] FATAL: terminating autovacuum
process due to administrator command
2008-08-25 04:00:01 EDT [23418] FATAL: terminating autovacuum
process due to administrator command
2008-08-25 04:00:01 EDT [21703] FATAL: terminating autovacuum
process due to administrator command

So now that I've read this why would the launcher start again ?

Dave

Show quoted text

regards, tom lane

#5Dave Cramer
pg@fastcrypt.com
In reply to: Alvaro Herrera (#3)
Re: can't stop autovacuum by HUP'ing the server

On 24-Aug-08, at 10:17 PM, Alvaro Herrera wrote:

Hi Dave,

Dave Cramer wrote:

I'd like to stop autovac by changing the conf file then sending the
server a HUP

This "appears" to work, the logs show autovac terminated by
administrative command. Then a few minutes later I see a vacuum
process
spawned.

Is it possible that there are timers that aren't being properly
terminated here ?

I wonder whether there are tables that need an emergency vacuum (i.e.
they have reached the freeze horizon). What version are you running
anyway?

version 8.3 and doubtful.

Since a number of people actually do this it would seem like a better
way to temporarily suspend autovac should be on the todo ?

Dave

Show quoted text

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#6Alvaro Herrera
alvherre@commandprompt.com
In reply to: Dave Cramer (#5)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer wrote:

Since a number of people actually do this it would seem like a better
way to temporarily suspend autovac should be on the todo ?

No -- what you are doing is documented to work. However, it only stops
the autovac launcher, not a currently-running worker. If this isn't
working, I'd like to know why.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#7Alvaro Herrera
alvherre@commandprompt.com
In reply to: Dave Cramer (#4)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer wrote:

Well, I go the extra mile and kill any remaing autovac procs

Here are the logs

2008-08-25 04:00:01 EDT [32276] LOG: autovacuum launcher shutting down
2008-08-25 04:00:01 EDT [20526] LOG: autovacuum launcher started

What did you SIGHUP, the launcher or postmaster? You need the latter.
The launcher should exit automatically at that time.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#8Dave Cramer
pg@fastcrypt.com
In reply to: Alvaro Herrera (#7)
Re: can't stop autovacuum by HUP'ing the server

On 25-Aug-08, at 10:43 AM, Alvaro Herrera wrote:

Dave Cramer wrote:

Well, I go the extra mile and kill any remaing autovac procs

Here are the logs

2008-08-25 04:00:01 EDT [32276] LOG: autovacuum launcher shutting
down
2008-08-25 04:00:01 EDT [20526] LOG: autovacuum launcher started

What did you SIGHUP, the launcher or postmaster? You need the latter.
The launcher should exit automatically at that time.

No, I am HUP'ing the postmaster then subsequently killing any
autovacuums still around, which may be the problem.

I may be killing the launcher prematurely. I'll try again tonight.

Dave

Show quoted text

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#9Dave Cramer
pg@fastcrypt.com
In reply to: Dave Cramer (#8)
Re: can't stop autovacuum by HUP'ing the server

On Mon, Aug 25, 2008 at 1:07 PM, Dave Cramer <pg@fastcrypt.com> wrote:

On 25-Aug-08, at 10:43 AM, Alvaro Herrera wrote:

Dave Cramer wrote:

Well, I go the extra mile and kill any remaing autovac procs

Here are the logs

2008-08-25 04:00:01 EDT [32276] LOG: autovacuum launcher shutting down
2008-08-25 04:00:01 EDT [20526] LOG: autovacuum launcher started

What did you SIGHUP, the launcher or postmaster? You need the latter.
The launcher should exit automatically at that time.

No, I am HUP'ing the postmaster then subsequently killing any autovacuums
still around, which may be the problem.

I may be killing the launcher prematurely. I'll try again tonight.

Ok, here are the logs from last night

2008-08-26 04:00:02 EDT [25407] LOG: received SIGHUP, reloading
configuration files
2008-08-26 04:00:02 EDT [22649] LOG: autovacuum launcher shutting down
2008-08-26 04:00:02 EDT [30438] LOG: autovacuum launcher started

you can see the SIGHUP, the launcher being shut down, and starting right
back up again ???

is this expected behaviour ?

Dave

#10Alvaro Herrera
alvherre@commandprompt.com
In reply to: Dave Cramer (#9)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer wrote:

Ok, here are the logs from last night

2008-08-26 04:00:02 EDT [25407] LOG: received SIGHUP, reloading
configuration files
2008-08-26 04:00:02 EDT [22649] LOG: autovacuum launcher shutting down
2008-08-26 04:00:02 EDT [30438] LOG: autovacuum launcher started

you can see the SIGHUP, the launcher being shut down, and starting right
back up again ???

is this expected behaviour ?

Certainly not, and that's not what I see here either. I assume process
25407 is (was) the postmaster, yes?

If you "show autovacuum", is it on?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#11Dave Cramer
pg@fastcrypt.com
In reply to: Alvaro Herrera (#10)
Re: can't stop autovacuum by HUP'ing the server

On Tue, Aug 26, 2008 at 9:37 AM, Alvaro Herrera
<alvherre@commandprompt.com>wrote:

Dave Cramer wrote:

Ok, here are the logs from last night

2008-08-26 04:00:02 EDT [25407] LOG: received SIGHUP, reloading
configuration files
2008-08-26 04:00:02 EDT [22649] LOG: autovacuum launcher shutting down
2008-08-26 04:00:02 EDT [30438] LOG: autovacuum launcher started

you can see the SIGHUP, the launcher being shut down, and starting right
back up again ???

is this expected behaviour ?

Certainly not, and that's not what I see here either. I assume process
25407 is (was) the postmaster, yes?

If you "show autovacuum", is it on?

Yes that was the postmaster, and I did check to see if autovacuum was on,
and it was not.

Dave

Show quoted text

--
Alvaro Herrera
http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#12Dave Cramer
pg@fastcrypt.com
In reply to: Dave Cramer (#11)
Re: can't stop autovacuum by HUP'ing the server

On Tue, Aug 26, 2008 at 9:59 AM, Dave Cramer <pg@fastcrypt.com> wrote:

On Tue, Aug 26, 2008 at 9:37 AM, Alvaro Herrera <
alvherre@commandprompt.com> wrote:

Dave Cramer wrote:

Ok, here are the logs from last night

2008-08-26 04:00:02 EDT [25407] LOG: received SIGHUP, reloading
configuration files
2008-08-26 04:00:02 EDT [22649] LOG: autovacuum launcher shutting down
2008-08-26 04:00:02 EDT [30438] LOG: autovacuum launcher started

you can see the SIGHUP, the launcher being shut down, and starting right
back up again ???

is this expected behaviour ?

Certainly not, and that's not what I see here either. I assume process
25407 is (was) the postmaster, yes?

If you "show autovacuum", is it on?

Yes that was the postmaster, and I did check to see if autovacuum was on,
and it was not.

Dave

So where do we go from here ?

#13Alvaro Herrera
alvherre@commandprompt.com
In reply to: Dave Cramer (#12)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer wrote:

On Tue, Aug 26, 2008 at 9:59 AM, Dave Cramer <pg@fastcrypt.com> wrote:

On Tue, Aug 26, 2008 at 9:37 AM, Alvaro Herrera <
alvherre@commandprompt.com> wrote:

Certainly not, and that's not what I see here either. I assume process
25407 is (was) the postmaster, yes?

If you "show autovacuum", is it on?

Yes that was the postmaster, and I did check to see if autovacuum was on,
and it was not.

So where do we go from here ?

The only possible explanation for this behavior is that somebody is
signalling the postmaster due to Xid wraparound issues. This is keyed
on some GUC vars -- Perhaps you have autovacuum_freeze_max_age set to an
insane value?

varsup.c line 246
/*
* We'll start trying to force autovacuums when oldest_datfrozenxid gets
* to be more than autovacuum_freeze_max_age transactions old.
*
* Note: guc.c ensures that autovacuum_freeze_max_age is in a sane range,
* so that xidVacLimit will be well before xidWarnLimit.
*
* [...]
*/
xidVacLimit = oldest_datfrozenxid + autovacuum_freeze_max_age;

...

if (TransactionIdFollowsOrEquals(curXid, xidVacLimit) &&
IsUnderPostmaster)
SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);

However, I think that in allowed configurations you should also receive
these warnings:

/* Give an immediate warning if past the wrap warn point */
if (TransactionIdFollowsOrEquals(curXid, xidWarnLimit))
ereport(WARNING,
(errmsg("database \"%s\" must be vacuumed within %u transactions",
NameStr(*oldest_datname),
xidWrapLimit - curXid),
errhint("To avoid a database shutdown, execute a full-database VACUUM in \"%s\".",
NameStr(*oldest_datname))));

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#14Dave Cramer
pg@fastcrypt.com
In reply to: Alvaro Herrera (#13)
Re: can't stop autovacuum by HUP'ing the server

On Tue, Aug 26, 2008 at 10:56 AM, Alvaro Herrera <alvherre@commandprompt.com

wrote:

Dave Cramer wrote:

On Tue, Aug 26, 2008 at 9:59 AM, Dave Cramer <pg@fastcrypt.com> wrote:

On Tue, Aug 26, 2008 at 9:37 AM, Alvaro Herrera <
alvherre@commandprompt.com> wrote:

Certainly not, and that's not what I see here either. I assume

process

25407 is (was) the postmaster, yes?

If you "show autovacuum", is it on?

Yes that was the postmaster, and I did check to see if autovacuum was

on,

and it was not.

So where do we go from here ?

The only possible explanation for this behavior is that somebody is
signalling the postmaster due to Xid wraparound issues. This is keyed
on some GUC vars -- Perhaps you have autovacuum_freeze_max_age set to an
insane value?

Doesn't appear to be insane ?

#autovacuum_freeze_max_age = 200000000 # maximum XID age before forced
vacuum

#15Alvaro Herrera
alvherre@commandprompt.com
In reply to: Dave Cramer (#14)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer wrote:

On Tue, Aug 26, 2008 at 10:56 AM, Alvaro Herrera <alvherre@commandprompt.com

The only possible explanation for this behavior is that somebody is
signalling the postmaster due to Xid wraparound issues. This is keyed
on some GUC vars -- Perhaps you have autovacuum_freeze_max_age set to an
insane value?

Doesn't appear to be insane ?

#autovacuum_freeze_max_age = 200000000 # maximum XID age before forced
vacuum

Not only sane, but also the default ;-)

What's the max age(pg_database.datfrozenxid)?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#16Dave Cramer
pg@fastcrypt.com
In reply to: Alvaro Herrera (#15)
Re: can't stop autovacuum by HUP'ing the server

On Tue, Aug 26, 2008 at 11:41 AM, Alvaro Herrera <alvherre@commandprompt.com

wrote:

Dave Cramer wrote:

On Tue, Aug 26, 2008 at 10:56 AM, Alvaro Herrera <

alvherre@commandprompt.com

The only possible explanation for this behavior is that somebody is
signalling the postmaster due to Xid wraparound issues. This is keyed
on some GUC vars -- Perhaps you have autovacuum_freeze_max_age set to

an

insane value?

Doesn't appear to be insane ?

#autovacuum_freeze_max_age = 200000000 # maximum XID age before forced
vacuum

Not only sane, but also the default ;-)

What's the max age(pg_database.datfrozenxid)?

select datfrozenxid from pg_database ;
datfrozenxid
--------------
201850617
101850961
86039359
21522712

Show quoted text

--
Alvaro Herrera
http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#17Dave Cramer
pg@fastcrypt.com
In reply to: Dave Cramer (#16)
Re: can't stop autovacuum by HUP'ing the server

On Tue, Aug 26, 2008 at 11:51 AM, Dave Cramer <pg@fastcrypt.com> wrote:

On Tue, Aug 26, 2008 at 11:41 AM, Alvaro Herrera <
alvherre@commandprompt.com> wrote:

Dave Cramer wrote:

On Tue, Aug 26, 2008 at 10:56 AM, Alvaro Herrera <

alvherre@commandprompt.com

The only possible explanation for this behavior is that somebody is
signalling the postmaster due to Xid wraparound issues. This is keyed
on some GUC vars -- Perhaps you have autovacuum_freeze_max_age set to

an

insane value?

Doesn't appear to be insane ?

#autovacuum_freeze_max_age = 200000000 # maximum XID age before forced
vacuum

Not only sane, but also the default ;-)

What's the max age(pg_database.datfrozenxid)?

select datfrozenxid from pg_database ;
datfrozenxid
--------------
201850617
101850961
86039359
21522712

this code in autovacuum.c looks like it might be interesting

if (AutoVacuumShmem->av_signal[AutoVacForkFailed])
{
/*
* If the postmaster failed to start a new
worker, we sleep
* for a little while and resend the
signal. The new worker's
* state is still in memory, so this is
sufficient. After
* that, we restart the main loop.
*
* XXX should we put a limit to the number
of times we retry?
* I don't think it makes much sense,
because a future start
* of a worker will continue to fail in the
same way.
*/

AutoVacuumShmem->av_signal[AutoVacForkFailed] = false;
pg_usleep(100000L); /* 100ms */

SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER);
continue;

Do these signals get cleaned up on a reload ?

Dave

#18Alvaro Herrera
alvherre@commandprompt.com
In reply to: Dave Cramer (#16)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer wrote:

On Tue, Aug 26, 2008 at 11:41 AM, Alvaro Herrera <alvherre@commandprompt.com

What's the max age(pg_database.datfrozenxid)?

select datfrozenxid from pg_database ;
datfrozenxid
--------------
201850617
101850961
86039359
21522712

Well, the first one is over the limit, isn't it?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#19Alvaro Herrera
alvherre@commandprompt.com
In reply to: Dave Cramer (#17)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer wrote:

AutoVacuumShmem->av_signal[AutoVacForkFailed] = false;
pg_usleep(100000L); /* 100ms */

SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER);
continue;

Do these signals get cleaned up on a reload ?

Well, not on a reload specifically, but this signal is of prompt
response (i.e. the postmaster acts immediately on it). See
CheckPostmasterSignal. Also, note that this code starts a worker, not
the launcher which is what you're seeing.

The signal you're looking for is PMSIGNAL_START_AUTOVAC_LAUNCHER (see
the varsup.c code in the vicinity of what I posted earlier). The
postmaster response is to set start_autovac_launcher (see
sigusr1_handler in postmaster.c) and when this is seen set, the launcher
is started (see ServerLoop in postmaster.c). However the bit you're
interested in is *why* the signal is being sent, which is what the
freeze limits determine.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#20Dave Cramer
pg@fastcrypt.com
In reply to: Alvaro Herrera (#19)
Re: can't stop autovacuum by HUP'ing the server

On Tue, Aug 26, 2008 at 12:10 PM, Alvaro Herrera <alvherre@commandprompt.com

wrote:

Dave Cramer wrote:

AutoVacuumShmem->av_signal[AutoVacForkFailed] = false;
pg_usleep(100000L); /* 100ms

*/

SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER);
continue;

Do these signals get cleaned up on a reload ?

Well, not on a reload specifically, but this signal is of prompt
response (i.e. the postmaster acts immediately on it). See
CheckPostmasterSignal. Also, note that this code starts a worker, not
the launcher which is what you're seeing.

Ok, back to why

turns out template0 is the culprit, why is autovac not vacuuming this ?

Dave

#21Alvaro Herrera
alvherre@commandprompt.com
In reply to: Dave Cramer (#20)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer wrote:

Ok, back to why

turns out template0 is the culprit, why is autovac not vacuuming this ?

Hmm ... template0 is not supposed to need vacuuming, because it is
frozen ... is it marked with datallowconn=false?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#22Dave Cramer
pg@fastcrypt.com
In reply to: Alvaro Herrera (#21)
Re: can't stop autovacuum by HUP'ing the server

On Tue, Aug 26, 2008 at 12:21 PM, Alvaro Herrera <alvherre@commandprompt.com

wrote:

Dave Cramer wrote:

Ok, back to why

turns out template0 is the culprit, why is autovac not vacuuming this ?

Hmm ... template0 is not supposed to need vacuuming, because it is
frozen ... is it marked with datallowconn=false?

Yes

select * from pg_database where datname='template0';
datname | datdba | encoding | datistemplate | datallowconn | datconnlimit
| datlastsysoid | datfrozenxid | dattablespace | datconfig |
datacl
-----------+--------+----------+---------------+--------------+--------------+---------------+--------------+---------------+-----------+-------------------------------------
template0 | 10 | 6 | t | f | -1
| 11510 | 201850617 | 1663 | |
{=c/postgres,postgres=CTc/postgres}

So how to fix ?

Show quoted text

--
Alvaro Herrera
http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#21)
Re: can't stop autovacuum by HUP'ing the server

Alvaro Herrera <alvherre@commandprompt.com> writes:

Dave Cramer wrote:

turns out template0 is the culprit, why is autovac not vacuuming this ?

Hmm ... template0 is not supposed to need vacuuming, because it is
frozen ... is it marked with datallowconn=false?

8.3's autovac doesn't care about that, does it?

Seems like the next step is to enable logging of autovac's decision-making.

regards, tom lane

#24Alvaro Herrera
alvherre@commandprompt.com
In reply to: Dave Cramer (#22)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer wrote:

Yes

select * from pg_database where datname='template0';
datname | datdba | encoding | datistemplate | datallowconn | datconnlimit
| datlastsysoid | datfrozenxid | dattablespace | datconfig |
datacl
-----------+--------+----------+---------------+--------------+--------------+---------------+--------------+---------------+-----------+-------------------------------------
template0 | 10 | 6 | t | f | -1
| 11510 | 201850617 | 1663 | |
{=c/postgres,postgres=CTc/postgres}

So how to fix ?

I think I see the problem -- vac_truncate_clog is not ignoring these
databases when passing the new frozen value to SetTransactionIdLimit.

/*
* Scan pg_database to compute the minimum datfrozenxid
*
* Note: we need not worry about a race condition with new entries being
* inserted by CREATE DATABASE. Any such entry will have a copy of some
* existing DB's datfrozenxid, and that source DB cannot be ours because
* of the interlock against copying a DB containing an active backend.
* Hence the new entry will not reduce the minimum. Also, if two VACUUMs
* concurrently modify the datfrozenxid's of different databases, the
* worst possible outcome is that pg_clog is not truncated as aggressively
* as it could be.
*/
relation = heap_open(DatabaseRelationId, AccessShareLock);

scan = heap_beginscan(relation, SnapshotNow, 0, NULL);

while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Form_pg_database dbform = (Form_pg_database) GETSTRUCT(tuple);

Assert(TransactionIdIsNormal(dbform->datfrozenxid));

if (TransactionIdPrecedes(myXID, dbform->datfrozenxid))
frozenAlreadyWrapped = true;
else if (TransactionIdPrecedes(dbform->datfrozenxid, frozenXID))
{
frozenXID = dbform->datfrozenxid;
namecpy(&oldest_datname, &dbform->datname);
}
}

...

/*
* Update the wrap limit for GetNewTransactionId. Note: this function
* will also signal the postmaster for an(other) autovac cycle if needed.
*/
SetTransactionIdLimit(frozenXID, &oldest_datname);

If it doesn't ignore them, then it should be properly vacuuming
template0 as any other database. We've changed autovac's behavior on
this area back and forth so I may be misremembering what's our rationale
du jour.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#25Dave Cramer
pg@fastcrypt.com
In reply to: Alvaro Herrera (#24)
Re: can't stop autovacuum by HUP'ing the server

On Tue, Aug 26, 2008 at 12:50 PM, Alvaro Herrera <alvherre@commandprompt.com

wrote:

Dave Cramer wrote:

Yes

select * from pg_database where datname='template0';
datname | datdba | encoding | datistemplate | datallowconn |

datconnlimit

| datlastsysoid | datfrozenxid | dattablespace | datconfig |
datacl

-----------+--------+----------+---------------+--------------+--------------+---------------+--------------+---------------+-----------+-------------------------------------

template0 | 10 | 6 | t | f |

-1

| 11510 | 201850617 | 1663 | |
{=c/postgres,postgres=CTc/postgres}

So how to fix ?

I think I see the problem -- vac_truncate_clog is not ignoring these
databases when passing the new frozen value to SetTransactionIdLimit.

/*
* Scan pg_database to compute the minimum datfrozenxid
*
* Note: we need not worry about a race condition with new entries being
* inserted by CREATE DATABASE. Any such entry will have a copy of some
* existing DB's datfrozenxid, and that source DB cannot be ours because
* of the interlock against copying a DB containing an active backend.
* Hence the new entry will not reduce the minimum. Also, if two
VACUUMs
* concurrently modify the datfrozenxid's of different databases, the
* worst possible outcome is that pg_clog is not truncated as
aggressively
* as it could be.
*/
relation = heap_open(DatabaseRelationId, AccessShareLock);

scan = heap_beginscan(relation, SnapshotNow, 0, NULL);

while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Form_pg_database dbform = (Form_pg_database) GETSTRUCT(tuple);

Assert(TransactionIdIsNormal(dbform->datfrozenxid));

if (TransactionIdPrecedes(myXID, dbform->datfrozenxid))
frozenAlreadyWrapped = true;
else if (TransactionIdPrecedes(dbform->datfrozenxid, frozenXID))
{
frozenXID = dbform->datfrozenxid;
namecpy(&oldest_datname, &dbform->datname);
}
}

...

/*
* Update the wrap limit for GetNewTransactionId. Note: this function
* will also signal the postmaster for an(other) autovac cycle if
needed.
*/
SetTransactionIdLimit(frozenXID, &oldest_datname);

If it doesn't ignore them, then it should be properly vacuuming
template0 as any other database. We've changed autovac's behavior on
this area back and forth so I may be misremembering what's our rationale
du jour.

Well, I'm willing to help debug this, however this is a busy production
database and I need to be able to turn it off for a few hours a day. Would
changing autovacuum_freeze_max_age be a solution ?

Dave

Show quoted text

--
Alvaro Herrera
http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#24)
Re: can't stop autovacuum by HUP'ing the server

Alvaro Herrera <alvherre@commandprompt.com> writes:

If it doesn't ignore them, then it should be properly vacuuming
template0 as any other database. We've changed autovac's behavior on
this area back and forth so I may be misremembering what's our rationale
du jour.

AFAICS, the only way in which current autovac treats !datallowconn
databases specially is this test in do_autovacuum:

if (dbForm->datistemplate || !dbForm->datallowconn)
default_freeze_min_age = 0;
else
default_freeze_min_age = vacuum_freeze_min_age;

Perhaps there's something wrong with the idea of setting freeze_min_age
to zero?

regards, tom lane

#27Joshua Drake
jd@commandprompt.com
In reply to: Dave Cramer (#25)
Re: can't stop autovacuum by HUP'ing the server

On Tue, 26 Aug 2008 12:58:59 -0400
"Dave Cramer" <pg@fastcrypt.com> wrote:

Well, I'm willing to help debug this, however this is a busy
production database and I need to be able to turn it off for a few
hours a day. Would changing autovacuum_freeze_max_age be a solution ?

Populate the table pg_autovacuum with all your relations and the
defaults from the postgresql.conf. Then set enabled to FALSE on all the
tuples. When you are ready to turn autovacuum back on, set it to TRUE.

Joshua D. Drake

--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

#28Alvaro Herrera
alvherre@commandprompt.com
In reply to: Dave Cramer (#25)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer wrote:

Well, I'm willing to help debug this, however this is a busy production
database and I need to be able to turn it off for a few hours a day. Would
changing autovacuum_freeze_max_age be a solution ?

Yes.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#29Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#26)
Re: can't stop autovacuum by HUP'ing the server

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

If it doesn't ignore them, then it should be properly vacuuming
template0 as any other database. We've changed autovac's behavior on
this area back and forth so I may be misremembering what's our rationale
du jour.

AFAICS, the only way in which current autovac treats !datallowconn
databases specially is this test in do_autovacuum:

if (dbForm->datistemplate || !dbForm->datallowconn)
default_freeze_min_age = 0;
else
default_freeze_min_age = vacuum_freeze_min_age;

Perhaps there's something wrong with the idea of setting freeze_min_age
to zero?

Nope, AFAICS it's harmless; what it means is that on those databases,
all tuples will be frozen immediately.

I'll try to reproduce the problem here.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#28)
Re: can't stop autovacuum by HUP'ing the server

Alvaro Herrera <alvherre@commandprompt.com> writes:

Dave Cramer wrote:

Well, I'm willing to help debug this, however this is a busy production
database and I need to be able to turn it off for a few hours a day. Would
changing autovacuum_freeze_max_age be a solution ?

Yes.

Could we first see a cycle of autovac log output with
log_autovacuum_min_duration = 0?
Otherwise we're not going to get closer to understanding why it's
not cleaning up template0 for you.

regards, tom lane

#31Dave Cramer
pg@fastcrypt.com
In reply to: Tom Lane (#30)
Re: can't stop autovacuum by HUP'ing the server

On Tue, Aug 26, 2008 at 1:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Dave Cramer wrote:

Well, I'm willing to help debug this, however this is a busy production
database and I need to be able to turn it off for a few hours a day.

Would

changing autovacuum_freeze_max_age be a solution ?

Yes.

Could we first see a cycle of autovac log output with
log_autovacuum_min_duration = 0?
Otherwise we're not going to get closer to understanding why it's
not cleaning up template0 for you.

I actually set that last time I restarted. There's nothing particularly
interesting there. Is there another log GUC that needs to be tweaked to get
more output ?

Dave

#32Alvaro Herrera
alvherre@commandprompt.com
In reply to: Dave Cramer (#31)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer wrote:

On Tue, Aug 26, 2008 at 1:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Could we first see a cycle of autovac log output with
log_autovacuum_min_duration = 0?
Otherwise we're not going to get closer to understanding why it's
not cleaning up template0 for you.

I actually set that last time I restarted. There's nothing particularly
interesting there. Is there another log GUC that needs to be tweaked to get
more output ?

My guess is that autovacuum is skipping the database for some reason, so
there's no log entry at all.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#33Dave Cramer
pg@fastcrypt.com
In reply to: Alvaro Herrera (#32)
Re: can't stop autovacuum by HUP'ing the server

On Tue, Aug 26, 2008 at 1:45 PM, Alvaro Herrera
<alvherre@commandprompt.com>wrote:

Dave Cramer wrote:

On Tue, Aug 26, 2008 at 1:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Could we first see a cycle of autovac log output with
log_autovacuum_min_duration = 0?
Otherwise we're not going to get closer to understanding why it's
not cleaning up template0 for you.

I actually set that last time I restarted. There's nothing particularly
interesting there. Is there another log GUC that needs to be tweaked to

get

more output ?

My guess is that autovacuum is skipping the database for some reason, so
there's no log entry at all.

Seems like a viable explanation, but doesn't advance us any further ?

Dave

#34Alvaro Herrera
alvherre@commandprompt.com
In reply to: Dave Cramer (#33)
Re: can't stop autovacuum by HUP'ing the server

Dave Cramer wrote:

On Tue, Aug 26, 2008 at 1:45 PM, Alvaro Herrera
<alvherre@commandprompt.com>wrote:

My guess is that autovacuum is skipping the database for some reason, so
there's no log entry at all.

Seems like a viable explanation, but doesn't advance us any further ?

Nope, it doesn't -- we need to understand what's the reason. I'm
checking the code.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#35Alvaro Herrera
alvherre@commandprompt.com
In reply to: Alvaro Herrera (#29)
Re: can't stop autovacuum by HUP'ing the server

Alvaro Herrera wrote:

Nope, AFAICS it's harmless; what it means is that on those databases,
all tuples will be frozen immediately.

I'll try to reproduce the problem here.

No luck :-( It works as expected for me.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.