Autovacuum launcher doesn't notice death of postmaster immediately

Started by Peter Eisentrautalmost 19 years ago25 messageshackers
Jump to latest
#1Peter Eisentraut
peter_e@gmx.net

I notice that in 8.3, when I kill the postmaster process with SIGKILL or
SIGSEGV, the child processes writer and stats collector go away
immediately, but the autovacuum launcher hangs around for up to a
minute. (I suppose this has to do with the periodic wakeups?). When
you try to restart the postmaster before that it fails with a complaint
that someone is still attached to the shared memory segment.

These are obviously not normal modes of operation, but I fear that this
could cause some problems with people's control scripts of the
sort, "it crashed, let's try to restart it".

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Eisentraut (#1)
Re: Autovacuum launcher doesn't notice death of postmaster immediately

Peter Eisentraut wrote:

I notice that in 8.3, when I kill the postmaster process with SIGKILL or
SIGSEGV, the child processes writer and stats collector go away
immediately, but the autovacuum launcher hangs around for up to a
minute. (I suppose this has to do with the periodic wakeups?). When
you try to restart the postmaster before that it fails with a complaint
that someone is still attached to the shared memory segment.

These are obviously not normal modes of operation, but I fear that this
could cause some problems with people's control scripts of the
sort, "it crashed, let's try to restart it".

The launcher is set up to wake up in autovacuum_naptime seconds at most.
So if the user configures a ridiculuos time (for example 86400 seconds,
which I've seen) then the launcher would not detect the postmaster death
for a very long time, which is probably bad. (You measured a one minute
delay because that's the default naptime).

Maybe this is not such a hot idea, and we should wake the launcher up
every 10 seconds (or less?). I picked 10 seconds because that's the
time the bgwriter sleeps if there is no activity configured. Does this
sound acceptable? The only problem with waking it up too frequently is
that it would be waking the system up (for gettimeofday()) even if
nothing is happening.

I also just noticed that the launcher will check if postmaster is alive,
then sleep, and then possibly do some work. So if the postmaster died
in the sleep period, the launcher might try to do some work. Should we
add a check for postmaster liveliness after the sleep?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#3Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Alvaro Herrera (#2)
Re: Autovacuum launcher doesn't notice death of postmaster immediately

On Mon, Jun 04, 2007 at 11:04:26AM -0400, Alvaro Herrera wrote:

The launcher is set up to wake up in autovacuum_naptime seconds at most.
So if the user configures a ridiculuos time (for example 86400 seconds,
which I've seen) then the launcher would not detect the postmaster death

Yeah, I've seen people set that up with the intention of "now autovacuum
will only run during our slow time!". I'm thinking it'd be worth
mentioning in the docs that this won't work, and instead suggesting that
they run vacuumdb -a or equivalent at that time instead. Thoughts?
--
Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#4Andrew Hammond
andrew.george.hammond@gmail.com
In reply to: Jim Nasby (#3)
Re: Autovacuum launcher doesn't notice death of postmaster immediately

On 6/7/07, Jim C. Nasby <decibel@decibel.org> wrote:

On Mon, Jun 04, 2007 at 11:04:26AM -0400, Alvaro Herrera wrote:

The launcher is set up to wake up in autovacuum_naptime seconds at most.
So if the user configures a ridiculuos time (for example 86400 seconds,
which I've seen) then the launcher would not detect the postmaster death

Is there some threshold after which we should have PostgreSQL emit a
warning to the effect of "autovacuum_naptime is very large. Are you
sure you know what you're doing?"

Yeah, I've seen people set that up with the intention of "now autovacuum
will only run during our slow time!". I'm thinking it'd be worth
mentioning in the docs that this won't work, and instead suggesting that
they run vacuumdb -a or equivalent at that time instead. Thoughts?

Hmmm... it seems to me that points new users towards not using
autovacuum, which doesn't seem like the best idea. I think it'd be
better to say that setting the naptime really high is a Bad Idea.
Instead, if they want to shift maintenances to "off hours" they should
consider using a cron job that bonks around the
pg_autovacuum.vac_base_thresh or vac_scale_factor values for tables
they don't want vacuumed during "operational hours" (set them really
high at the start of operational hours, then to normal during off
hours). Tweaking the enable column would work too, but they presumably
don't want to disable ANALYZE, although it's entirely likely that new
users don't know what ANALYZE does, in which case they _really_ don't
want to disable it.

This should probably be very close to a section that says something
about how insufficient maintenance can be expected to lead to greater
performance issues than using autovacuum with default settings.
Assuming we believe that to be the case, which I think is reasonable
given that we are now defaulting to having autovacuum enabled.

Andrew

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Hammond (#4)
Re: Autovacuum launcher doesn't notice death of postmaster immediately

"Andrew Hammond" <andrew.george.hammond@gmail.com> writes:

Hmmm... it seems to me that points new users towards not using
autovacuum, which doesn't seem like the best idea. I think it'd be
better to say that setting the naptime really high is a Bad Idea.

It seems like we should have an upper limit on the GUC variable that's
less than INT_MAX ;-). Would an hour be sane? 10 minutes?

This is independent of the problem at hand, though, which is that we
probably want the launcher to notice postmaster death in less time
than autovacuum_naptime, for reasonable values of same.

regards, tom lane

#6Matthew T. O'Connor
matthew@zeut.net
In reply to: Tom Lane (#5)
Re: Autovacuum launcher doesn't notice death of postmaster immediately

Tom Lane wrote:

"Andrew Hammond" <andrew.george.hammond@gmail.com> writes:

Hmmm... it seems to me that points new users towards not using
autovacuum, which doesn't seem like the best idea. I think it'd be
better to say that setting the naptime really high is a Bad Idea.

It seems like we should have an upper limit on the GUC variable that's
less than INT_MAX ;-). Would an hour be sane? 10 minutes?

This is independent of the problem at hand, though, which is that we
probably want the launcher to notice postmaster death in less time
than autovacuum_naptime, for reasonable values of same.

Do we need a configurable autovacuum naptime at all? I know I put it in
the original contrib autovacuum because I had no idea what knobs might
be needed. I can't see a good reason to ever have a naptime longer than
the default 60 seconds, but I suppose one might want a smaller naptime
for a very active system?

#7Michael Paesold
mpaesold@gmx.at
In reply to: Matthew T. O'Connor (#6)
Re: Autovacuum launcher doesn't notice death of postmaster immediately

Matthew T. O'Connor schrieb:

Tom Lane wrote:

"Andrew Hammond" <andrew.george.hammond@gmail.com> writes:

Hmmm... it seems to me that points new users towards not using
autovacuum, which doesn't seem like the best idea. I think it'd be
better to say that setting the naptime really high is a Bad Idea.

It seems like we should have an upper limit on the GUC variable that's
less than INT_MAX ;-). Would an hour be sane? 10 minutes?

This is independent of the problem at hand, though, which is that we
probably want the launcher to notice postmaster death in less time
than autovacuum_naptime, for reasonable values of same.

Do we need a configurable autovacuum naptime at all? I know I put it in
the original contrib autovacuum because I had no idea what knobs might
be needed. I can't see a good reason to ever have a naptime longer than
the default 60 seconds, but I suppose one might want a smaller naptime
for a very active system?

A PostgreSQL database on my laptop for testing. It should use as little
resources as possible while being idle. That would be a scenario for
naptime greater than 60 seconds, wouldn't it?

Best Regards
Michael Paesold

#8Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Andrew Hammond (#4)
Re: Autovacuum launcher doesn't notice death of postmaster immediately

The launcher is set up to wake up in autovacuum_naptime seconds at

most.

Imho the fix is usually to have a sleep loop.

Andreas

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Zeugswetter Andreas SB SD (#8)
Re: Autovacuum launcher doesn't notice death of postmaster immediately

Zeugswetter Andreas ADI SD escribi�:

The launcher is set up to wake up in autovacuum_naptime seconds at
most.

Imho the fix is usually to have a sleep loop.

This is what we have. The sleep time depends on the schedule of next
vacuum for the closest database in time. If naptime is high, the sleep
time will be high (depending on number of databases needing attention).

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#10Matthew T. O'Connor
matthew@zeut.net
In reply to: Michael Paesold (#7)
Re: Autovacuum launcher doesn't notice death of postmaster immediately

Michael Paesold wrote:

Matthew T. O'Connor schrieb:

Do we need a configurable autovacuum naptime at all? I know I put it
in the original contrib autovacuum because I had no idea what knobs
might be needed. I can't see a good reason to ever have a naptime
longer than the default 60 seconds, but I suppose one might want a
smaller naptime for a very active system?

A PostgreSQL database on my laptop for testing. It should use as little
resources as possible while being idle. That would be a scenario for
naptime greater than 60 seconds, wouldn't it?

Perhaps, but that isn't the use case PostgresSQL is being designed for.
If that is what you really need, then you should probably disable
autovacuum. Also a very long naptime means that autovacuum will still
wake up at random times and to do the work. At least with short
naptime, it will do the work shortly after you updated your tables.

#11Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Alvaro Herrera (#9)
Re: Autovacuum launcher doesn't notice death of postmaster immediately

The launcher is set up to wake up in autovacuum_naptime

seconds

at most.

Imho the fix is usually to have a sleep loop.

This is what we have. The sleep time depends on the schedule
of next vacuum for the closest database in time. If naptime
is high, the sleep time will be high (depending on number of
databases needing attention).

No, I meant a "while (sleep 1(or 10) and counter < longtime) check for
exit" instead of "sleep longtime".

Andreas

#12Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Zeugswetter Andreas SB SD (#11)
Re: Autovacuum launcher doesn't notice death of postmaster immediately

Zeugswetter Andreas ADI SD escribi�:

The launcher is set up to wake up in autovacuum_naptime

seconds

at most.

Imho the fix is usually to have a sleep loop.

This is what we have. The sleep time depends on the schedule
of next vacuum for the closest database in time. If naptime
is high, the sleep time will be high (depending on number of
databases needing attention).

No, I meant a "while (sleep 1(or 10) and counter < longtime) check for
exit" instead of "sleep longtime".

Ah; yes, what I was proposing (or thought about proposing, not sure if I
posted it or not) was putting a upper limit of 10 seconds in the sleep
(bgwriter sleeps 10 seconds if configured to not do anything). Though
10 seconds may seem like an eternity for systems like the ones Peter was
talking about, where there is a script trying to restart the server as
soon as the postmaster dies.

--
Alvaro Herrera Developer, http://www.PostgreSQL.org/
"Lim�tate a mirar... y algun d�a veras"

#13Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Matthew T. O'Connor (#10)
Re: Autovacuum launcher doesn't notice death of postmaster immediately

On Fri, Jun 08, 2007 at 09:49:56AM -0400, Matthew O'Connor wrote:

Michael Paesold wrote:

Matthew T. O'Connor schrieb:

Do we need a configurable autovacuum naptime at all? I know I put it
in the original contrib autovacuum because I had no idea what knobs
might be needed. I can't see a good reason to ever have a naptime
longer than the default 60 seconds, but I suppose one might want a
smaller naptime for a very active system?

A PostgreSQL database on my laptop for testing. It should use as little
resources as possible while being idle. That would be a scenario for
naptime greater than 60 seconds, wouldn't it?

Perhaps, but that isn't the use case PostgresSQL is being designed for.
If that is what you really need, then you should probably disable
autovacuum. Also a very long naptime means that autovacuum will still
wake up at random times and to do the work. At least with short
naptime, it will do the work shortly after you updated your tables.

Agreed. Maybe 10 minutes might make sense, but the overhead of checking
to see if anything needs vacuuming is pretty tiny.

There *is* reason to allow setting the naptime smaller, though (or at
least there was; perhaps Alvero's recent changes negate this need):
clusters that have a large number of databases. I've worked with folks
who are in a hosted environment and give each customer their own
database; it's not hard to get a couple hundred databases that way.
Setting the naptime higher than a second in such an environment would
mean it could be hours before a database is checked for vacuuming.
--
Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#14Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Andrew Hammond (#4)
Re: Autovacuum launcher doesn't notice death of postmaster immediately

On Thu, Jun 07, 2007 at 12:13:09PM -0700, Andrew Hammond wrote:

On 6/7/07, Jim C. Nasby <decibel@decibel.org> wrote:

On Mon, Jun 04, 2007 at 11:04:26AM -0400, Alvaro Herrera wrote:

The launcher is set up to wake up in autovacuum_naptime seconds at most.
So if the user configures a ridiculuos time (for example 86400 seconds,
which I've seen) then the launcher would not detect the postmaster death

Is there some threshold after which we should have PostgreSQL emit a
warning to the effect of "autovacuum_naptime is very large. Are you
sure you know what you're doing?"

Yeah, I've seen people set that up with the intention of "now autovacuum
will only run during our slow time!". I'm thinking it'd be worth
mentioning in the docs that this won't work, and instead suggesting that
they run vacuumdb -a or equivalent at that time instead. Thoughts?

Hmmm... it seems to me that points new users towards not using
autovacuum, which doesn't seem like the best idea. I think it'd be

I think we could easily word it so that it's clear that just letting
autovacuum do it's thing is preferred.

better to say that setting the naptime really high is a Bad Idea.
Instead, if they want to shift maintenances to "off hours" they should
consider using a cron job that bonks around the
pg_autovacuum.vac_base_thresh or vac_scale_factor values for tables
they don't want vacuumed during "operational hours" (set them really
high at the start of operational hours, then to normal during off
hours). Tweaking the enable column would work too, but they presumably
don't want to disable ANALYZE, although it's entirely likely that new
users don't know what ANALYZE does, in which case they _really_ don't
want to disable it.

That sounds like a rather ugly solution, and one that would be hard to
implement; not something to be putting in the docs.
--
Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#15Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jim Nasby (#13)
Re: Autovacuum launcher doesn't notice death of postmaster immediately

Jim C. Nasby escribi�:

There *is* reason to allow setting the naptime smaller, though (or at
least there was; perhaps Alvero's recent changes negate this need):
clusters that have a large number of databases. I've worked with folks
who are in a hosted environment and give each customer their own
database; it's not hard to get a couple hundred databases that way.
Setting the naptime higher than a second in such an environment would
mean it could be hours before a database is checked for vacuuming.

Yes, the code in HEAD is different -- each database will be considered
separately. So the huge database taking all day to vacuum will not stop
the tiny databases from being vacuumed in a timely manner.

And the very huge table in that database will not stop the other tables
in the database from being vacuumed either. There can be more than one
worker in a single database.

The limit is autovacuum_max_workers.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#16Matthew T. O'Connor
matthew@zeut.net
In reply to: Alvaro Herrera (#15)
Re: Autovacuum launcher doesn't notice death of postmaster immediately

Alvaro Herrera wrote:

Jim C. Nasby escribi�:

There *is* reason to allow setting the naptime smaller, though (or at
least there was; perhaps Alvero's recent changes negate this need):
clusters that have a large number of databases. I've worked with folks
who are in a hosted environment and give each customer their own
database; it's not hard to get a couple hundred databases that way.
Setting the naptime higher than a second in such an environment would
mean it could be hours before a database is checked for vacuuming.

Yes, the code in HEAD is different -- each database will be considered
separately. So the huge database taking all day to vacuum will not stop
the tiny databases from being vacuumed in a timely manner.

And the very huge table in that database will not stop the other tables
in the database from being vacuumed either. There can be more than one
worker in a single database.

Ok, but I think the question posed is that in say a virtual hosting
environment there might be say 1,000 databases in the cluster. Am I
still going to have to wait a long time for my database to get vacuumed?
I don't think this has changed much no?

(If default naptime is 1 minute, then autovacuum won't even look at a
given database but once every 1,000 minutes (16.67 hours) assuming that
there isn't enough work to keep all the workers busy.)

#17Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Matthew T. O'Connor (#16)
Re: Autovacuum launcher doesn't notice death of postmaster immediately

Matthew T. O'Connor escribi�:

Ok, but I think the question posed is that in say a virtual hosting
environment there might be say 1,000 databases in the cluster. Am I
still going to have to wait a long time for my database to get vacuumed?
I don't think this has changed much no?

Depends on how much time it takes to vacuum the other 999 databases.
The default max workers is 3.

(If default naptime is 1 minute, then autovacuum won't even look at a
given database but once every 1,000 minutes (16.67 hours) assuming that
there isn't enough work to keep all the workers busy.)

The naptime is per database. Which means if you have 1000 databases and
a naptime of 60 seconds, the launcher is going to wake up every 100
milliseconds to check things up. (This results from 60000 / 1000 = 60
ms, but there is a minimum of 100 ms just to keep things sane).

If there are 3 workers and each of the 1000 databases in average takes
10 seconds to vacuum, there will be around 3000 seconds between autovac
runs of your database assuming my math is right.

I hope those 1000 databases you put in your shared hosting are not very
big.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#18Joshua D. Drake
jd@commandprompt.com
In reply to: Alvaro Herrera (#17)
Re: Autovacuum launcher doesn't notice death of postmaster immediately

Alvaro Herrera wrote:

Matthew T. O'Connor escribió:

Ok, but I think the question posed is that in say a virtual hosting
environment there might be say 1,000 databases in the cluster.

That is uhmmm insane... 1000 databases?

Joshua D. Drake

Am I

still going to have to wait a long time for my database to get vacuumed?
I don't think this has changed much no?

Depends on how much time it takes to vacuum the other 999 databases.
The default max workers is 3.

(If default naptime is 1 minute, then autovacuum won't even look at a
given database but once every 1,000 minutes (16.67 hours) assuming that
there isn't enough work to keep all the workers busy.)

The naptime is per database. Which means if you have 1000 databases and
a naptime of 60 seconds, the launcher is going to wake up every 100
milliseconds to check things up. (This results from 60000 / 1000 = 60
ms, but there is a minimum of 100 ms just to keep things sane).

If there are 3 workers and each of the 1000 databases in average takes
10 seconds to vacuum, there will be around 3000 seconds between autovac
runs of your database assuming my math is right.

I hope those 1000 databases you put in your shared hosting are not very
big.

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

#19Dann Corbit
DCorbit@connx.com
In reply to: Joshua D. Drake (#18)
Re: Autovacuum launcher doesn't notice death of postmaster immediately

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
owner@postgresql.org] On Behalf Of Joshua D. Drake
Sent: Friday, June 08, 2007 10:49 PM
To: Alvaro Herrera
Cc: Matthew T. O'Connor; Jim C. Nasby; Michael Paesold; Tom Lane; Andrew
Hammond; Peter Eisentraut; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Autovacuum launcher doesn't notice death of
postmaster immediately

Alvaro Herrera wrote:

Matthew T. O'Connor escribió:

Ok, but I think the question posed is that in say a virtual hosting
environment there might be say 1,000 databases in the cluster.

That is uhmmm insane... 1000 databases?

Not in a test environment. We have several hundred databases here. Of course, only a few dozen (or at most ~100) are of any one type, but I can imagine that under certain circumstances 1000 databases would not be unreasonable.

[snip]

#20ITAGAKI Takahiro
itagaki.takahiro@oss.ntt.co.jp
In reply to: Alvaro Herrera (#12)
Re: Autovacuum launcher doesn't notice death of postmaster immediately

Alvaro Herrera <alvherre@commandprompt.com> wrote:

No, I meant a "while (sleep 1(or 10) and counter < longtime) check for
exit" instead of "sleep longtime".

Ah; yes, what I was proposing (or thought about proposing, not sure if I
posted it or not) was putting a upper limit of 10 seconds in the sleep
(bgwriter sleeps 10 seconds if configured to not do anything). Though
10 seconds may seem like an eternity for systems like the ones Peter was
talking about, where there is a script trying to restart the server as
soon as the postmaster dies.

Here is a patch for split-sleep of autovacuum_naptime.

There are some other issues in CVS HEAD; We use the calculation
{autovacuum_naptime * 1000000} in launcher_determine_sleep().
The result will be corrupted if we set autovacuum_naptime to >2147.

In another place, we use {autovacuum_naptime * 1000}, so we should
set the upper bound to INT_MAX/1000 instead of INT_MAX.
Incidentally, we've already had the same protections for
log_min_duration_statement and log_autovacuum.

I hope this patch could fix those large-autovacuum_naptime problems.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachments:

autovacuum_naptime_overflow.patchapplication/octet-stream; name=autovacuum_naptime_overflow.patchDownload+11-3
#21Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Alvaro Herrera (#12)
#22Magnus Hagander
magnus@hagander.net
In reply to: Zdenek Kotala (#21)
#23Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Magnus Hagander (#22)
#24Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: ITAGAKI Takahiro (#20)
#25Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#24)