Autovacuum vs statement_timeout
I seem to remember that we'd agreed that autovacuum should ignore any
globally set statement_timeout, on the grounds that a poorly chosen
setting could indefinitely prevent large tables from being vacuumed.
But I do not see anything in autovacuum.c that resets the variable.
Am I just being blind? (Quite possible, as I'm tired and under the
weather.)
The thing that brought this to mind was the idea that Mark
Shuttleworth's open problem might be triggered in part by a statement
timeout interrupting autovacuum at an inopportune point --- some logs
he sent me offlist show that he is using statement_timeout ...
regards, tom lane
Tom Lane wrote:
I seem to remember that we'd agreed that autovacuum should ignore any
globally set statement_timeout, on the grounds that a poorly chosen
setting could indefinitely prevent large tables from being vacuumed.
But I do not see anything in autovacuum.c that resets the variable.
Am I just being blind? (Quite possible, as I'm tired and under the
weather.)The thing that brought this to mind was the idea that Mark
Shuttleworth's open problem might be triggered in part by a statement
timeout interrupting autovacuum at an inopportune point --- some logs
he sent me offlist show that he is using statement_timeout ...
statement_timeout interrupts seem to go through the PG_CATCH-block and
clean up the entry from the vacuum cycle array as they should. But a
SIGINT leading to a "terminating connection due to administrator
command" error does not.
After the recent change in CVS HEAD, CREATE DATABASE tries to
kill(SIGINT) any autovacuum process in the template database. That seems
very dangerous now, it could easily leave stale entries in the cycle id
array. However, it doesn't explain the Mark Shuttleworth's problem
because the 8.2 behavior is to throw an "source database is being
accessed by other users" error instead of killing autovacuum. Maybe
there's something else killing autovacuum processes?
I think we need to add the xid of the vacuum transaction in the vacuum
cycle array, and clean up orphaned entries in _bt_start_vacuum. We're
going to have a hard time plugging every leak one-by-one otherwise.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki@enterprisedb.com> writes:
statement_timeout interrupts seem to go through the PG_CATCH-block and
clean up the entry from the vacuum cycle array as they should. But a
SIGINT leading to a "terminating connection due to administrator
command" error does not.
Hm, that's an interesting thought, but there are no "terminating
connection" messages in Shuttleworth's logs either. So we still lack
the right idea there. (BTW it would be SIGTERM not SIGINT.)
I think we need to add the xid of the vacuum transaction in the vacuum
cycle array, and clean up orphaned entries in _bt_start_vacuum. We're
going to have a hard time plugging every leak one-by-one otherwise.
You're thinking too small --- what this thought actually suggests is
that PG_CATCH can't be used to clean up shared memory at all, and I
don't think we want to accept that. (I see several other places already
where we assume we can do that. We could convert each one into an
on_proc_exit cleanup operation, maybe, but that seems messy and not very
scalable.) I'm thinking we may want to redesign elog(FATAL) processing
so that we escape out to the outer level before calling proc_exit,
thereby allowing CATCH blocks to run first.
Note for the archives: I've argued for some time that SIGTERM'ing
individual backends is an insufficiently tested code path to be exposed
as standard functionality. Looks like that's still true. This is not
a bug for database shutdown because we don't really care if we leave
perfectly clean shared memory behind --- it's only a bug if you try to
SIGTERM an individual vacuum process while leaving the system up.
regards, tom lane
I wrote:
Heikki Linnakangas <heikki@enterprisedb.com> writes:
statement_timeout interrupts seem to go through the PG_CATCH-block and
clean up the entry from the vacuum cycle array as they should. But a
SIGINT leading to a "terminating connection due to administrator
command" error does not.
Hm, that's an interesting thought, but there are no "terminating
connection" messages in Shuttleworth's logs either. So we still lack
the right idea there. (BTW it would be SIGTERM not SIGINT.)
Hold it ... stop the presses ... the reason we saw no "terminating
connection" messages was he was grepping his logs for lines containing
ERROR. Once we look for FATAL too, there are a pile of 'em. I'm not
100% convinced that any are from autovacuum processes, but clearly
*something* is throwing SIGTERM around with abandon in his test
environment. So at this point your theory above looks like a plausible
mechanism for the vacuum cycle array to slowly fill up and eventually
make _bt_start_vacuum fail (or, perhaps, fail sooner than that due to
a repeat vacuum attempt).
I think we need to add the xid of the vacuum transaction in the vacuum
cycle array, and clean up orphaned entries in _bt_start_vacuum. We're
going to have a hard time plugging every leak one-by-one otherwise.
You're thinking too small --- what this thought actually suggests is
that PG_CATCH can't be used to clean up shared memory at all, and I
don't think we want to accept that. (I see several other places already
where we assume we can do that. We could convert each one into an
on_proc_exit cleanup operation, maybe, but that seems messy and not very
scalable.) I'm thinking we may want to redesign elog(FATAL) processing
so that we escape out to the outer level before calling proc_exit,
thereby allowing CATCH blocks to run first.
I was hoping we could do that just as an 8.3 change, but it's now
starting to look like we might have to back-patch it, depending on how
much we care about surviving random SIGTERM attempts. I'd like to wait
for some report from Mark about what's causing all the SIGTERMs before
we evaluate that.
regards, tom lane
Tom Lane wrote:
I wrote:
Heikki Linnakangas <heikki@enterprisedb.com> writes:
statement_timeout interrupts seem to go through the PG_CATCH-block and
clean up the entry from the vacuum cycle array as they should. But a
SIGINT leading to a "terminating connection due to administrator
command" error does not.Hm, that's an interesting thought, but there are no "terminating
connection" messages in Shuttleworth's logs either. So we still lack
the right idea there. (BTW it would be SIGTERM not SIGINT.)Hold it ... stop the presses ... the reason we saw no "terminating
connection" messages was he was grepping his logs for lines containing
ERROR. Once we look for FATAL too, there are a pile of 'em. I'm not
100% convinced that any are from autovacuum processes, but clearly
*something* is throwing SIGTERM around with abandon in his test
environment. So at this point your theory above looks like a plausible
mechanism for the vacuum cycle array to slowly fill up and eventually
make _bt_start_vacuum fail (or, perhaps, fail sooner than that due to
a repeat vacuum attempt).
Hmmm, remember that DatabaseCancelAutovacuumActivity is called on CREATE
DATABASE; but what it does is send SIGINT, not SIGTERM. Also, it's not
in 8.2. SIGINT does terminate the autovac process however.
I haven't read the whole problem report completely, so I'm not sure this
has something to do or not.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes:
Hmmm, remember that DatabaseCancelAutovacuumActivity is called on CREATE
DATABASE; but what it does is send SIGINT, not SIGTERM. Also, it's not
in 8.2. SIGINT does terminate the autovac process however.
I haven't read the whole problem report completely, so I'm not sure this
has something to do or not.
AFAICT, SIGINT should be okay, because it will lead to an ERROR not a
FATAL elog; so control should fall out through the CATCH block before
the autovacuum process quits. The problem is with FATAL elogs.
Mark reports that the only FATAL lines in his logs are instances of
FATAL: terminating connection due to administrator command
FATAL: database "launchpad_ftest" does not exist
and the latter presumably isn't coming out from within the btree vacuum
code, so I don't see any other explanation for a FATAL exit than SIGTERM.
regards, tom lane
Tom Lane wrote:
I seem to remember that we'd agreed that autovacuum should ignore any
globally set statement_timeout, on the grounds that a poorly chosen
setting could indefinitely prevent large tables from being vacuumed.
On a vaguely related matter, should programs such as pg_dump, vacuumdb,
and reindexdb disable statement_timeout?
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
On Sun, Apr 01, 2007 at 12:36:01AM +0200, Peter Eisentraut wrote:
Tom Lane wrote:
I seem to remember that we'd agreed that autovacuum should ignore any
globally set statement_timeout, on the grounds that a poorly chosen
setting could indefinitely prevent large tables from being vacuumed.On a vaguely related matter, should programs such as pg_dump, vacuumdb,
and reindexdb disable statement_timeout?
Youch... yes, they should IMO. Add clusterdb, pg_dumpall and pg_restore
to that list as well (really, pg_dump(all) should output a command to
disable statement_timeout).
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Jim C. Nasby wrote:
On Sun, Apr 01, 2007 at 12:36:01AM +0200, Peter Eisentraut wrote:
Tom Lane wrote:
I seem to remember that we'd agreed that autovacuum should ignore any
globally set statement_timeout, on the grounds that a poorly chosen
setting could indefinitely prevent large tables from being vacuumed.On a vaguely related matter, should programs such as pg_dump, vacuumdb,
and reindexdb disable statement_timeout?Youch... yes, they should IMO. Add clusterdb, pg_dumpall and pg_restore
to that list as well (really, pg_dump(all) should output a command to
disable statement_timeout).
I don't know if that should be a default or not. It is certainly easy
enough to disable it should you want to.
Sincerely,
Joshua D. Drake
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/
On Tue, Apr 17, 2007 at 12:51:51PM -0700, Joshua D. Drake wrote:
Jim C. Nasby wrote:
On Sun, Apr 01, 2007 at 12:36:01AM +0200, Peter Eisentraut wrote:
Tom Lane wrote:
I seem to remember that we'd agreed that autovacuum should ignore any
globally set statement_timeout, on the grounds that a poorly chosen
setting could indefinitely prevent large tables from being vacuumed.On a vaguely related matter, should programs such as pg_dump, vacuumdb,
and reindexdb disable statement_timeout?Youch... yes, they should IMO. Add clusterdb, pg_dumpall and pg_restore
to that list as well (really, pg_dump(all) should output a command to
disable statement_timeout).I don't know if that should be a default or not. It is certainly easy
enough to disable it should you want to.
How would you disable it for those command-line utilities? Or are you
referring to disabling it via an ALTER ROLE SET ... for superusers?
ISTM current behavior is a bit of a foot-gun. These are administrative
shell commands that aren't going to be run by Joe-user.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Tuesday 17 April 2007 18:38, Jim C. Nasby wrote:
On Tue, Apr 17, 2007 at 12:51:51PM -0700, Joshua D. Drake wrote:
Jim C. Nasby wrote:
On Sun, Apr 01, 2007 at 12:36:01AM +0200, Peter Eisentraut wrote:
Tom Lane wrote:
I seem to remember that we'd agreed that autovacuum should ignore any
globally set statement_timeout, on the grounds that a poorly chosen
setting could indefinitely prevent large tables from being vacuumed.On a vaguely related matter, should programs such as pg_dump, vacuumdb,
and reindexdb disable statement_timeout?Youch... yes, they should IMO. Add clusterdb, pg_dumpall and pg_restore
to that list as well (really, pg_dump(all) should output a command to
disable statement_timeout).I don't know if that should be a default or not. It is certainly easy
enough to disable it should you want to.How would you disable it for those command-line utilities? Or are you
referring to disabling it via an ALTER ROLE SET ... for superusers?ISTM current behavior is a bit of a foot-gun. These are administrative
shell commands that aren't going to be run by Joe-user.
I'm with Joshua on this one. Statement_timeout is often used as a means for
protection from long running statements due to server load and locking and
all of the above commands can certainly fall into that area. If people feel
strongly that the command line programs need a way to circumvent it, add
a --ignore-statement-timeout option or similar mechanism.
--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Robert Treat <xzilla@users.sourceforge.net> writes:
I'm with Joshua on this one. Statement_timeout is often used as a means for
protection from long running statements due to server load and locking and
all of the above commands can certainly fall into that area. If people feel
strongly that the command line programs need a way to circumvent it, add
a --ignore-statement-timeout option or similar mechanism.
The worst-case scenario here is that your server fails and you discover
that all your backups are corrupt because you didn't notice pg_dump was
failing due to statement_timeout. (Maybe it just recently started to
fail because your biggest table grew past the point at which the COPY
command exceeded statement_timeout.)
I'm not excited about the other ones but I can see the argument for
making pg_dump force the timeout to 0.
regards, tom lane
Tom Lane wrote:
Robert Treat <xzilla@users.sourceforge.net> writes:
I'm with Joshua on this one. Statement_timeout is often used as a means for
protection from long running statements due to server load and locking and
all of the above commands can certainly fall into that area. If people feel
strongly that the command line programs need a way to circumvent it, add
a --ignore-statement-timeout option or similar mechanism.The worst-case scenario here is that your server fails and you discover
that all your backups are corrupt because you didn't notice pg_dump was
failing due to statement_timeout. (Maybe it just recently started to
fail because your biggest table grew past the point at which the COPY
command exceeded statement_timeout.)I'm not excited about the other ones but I can see the argument for
making pg_dump force the timeout to 0.
I guess my point is, if you are knowledgeable enough to actually set a
statement_timeout, you are likely knowledgeable enough to know how to
turn it off for programs like pg_dump.
Sincerely,
Joshua D. Drake
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/
Joshua D. Drake wrote:
Tom Lane wrote:
Robert Treat <xzilla@users.sourceforge.net> writes:
I'm with Joshua on this one. Statement_timeout is often used as a means
for protection from long running statements due to server load and
locking and all of the above commands can certainly fall into that area.
If people feel strongly that the command line programs need a way to
circumvent it, add a --ignore-statement-timeout option or similar
mechanism.The worst-case scenario here is that your server fails and you discover
that all your backups are corrupt because you didn't notice pg_dump was
failing due to statement_timeout. (Maybe it just recently started to
fail because your biggest table grew past the point at which the COPY
command exceeded statement_timeout.)I'm not excited about the other ones but I can see the argument for
making pg_dump force the timeout to 0.I guess my point is, if you are knowledgeable enough to actually set a
statement_timeout, you are likely knowledgeable enough to know how to
turn it off for programs like pg_dump.
I think that is too strong an assumption, which is why I'm planning to
back-patch the change to reset statement_timeout to 0 on autovacuum till
8.0, as discussed. I think I should also backpatch the change to set
zero_damaged_pages as well (which is not on 8.0 AFAIR).
It's very very easy to change things in postgresql.conf. Actually
knowing what you are doing (i.e. thinking on the consequences on VACUUM
and such) is a whole another matter.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote:
I think that is too strong an assumption, which is why I'm planning to
back-patch the change to reset statement_timeout to 0 on autovacuum till
8.0, as discussed. I think I should also backpatch the change to set
zero_damaged_pages as well (which is not on 8.0 AFAIR).It's very very easy to change things in postgresql.conf. Actually
knowing what you are doing (i.e. thinking on the consequences on VACUUM
and such) is a whole another matter.
Frankly, setting statement_timeout in postgresql.conf seems so risky in
so many ways, perhaps we just need to document that the parameter
probably should not be set in postgresql.conf, and why.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On Tue, Apr 17, 2007 at 10:33:21PM -0400, Bruce Momjian wrote:
Alvaro Herrera wrote:
I think that is too strong an assumption, which is why I'm planning to
back-patch the change to reset statement_timeout to 0 on autovacuum till
8.0, as discussed. I think I should also backpatch the change to set
zero_damaged_pages as well (which is not on 8.0 AFAIR).It's very very easy to change things in postgresql.conf. Actually
knowing what you are doing (i.e. thinking on the consequences on VACUUM
and such) is a whole another matter.Frankly, setting statement_timeout in postgresql.conf seems so risky in
so many ways, perhaps we just need to document that the parameter
probably should not be set in postgresql.conf, and why.
I'd suggest doing both. Tell people that it's dangerous (probably to the
point of a comment in the sample config file), but *also* force it in
pg_dump since you can't really expect people to read the documentation.
//MAgnus
On Tuesday 17 April 2007 21:25, Alvaro Herrera wrote:
I think that is too strong an assumption, which is why I'm planning to
back-patch the change to reset statement_timeout to 0 on autovacuum till
8.0, as discussed. I think I should also backpatch the change to set
zero_damaged_pages as well (which is not on 8.0 AFAIR).
<blinks> Um, can I get a pointer to that thread? I can't imagine why we
would actually want to automatically destroy our data without oversight from
a DBA... I must be reading that wrong.
--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
On Tuesday 17 April 2007 20:54, Tom Lane wrote:
Robert Treat <xzilla@users.sourceforge.net> writes:
I'm with Joshua on this one. Statement_timeout is often used as a means
for protection from long running statements due to server load and
locking and all of the above commands can certainly fall into that area.
If people feel strongly that the command line programs need a way to
circumvent it, add a --ignore-statement-timeout option or similar
mechanism.The worst-case scenario here is that your server fails and you discover
that all your backups are corrupt because you didn't notice pg_dump was
failing due to statement_timeout. (Maybe it just recently started to
fail because your biggest table grew past the point at which the COPY
command exceeded statement_timeout.)
I don't think I recall anyone ever complaining about this, and this scenario
has been plausible for *years*...
I'm not excited about the other ones but I can see the argument for
making pg_dump force the timeout to 0.
Allowing pg_dump to run un-checked could also lead to problems such as
exceeding maintenence windows causing performance issues, or causing trouble
due to lock contention with ongoing pg_dumps. I'll grant that the downsides
aren't as extreme, but the current functionality provides simple work arounds
(setting up specific dump users for example). If we force pg_dump to 0
timeout, what means will be provided for the DBA who doesn't want to let
pg_dump run unchecked?
--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Robert Treat wrote:
On Tuesday 17 April 2007 21:25, Alvaro Herrera wrote:
I think that is too strong an assumption, which is why I'm planning to
back-patch the change to reset statement_timeout to 0 on autovacuum till
8.0, as discussed. I think I should also backpatch the change to set
zero_damaged_pages as well (which is not on 8.0 AFAIR).<blinks> Um, can I get a pointer to that thread? I can't imagine why we
would actually want to automatically destroy our data without oversight from
a DBA... I must be reading that wrong.
You are -- I intend to set it to _off_ :-)
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Robert Treat wrote:
On Tuesday 17 April 2007 20:54, Tom Lane wrote:
I'm not excited about the other ones but I can see the argument for
making pg_dump force the timeout to 0.Allowing pg_dump to run un-checked could also lead to problems such as
exceeding maintenence windows causing performance issues, or causing trouble
due to lock contention with ongoing pg_dumps.
I have never ever seen a request to be able to control pg_dump and have
it stop dumping if the time taken to dump exceeded a threshold.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support