autovacuum causing numerous regression-test failures

Started by Tom Laneover 19 years ago17 messages
#1Tom Lane
tgl@sss.pgh.pa.us

I think we shall have to reconsider that patch to turn it on by default.
So far I've seen two categories of failure:

* manual ANALYZE issued by regression tests fails because autovac is
analyzing the same table concurrently.

* contrib tests fail in their repeated drop/create database operations
because autovac is connected to that database. (pl tests presumably
have same issue.)

There are probably more symptoms we have not seen yet.

In the long run it would be good to figure out fixes to make these
problems not happen, but I'm not putting that on the must-fix-for-8.2
list.

BTW, it would sure be nice to know what happened here:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=wasp&dt=2006-08-28%2017:05:01

LOG: autovacuum process (PID 26315) was terminated by signal 11
LOG: terminating any other active server processes

but even if there was a core file, it got wiped out immediately by
the next "DROP DATABASE" command :-(. This one does look like a
must-fix, if we can find out what happened.

regards, tom lane

#2Alon Goldshuv
agoldshuv@greenplum.com
In reply to: Tom Lane (#1)
Unnecessary rescan for non scrollable holdable cursors

Hi,

When persisting a holdable cursor at COMMIT time we currently choose to
rewind the executor and re-scan the whole result set into the tuplestore in
order to be able to scroll backwards later on. And then, we reposition the
cursor to the position we been in. However, unless I am missing something,
this seems to be done always, even if the cursor is not scrollable. I
suppose adding a simple conditional or two in PersistHoldablePortal() in
portalcmds.c could save the rescan and filling up the tuplestore with tuples
that will never be looked at, in the case that we never want to scroll back.

Anyway, definitely not critical, but should save some time and space in
those specific situations.

Regards,
Alon.

#3Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#1)
Re: autovacuum causing numerous regression-test failures

Tom Lane wrote:

I think we shall have to reconsider that patch to turn it on by
default. So far I've seen two categories of failure:

So we turn autovacuum off for regression test instance.

* manual ANALYZE issued by regression tests fails because autovac is
analyzing the same table concurrently.

Or we put manual exceptions for the affected tables into pg_autovacuum.

* contrib tests fail in their repeated drop/create database
operations because autovac is connected to that database. (pl tests
presumably have same issue.)

I opine that when a database is to be dropped, the connections should be
cut.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#3)
Re: autovacuum causing numerous regression-test failures

Peter Eisentraut <peter_e@gmx.net> writes:

Tom Lane wrote:

I think we shall have to reconsider that patch to turn it on by
default. So far I've seen two categories of failure:

So we turn autovacuum off for regression test instance.

Not a solution for "make installcheck", unless you are proposing adding
the ability to suppress autovac per-database. Which would be a good
new feature ... for 8.3.

* manual ANALYZE issued by regression tests fails because autovac is
analyzing the same table concurrently.

Or we put manual exceptions for the affected tables into pg_autovacuum.

New feature? Or does that capability exist already?

* contrib tests fail in their repeated drop/create database
operations because autovac is connected to that database. (pl tests
presumably have same issue.)

I opine that when a database is to be dropped, the connections should be
cut.

Sure, but that's another thing that we're not going to start designing
and implementing four weeks after feature freeze.

I didn't complain about your proposing two weeks after feature freeze
that we turn autovac on by default, because I assumed (same as you no
doubt) that it would be a trivial one-liner change. It is becoming
clear that that is not the case, and I don't think it makes any sense
from a project-management standpoint to try to flush the problems out
at this time in the release cycle. We have more than enough problems
to fix for 8.2 already. Let's try to do this early in the 8.3 cycle
instead.

regards, tom lane

#5Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#4)
Re: autovacuum causing numerous regression-test failures

Tom Lane wrote:

So we turn autovacuum off for regression test instance.

Not a solution for "make installcheck",

Well, for "make installcheck" we don't have any control over whether
autovacuum has been turned on or off manually anyway. If you are
concerned about build farm reliability, the build farm scripts can
surely be made to initialize or start the instance in a particular way.

Another option might be to turn off stats_row_level on the fly.

Or we put manual exceptions for the affected tables into
pg_autovacuum.

New feature? Or does that capability exist already?

I haven't ever used the pg_autovacuum table but the documentation
certainly makes one believe that this is possible.

I opine that when a database is to be dropped, the connections
should be cut.

Sure, but that's another thing that we're not going to start
designing and implementing four weeks after feature freeze.

Right.

clear that that is not the case, and I don't think it makes any sense
from a project-management standpoint to try to flush the problems out
at this time in the release cycle. We have more than enough problems
to fix for 8.2 already. Let's try to do this early in the 8.3 cycle
instead.

Let's just consider some of the options a bit more closely, and if they
don't work, we'll revert it.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#1)
Re: autovacuum causing numerous regression-test failures

I wrote:

BTW, it would sure be nice to know what happened here:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=wasp&amp;dt=2006-08-28%2017:05:01
LOG: autovacuum process (PID 26315) was terminated by signal 11

I was able to cause autovac to crash by repeating contrib/intarray
regression test enough times in a row. The cause is not specific
to autovac, it's a generic bug created by my recent patch to add
"waiting" status to pg_stat_activity. If we block on a lock during
InitPostgres then the stats stuff isn't ready yet ... oops.
Patch committed.

The other issues remain problems however.

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#5)
Re: autovacuum causing numerous regression-test failures

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=osprey&amp;dt=2006-08-28%2016:00:17
shows another autovac-induced failure mode:

! psql: FATAL: sorry, too many clients already

initdb is choosing max_connections = 20 on this machine, which is
sufficient to run the parallel regression tests by themselves,
but not regression tests plus autovac.

IIRC initdb will go down to 10 or so connections before deciding
it's hopeless. I don't really want to change that behavior because
it might make it impossible to initdb at all on a small machine.
But probably there needs to be a way for pg_regress to set a floor
on the acceptable max_connections setting while initializing the
test instance for "make check".

This also ties into the recent discussions about whether autovac needs
its own reserved backend slots. Which, again, sounds to me like a fine
idea for 8.3 work.

regards, tom lane

#8Neil Conway
neilc@samurai.com
In reply to: Tom Lane (#4)
Re: autovacuum causing numerous regression-test failures

On Mon, 2006-08-28 at 15:21 -0400, Tom Lane wrote:

We have more than enough problems to fix for 8.2 already. Let's
try to do this early in the 8.3 cycle instead.

I agree -- I think this is exactly the sort of change that is best made
at the beginning of a development cycle, so that there's a whole cycle's
worth of testing to ensure it plays nicely with the rest of the system.

-Neil

#9Alvaro Herrera
alvherre@commandprompt.com
In reply to: Neil Conway (#8)
Re: autovacuum causing numerous regression-test failures

Neil Conway wrote:

On Mon, 2006-08-28 at 15:21 -0400, Tom Lane wrote:

We have more than enough problems to fix for 8.2 already. Let's
try to do this early in the 8.3 cycle instead.

I agree -- I think this is exactly the sort of change that is best made
at the beginning of a development cycle, so that there's a whole cycle's
worth of testing to ensure it plays nicely with the rest of the system.

On the other hand, the bug Tom found on DROP OWNED a couple of weeks ago
was introduced right at the start of this development cycle, which tells
us that our testing of the development branch is not very exhaustive.
But I agree anyway.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#10Matthew T. O'Connor
matthew@zeut.net
In reply to: Peter Eisentraut (#5)
Re: autovacuum causing numerous regression-test failures

Peter Eisentraut wrote:

Tom Lane wrote:

Not a solution for "make installcheck",

Well, for "make installcheck" we don't have any control over whether
autovacuum has been turned on or off manually anyway. If you are
concerned about build farm reliability, the build farm scripts can
surely be made to initialize or start the instance in a particular way.

Another option might be to turn off stats_row_level on the fly.

I'm sure I'm missing some of the subtleties of make installcheck issues,
but autovacuum can be enabled / disabled on the fly just as easily as
stats_row_level, so I don't see the difference?

Or we put manual exceptions for the affected tables into
pg_autovacuum.

New feature? Or does that capability exist already?

I haven't ever used the pg_autovacuum table but the documentation
certainly makes one believe that this is possible.

Right, if it doesn't work, that would certainly be a bug. This feature
was included during the original integration into the backend during the
8.0 dev cycle.

Let's just consider some of the options a bit more closely, and if they
don't work, we'll revert it.

Agreed.

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matthew T. O'Connor (#10)
Re: autovacuum causing numerous regression-test failures

"Matthew T. O'Connor" <matthew@zeut.net> writes:

Tom Lane wrote:

Not a solution for "make installcheck",

I'm sure I'm missing some of the subtleties of make installcheck issues,
but autovacuum can be enabled / disabled on the fly just as easily as
stats_row_level, so I don't see the difference?

Well, "just as easily" means "edit postgresql.conf and SIGHUP", which is
not an option available to "make installcheck", even if we thought that
an invasive change of the server configuration would be acceptable for
it to do. It's conceivable that we could invent a per-database
autovac-off variable controlled by, say, ALTER DATABASE SET ... but we
haven't got one today.

My objection here is basically that this proposal passed on the
assumption that it would be very nearly zero effort to make it happen.
We are now finding out that we have a fair amount of work to do if we
want autovac to not mess up the regression tests, and I think that has
to mean that the proposal goes back on the shelf until 8.3 development
starts. We are already overcommitted in terms of the stuff that was
submitted *before* feature freeze.

regards, tom lane

#12Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Tom Lane (#11)
Re: autovacuum causing numerous regression-test failures

Tom Lane wrote:

My objection here is basically that this proposal passed on the
assumption that it would be very nearly zero effort to make it happen.
We are now finding out that we have a fair amount of work to do if we
want autovac to not mess up the regression tests, and I think that has
to mean that the proposal goes back on the shelf until 8.3 development
starts. We are already overcommitted in terms of the stuff that was
submitted *before* feature freeze.

Kicking out autovacuum as default is a disaster, it took far too long to
get in the backend already (wasn't it planned for 8.0?).
You discuss this on the base of the regression tests, which obviously
run on installations that do _not_ represent standard recommended
installations. It's required for ages now to have vacuum running
regularly, using cron or so. The regression tests have to deal with that
default situation, in one way or the other (which might well mean "this
tables don't need vacuum" or "this instance doesn't need vacuum"). IMHO
blaming autovacuum for the test failures reverses cause and effect.

Missing vacuum was probably a reason for poor performance of many newbie
pgsql installations (and I must admit that I missed installing the cron
job myself from time to time, though I _knew_ it was needed). As Magnus
already pointed out, all win32 installations have it on by default, to
take them to the safe side. Disabling it for modules a "retail" user
will never launch appears overreacting.

I can positively acknowledge that disabling autovacuum with a
pg_autovacuum row does work, I'm using it in production.

Regards,
Andreas

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andreas Pflug (#12)
Re: autovacuum causing numerous regression-test failures

Andreas Pflug <pgadmin@pse-consulting.de> writes:

Tom Lane wrote:

My objection here is basically that this proposal passed on the
assumption that it would be very nearly zero effort to make it happen.

Kicking out autovacuum as default is a disaster, it took far too long to
get in the backend already (wasn't it planned for 8.0?).

If it's so "disastrous" to not have it, why wasn't it even proposed
until two weeks after feature freeze? Sorry, I'm not buying this
argument.

regards, tom lane

#14Peter Eisentraut
peter_e@gmx.net
In reply to: Andreas Pflug (#12)
Re: autovacuum causing numerous regression-test failures

Am Dienstag, 29. August 2006 11:14 schrieb Andreas Pflug:

already pointed out, all win32 installations have it on by default, to
take them to the safe side. Disabling it for modules a "retail" user
will never launch appears overreacting.

Well, the really big problem is that autovacuum may be connected to a database
when you want to drop it. (There may be related problems like vacuuming a
template database at the wrong time. I'm not sure how that is handled.) I
think this is not only a problem that is specific to the regression testing
but a potential problem in deployment. I have opined earlier how I think
that should behave properly, but we're not going to change that in 8.2.

The other problems that were mentioned are pretty easy to work around by
setting stats_row_level to off on the fly, but that doesn't stop autovacuum
from connecting.

The good thing is that we have collected plenty of interesting data in the
last 24 hours which will make for plenty of development work next time
around. :)

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#15Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Tom Lane (#13)
Re: autovacuum causing numerous regression-test failures

Tom Lane wrote:

Andreas Pflug <pgadmin@pse-consulting.de> writes:

Tom Lane wrote:

My objection here is basically that this proposal passed on the
assumption that it would be very nearly zero effort to make it happen.

Kicking out autovacuum as default is a disaster, it took far too long to
get in the backend already (wasn't it planned for 8.0?).

If it's so "disastrous" to not have it, why wasn't it even proposed
until two weeks after feature freeze?

To me, this proposal was just too obvious, for reasons already discussed
earlier.

Regards,
Andreas

#16Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Peter Eisentraut (#14)
Re: autovacuum causing numerous regression-test failures

Peter Eisentraut wrote:

Am Dienstag, 29. August 2006 11:14 schrieb Andreas Pflug:

already pointed out, all win32 installations have it on by default, to
take them to the safe side. Disabling it for modules a "retail" user
will never launch appears overreacting.

Well, the really big problem is that autovacuum may be connected to a database
when you want to drop it. (There may be related problems like vacuuming a
template database at the wrong time. I'm not sure how that is handled.) I
think this is not only a problem that is specific to the regression testing
but a potential problem in deployment. I have opined earlier how I think
that should behave properly, but we're not going to change that in 8.2.

Don't these issues hit a cron scheduled vacuum as well?

Regards,
Andreas

#17Josh Berkus
josh@agliodbs.com
In reply to: Andreas Pflug (#15)
Re: autovacuum causing numerous regression-test failures

Folks,

My vote is with Peter and Tom on not putting it in. We needed to discuss/test
this well before feature freeze if we really wanted to do it.

Here's what needs to be resolved:
a) make autovaccum play nice with the regression tests
b) come up with default threshold/multiplier values which are backed by test
data

--
Josh Berkus
PostgreSQL @ Sun
San Francisco