Intermittent buildfarm failures on wrasse
For the past five days or so, wrasse has been intermittently
failing due to unexpectedly not using an Index Only Scan plan
in the create_index test [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-04-08%2003%3A48%3A30, eg
@@ -1910,11 +1910,15 @@
SELECT unique1 FROM tenk1
WHERE unique1 IN (1,42,7)
ORDER BY unique1;
- QUERY PLAN
--------------------------------------------------------
- Index Only Scan using tenk1_unique1 on tenk1
- Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
-(2 rows)
+ QUERY PLAN
+-------------------------------------------------------------------
+ Sort
+ Sort Key: unique1
+ -> Bitmap Heap Scan on tenk1
+ Recheck Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+ -> Bitmap Index Scan on tenk1_unique1
+ Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+(6 rows)
SELECT unique1 FROM tenk1
WHERE unique1 IN (1,42,7)
The most probable explanation for this seems to be that tenk1's
pg_class.relallvisible value hasn't been set high enough to make an IOS
look cheaper than the alternatives. Where that ought to be getting set
is the "VACUUM ANALYZE tenk1" step in test_setup.sql. It's plausible
I guess that a background autovacuum is preventing that command from
setting relallvisible as high as it ought to be --- but if so, why
are we only seeing two plans changing, on only one animal?
But what I'm really confused about is that this test arrangement has
been stable since early February. Why has wrasse suddenly started
showing a 25% failure rate when it never failed this way before that?
Somebody has to have recently committed a change that affects this.
Checking the commit log up to the onset of the failures on 8 April,
I only see two plausible candidates:
* shared-memory pgstats
* Peter's recent VACUUM changes
Any connection to pgstats is, um, pretty obscure. I'd finger the VACUUM
changes as a more likely trigger except that the last interesting-looking
one was f3c15cbe5 on 3 April, and wrasse got through "make check" 38 times
after that before its first failure of this kind. That doesn't square with
the 25% failure rate since then, so I'm kind of forced to the conclusion
that the pgstats work changed some behavior that it should not have.
Any ideas?
I'm tempted to add something like
SELECT relallvisible = relpages FROM pg_class WHERE relname = 'tenk1';
so that we can confirm or refute the theory that relallvisible is
the driving factor.
regards, tom lane
[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-04-08%2003%3A48%3A30
On Wed, Apr 13, 2022 at 3:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm tempted to add something like
SELECT relallvisible = relpages FROM pg_class WHERE relname = 'tenk1';
so that we can confirm or refute the theory that relallvisible is
the driving factor.
It would be fairly straightforward to commit a temporary debugging
patch that has the autovacuum logging stuff report directly on how
VACUUM set new_rel_allvisible in pg_class. We should probably be doing
that already, just because it's useful information that is already
close at hand.
Might be a bit trickier to make sure that wrasse reliably reported on
all relevant VACUUMs, since that would have to include manual VACUUMs
(which would really have to use VACUUM VERBOSE), as well as
autovacuums.
--
Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes:
On Wed, Apr 13, 2022 at 3:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm tempted to add something like
SELECT relallvisible = relpages FROM pg_class WHERE relname = 'tenk1';
so that we can confirm or refute the theory that relallvisible is
the driving factor.
It would be fairly straightforward to commit a temporary debugging
patch that has the autovacuum logging stuff report directly on how
VACUUM set new_rel_allvisible in pg_class. We should probably be doing
that already, just because it's useful information that is already
close at hand.
Doesn't look like wrasse has autovacuum logging enabled, though.
After a bit more navel-contemplation I see a way that the pgstats
work could have changed timing in this area. We used to have a
rate limit on how often stats reports would be sent to the
collector, which'd ensure half a second or so delay before a
transaction's change counts became visible to the autovac daemon.
I've not looked at the new code, but I'm betting that that's gone
and the autovac launcher might start a worker nearly immediately
after some foreground process finishes inserting some rows.
So that could result in autovac activity occurring concurrently
with test_setup where it didn't before.
As to what to do about it ... maybe apply the FREEZE and
DISABLE_PAGE_SKIPPING options in test_setup's vacuums?
It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps
not sufficient.
regards, tom lane
On Thu, 14 Apr 2022 at 10:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
After a bit more navel-contemplation I see a way that the pgstats
work could have changed timing in this area. We used to have a
rate limit on how often stats reports would be sent to the
collector, which'd ensure half a second or so delay before a
transaction's change counts became visible to the autovac daemon.
I've not looked at the new code, but I'm betting that that's gone
and the autovac launcher might start a worker nearly immediately
after some foreground process finishes inserting some rows.
So that could result in autovac activity occurring concurrently
with test_setup where it didn't before.
It's not quite clear to me why the manual vacuum wouldn't just cancel
the autovacuum and complete the job. I can't quite see how there's
room for competing page locks here. Also, see [1]/messages/by-id/20220224153339.pqn64kseb5gpgl74@alap3.anarazel.de. One of the
reported failing tests there is the same as one of the failing tests
on wrasse. My investigation for the AIO branch found that
relallvisible was not equal to relpages. I don't recall the reason why
that was happening now.
As to what to do about it ... maybe apply the FREEZE and
DISABLE_PAGE_SKIPPING options in test_setup's vacuums?
It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps
not sufficient.
We should likely try and confirm it's due to relallvisible first.
David
[1]: /messages/by-id/20220224153339.pqn64kseb5gpgl74@alap3.anarazel.de
On Wed, Apr 13, 2022 at 3:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
After a bit more navel-contemplation I see a way that the pgstats
work could have changed timing in this area. We used to have a
rate limit on how often stats reports would be sent to the
collector, which'd ensure half a second or so delay before a
transaction's change counts became visible to the autovac daemon.
I've not looked at the new code, but I'm betting that that's gone
and the autovac launcher might start a worker nearly immediately
after some foreground process finishes inserting some rows.
So that could result in autovac activity occurring concurrently
with test_setup where it didn't before.
But why should it matter? The test_setup.sql VACUUM of tenk1 should
leave relallvisible and relpages in the same state, either way (or
very close to it).
The only way that it seems like it could matter is if OldestXmin was
held back during test_setup.sql's execution of the VACUUM command.
As to what to do about it ... maybe apply the FREEZE and
DISABLE_PAGE_SKIPPING options in test_setup's vacuums?
It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps
not sufficient.
BTW, the work on VACUUM for Postgres 15 probably makes VACUUM test
flappiness issues less of a problem -- unless they're issues involving
something holding back OldestXmin when it shouldn't (in which case it
won't have any effect on test stability). I would expect that to be
the case, at least, since VACUUM now does almost all of the same work
for any individual page that it cannot get a cleanup lock on. There is
surprisingly little difference between a page that gets processed by
lazy_scan_prune and a page that gets processed by lazy_scan_noprune.
--
Peter Geoghegan
Hi,
On April 13, 2022 7:06:33 PM EDT, David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 14 Apr 2022 at 10:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
After a bit more navel-contemplation I see a way that the pgstats
work could have changed timing in this area. We used to have a
rate limit on how often stats reports would be sent to the
collector, which'd ensure half a second or so delay before a
transaction's change counts became visible to the autovac daemon.
I've not looked at the new code, but I'm betting that that's gone
and the autovac launcher might start a worker nearly immediately
after some foreground process finishes inserting some rows.
So that could result in autovac activity occurring concurrently
with test_setup where it didn't before.It's not quite clear to me why the manual vacuum wouldn't just cancel
the autovacuum and complete the job. I can't quite see how there's
room for competing page locks here. Also, see [1]. One of the
reported failing tests there is the same as one of the failing tests
on wrasse. My investigation for the AIO branch found that
relallvisible was not equal to relpages. I don't recall the reason why
that was happening now.As to what to do about it ... maybe apply the FREEZE and
DISABLE_PAGE_SKIPPING options in test_setup's vacuums?
It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps
not sufficient.We should likely try and confirm it's due to relallvisible first.
We had this issue before, and not just on the aio branch. On my phone right now, so won't look up references.
IIRC the problem in matter isn't skipped pages, but that the horizon simply isn't new enough to mark pages as all visible. An independent autovac worker starting is enough for that, for example. Previously the data load and vacuum were further apart, preventing this kind of issue.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Wed, Apr 13, 2022 at 4:13 PM Andres Freund <andres@anarazel.de> wrote:
IIRC the problem in matter isn't skipped pages, but that the horizon simply isn't new enough to mark pages as all visible.
Sometimes OldestXmin can go backwards in VACUUM operations that are
run in close succession against the same table, due to activity from
other databases in the same cluster (perhaps other factors are
involved at times).
That's why the following assertion that I recently added to
vacuumlazy.c will fail pretty quickly without the
"vacrel->NewRelfrozenXid == OldestXmin" part of its test:
Assert(vacrel->NewRelfrozenXid == OldestXmin ||
TransactionIdPrecedesOrEquals(aggressive ? FreezeLimit :
vacrel->relfrozenxid,
vacrel->NewRelfrozenXid));
If you remove "vacrel->NewRelfrozenXid == OldestXmin", and run the
regression tests, the remaining assertion will fail quite easily.
Though perhaps not with a serial "make check".
--
Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes:
On Wed, Apr 13, 2022 at 4:13 PM Andres Freund <andres@anarazel.de> wrote:
IIRC the problem in matter isn't skipped pages, but that the horizon simply isn't new enough to mark pages as all visible.
Sometimes OldestXmin can go backwards in VACUUM operations that are
run in close succession against the same table, due to activity from
other databases in the same cluster (perhaps other factors are
involved at times).
I've been doing some testing locally by inserting commands to
manually set tenk1's relallvisible to zero. I first did that
in test_setup.sql ... and it had no effect whatsoever. Further
experimentation showed that the "CREATE INDEX ON tenk1" steps
in create_index.sql itself generally suffice to fix relallvisible;
although if you force it back to zero after the last such command,
you get the same plan diffs wrasse is showing. And you don't
get any others, which I thought curious until I realized that
sanity_check.sql's database-wide VACUUM offers yet another
opportunity to heal the incorrect value. If you force it back
to zero again after that, a bunch of later tests start to show
plan differences, which is what I'd been expecting.
So what seems to be happening on wrasse is that a background
autovacuum (or really autoanalyze?) is preventing pages from
being marked all-visible not only during test_setup.sql but
also create_index.sql; but it's gone by the time sanity_check.sql
runs. Which is odd in itself because not that much time elapses
between create_index and sanity_check, certainly less than the
time from test_setup to create_index.
It seems like a reliable fix might require test_setup to wait
for any background autovac to exit before it does its own
vacuums. Ick.
And we still lack an explanation of why this only now broke.
I remain suspicious that pgstats is behaving unexpectedly.
regards, tom lane
On Wed, Apr 13, 2022 at 4:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
So what seems to be happening on wrasse is that a background
autovacuum (or really autoanalyze?) is preventing pages from
being marked all-visible not only during test_setup.sql but
also create_index.sql; but it's gone by the time sanity_check.sql
runs.
I agree that it would need to be an autoanalyze (due to the
PROC_IN_VACUUM optimization).
It seems like a reliable fix might require test_setup to wait
for any background autovac to exit before it does its own
vacuums. Ick.
This is hardly a new problem, really. I wonder if it's worth inventing
a comprehensive solution. Some kind of infrastructure that makes
VACUUM establish a next XID up-front (by calling
ReadNextTransactionId()), and then find a way to run with an
OldestXmin that's >= the earleir "next" XID value. If necessary by
waiting.
--
Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes:
On Wed, Apr 13, 2022 at 4:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
It seems like a reliable fix might require test_setup to wait
for any background autovac to exit before it does its own
vacuums. Ick.
This is hardly a new problem, really. I wonder if it's worth inventing
a comprehensive solution.
Yeah, we have band-aided around this type of problem repeatedly.
Making a fix that's readily accessible from any test script
seems like a good idea.
regards, tom lane
On Wed, Apr 13, 2022 at 4:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Yeah, we have band-aided around this type of problem repeatedly.
Making a fix that's readily accessible from any test script
seems like a good idea.
We might even be able to consistently rely on this new option, given
*any* problem involving test stability and VACUUM. Having a
one-size-fits-all solution to these kinds of stability problems would
be nice -- no more DISABLE_PAGE_SKIPPING bandaids.
That general approach will be possible provided an inability to
acquire a cleanup lock during VACUUM (which can more or less occur at
random in most environments) doesn't ever lead to unexpected test
results. There is good reason to think that it might work out that
way. Simulating problems with acquiring cleanup locks during VACUUM
left me with the impression that that could really work:
/messages/by-id/CAH2-WzkiB-qcsBmWrpzP0nxvrQExoUts1d7TYShg_DrkOHeg4Q@mail.gmail.com
--
Peter Geoghegan
Hi,
On 2022-04-13 16:45:44 -0700, Peter Geoghegan wrote:
On Wed, Apr 13, 2022 at 4:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
So what seems to be happening on wrasse is that a background
autovacuum (or really autoanalyze?) is preventing pages from
being marked all-visible not only during test_setup.sql but
also create_index.sql; but it's gone by the time sanity_check.sql
runs.I agree that it would need to be an autoanalyze (due to the
PROC_IN_VACUUM optimization).
That's not a realiable protection - the snapshot is established normally
at first, only after a while we set PROC_IN_VACUUM...
Greetings,
Andres Freund
Peter Geoghegan <pg@bowt.ie> writes:
On Wed, Apr 13, 2022 at 4:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Yeah, we have band-aided around this type of problem repeatedly.
Making a fix that's readily accessible from any test script
seems like a good idea.
We might even be able to consistently rely on this new option, given
*any* problem involving test stability and VACUUM. Having a
one-size-fits-all solution to these kinds of stability problems would
be nice -- no more DISABLE_PAGE_SKIPPING bandaids.
My guess is that you'd need both this new wait-for-horizon behavior
*and* DISABLE_PAGE_SKIPPING. But the two together ought to make
for pretty reproducible behavior. I noticed while scanning the
commit log that some patches have tried adding a FREEZE option,
which seems more like waving a dead chicken than anything that'd
improve stability.
We'd not necessarily have to embed wait-for-horizon into VACUUM
itself. It seems like a SQL-accessible function could be written
and then called before any problematic VACUUM. I like this better
for something we're thinking of jamming in post-feature-freeze;
we'd not be committing to the feature quite as much as if we
added a VACUUM option.
regards, tom lane
Hi,
On 2022-04-13 18:54:06 -0400, Tom Lane wrote:
We used to have a rate limit on how often stats reports would be sent
to the collector, which'd ensure half a second or so delay before a
transaction's change counts became visible to the autovac daemon.
Just for posterity: That's not actually gone. But what is gone is the
rate limiting in autovacuum about requesting recent stats for a table /
autovac seeing slightly older stats.
Greetings,
Andres Freund
On Wed, Apr 13, 2022 at 5:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
My guess is that you'd need both this new wait-for-horizon behavior
*and* DISABLE_PAGE_SKIPPING. But the two together ought to make
for pretty reproducible behavior. I noticed while scanning the
commit log that some patches have tried adding a FREEZE option,
which seems more like waving a dead chicken than anything that'd
improve stability.
I think that it's more likely that FREEZE will correct problems, out of the two:
* FREEZE forces an aggressive VACUUM whose FreezeLimit is as recent a
cutoff value as possible (FreezeLimit will be equal to OldestXmin).
* DISABLE_PAGE_SKIPPING also forces an aggressive VACUUM. But unlike
FREEZE it makes VACUUM not use the visibility map, even in the case of
all-frozen pages. And it changes nothing about FreezeLimit.
It's also a certainty that VACUUM(FREEZE, DISABLE_PAGE_SKIPPING) is
not a sensible remedy for any problem with test stability, but there
are still some examples of that combination in the regression tests.
The only way it could make sense is if the visibility map might be
corrupt, but surely we're not expecting that anyway (and if we were
we'd be testing it more directly).
I recently argued that DISABLE_PAGE_SKIPPING should have nothing to do
with aggressive vacuuming -- that should all be left up to VACUUM
FREEZE. It seems more logical to make DISABLE_PAGE_SKIPPING mean
"don't use the visibility map to skip anything", without bringing
aggressiveness into it at all. That would be less confusing.
We'd not necessarily have to embed wait-for-horizon into VACUUM
itself. It seems like a SQL-accessible function could be written
and then called before any problematic VACUUM. I like this better
for something we're thinking of jamming in post-feature-freeze;
we'd not be committing to the feature quite as much as if we
added a VACUUM option.
Hmm. I would say that the feature has zero appeal to users anyway.
Maybe it can and should be done through an SQL function for other
reasons, though. Users already think that there are several different
flavors of VACUUM, which isn't really true.
--
Peter Geoghegan
Hi,
On 2022-04-13 20:35:50 -0400, Tom Lane wrote:
Peter Geoghegan <pg@bowt.ie> writes:
On Wed, Apr 13, 2022 at 4:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Yeah, we have band-aided around this type of problem repeatedly.
Making a fix that's readily accessible from any test script
seems like a good idea.We might even be able to consistently rely on this new option, given
*any* problem involving test stability and VACUUM. Having a
one-size-fits-all solution to these kinds of stability problems would
be nice -- no more DISABLE_PAGE_SKIPPING bandaids.My guess is that you'd need both this new wait-for-horizon behavior
*and* DISABLE_PAGE_SKIPPING. But the two together ought to make
for pretty reproducible behavior. I noticed while scanning the
commit log that some patches have tried adding a FREEZE option,
which seems more like waving a dead chicken than anything that'd
improve stability.
I think most of those we've ended up replacing by using temp tables in
those tests instead, since they're not affected by the global horizon
anymore.
We'd not necessarily have to embed wait-for-horizon into VACUUM
itself.
I'm not sure it'd be quite reliable outside of vacuum though, due to the
horizon potentially going backwards (in otherwise harmless ways)?
It seems like a SQL-accessible function could be written
and then called before any problematic VACUUM. I like this better
for something we're thinking of jamming in post-feature-freeze;
we'd not be committing to the feature quite as much as if we
added a VACUUM option.
We could otherwise just disable IOS for that query, for now.
Greetings,
Andres Freund
On Wed, Apr 13, 2022 at 6:03 PM Peter Geoghegan <pg@bowt.ie> wrote:
I think that it's more likely that FREEZE will correct problems, out of the two:
* FREEZE forces an aggressive VACUUM whose FreezeLimit is as recent a
cutoff value as possible (FreezeLimit will be equal to OldestXmin).
The reason why that might have helped (at least in the past) is that
it's enough to force us to wait for a cleanup lock to prune and
freeze, if necessary. Which was never something that
DISABLE_PAGE_SKIPPING could do.
--
Peter Geoghegan
On Wed, Apr 13, 2022 at 6:05 PM Andres Freund <andres@anarazel.de> wrote:
I think most of those we've ended up replacing by using temp tables in
those tests instead, since they're not affected by the global horizon
anymore.
Maybe, but it's a pain to have to work that way. You can't do it in
cases like this, because a temp table is not workable. So that's not
an ideal long term solution.
We'd not necessarily have to embed wait-for-horizon into VACUUM
itself.I'm not sure it'd be quite reliable outside of vacuum though, due to the
horizon potentially going backwards (in otherwise harmless ways)?
I agree, since vacuumlazy.c would need to either be given its own
OldestXmin, or knowledge of a wait-up-to XID. Either way we have to
make non-trivial changes to vacuumlazy.c.
--
Peter Geoghegan
Andres Freund <andres@anarazel.de> writes:
On 2022-04-13 20:35:50 -0400, Tom Lane wrote:
It seems like a SQL-accessible function could be written
and then called before any problematic VACUUM. I like this better
for something we're thinking of jamming in post-feature-freeze;
we'd not be committing to the feature quite as much as if we
added a VACUUM option.
We could otherwise just disable IOS for that query, for now.
The entire point of that test case is to verify the shape of the
IOS plan, so no that's not an acceptable answer. But if we're
looking for quick hacks, we could do
update pg_class set relallvisible = relpages where relname = 'tenk1';
just before that test.
I'm still suspicious of the pgstat changes, though. I checked into
things here by doing
initdb
edit postgresql.conf to set log_autovacuum_min_duration = 0
pg_ctl start && make installcheck-parallel
and what I see is that the first reported autovacuum activity begins
exactly one minute after the postmaster starts, which is what I'd
expect given the autovacuum naptime rules. On my machine, of course,
the installcheck-parallel run is long gone by then. But even on the
much slower wrasse, we should be well past create_index by the time any
autovac worker launches --- you can see from wrasse's reported test
runtimes that only about 10 seconds have elapsed when it get to the end
of create_index.
This suggests to me that what is holding the (presumed) conflicting
snapshot must be the autovac launcher, because what else could it be?
So I'm suspicious that we broke something in that area, though I am
baffled why only wrasse would be telling us so.
regards, tom lane
Hi,
Noah, any chance you could enable log_autovacuum_min_duration=0 on
wrasse?
On 2022-04-13 21:23:12 -0400, Tom Lane wrote:
I'm still suspicious of the pgstat changes, though. I checked into
things here by doinginitdb
edit postgresql.conf to set log_autovacuum_min_duration = 0
pg_ctl start && make installcheck-paralleland what I see is that the first reported autovacuum activity begins
exactly one minute after the postmaster starts, which is what I'd
expect given the autovacuum naptime rules.
It'd not necessarily have to be autovacuum. A CREATE INDEX or VACUUM
using parallelism, could also cause this, I think. It'd be a narrow
window, of course...
Does sparc have wider alignment rules for some types? Perhaps that'd be
enough to put some tables to be sufficiently larger to trigger parallel
vacuum?
Greetings,
Andres Freund