Intermittent buildfarm failures on wrasse

Started by Tom Laneabout 4 years ago88 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

For the past five days or so, wrasse has been intermittently
failing due to unexpectedly not using an Index Only Scan plan
in the create_index test [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-04-08%2003%3A48%3A30, eg

@@ -1910,11 +1910,15 @@
 SELECT unique1 FROM tenk1
 WHERE unique1 IN (1,42,7)
 ORDER BY unique1;
-                      QUERY PLAN                       
--------------------------------------------------------
- Index Only Scan using tenk1_unique1 on tenk1
-   Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
-(2 rows)
+                            QUERY PLAN                             
+-------------------------------------------------------------------
+ Sort
+   Sort Key: unique1
+   ->  Bitmap Heap Scan on tenk1
+         Recheck Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+         ->  Bitmap Index Scan on tenk1_unique1
+               Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+(6 rows)

SELECT unique1 FROM tenk1
WHERE unique1 IN (1,42,7)

The most probable explanation for this seems to be that tenk1's
pg_class.relallvisible value hasn't been set high enough to make an IOS
look cheaper than the alternatives. Where that ought to be getting set
is the "VACUUM ANALYZE tenk1" step in test_setup.sql. It's plausible
I guess that a background autovacuum is preventing that command from
setting relallvisible as high as it ought to be --- but if so, why
are we only seeing two plans changing, on only one animal?

But what I'm really confused about is that this test arrangement has
been stable since early February. Why has wrasse suddenly started
showing a 25% failure rate when it never failed this way before that?
Somebody has to have recently committed a change that affects this.
Checking the commit log up to the onset of the failures on 8 April,
I only see two plausible candidates:

* shared-memory pgstats
* Peter's recent VACUUM changes

Any connection to pgstats is, um, pretty obscure. I'd finger the VACUUM
changes as a more likely trigger except that the last interesting-looking
one was f3c15cbe5 on 3 April, and wrasse got through "make check" 38 times
after that before its first failure of this kind. That doesn't square with
the 25% failure rate since then, so I'm kind of forced to the conclusion
that the pgstats work changed some behavior that it should not have.

Any ideas?

I'm tempted to add something like

SELECT relallvisible = relpages FROM pg_class WHERE relname = 'tenk1';

so that we can confirm or refute the theory that relallvisible is
the driving factor.

regards, tom lane

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-04-08%2003%3A48%3A30

In reply to: Tom Lane (#1)
Re: Intermittent buildfarm failures on wrasse

On Wed, Apr 13, 2022 at 3:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm tempted to add something like

SELECT relallvisible = relpages FROM pg_class WHERE relname = 'tenk1';

so that we can confirm or refute the theory that relallvisible is
the driving factor.

It would be fairly straightforward to commit a temporary debugging
patch that has the autovacuum logging stuff report directly on how
VACUUM set new_rel_allvisible in pg_class. We should probably be doing
that already, just because it's useful information that is already
close at hand.

Might be a bit trickier to make sure that wrasse reliably reported on
all relevant VACUUMs, since that would have to include manual VACUUMs
(which would really have to use VACUUM VERBOSE), as well as
autovacuums.

--
Peter Geoghegan

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#2)
Re: Intermittent buildfarm failures on wrasse

Peter Geoghegan <pg@bowt.ie> writes:

On Wed, Apr 13, 2022 at 3:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm tempted to add something like
SELECT relallvisible = relpages FROM pg_class WHERE relname = 'tenk1';
so that we can confirm or refute the theory that relallvisible is
the driving factor.

It would be fairly straightforward to commit a temporary debugging
patch that has the autovacuum logging stuff report directly on how
VACUUM set new_rel_allvisible in pg_class. We should probably be doing
that already, just because it's useful information that is already
close at hand.

Doesn't look like wrasse has autovacuum logging enabled, though.

After a bit more navel-contemplation I see a way that the pgstats
work could have changed timing in this area. We used to have a
rate limit on how often stats reports would be sent to the
collector, which'd ensure half a second or so delay before a
transaction's change counts became visible to the autovac daemon.
I've not looked at the new code, but I'm betting that that's gone
and the autovac launcher might start a worker nearly immediately
after some foreground process finishes inserting some rows.
So that could result in autovac activity occurring concurrently
with test_setup where it didn't before.

As to what to do about it ... maybe apply the FREEZE and
DISABLE_PAGE_SKIPPING options in test_setup's vacuums?
It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps
not sufficient.

regards, tom lane

#4David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#3)
Re: Intermittent buildfarm failures on wrasse

On Thu, 14 Apr 2022 at 10:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:

After a bit more navel-contemplation I see a way that the pgstats
work could have changed timing in this area. We used to have a
rate limit on how often stats reports would be sent to the
collector, which'd ensure half a second or so delay before a
transaction's change counts became visible to the autovac daemon.
I've not looked at the new code, but I'm betting that that's gone
and the autovac launcher might start a worker nearly immediately
after some foreground process finishes inserting some rows.
So that could result in autovac activity occurring concurrently
with test_setup where it didn't before.

It's not quite clear to me why the manual vacuum wouldn't just cancel
the autovacuum and complete the job. I can't quite see how there's
room for competing page locks here. Also, see [1]/messages/by-id/20220224153339.pqn64kseb5gpgl74@alap3.anarazel.de. One of the
reported failing tests there is the same as one of the failing tests
on wrasse. My investigation for the AIO branch found that
relallvisible was not equal to relpages. I don't recall the reason why
that was happening now.

As to what to do about it ... maybe apply the FREEZE and
DISABLE_PAGE_SKIPPING options in test_setup's vacuums?
It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps
not sufficient.

We should likely try and confirm it's due to relallvisible first.

David

[1]: /messages/by-id/20220224153339.pqn64kseb5gpgl74@alap3.anarazel.de

In reply to: Tom Lane (#3)
Re: Intermittent buildfarm failures on wrasse

On Wed, Apr 13, 2022 at 3:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

After a bit more navel-contemplation I see a way that the pgstats
work could have changed timing in this area. We used to have a
rate limit on how often stats reports would be sent to the
collector, which'd ensure half a second or so delay before a
transaction's change counts became visible to the autovac daemon.
I've not looked at the new code, but I'm betting that that's gone
and the autovac launcher might start a worker nearly immediately
after some foreground process finishes inserting some rows.
So that could result in autovac activity occurring concurrently
with test_setup where it didn't before.

But why should it matter? The test_setup.sql VACUUM of tenk1 should
leave relallvisible and relpages in the same state, either way (or
very close to it).

The only way that it seems like it could matter is if OldestXmin was
held back during test_setup.sql's execution of the VACUUM command.

As to what to do about it ... maybe apply the FREEZE and
DISABLE_PAGE_SKIPPING options in test_setup's vacuums?
It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps
not sufficient.

BTW, the work on VACUUM for Postgres 15 probably makes VACUUM test
flappiness issues less of a problem -- unless they're issues involving
something holding back OldestXmin when it shouldn't (in which case it
won't have any effect on test stability). I would expect that to be
the case, at least, since VACUUM now does almost all of the same work
for any individual page that it cannot get a cleanup lock on. There is
surprisingly little difference between a page that gets processed by
lazy_scan_prune and a page that gets processed by lazy_scan_noprune.

--
Peter Geoghegan

#6Andres Freund
andres@anarazel.de
In reply to: David Rowley (#4)
Re: Intermittent buildfarm failures on wrasse

Hi,

On April 13, 2022 7:06:33 PM EDT, David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 14 Apr 2022 at 10:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:

After a bit more navel-contemplation I see a way that the pgstats
work could have changed timing in this area. We used to have a
rate limit on how often stats reports would be sent to the
collector, which'd ensure half a second or so delay before a
transaction's change counts became visible to the autovac daemon.
I've not looked at the new code, but I'm betting that that's gone
and the autovac launcher might start a worker nearly immediately
after some foreground process finishes inserting some rows.
So that could result in autovac activity occurring concurrently
with test_setup where it didn't before.

It's not quite clear to me why the manual vacuum wouldn't just cancel
the autovacuum and complete the job. I can't quite see how there's
room for competing page locks here. Also, see [1]. One of the
reported failing tests there is the same as one of the failing tests
on wrasse. My investigation for the AIO branch found that
relallvisible was not equal to relpages. I don't recall the reason why
that was happening now.

As to what to do about it ... maybe apply the FREEZE and
DISABLE_PAGE_SKIPPING options in test_setup's vacuums?
It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps
not sufficient.

We should likely try and confirm it's due to relallvisible first.

We had this issue before, and not just on the aio branch. On my phone right now, so won't look up references.

IIRC the problem in matter isn't skipped pages, but that the horizon simply isn't new enough to mark pages as all visible. An independent autovac worker starting is enough for that, for example. Previously the data load and vacuum were further apart, preventing this kind of issue.

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

In reply to: Andres Freund (#6)
Re: Intermittent buildfarm failures on wrasse

On Wed, Apr 13, 2022 at 4:13 PM Andres Freund <andres@anarazel.de> wrote:

IIRC the problem in matter isn't skipped pages, but that the horizon simply isn't new enough to mark pages as all visible.

Sometimes OldestXmin can go backwards in VACUUM operations that are
run in close succession against the same table, due to activity from
other databases in the same cluster (perhaps other factors are
involved at times).

That's why the following assertion that I recently added to
vacuumlazy.c will fail pretty quickly without the
"vacrel->NewRelfrozenXid == OldestXmin" part of its test:

Assert(vacrel->NewRelfrozenXid == OldestXmin ||
TransactionIdPrecedesOrEquals(aggressive ? FreezeLimit :
vacrel->relfrozenxid,
vacrel->NewRelfrozenXid));

If you remove "vacrel->NewRelfrozenXid == OldestXmin", and run the
regression tests, the remaining assertion will fail quite easily.
Though perhaps not with a serial "make check".

--
Peter Geoghegan

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#7)
Re: Intermittent buildfarm failures on wrasse

Peter Geoghegan <pg@bowt.ie> writes:

On Wed, Apr 13, 2022 at 4:13 PM Andres Freund <andres@anarazel.de> wrote:

IIRC the problem in matter isn't skipped pages, but that the horizon simply isn't new enough to mark pages as all visible.

Sometimes OldestXmin can go backwards in VACUUM operations that are
run in close succession against the same table, due to activity from
other databases in the same cluster (perhaps other factors are
involved at times).

I've been doing some testing locally by inserting commands to
manually set tenk1's relallvisible to zero. I first did that
in test_setup.sql ... and it had no effect whatsoever. Further
experimentation showed that the "CREATE INDEX ON tenk1" steps
in create_index.sql itself generally suffice to fix relallvisible;
although if you force it back to zero after the last such command,
you get the same plan diffs wrasse is showing. And you don't
get any others, which I thought curious until I realized that
sanity_check.sql's database-wide VACUUM offers yet another
opportunity to heal the incorrect value. If you force it back
to zero again after that, a bunch of later tests start to show
plan differences, which is what I'd been expecting.

So what seems to be happening on wrasse is that a background
autovacuum (or really autoanalyze?) is preventing pages from
being marked all-visible not only during test_setup.sql but
also create_index.sql; but it's gone by the time sanity_check.sql
runs. Which is odd in itself because not that much time elapses
between create_index and sanity_check, certainly less than the
time from test_setup to create_index.

It seems like a reliable fix might require test_setup to wait
for any background autovac to exit before it does its own
vacuums. Ick.

And we still lack an explanation of why this only now broke.
I remain suspicious that pgstats is behaving unexpectedly.

regards, tom lane

In reply to: Tom Lane (#8)
Re: Intermittent buildfarm failures on wrasse

On Wed, Apr 13, 2022 at 4:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

So what seems to be happening on wrasse is that a background
autovacuum (or really autoanalyze?) is preventing pages from
being marked all-visible not only during test_setup.sql but
also create_index.sql; but it's gone by the time sanity_check.sql
runs.

I agree that it would need to be an autoanalyze (due to the
PROC_IN_VACUUM optimization).

It seems like a reliable fix might require test_setup to wait
for any background autovac to exit before it does its own
vacuums. Ick.

This is hardly a new problem, really. I wonder if it's worth inventing
a comprehensive solution. Some kind of infrastructure that makes
VACUUM establish a next XID up-front (by calling
ReadNextTransactionId()), and then find a way to run with an
OldestXmin that's >= the earleir "next" XID value. If necessary by
waiting.

--
Peter Geoghegan

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#9)
Re: Intermittent buildfarm failures on wrasse

Peter Geoghegan <pg@bowt.ie> writes:

On Wed, Apr 13, 2022 at 4:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

It seems like a reliable fix might require test_setup to wait
for any background autovac to exit before it does its own
vacuums. Ick.

This is hardly a new problem, really. I wonder if it's worth inventing
a comprehensive solution.

Yeah, we have band-aided around this type of problem repeatedly.
Making a fix that's readily accessible from any test script
seems like a good idea.

regards, tom lane

In reply to: Tom Lane (#10)
Re: Intermittent buildfarm failures on wrasse

On Wed, Apr 13, 2022 at 4:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, we have band-aided around this type of problem repeatedly.
Making a fix that's readily accessible from any test script
seems like a good idea.

We might even be able to consistently rely on this new option, given
*any* problem involving test stability and VACUUM. Having a
one-size-fits-all solution to these kinds of stability problems would
be nice -- no more DISABLE_PAGE_SKIPPING bandaids.

That general approach will be possible provided an inability to
acquire a cleanup lock during VACUUM (which can more or less occur at
random in most environments) doesn't ever lead to unexpected test
results. There is good reason to think that it might work out that
way. Simulating problems with acquiring cleanup locks during VACUUM
left me with the impression that that could really work:

/messages/by-id/CAH2-WzkiB-qcsBmWrpzP0nxvrQExoUts1d7TYShg_DrkOHeg4Q@mail.gmail.com

--
Peter Geoghegan

#12Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#9)
Re: Intermittent buildfarm failures on wrasse

Hi,

On 2022-04-13 16:45:44 -0700, Peter Geoghegan wrote:

On Wed, Apr 13, 2022 at 4:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

So what seems to be happening on wrasse is that a background
autovacuum (or really autoanalyze?) is preventing pages from
being marked all-visible not only during test_setup.sql but
also create_index.sql; but it's gone by the time sanity_check.sql
runs.

I agree that it would need to be an autoanalyze (due to the
PROC_IN_VACUUM optimization).

That's not a realiable protection - the snapshot is established normally
at first, only after a while we set PROC_IN_VACUUM...

Greetings,

Andres Freund

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#11)
Re: Intermittent buildfarm failures on wrasse

Peter Geoghegan <pg@bowt.ie> writes:

On Wed, Apr 13, 2022 at 4:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, we have band-aided around this type of problem repeatedly.
Making a fix that's readily accessible from any test script
seems like a good idea.

We might even be able to consistently rely on this new option, given
*any* problem involving test stability and VACUUM. Having a
one-size-fits-all solution to these kinds of stability problems would
be nice -- no more DISABLE_PAGE_SKIPPING bandaids.

My guess is that you'd need both this new wait-for-horizon behavior
*and* DISABLE_PAGE_SKIPPING. But the two together ought to make
for pretty reproducible behavior. I noticed while scanning the
commit log that some patches have tried adding a FREEZE option,
which seems more like waving a dead chicken than anything that'd
improve stability.

We'd not necessarily have to embed wait-for-horizon into VACUUM
itself. It seems like a SQL-accessible function could be written
and then called before any problematic VACUUM. I like this better
for something we're thinking of jamming in post-feature-freeze;
we'd not be committing to the feature quite as much as if we
added a VACUUM option.

regards, tom lane

#14Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#3)
Re: Intermittent buildfarm failures on wrasse

Hi,

On 2022-04-13 18:54:06 -0400, Tom Lane wrote:

We used to have a rate limit on how often stats reports would be sent
to the collector, which'd ensure half a second or so delay before a
transaction's change counts became visible to the autovac daemon.

Just for posterity: That's not actually gone. But what is gone is the
rate limiting in autovacuum about requesting recent stats for a table /
autovac seeing slightly older stats.

Greetings,

Andres Freund

In reply to: Tom Lane (#13)
Re: Intermittent buildfarm failures on wrasse

On Wed, Apr 13, 2022 at 5:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

My guess is that you'd need both this new wait-for-horizon behavior
*and* DISABLE_PAGE_SKIPPING. But the two together ought to make
for pretty reproducible behavior. I noticed while scanning the
commit log that some patches have tried adding a FREEZE option,
which seems more like waving a dead chicken than anything that'd
improve stability.

I think that it's more likely that FREEZE will correct problems, out of the two:

* FREEZE forces an aggressive VACUUM whose FreezeLimit is as recent a
cutoff value as possible (FreezeLimit will be equal to OldestXmin).

* DISABLE_PAGE_SKIPPING also forces an aggressive VACUUM. But unlike
FREEZE it makes VACUUM not use the visibility map, even in the case of
all-frozen pages. And it changes nothing about FreezeLimit.

It's also a certainty that VACUUM(FREEZE, DISABLE_PAGE_SKIPPING) is
not a sensible remedy for any problem with test stability, but there
are still some examples of that combination in the regression tests.
The only way it could make sense is if the visibility map might be
corrupt, but surely we're not expecting that anyway (and if we were
we'd be testing it more directly).

I recently argued that DISABLE_PAGE_SKIPPING should have nothing to do
with aggressive vacuuming -- that should all be left up to VACUUM
FREEZE. It seems more logical to make DISABLE_PAGE_SKIPPING mean
"don't use the visibility map to skip anything", without bringing
aggressiveness into it at all. That would be less confusing.

We'd not necessarily have to embed wait-for-horizon into VACUUM
itself. It seems like a SQL-accessible function could be written
and then called before any problematic VACUUM. I like this better
for something we're thinking of jamming in post-feature-freeze;
we'd not be committing to the feature quite as much as if we
added a VACUUM option.

Hmm. I would say that the feature has zero appeal to users anyway.
Maybe it can and should be done through an SQL function for other
reasons, though. Users already think that there are several different
flavors of VACUUM, which isn't really true.

--
Peter Geoghegan

#16Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#13)
Re: Intermittent buildfarm failures on wrasse

Hi,

On 2022-04-13 20:35:50 -0400, Tom Lane wrote:

Peter Geoghegan <pg@bowt.ie> writes:

On Wed, Apr 13, 2022 at 4:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, we have band-aided around this type of problem repeatedly.
Making a fix that's readily accessible from any test script
seems like a good idea.

We might even be able to consistently rely on this new option, given
*any* problem involving test stability and VACUUM. Having a
one-size-fits-all solution to these kinds of stability problems would
be nice -- no more DISABLE_PAGE_SKIPPING bandaids.

My guess is that you'd need both this new wait-for-horizon behavior
*and* DISABLE_PAGE_SKIPPING. But the two together ought to make
for pretty reproducible behavior. I noticed while scanning the
commit log that some patches have tried adding a FREEZE option,
which seems more like waving a dead chicken than anything that'd
improve stability.

I think most of those we've ended up replacing by using temp tables in
those tests instead, since they're not affected by the global horizon
anymore.

We'd not necessarily have to embed wait-for-horizon into VACUUM
itself.

I'm not sure it'd be quite reliable outside of vacuum though, due to the
horizon potentially going backwards (in otherwise harmless ways)?

It seems like a SQL-accessible function could be written
and then called before any problematic VACUUM. I like this better
for something we're thinking of jamming in post-feature-freeze;
we'd not be committing to the feature quite as much as if we
added a VACUUM option.

We could otherwise just disable IOS for that query, for now.

Greetings,

Andres Freund

In reply to: Peter Geoghegan (#15)
Re: Intermittent buildfarm failures on wrasse

On Wed, Apr 13, 2022 at 6:03 PM Peter Geoghegan <pg@bowt.ie> wrote:

I think that it's more likely that FREEZE will correct problems, out of the two:

* FREEZE forces an aggressive VACUUM whose FreezeLimit is as recent a
cutoff value as possible (FreezeLimit will be equal to OldestXmin).

The reason why that might have helped (at least in the past) is that
it's enough to force us to wait for a cleanup lock to prune and
freeze, if necessary. Which was never something that
DISABLE_PAGE_SKIPPING could do.

--
Peter Geoghegan

In reply to: Andres Freund (#16)
Re: Intermittent buildfarm failures on wrasse

On Wed, Apr 13, 2022 at 6:05 PM Andres Freund <andres@anarazel.de> wrote:

I think most of those we've ended up replacing by using temp tables in
those tests instead, since they're not affected by the global horizon
anymore.

Maybe, but it's a pain to have to work that way. You can't do it in
cases like this, because a temp table is not workable. So that's not
an ideal long term solution.

We'd not necessarily have to embed wait-for-horizon into VACUUM
itself.

I'm not sure it'd be quite reliable outside of vacuum though, due to the
horizon potentially going backwards (in otherwise harmless ways)?

I agree, since vacuumlazy.c would need to either be given its own
OldestXmin, or knowledge of a wait-up-to XID. Either way we have to
make non-trivial changes to vacuumlazy.c.

--
Peter Geoghegan

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#16)
Re: Intermittent buildfarm failures on wrasse

Andres Freund <andres@anarazel.de> writes:

On 2022-04-13 20:35:50 -0400, Tom Lane wrote:

It seems like a SQL-accessible function could be written
and then called before any problematic VACUUM. I like this better
for something we're thinking of jamming in post-feature-freeze;
we'd not be committing to the feature quite as much as if we
added a VACUUM option.

We could otherwise just disable IOS for that query, for now.

The entire point of that test case is to verify the shape of the
IOS plan, so no that's not an acceptable answer. But if we're
looking for quick hacks, we could do

update pg_class set relallvisible = relpages where relname = 'tenk1';

just before that test.

I'm still suspicious of the pgstat changes, though. I checked into
things here by doing

initdb
edit postgresql.conf to set log_autovacuum_min_duration = 0
pg_ctl start && make installcheck-parallel

and what I see is that the first reported autovacuum activity begins
exactly one minute after the postmaster starts, which is what I'd
expect given the autovacuum naptime rules. On my machine, of course,
the installcheck-parallel run is long gone by then. But even on the
much slower wrasse, we should be well past create_index by the time any
autovac worker launches --- you can see from wrasse's reported test
runtimes that only about 10 seconds have elapsed when it get to the end
of create_index.

This suggests to me that what is holding the (presumed) conflicting
snapshot must be the autovac launcher, because what else could it be?
So I'm suspicious that we broke something in that area, though I am
baffled why only wrasse would be telling us so.

regards, tom lane

#20Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#19)
Re: Intermittent buildfarm failures on wrasse

Hi,

Noah, any chance you could enable log_autovacuum_min_duration=0 on
wrasse?

On 2022-04-13 21:23:12 -0400, Tom Lane wrote:

I'm still suspicious of the pgstat changes, though. I checked into
things here by doing

initdb
edit postgresql.conf to set log_autovacuum_min_duration = 0
pg_ctl start && make installcheck-parallel

and what I see is that the first reported autovacuum activity begins
exactly one minute after the postmaster starts, which is what I'd
expect given the autovacuum naptime rules.

It'd not necessarily have to be autovacuum. A CREATE INDEX or VACUUM
using parallelism, could also cause this, I think. It'd be a narrow
window, of course...

Does sparc have wider alignment rules for some types? Perhaps that'd be
enough to put some tables to be sufficiently larger to trigger parallel
vacuum?

Greetings,

Andres Freund

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#20)
#22Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#20)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#22)
#24Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#23)
In reply to: Andres Freund (#24)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#24)
#27Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#26)
In reply to: Andres Freund (#27)
In reply to: Peter Geoghegan (#28)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#26)
In reply to: Tom Lane (#30)
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#31)
In reply to: Tom Lane (#32)
In reply to: Peter Geoghegan (#33)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#34)
In reply to: Tom Lane (#35)
#37Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#35)
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#36)
In reply to: Tom Lane (#38)
#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#39)
In reply to: Tom Lane (#40)
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#41)
#43Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#37)
#44Noah Misch
noah@leadboat.com
In reply to: Noah Misch (#43)
#45Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#43)
#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#44)
In reply to: Tom Lane (#45)
#48Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#47)
#49Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#46)
#50Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#48)
#51Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#42)
#52Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#50)
#53Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#50)
#54Andres Freund
andres@anarazel.de
In reply to: Noah Misch (#43)
#55Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#53)
#56Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#54)
#57Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#56)
#58Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#57)
#59Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#57)
#60Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#58)
#61Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#57)
#62Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#60)
#63Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#58)
#64Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#61)
In reply to: Tom Lane (#59)
#66Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#64)
#67Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#65)
In reply to: Tom Lane (#67)
#69Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#66)
#70Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#66)
#71Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#66)
#72Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#65)
In reply to: Andres Freund (#72)
#74Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#73)
In reply to: Tom Lane (#74)
#76Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#75)
In reply to: Andres Freund (#76)
#78Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#70)
#79Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#78)
#80Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#79)
#81Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#77)
In reply to: Andres Freund (#81)
#83Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#78)
#84Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#83)
#85Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Tom Lane (#84)
#86Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Masahiko Sawada (#85)
#87Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Alvaro Herrera (#86)
#88Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#83)