Unstable select_parallel regression output in 12rc1

Started by Christoph Bergover 6 years ago5 messages
#1Christoph Berg
myon@debian.org

Building the 12rc1 package on Ubuntu eoan/amd64, I got this
regression diff:

12:06:27 diff -U3 /<<PKGBUILDDIR>>/build/../src/test/regress/expected/select_parallel.out /<<PKGBUILDDIR>>/build/src/bin/pg_upgrade/tmp_check/regress/results/select_parallel.out
12:06:27 --- /<<PKGBUILDDIR>>/build/../src/test/regress/expected/select_parallel.out 2019-09-23 20:24:42.000000000 +0000
12:06:27 +++ /<<PKGBUILDDIR>>/build/src/bin/pg_upgrade/tmp_check/regress/results/select_parallel.out 2019-09-26 10:06:21.171683801 +0000
12:06:27 @@ -21,8 +21,8 @@
12:06:27 Workers Planned: 3
12:06:27 -> Partial Aggregate
12:06:27 -> Parallel Append
12:06:27 - -> Parallel Seq Scan on d_star
12:06:27 -> Parallel Seq Scan on f_star
12:06:27 + -> Parallel Seq Scan on d_star
12:06:27 -> Parallel Seq Scan on e_star
12:06:27 -> Parallel Seq Scan on b_star
12:06:27 -> Parallel Seq Scan on c_star
12:06:27 @@ -75,8 +75,8 @@
12:06:27 Workers Planned: 3
12:06:27 -> Partial Aggregate
12:06:27 -> Parallel Append
12:06:27 - -> Seq Scan on d_star
12:06:27 -> Seq Scan on f_star
12:06:27 + -> Seq Scan on d_star
12:06:27 -> Seq Scan on e_star
12:06:27 -> Seq Scan on b_star
12:06:27 -> Seq Scan on c_star
12:06:27 @@ -103,7 +103,7 @@
12:06:27 -----------------------------------------------------
12:06:27 Finalize Aggregate
12:06:27 -> Gather
12:06:27 - Workers Planned: 1
12:06:27 + Workers Planned: 3
12:06:27 -> Partial Aggregate
12:06:27 -> Append
12:06:27 -> Parallel Seq Scan on a_star

Retriggering the build worked, though.

Christoph

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christoph Berg (#1)
Re: Unstable select_parallel regression output in 12rc1

Christoph Berg <myon@debian.org> writes:

Building the 12rc1 package on Ubuntu eoan/amd64, I got this
regression diff:

The append-order differences have been seen before, per this thread:

/messages/by-id/CA+hUKG+0CxrKRWRMf5ymN3gm+BECHna2B-q1w8onKBep4HasUw@mail.gmail.com

We haven't seen it in quite some time in HEAD, though I fear that's
just due to bad luck or change of timing of unrelated tests. I've
been hoping to catch it in HEAD to validate the theory I posited in
<22315.1563378828@sss.pgh.pa.us>, but your report doesn't help because
the additional checking queries aren't there in the v12 branch :-(

12:06:27 @@ -103,7 +103,7 @@
12:06:27 -----------------------------------------------------
12:06:27 Finalize Aggregate
12:06:27 -> Gather
12:06:27 - Workers Planned: 1
12:06:27 + Workers Planned: 3
12:06:27 -> Partial Aggregate
12:06:27 -> Append
12:06:27 -> Parallel Seq Scan on a_star

We've also seen this on a semi-regular basis, and I've been intending
to bitch about it, though it didn't seem very useful to do so as long
as there were other instabilities in the regression tests. What we
could do, perhaps, is feed the plan output through a filter that
suppresses the exact number-of-workers value. There's precedent
for such plan-filtering elsewhere in the tests already.

regards, tom lane

#3Christoph Berg
myon@debian.org
In reply to: Tom Lane (#2)
Re: Unstable select_parallel regression output in 12rc1

Re: Tom Lane 2019-09-26 <12685.1569510771@sss.pgh.pa.us>

We haven't seen it in quite some time in HEAD, though I fear that's
just due to bad luck or change of timing of unrelated tests.

The v13 package builds that are running every 6h here haven't seen a
problem yet either, so the probability of triggering it seems very
low. So it's not a pressing problem. (There's some extension modules
where the testsuite fails at a much higher rate, getting all targets
to pass at the same time is next to impossible there :(. )

Christoph

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christoph Berg (#3)
Re: Unstable select_parallel regression output in 12rc1

Christoph Berg <myon@debian.org> writes:

Re: Tom Lane 2019-09-26 <12685.1569510771@sss.pgh.pa.us>

We haven't seen it in quite some time in HEAD, though I fear that's
just due to bad luck or change of timing of unrelated tests.

The v13 package builds that are running every 6h here haven't seen a
problem yet either, so the probability of triggering it seems very
low. So it's not a pressing problem.

I've pushed some changes to try to ameliorate the issue.

(There's some extension modules
where the testsuite fails at a much higher rate, getting all targets
to pass at the same time is next to impossible there :(. )

I feel your pain, believe me. Used to fight the same kind of problems
when I was at Red Hat. Are any of those extension modules part of
Postgres?

regards, tom lane

#5Christoph Berg
myon@debian.org
In reply to: Tom Lane (#4)
Re: Unstable select_parallel regression output in 12rc1

Re: Tom Lane 2019-09-28 <24917.1569692191@sss.pgh.pa.us>

(There's some extension modules
where the testsuite fails at a much higher rate, getting all targets
to pass at the same time is next to impossible there :(. )

I feel your pain, believe me. Used to fight the same kind of problems
when I was at Red Hat. Are any of those extension modules part of
Postgres?

No, external ones. The main offenders at the moment are pglogical and
patroni (admittedly not an extension in the strict sense). Both have
extensive testsuites that exercise replication scenarios that are
prone to race conditions. (Maybe we should just run less tests for the
packaging.)

Christoph