Assertion failure with barriers in parallel hash join
Hi all,
prion, that uses -DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE,
has just failed with an interesting failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2020-09-29%2005%3A24%3A11
The assertion failure happens in a parallel worker when attempting to
attach a barrier:
#2 0x00000000009027d2 in ExceptionalCondition
(conditionName=conditionName@entry=0xa80846 "!barrier->static_party",
errorType=errorType@entry=0x955e22 "FailedAssertion",
fileName=fileName@entry=0xa807a8
"/home/ec2-user/bf/root/REL_13_STABLE/pgsql.build/../pgsql/src/backend/storage/ipc/barrier.c",
lineNumber=lineNumber@entry=218) at
/home/ec2-user/bf/root/REL_13_STABLE/pgsql.build/../pgsql/src/backend/utils/error/assert.c:67
#3 0x00000000007b6b1f in BarrierAttach
(barrier=barrier@entry=0x7f73c9d76008) at
/home/ec2-user/bf/root/REL_13_STABLE/pgsql.build/../pgsql/src/backend/storage/ipc/barrier.c:218
#4 0x0000000000682ebf in ExecParallelHashJoinNewBatch
(hjstate=<optimized out>, hjstate=<optimized out>) at
/home/ec2-user/bf/root/REL_13_STABLE/pgsql.build/../pgsql/src/backend/executor/nodeHashjoin.c:1132
#5 ExecHashJoinImpl (parallel=true, pstate=0x1248d88) at
/home/ec2-user/bf/root/REL_13_STABLE/pgsql.build/../pgsql/src/backend/executor/nodeHashjoin.c:560
#6 ExecParallelHashJoin (pstate=0x1248d88) at
/home/ec2-user/bf/root/REL_13_STABLE/pgsql.build/../pgsql/src/backend/executor/nodeHashjoin.c:607
#7 0x0000000000666d48 in ExecProcNodeInstr (node=0x1248d88) at
/home/ec2-user/bf/root/REL_13_STABLE/pgsql.build/../pgsql/src/backend/executor/execProcnode.c:466
#8 0x000000000065f322 in ExecProcNode (node=0x1248d88) at
/home/ec2-user/bf/root/REL_13_STABLE/pgsql.build/../pgsql/src/include/executor/executor.h:245
Thanks,
--
Michael
On Tue, Sep 29, 2020 at 7:11 PM Michael Paquier <michael@paquier.xyz> wrote:
#2 0x00000000009027d2 in ExceptionalCondition
(conditionName=conditionName@entry=0xa80846 "!barrier->static_party",
#4 0x0000000000682ebf in ExecParallelHashJoinNewBatch
Thanks. Ohhh. I think I see how that condition was reached and what
to do about it, but I'll need to look more closely. I'm away on
vacation right now, and will update in a couple of days when I'm back
at a real computer.
On Tue, Sep 29, 2020 at 9:12 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Tue, Sep 29, 2020 at 7:11 PM Michael Paquier <michael@paquier.xyz> wrote:
#2 0x00000000009027d2 in ExceptionalCondition
(conditionName=conditionName@entry=0xa80846 "!barrier->static_party",#4 0x0000000000682ebf in ExecParallelHashJoinNewBatch
Thanks. Ohhh. I think I see how that condition was reached and what
to do about it, but I'll need to look more closely. I'm away on
vacation right now, and will update in a couple of days when I'm back
at a real computer.
Here's a throw-away patch to add some sleeps that trigger the problem,
and a first draft fix. I'll do some more testing of this next week
and see if I can simplify it.
On Thu, Oct 1, 2020 at 8:08 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Tue, Sep 29, 2020 at 9:12 PM Thomas Munro <thomas.munro@gmail.com>
wrote:On Tue, Sep 29, 2020 at 7:11 PM Michael Paquier <michael@paquier.xyz>
wrote:
#2 0x00000000009027d2 in ExceptionalCondition
(conditionName=conditionName@entry=0xa80846 "!barrier->static_party",#4 0x0000000000682ebf in ExecParallelHashJoinNewBatch
Thanks. Ohhh. I think I see how that condition was reached and what
to do about it, but I'll need to look more closely. I'm away on
vacation right now, and will update in a couple of days when I'm back
at a real computer.Here's a throw-away patch to add some sleeps that trigger the problem,
and a first draft fix. I'll do some more testing of this next week
and see if I can simplify it.
I was just taking a look at the patch and noticed the commit message
says:
With unlucky timing and parallel_leader_participation off...
Is parallel_leader_participation being off required to reproduce the
issue?
On Tue, Oct 13, 2020 at 12:15 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
On Thu, Oct 1, 2020 at 8:08 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Tue, Sep 29, 2020 at 9:12 PM Thomas Munro <thomas.munro@gmail.com> wrote:
Here's a throw-away patch to add some sleeps that trigger the problem,
and a first draft fix. I'll do some more testing of this next week
and see if I can simplify it.I was just taking a look at the patch and noticed the commit message
says:With unlucky timing and parallel_leader_participation off...
Is parallel_leader_participation being off required to reproduce the
issue?
Yeah, because otherwise the leader detaches last so the problem doesn't arise.
On Tue, Oct 13, 2020 at 12:18 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Tue, Oct 13, 2020 at 12:15 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:On Thu, Oct 1, 2020 at 8:08 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Tue, Sep 29, 2020 at 9:12 PM Thomas Munro <thomas.munro@gmail.com> wrote:
Here's a throw-away patch to add some sleeps that trigger the problem,
and a first draft fix. I'll do some more testing of this next week
and see if I can simplify it.I was just taking a look at the patch and noticed the commit message
says:With unlucky timing and parallel_leader_participation off...
Is parallel_leader_participation being off required to reproduce the
issue?Yeah, because otherwise the leader detaches last so the problem doesn't arise.
While working on Melanie's Parallel Hash Full Join patch I remembered
that this (apparently extremely rare) race still needs fixing. Here
is a slightly tidied version, which I'm adding to the next CF for CI
coverage.
Here also is a picture that comes from an unfinished description of
this algorithm that I've been trying to write, that might help explain
the change. It's a phase diagram, where you can see the phases "run"
(= all processes try to work on batches) and "done" (= one process is
freeing the shmem objects for tracking batches, anyone who attaches to
the barrier in this phase knows that it's not even safe to access
batch bookkeeping memory). Before this patch there is no "run", just
"done" (= process batches and then one process frees, which has a race
if someone else attaches really late, after the freeing has begun).
I'm currently wondering whether this can be further improved using
Melanie's new BarrierArriveAndDetachExceptLast() function.
(In the code the phase names have -ing on the end, I'll probably drop
those, because commit 3048898e73c did that to the corresponding wait
events.)
Attachments:
v2-0001-Inject-fault-timing.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Inject-fault-timing.patchDownload+11-1
v2-0002-Fix-race-condition-in-parallel-hash-join-batch-cl.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Fix-race-condition-in-parallel-hash-join-batch-cl.patchDownload+58-33
phj-barriers.pngimage/png; name=phj-barriers.pngDownload+1-0
On Sat, Mar 6, 2021 at 9:56 AM Thomas Munro <thomas.munro@gmail.com> wrote:
While working on Melanie's Parallel Hash Full Join patch I remembered
that this (apparently extremely rare) race still needs fixing. Here
is a slightly tidied version, which I'm adding to the next CF for CI
coverage.
Pushed and back-patched.
On Wed, Mar 17, 2021 at 6:17 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Sat, Mar 6, 2021 at 9:56 AM Thomas Munro <thomas.munro@gmail.com> wrote:
While working on Melanie's Parallel Hash Full Join patch I remembered
that this (apparently extremely rare) race still needs fixing. Here
is a slightly tidied version, which I'm adding to the next CF for CI
coverage.Pushed and back-patched.
According to BF animal elver there is something wrong with this
commit. Looking into it.
On Wed, Mar 17, 2021 at 6:58 PM Thomas Munro <thomas.munro@gmail.com> wrote:
According to BF animal elver there is something wrong with this
commit. Looking into it.
Assertion failure reproduced here and understood, but unfortunately
it'll take some more time to fix this. I've reverted the commit for
now to unbreak the ~5 machines that are currently showing red in the
build farm.
On Wed, Mar 17, 2021 at 8:18 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Wed, Mar 17, 2021 at 6:58 PM Thomas Munro <thomas.munro@gmail.com> wrote:
According to BF animal elver there is something wrong with this
commit. Looking into it.Assertion failure reproduced here and understood, but unfortunately
it'll take some more time to fix this. I've reverted the commit for
now to unbreak the ~5 machines that are currently showing red in the
build farm.
So, I think the premise of the patch
v6-0001-Fix-race-condition-in-parallel-hash-join-batch-cl.patch makes
sense: freeing the hashtable memory should happen atomically with
updating the flag that indicates to all others that the memory is not to
be accessed any longer.
As was likely reported by the buildfarm leading to you reverting the
patch, when the inner side is empty and we dump out before advancing the
build barrier past PHJ_BUILD_HASHING_OUTER, we trip the new Asserts
you've added in ExecHashTableDetach().
Assert(!pstate ||
BarrierPhase(&pstate->build_barrier) >= PHJ_BUILD_RUNNING);
Hmm.
Maybe if the inner side is empty, we can advance the build barrier to
the end?
We help batch 0 along like this in ExecParallelHashJoinSetUpBatches().
But, I'm not sure we can expect the process executing this code to be
attached to the build barrier, can we?
@@ -296,7 +304,19 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
* outer relation.
*/
if (hashtable->totalTuples == 0 &&
!HJ_FILL_OUTER(node))
+ {
+ if (parallel)
+ {
+ Barrier *build_barrier;
+
+ build_barrier =
¶llel_state->build_barrier;
+ while
(BarrierPhase(build_barrier) < PHJ_BUILD_DONE)
+
BarrierArriveAndWait(build_barrier, 0);
+ BarrierDetach(build_barrier);
+ }
+
return NULL;
+ }
On Wed, Mar 17, 2021 at 8:18 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Wed, Mar 17, 2021 at 6:58 PM Thomas Munro <thomas.munro@gmail.com> wrote:
According to BF animal elver there is something wrong with this
commit. Looking into it.Assertion failure reproduced here and understood, but unfortunately
it'll take some more time to fix this. I've reverted the commit for
now to unbreak the ~5 machines that are currently showing red in the
build farm.
Also, silly question: why couldn't we just set the pstate->batches pointer to
InvalidDsaPointer before doing dsa_free() (of course saving the pointer
so that we can actually do the freeing with it)? Is there still a race?
On Wed, Mar 31, 2021 at 6:25 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
On Wed, Mar 17, 2021 at 8:18 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Wed, Mar 17, 2021 at 6:58 PM Thomas Munro <thomas.munro@gmail.com> wrote:
According to BF animal elver there is something wrong with this
commit. Looking into it.Assertion failure reproduced here and understood, but unfortunately
it'll take some more time to fix this. I've reverted the commit for
now to unbreak the ~5 machines that are currently showing red in the
build farm.So, I think the premise of the patch
v6-0001-Fix-race-condition-in-parallel-hash-join-batch-cl.patch makes
sense: freeing the hashtable memory should happen atomically with
updating the flag that indicates to all others that the memory is not to
be accessed any longer.As was likely reported by the buildfarm leading to you reverting the
patch, when the inner side is empty and we dump out before advancing the
build barrier past PHJ_BUILD_HASHING_OUTER, we trip the new Asserts
you've added in ExecHashTableDetach().Assert(!pstate ||
BarrierPhase(&pstate->build_barrier) >= PHJ_BUILD_RUNNING);Hmm.
Maybe if the inner side is empty, we can advance the build barrier to
the end?
We help batch 0 along like this in ExecParallelHashJoinSetUpBatches().But, I'm not sure we can expect the process executing this code to be
attached to the build barrier, can we?@@ -296,7 +304,19 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel) * outer relation. */ if (hashtable->totalTuples == 0 && !HJ_FILL_OUTER(node)) + { + if (parallel) + { + Barrier *build_barrier; + + build_barrier = ¶llel_state->build_barrier; + while (BarrierPhase(build_barrier) < PHJ_BUILD_DONE) + BarrierArriveAndWait(build_barrier, 0); + BarrierDetach(build_barrier); + } + return NULL; + }
I've attached a new version of the patch which contains my suggested fix
for the problem with the empty inner optimization.
If you apply Thomas' injected sleeps original bug repro patch and use
the following DDL and query, you can reproduce the issue with the empty
inner optimization present in his v2 patch. Then, if you apply my
attached v3 patch, you can see that we no longer trip the assert.
drop table if exists empty_simple;
create table empty_simple(id int, col2 text);
update pg_class set reltuples = 10000 where relname = 'empty_simple';
drop table if exists simple;
create table simple as
select generate_series(1, 20000) AS id, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
analyze simple;
set min_parallel_table_scan_size = 0;
set parallel_setup_cost = 0;
set enable_hashjoin = on;
set parallel_leader_participation to off;
set work_mem = '4MB';
set enable_parallel_hash = on;
select count(*) from empty_simple join simple using (id);
- Melanie
Attachments:
v3-0001-Fix-race-condition-in-parallel-hash-join-batch-cl.patchapplication/octet-stream; name=v3-0001-Fix-race-condition-in-parallel-hash-join-batch-cl.patchDownload+71-33
The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: not tested
Hi all,
We recently encountered the same bug in the field. Oleksii Kozlov managed to come up with reproduction steps, which reliably trigger it. Interestingly, the bug does not only manifest as failing assertion, but also as segmentation fault; in builds with disabled and with enabled (!) assertions. So it can crash production environments. We applied the proposed patch v3 from Melanie to the REL_14_3 branch and can confirm that with the patch neither the assertion nor the segmentation fault still occur.
I have also glanced at the code and the implementation looks fine. However, I'm not an expert for the fairly involved hash join state machine.
There seems to be no need for additional documentation.
For completeness here is the stack trace of the segmentation fault.
Like the stack trace from the assertion failure initially shared by Michael and also encountered by us, the stack trace of the segmentation fault also contains ExecParallelHashJoinNewBatch().
#9 | Source "/opt/src/backend/executor/execMain.c", line 361, in standard_ExecutorRun
| Source "/opt/src/backend/executor/execMain.c", line 1551, in ExecutePlan
Source "/opt/src/include/executor/executor.h", line 257, in ExecProcNode [0x657e4d]
#8 | Source "/opt/src/backend/executor/nodeAgg.c", line 2179, in ExecAgg
Source "/opt/src/backend/executor/nodeAgg.c", line 2364, in agg_retrieve_direct [0x66ba60]
#7 | Source "/opt/src/backend/executor/nodeAgg.c", line 581, in fetch_input_tuple
Source "/opt/src/include/executor/executor.h", line 257, in ExecProcNode [0x66d585]
#6 | Source "/opt/src/backend/executor/nodeHashjoin.c", line 607, in ExecParallelHashJoin
| Source "/opt/src/backend/executor/nodeHashjoin.c", line 560, in ExecHashJoinImpl
Source "/opt/src/backend/executor/nodeHashjoin.c", line 1132, in ExecParallelHashJoinNewBatch [0x67a89b]
#5 | Source "/opt/src/backend/storage/ipc/barrier.c", line 242, in BarrierAttach
Source "/opt/src/include/storage/s_lock.h", line 228, in tas [0x7c2a1b]
#4 Object "/lib/x86_64-linux-gnu/libpthread.so.0", at 0x7f4db364841f, in __funlockfile
--
David Geier
(SericeNow)
The new status of this patch is: Ready for Committer
On Thu, Jun 2, 2022 at 9:31 PM David Geier <geidav.pg@gmail.com> wrote:
We recently encountered the same bug in the field. Oleksii Kozlov managed to come up with reproduction steps, which reliably trigger it. Interestingly, the bug does not only manifest as failing assertion, but also as segmentation fault; in builds with disabled and with enabled (!) assertions. So it can crash production environments. We applied the proposed patch v3 from Melanie to the REL_14_3 branch and can confirm that with the patch neither the assertion nor the segmentation fault still occur.
Thanks for the report, testing, and for creating the CF entry.
I assume you are using parallel_leader_participation=off, and the
reason we haven't heard more about this is because few people do that.
By coincidence I was just about to restart a bunch of hash join
projects and have been paging the topic area back into my brain, so
I'll start with another round of testing/analysis of this bug/patch
next week.
Hi Thomas,
Correct. We're running with disabled parallel leader participation and we
have to do so, because another custom plan node we built relies on that.
That would be great. Anything else I can help with to get this patch in?
Thanks!
--
David
(ServiceNow)
On Fri, 3 Jun 2022 at 00:06, Thomas Munro <thomas.munro@gmail.com> wrote:
Show quoted text
On Thu, Jun 2, 2022 at 9:31 PM David Geier <geidav.pg@gmail.com> wrote:
We recently encountered the same bug in the field. Oleksii Kozlov
managed to come up with reproduction steps, which reliably trigger it.
Interestingly, the bug does not only manifest as failing assertion, but
also as segmentation fault; in builds with disabled and with enabled (!)
assertions. So it can crash production environments. We applied the
proposed patch v3 from Melanie to the REL_14_3 branch and can confirm that
with the patch neither the assertion nor the segmentation fault still occur.Thanks for the report, testing, and for creating the CF entry.
I assume you are using parallel_leader_participation=off, and the
reason we haven't heard more about this is because few people do that.
By coincidence I was just about to restart a bunch of hash join
projects and have been paging the topic area back into my brain, so
I'll start with another round of testing/analysis of this bug/patch
next week.
Hi Thomas,
Can we make progress with this patch in the current commit fest, or
discuss what is still missing to bring this in?
Thanks!
--
David Geier
(ServiceNow)
Show quoted text
On 6/6/22 17:01, David Geier wrote:
Hi Thomas,
Correct. We're running with disabled parallel leader participation and
we have to do so, because another custom plan node we built relies on
that.
That would be great. Anything else I can help with to get this patch in?Thanks!
--
David
(ServiceNow)On Fri, 3 Jun 2022 at 00:06, Thomas Munro <thomas.munro@gmail.com> wrote:
On Thu, Jun 2, 2022 at 9:31 PM David Geier <geidav.pg@gmail.com>
wrote:We recently encountered the same bug in the field. Oleksii
Kozlov managed to come up with reproduction steps, which reliably
trigger it. Interestingly, the bug does not only manifest as
failing assertion, but also as segmentation fault; in builds with
disabled and with enabled (!) assertions. So it can crash
production environments. We applied the proposed patch v3 from
Melanie to the REL_14_3 branch and can confirm that with the patch
neither the assertion nor the segmentation fault still occur.Thanks for the report, testing, and for creating the CF entry.
I assume you are using parallel_leader_participation=off, and the
reason we haven't heard more about this is because few people do that.
By coincidence I was just about to restart a bunch of hash join
projects and have been paging the topic area back into my brain, so
I'll start with another round of testing/analysis of this bug/patch
next week.
On Thu, Nov 17, 2022 at 8:01 PM David Geier <geidav.pg@gmail.com> wrote:
Can we make progress with this patch in the current commit fest, or discuss what is still missing to bring this in?
Hi David,
Sorry for the delay. I'll aim to get this done in the next few days.
Thanks!
Please let me know if I can help out, e.g. with re-testing.
--
David Geier
(ServiceNow)
Show quoted text
On 11/17/22 08:28, Thomas Munro wrote:
On Thu, Nov 17, 2022 at 8:01 PM David Geier <geidav.pg@gmail.com> wrote:
Can we make progress with this patch in the current commit fest, or discuss what is still missing to bring this in?
Hi David,
Sorry for the delay. I'll aim to get this done in the next few days.