Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
Leader backend process hangs forever executing certain queries in
parallel mode. `pg_stat_activity` reports the leader stuck on
IPC/ParallelFinish, with no trace of parallel workers. Nothing is
reported in frontend output or server logs. Ctrl-C does not terminate
the query, requires a SIGKILL.
We originally stumbled into this while developing a foreign data
wrapper; using the SPI to perform queries during a foreign scan
operations, we got hung when the SPI query employed a parallel plan.
We then found multiple reports of leader backends stuck in
IPC/ParallelFinish on psql-hackers and psql-bugs mailing lists, see
references below. We managed to get a reliable reproducer that does not
require any extension to work.
Test system
===========
- Architecture: x86_64
- OS: Linux Mint 21.1 Vera
- Tested postgres version(s):
- latest 16 (16.4)
- latest 15 (15.8)
- master @ 4eada203a5a871f893afe3eb3e07eea5de1aa642
Steps to reproduce
==================
The repro sql script is attached as `parallel_hang_repro.sql`. If using
postgres versions before 16, please replace
SET debug_parallel_query=on;
with
SET force_parallel_mode=on;
Details
=======
Setup
-----
hang=# CREATE FUNCTION my_cmp (int4, int4)
RETURNS int LANGUAGE sql AS
$$
SELECT
CASE WHEN $1 < $2 THEN -1
WHEN $1 > $2 THEN 1
ELSE 0
END;
$$;
CREATE FUNCTION
hang=# CREATE TABLE parallel_hang (i int4);
CREATE TABLE
hang=# INSERT INTO parallel_hang
(SELECT * FROM generate_series(1, 400) gs);
INSERT 0 400
hang=# CREATE OPERATOR CLASS int4_custom_ops FOR TYPE int4 USING btree
AS
OPERATOR 1 < (int4, int4), OPERATOR 2 <= (int4, int4),
OPERATOR 3 = (int4, int4), OPERATOR 4 >= (int4, int4),
OPERATOR 5 > (int4, int4), FUNCTION 1 my_cmp(int4, int4);
CREATE OPERATOR CLASS
hang=# CREATE UNIQUE INDEX parallel_hang_idx
ON parallel_hang
USING btree (i int4_custom_ops);
CREATE INDEX
Hanging query
-------------
hang=# SET debug_parallel_query=on;
SET
hang=# DELETE FROM parallel_hang WHERE 380 <= i AND i <= 420;
The query never returns. Ctrl-C does not terminate the query.
Output from `pg_stat_activity`
------------------------------
The only non-system entry (apart from the backend used for querying
`pg_stat_activity`) is the following. No parallel worker backends are
present.
See attached `pg_stat_activity.out` for full output.
postgres=# select * from pg_stat_activity ;
-[ RECORD 4
]----+-------------------------------------------------------------
datid | 5
datname | postgres
pid | 705564
leader_pid |
usesysid | 10
usename | fdegrassi
application_name | psql
client_addr | 127.0.0.1
client_hostname |
client_port | 46538
backend_start | 2024-09-16 12:10:46.282769+02
xact_start | 2024-09-16 12:12:19.93757+02
query_start | 2024-09-16 12:12:19.93757+02
state_change | 2024-09-16 12:12:19.93757+02
wait_event_type | IPC
wait_event | ParallelFinish
state | active
backend_xid |
backend_xmin | 747
query_id |
query | DELETE FROM parallel_hang WHERE 380 <= i AND i <=
420;
backend_type | client backend
gdb session
-----------
Leader backend is stuck in `WaitForParallelWorkersToFinish`. See more
details in the attached "backtrace.out"
(gdb) bt
#0 0x000071e4efd25dea in epoll_wait at ..../linux/epoll_wait.c:30
#1 0x00005ff2f2040ef6 in WaitEventSetWaitBlock at latch.c:1535
#2 0x00005ff2f2040e0d in WaitEventSetWait at latch.c:1481
#3 0x00005ff2f20401cb in WaitLatch at latch.c:513
#4 0x00005ff2f1bfd4cd in WaitForParallelWorkersToFinish at
parallel.c:863
#5 0x00005ff2f1df6a9f in ExecParallelFinish at execParallel.c:1160
#6 0x00005ff2f1e198a5 in ExecShutdownGatherWorkers at nodeGather.c:404
#7 0x00005ff2f1e198f6 in ExecShutdownGather at nodeGather.c:421
#8 0x00005ff2f1dfbbcd in ExecShutdownNode_walker at execProcnode.c:804
#9 0x00005ff2f1dfbb1a in ExecShutdownNode at execProcnode.c:775
#10 0x00005ff2f1df22fc in ExecutePlan at execMain.c:1728
#11 0x00005ff2f1defe92 in standard_ExecutorRun at execMain.c:365
#12 0x00005ff2f1defd19 in ExecutorRun at execMain.c:309
#13 0x00005ff2f1e06af2 in postquel_getnext at functions.c:897
#14 0x00005ff2f1e0716d in fmgr_sql at functions.c:1198
#15 0x00005ff2f2255376 in FunctionCall2Coll at fmgr.c:1132
#16 0x00005ff2f1bc8259 in _bt_compare at nbtsearch.c:744
#17 0x00005ff2f1bc7c81 in _bt_binsrch at nbtsearch.c:403
#18 0x00005ff2f1bc7892 in _bt_search at nbtsearch.c:153
#19 0x00005ff2f1bc8e6a in _bt_first at nbtsearch.c:1377
#20 0x00005ff2f1bc542d in btgetbitmap at nbtree.c:310
#21 0x00005ff2f1bb892a in index_getbitmap at indexam.c:724
#22 0x00005ff2f1e15df6 in MultiExecBitmapIndexScan at
nodeBitmapIndexscan.c:105
#23 0x00005ff2f1dfb6ee in MultiExecProcNode at execProcnode.c:524
#24 0x00005ff2f1e14626 in BitmapHeapNext at nodeBitmapHeapscan.c:113
#25 0x00005ff2f1dff7af in ExecScanFetch at execScan.c:132
#26 0x00005ff2f1dff854 in ExecScan at execScan.c:198
#27 0x00005ff2f1e152cd in ExecBitmapHeapScan at nodeBitmapHeapscan.c:592
#28 0x00005ff2f1dfb5da in ExecProcNodeFirst at execProcnode.c:464
#29 0x00005ff2f1e32023 in ExecProcNode at .../executor/executor.h:273
#30 0x00005ff2f1e379bd in ExecModifyTable at nodeModifyTable.c:3676
#31 0x00005ff2f1dfb5da in ExecProcNodeFirst at execProcnode.c:464
#32 0x00005ff2f1def8fe in ExecProcNode at .../executor/executor.h:273
#33 0x00005ff2f1df222b in ExecutePlan at execMain.c:1670
#34 0x00005ff2f1defe92 in standard_ExecutorRun at execMain.c:365
#35 0x00005ff2f1defd19 in ExecutorRun at execMain.c:309
#36 0x00005ff2f2078d98 in ProcessQuery at pquery.c:160
#37 0x00005ff2f207a732 in PortalRunMulti at pquery.c:1277
#38 0x00005ff2f2079cc8 in PortalRun at pquery.c:791
#39 0x00005ff2f2072cfd in exec_simple_query at postgres.c:1278
#40 0x00005ff2f2077d00 in PostgresMain at postgres.c:4701
#41 0x00005ff2f1faea08 in BackendRun at postmaster.c:4464
#42 0x00005ff2f1fae294 in BackendStartup at postmaster.c:4192
#43 0x00005ff2f1faa8c5 in ServerLoop at postmaster.c:1782
#44 0x00005ff2f1faa179 in PostmasterMain at postmaster.c:1466
#45 0x00005ff2f1e6f91e in main at main.c:198
Examining `InterruptHoldoffCount` shows that interrupts are held.
(gdb) p InterruptHoldoffCount
$1 = 1
Looking instead at the parallel worker, gdb debugging shows that it
completes successfully and terminates.
Analysis
========
The problem appears to manifest when a backend is holding an LWLock and
starting a query, and the planner chooses a parallel plan for the
latter.
Holding the LWLock means that interrupts are held
(`InterruptHoldoffCount` is > 0), causing `ProcessInterrupts()` to bail
out immediately, thus never calling `HandleParallelMessages`. The leader
then gets stuck forever inside `WaitForParallelWorkersToFinish`, even
though the parallel workers have successfully terminated, because their
messages are received but never handled.
To dig further, we re-run the query with `trace_lwlocks=on` and manually
paired acquire/release messages; this allowed us to identify the
LWLock being held by the leader at time of hang.
In the case of the repro above, the lwlock appears to be a BufferContent
lwlock acquired by `_bt_getroot()`; `_bt_first()` then calls the custom
compare function `my_cmp`, which in turn executes the SQL SELECT query
for which the planner chooses a parallel plan.
(In our original work, our custom foreign data wrapper directly held an
LWLock when running a query via SPI, which when run as parallel
triggered the same bug.)
Potential fixes
---------------
As an experiment, we modified the planner code to consider the state of
`InterruptHoldoffCount` when determining the value of
`glob->parallelOK`: if `InterruptHoldoffCount` > 0, then `parallelOK`
is set to false.
This ensures a sequential plan is executed if interrupts are being held
on the leader backend, and the query completes normally.
The patch is attached as `no_parallel_on_interrupts_held.patch`.
Related issues
==============
- Query stuck with wait event IPC / ParallelFinish
-
/messages/by-id/0f64b4c7fc200890f2055ce4d6650e9c2191fac2.camel%5C@j-davis.com
- BUG \#18586: Process (and transaction) is stuck in IPC when the DB
is under high load
-
/messages/by-id/18586-03e1535b1b34db81@postgresql.org
Attachments:
no_parallel_on_interrupts_held.patchapplication/octet-stream; name=no_parallel_on_interrupts_held.patchDownload+1-0
On Mon, Sep 16, 2024 at 09:35:13PM +0200, Francesco Degrassi wrote:
The problem appears to manifest when a backend is holding an LWLock and
starting a query, and the planner chooses a parallel plan for the
latter.
Thanks for the detailed report and for the fix.
Potential fixes
---------------As an experiment, we modified the planner code to consider the state of
`InterruptHoldoffCount` when determining the value of
`glob->parallelOK`: if `InterruptHoldoffCount` > 0, then `parallelOK`
is set to false.This ensures a sequential plan is executed if interrupts are being held
on the leader backend, and the query completes normally.The patch is attached as `no_parallel_on_interrupts_held.patch`.
Looks good. An alternative would be something like the leader periodically
waking up to call HandleParallelMessages() outside of ProcessInterrupts(). I
like your patch better, though. Parallel query is a lot of infrastructure to
be running while immune to statement_timeout, pg_cancel_backend(), etc. I
opted to check INTERRUPTS_CAN_BE_PROCESSED(), since QueryCancelHoldoffCount!=0
doesn't cause the hang but still qualifies as a good reason to stay out of
parallel query. Pushed that way:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ac04aa8
Related issues
==============- Query stuck with wait event IPC / ParallelFinish
-
/messages/by-id/0f64b4c7fc200890f2055ce4d6650e9c2191fac2.camel%5C@j-davis.com
This one didn't reproduce for me. Like your test, it involves custom code
running inside an opclass. I'm comfortable assuming it's the same problem.
- BUG \#18586: Process (and transaction) is stuck in IPC when the DB
is under high load
-
/messages/by-id/18586-03e1535b1b34db81@postgresql.org
Here, I'm not seeing enough detail to judge if it's the same. That's okay.
Noah Misch <noah@leadboat.com> writes:
On Mon, Sep 16, 2024 at 09:35:13PM +0200, Francesco Degrassi wrote:
The problem appears to manifest when a backend is holding an LWLock and
starting a query, and the planner chooses a parallel plan for the
latter.
Thanks for the detailed report and for the fix.
I do not have much faith in this patch. It assumes that the
condition "interrupts can be processed" is the same at plan time and
execution time. For plans extracted from the plan cache, there seems
little reason to assume that must be true. The proposed test case
cannot trigger that (today anyway) because SQL-language functions
don't deal in cached plans, but I suspect it could be broken with a
test case using a plpgsql function instead.
I actually think that the problem is somewhere upstream of here:
what in the world are we doing invoking planning and execution of
arbitrary queries in a context where interrupts are held off?
That seems certain to fail in multiple ways besides parallel
workers not working. I think we need to fix whatever code
believes that that could be OK. And maybe then insert
"Assert(INTERRUPTS_CAN_BE_PROCESSED())" at planner start
and executor start.
regards, tom lane
On Wed, Sep 18, 2024 at 12:23:42AM -0400, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
On Mon, Sep 16, 2024 at 09:35:13PM +0200, Francesco Degrassi wrote:
The problem appears to manifest when a backend is holding an LWLock and
starting a query, and the planner chooses a parallel plan for the
latter.Thanks for the detailed report and for the fix.
I do not have much faith in this patch. It assumes that the
condition "interrupts can be processed" is the same at plan time and
execution time. For plans extracted from the plan cache, there seems
little reason to assume that must be true. The proposed test case
cannot trigger that (today anyway) because SQL-language functions
don't deal in cached plans, but I suspect it could be broken with a
test case using a plpgsql function instead.
Good point. I missed that.
I actually think that the problem is somewhere upstream of here:
what in the world are we doing invoking planning and execution of
arbitrary queries in a context where interrupts are held off?
That seems certain to fail in multiple ways besides parallel
workers not working. I think we need to fix whatever code
believes that that could be OK. And maybe then insert
"Assert(INTERRUPTS_CAN_BE_PROCESSED())" at planner start
and executor start.
It would be nice to never run planning or execution with interrupts held. The
concrete examples so far involve btree operator classes with non-C support
functions. How practical would it be to release buffer content locks before
running index support functions? An alternative would be blocking non-C
support functions (and instructing C support function authors to not use
planner/executor). Non-C support functions have been handy for testing if
nothing else, though. Do non-bundled modules rely on non-C support functions?
Thanks,
nm
Noah Misch <noah@leadboat.com> writes:
On Wed, Sep 18, 2024 at 12:23:42AM -0400, Tom Lane wrote:
I actually think that the problem is somewhere upstream of here:
what in the world are we doing invoking planning and execution of
arbitrary queries in a context where interrupts are held off?
It would be nice to never run planning or execution with interrupts held. The
concrete examples so far involve btree operator classes with non-C support
functions. How practical would it be to release buffer content locks before
running index support functions?
Ugh. Probably not very. We have to be able to perform comparison
operations within the scanning of a page. Even if it could be made
to work correctly to release and re-acquire the buffer lock many
times per page, it sounds awful for performance.
An alternative would be blocking non-C
support functions (and instructing C support function authors to not use
planner/executor). Non-C support functions have been handy for testing if
nothing else, though. Do non-bundled modules rely on non-C support functions?
Dunno ... but the OP claimed this is a case that's seen in the
field, so maybe somebody is doing it. On the whole I don't see
how a btree support function can be considered not to be a low-level
thing, so maybe restricting what it can do is the best answer.
I fear though that the restriction can't simply be to forbid
parallel sub-queries.
On the third hand: you can't implement a btree comparison function
in SQL and simultaneously claim with a straight face that you
expect premium performance. Could we set things up so that
buffer release/reacquires happen only with non-C support functions?
This still requires that such behavior is safe in the face of
concurrent index activity, which I'm not sure is practical.
(Also, I'm not sure to what extent we'd still be testing what
we wish to with test comparison functions written in SQL.)
regards, tom lane
On Wed, Sep 18, 2024 at 01:18:48AM -0400, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
On Wed, Sep 18, 2024 at 12:23:42AM -0400, Tom Lane wrote:
I actually think that the problem is somewhere upstream of here:
what in the world are we doing invoking planning and execution of
arbitrary queries in a context where interrupts are held off?It would be nice to never run planning or execution with interrupts held. The
concrete examples so far involve btree operator classes with non-C support
functions. How practical would it be to release buffer content locks before
running index support functions?Ugh. Probably not very. We have to be able to perform comparison
operations within the scanning of a page. Even if it could be made
to work correctly to release and re-acquire the buffer lock many
times per page, it sounds awful for performance.
Sounds that way.
An alternative would be blocking non-C
support functions (and instructing C support function authors to not use
planner/executor). Non-C support functions have been handy for testing if
nothing else, though. Do non-bundled modules rely on non-C support functions?Dunno ... but the OP claimed this is a case that's seen in the
field, so maybe somebody is doing it. On the whole I don't see
how a btree support function can be considered not to be a low-level
thing, so maybe restricting what it can do is the best answer.
I fear though that the restriction can't simply be to forbid
parallel sub-queries.On the third hand: you can't implement a btree comparison function
in SQL and simultaneously claim with a straight face that you
expect premium performance. Could we set things up so that
buffer release/reacquires happen only with non-C support functions?
This still requires that such behavior is safe in the face of
concurrent index activity, which I'm not sure is practical.
(Also, I'm not sure to what extent we'd still be testing what
we wish to with test comparison functions written in SQL.)
Not to rule it out yet, but your point about reduced test fidelity is cause
for hesitation. Also, I'd regret even having that code bloating up a btree
hot path.
What if we just ignored the plancache when uninterruptible? The new planner
check would then suffice. If it slows anything folks care about, either find
a way to become interruptible or make a C function that bypasses planner &
executor.
Noah Misch <noah@leadboat.com> writes:
What if we just ignored the plancache when uninterruptible? The new planner
check would then suffice.
Only if you believe that parallel-query is the only problem,
which is something I seriously doubt. I fear that the
committed patch is just a band-aid over one symptom of the
general problem that we can't permit arbitrary operations
to be invoked from a btree comparison function. It's late
here so I'm not sufficiently awake to think of examples,
but I'm sure there are some.
However ... clearly a maliciously-coded btree support function can
do arbitrarily bad stuff. We restrict creation of opclasses to
superusers for exactly that reason. If our ambitions are only
to prevent support functions from *accidentally* causing problems,
is disabling parallel query enough? I'm still pretty uncomfortable
about it, but it's less obviously insufficient than in the general
case.
regards, tom lane
On Wed, Sep 18, 2024 at 1:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dunno ... but the OP claimed this is a case that's seen in the
field, so maybe somebody is doing it. On the whole I don't see
how a btree support function can be considered not to be a low-level
thing, so maybe restricting what it can do is the best answer.
Making it possible to do arbitrarily complicated things from B-Tree
support functions seems out of the question. We're not going to
maintain parallel versions of the code that releases buffer locks
before calling (say) the opclass ORDER proc. Just for example, how
would such a scheme work with code like _bt_check_unique?
I fear though that the restriction can't simply be to forbid
parallel sub-queries.
Why not?
The test case provided was intended to be illustrative of a problem
that some foreign data wrapper ran into, when it used SPI. I don't
think that the true problem was in any way related to B-Tree indexes.
--
Peter Geoghegan
On Wed, Sep 18, 2024 at 01:45:31AM -0400, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
What if we just ignored the plancache when uninterruptible? The new planner
check would then suffice.Only if you believe that parallel-query is the only problem,
which is something I seriously doubt. I fear that the
committed patch is just a band-aid over one symptom of the
general problem that we can't permit arbitrary operations
to be invoked from a btree comparison function. It's late
here so I'm not sufficiently awake to think of examples,
but I'm sure there are some.
I bet one can cause an (undetected) lwlock deadlock, but I don't know of a way
to do that "accidentally".
However ... clearly a maliciously-coded btree support function can
do arbitrarily bad stuff. We restrict creation of opclasses to
superusers for exactly that reason. If our ambitions are only
to prevent support functions from *accidentally* causing problems,
is disabling parallel query enough? I'm still pretty uncomfortable
about it, but it's less obviously insufficient than in the general
case.
I'm okay stopping at blocking some known accidents. I'm not aware of an
approach that would block 100% of unwanted outcomes here without also removing
a feature folks would miss. Nice things about blocking parallel query:
- Cheap to block
- Always hangs if attempted, so blocking it doesn't take away something of value
- Query still completes, without parallelism
- Writers of opclass functions probably aren't thinking of parallel query
For what it's worth, I tried making standard_ExecutorStart() warn if
!INTERRUPTS_CAN_BE_PROCESSED(). Only this thread's new test and
004_verify_nbtree_unique reached the warning. (That's not a surprise.)
On Wed, Sep 18, 2024 at 08:59:22AM -0400, Peter Geoghegan wrote:
The test case provided was intended to be illustrative of a problem
that some foreign data wrapper ran into, when it used SPI.
Ideally, we'd block those or at least warn under assertions so FDW authors
don't accidentally run the executor with an LWLock held. Unlike the opclass
case, we so far don't have a valid use case for holding an LWLock there. In
other words, if the opclass use case is the only known-valid one, it would be
nice to have checks so new use cases don't creep in.
On Thu, 19 Sept 2024 at 04:53, Noah Misch <noah@leadboat.com> wrote:
For what it's worth, I tried making standard_ExecutorStart() warn if
!INTERRUPTS_CAN_BE_PROCESSED(). Only this thread's new test and
004_verify_nbtree_unique reached the warning. (That's not a surprise.)
The reproducer I provided is actually a minimization of
004_verify_nbtree_unique, so it's just the one case actually.
On Wed, Sep 18, 2024 at 08:59:22AM -0400, Peter Geoghegan wrote:
The test case provided was intended to be illustrative of a problem
that some foreign data wrapper ran into, when it used SPI.Ideally, we'd block those or at least warn under assertions so FDW authors
don't accidentally run the executor with an LWLock held. Unlike the opclass
case, we so far don't have a valid use case for holding an LWLock there. In
other words, if the opclass use case is the only known-valid one, it would be
nice to have checks so new use cases don't creep in.
My 2c as a FDW developer, having a warning when calling into the SPI with a
LWLock held would have allowed us to identify this issue months ago, well
before we stumbled into a parallel plan and hang.
Best regards
--
Francesco
Noah Misch <noah@leadboat.com> writes:
On Wed, Sep 18, 2024 at 12:23:42AM -0400, Tom Lane wrote:
I do not have much faith in this patch. It assumes that the
condition "interrupts can be processed" is the same at plan time and
execution time. For plans extracted from the plan cache, there seems
little reason to assume that must be true. The proposed test case
cannot trigger that (today anyway) because SQL-language functions
don't deal in cached plans, but I suspect it could be broken with a
test case using a plpgsql function instead.
Good point. I missed that.
While working on the release notes, I noticed that nothing further
got done about this concern. What do you think of adding a test
somewhere early in executor startup, to the effect of
if (!INTERRUPTS_CAN_BE_PROCESSED())
ereport(ERROR,
(errmsg("cannot start a query with interrupts disabled")));
It's definitely a band-aid, but it seems better than leaving
things at the status quo.
regards, tom lane
On Thu, Nov 07, 2024 at 12:17:21PM -0500, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
On Wed, Sep 18, 2024 at 12:23:42AM -0400, Tom Lane wrote:
I do not have much faith in this patch. It assumes that the
condition "interrupts can be processed" is the same at plan time and
execution time. For plans extracted from the plan cache, there seems
little reason to assume that must be true. The proposed test case
cannot trigger that (today anyway) because SQL-language functions
don't deal in cached plans, but I suspect it could be broken with a
test case using a plpgsql function instead.Good point. I missed that.
While working on the release notes, I noticed that nothing further
got done about this concern. What do you think of adding a test
somewhere early in executor startup, to the effect ofif (!INTERRUPTS_CAN_BE_PROCESSED())
ereport(ERROR,
(errmsg("cannot start a query with interrupts disabled")));It's definitely a band-aid, but it seems better than leaving
things at the status quo.
That would fire in queries that don't error or hang today, so I think that
would be worse than the status quo.
This thread's previous commit just forced a serial plan. The executor
counterpart would look like having the executor do what it does when there are
no free worker slots.
Noah Misch <noah@leadboat.com> writes:
On Thu, Nov 07, 2024 at 12:17:21PM -0500, Tom Lane wrote:
While working on the release notes, I noticed that nothing further
got done about this concern. What do you think of adding a test
somewhere early in executor startup, to the effect ofif (!INTERRUPTS_CAN_BE_PROCESSED())
ereport(ERROR,
(errmsg("cannot start a query with interrupts disabled")));It's definitely a band-aid, but it seems better than leaving
things at the status quo.
That would fire in queries that don't error or hang today, so I think that
would be worse than the status quo.
Good point.
This thread's previous commit just forced a serial plan.
My concern is that the previous commit forced new plans to be serial,
but did nothing about re-use of an existing plan.
The executor
counterpart would look like having the executor do what it does when there are
no free worker slots.
Ah, that could be a way out. Stick an INTERRUPTS_CAN_BE_PROCESSED()
call somewhere in there? That could even allow us to revert the
planner change, which would simplify testing of the executor change.
regards, tom lane
On Thu, Nov 07, 2024 at 02:29:19PM -0500, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
This thread's previous commit just forced a serial plan.
My concern is that the previous commit forced new plans to be serial,
but did nothing about re-use of an existing plan.
I agree. It solved one way of reaching the problem, leaving at least one
unsolved.
The executor
counterpart would look like having the executor do what it does when there are
no free worker slots.Ah, that could be a way out. Stick an INTERRUPTS_CAN_BE_PROCESSED()
call somewhere in there?
Exactly. If !INTERRUPTS_CAN_BE_PROCESSED(), proceed as though no workers can
be launched.
That could even allow us to revert the
planner change, which would simplify testing of the executor change.
True.
Noah Misch <noah@leadboat.com> writes:
On Thu, Nov 07, 2024 at 02:29:19PM -0500, Tom Lane wrote:
Ah, that could be a way out. Stick an INTERRUPTS_CAN_BE_PROCESSED()
call somewhere in there?
Exactly. If !INTERRUPTS_CAN_BE_PROCESSED(), proceed as though no workers can
be launched.
That could even allow us to revert the
planner change, which would simplify testing of the executor change.
True.
Here's a proposed patch along that line. I left the test case from
ac04aa84a alone, since it works perfectly well to test this way too.
We could argue about whether or not to revert the planner change.
But I'd prefer to do so, because as things stand it creates a
hard-to-reason-about source of plan instability.
regards, tom lane
Attachments:
better-fix-for-noninterruptible-lockup.patchtext/x-diff; charset=us-ascii; name=better-fix-for-noninterruptible-lockup.patchDownload+9-6
On Fri, Nov 08, 2024 at 11:46:36AM -0500, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
On Thu, Nov 07, 2024 at 02:29:19PM -0500, Tom Lane wrote:
Ah, that could be a way out. Stick an INTERRUPTS_CAN_BE_PROCESSED()
call somewhere in there?Exactly. If !INTERRUPTS_CAN_BE_PROCESSED(), proceed as though no workers can
be launched.That could even allow us to revert the
planner change, which would simplify testing of the executor change.True.
Here's a proposed patch along that line. I left the test case from
ac04aa84a alone, since it works perfectly well to test this way too.We could argue about whether or not to revert the planner change.
But I'd prefer to do so, because as things stand it creates a
hard-to-reason-about source of plan instability.
Looks perfect. Thank you.
I wrote:
Here's a proposed patch along that line. I left the test case from
ac04aa84a alone, since it works perfectly well to test this way too.
I'd modeled that on the existing recovery code for DSM segment creation
failure, just below. But a look at the code coverage report shows
(unsurprisingly) that that path is never exercised in our regression
tests, so I wondered if it actually works ... and it doesn't work
very well. To test, I lobotomized InitializeParallelDSM to always
force pcxt->nworkers = 0. That results in a bunch of unsurprising
regression test diffs, plus a couple of
+ERROR: could not find key 4 in shm TOC at 0x229f138
which turns out to be the fault of ExecHashJoinReInitializeDSM:
it's not accounting for the possibility that we didn't really
start a parallel hash join.
I'm also not happy about ReinitializeParallelWorkers'
Assert(pcxt->nworkers >= nworkers_to_launch);
The one existing caller manages not to trigger that because it's
careful to reduce its request based on pcxt->nworkers, but it
doesn't seem to me that callers should be expected to have to.
So I end with the attached. There might still be some more issues
that the regression tests don't reach, but I think this is the
best we can do for today.
regards, tom lane
Attachments:
v2-better-fix-for-noninterruptible-lockup.patchtext/x-diff; charset=us-ascii; name=v2-better-fix-for-noninterruptible-lockup.patchDownload+23-11
On Fri, Nov 08, 2024 at 12:56:55PM -0500, Tom Lane wrote:
I wrote:
Here's a proposed patch along that line. I left the test case from
ac04aa84a alone, since it works perfectly well to test this way too.I'd modeled that on the existing recovery code for DSM segment creation
failure, just below. But a look at the code coverage report shows
(unsurprisingly) that that path is never exercised in our regression
tests, so I wondered if it actually works ... and it doesn't work
very well. To test, I lobotomized InitializeParallelDSM to always
force pcxt->nworkers = 0. That results in a bunch of unsurprising
regression test diffs, plus a couple of+ERROR: could not find key 4 in shm TOC at 0x229f138
which turns out to be the fault of ExecHashJoinReInitializeDSM:
it's not accounting for the possibility that we didn't really
start a parallel hash join.I'm also not happy about ReinitializeParallelWorkers'
Assert(pcxt->nworkers >= nworkers_to_launch);
The one existing caller manages not to trigger that because it's
careful to reduce its request based on pcxt->nworkers, but it
doesn't seem to me that callers should be expected to have to.So I end with the attached. There might still be some more issues
that the regression tests don't reach, but I think this is the
best we can do for today.
Looks good.