pgsql: Don't enter parallel mode when holding interrupts.

Started by Noah Mischover 1 year ago4 messages
#1Noah Misch
noah@leadboat.com

Don't enter parallel mode when holding interrupts.

Doing so caused the leader to hang in wait_event=ParallelFinish, which
required an immediate shutdown to resolve. Back-patch to v12 (all
supported versions).

Francesco Degrassi

Discussion: /messages/by-id/CAC-SaSzHUKT=vZJ8MPxYdC_URPfax+yoA1hKTcF4ROz_Q6z0_Q@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/ac04aa84a7f06635748278e6ff4bd74751bb3e8e

Modified Files
--------------
src/backend/optimizer/plan/planner.c | 6 ++++++
src/test/regress/expected/select_parallel.out | 24 +++++++++++++++++++++
src/test/regress/sql/select_parallel.sql | 31 +++++++++++++++++++++++++++
3 files changed, 61 insertions(+)

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Noah Misch (#1)
Re: pgsql: Don't enter parallel mode when holding interrupts.

On Wed, 2024-09-18 at 02:58 +0000, Noah Misch wrote:

Don't enter parallel mode when holding interrupts.

Doing so caused the leader to hang in wait_event=ParallelFinish, which
required an immediate shutdown to resolve.  Back-patch to v12 (all
supported versions).

Francesco Degrassi

Discussion: /messages/by-id/CAC-SaSzHUKT=vZJ8MPxYdC_URPfax+yoA1hKTcF4ROz_Q6z0_Q@mail.gmail.com

Does that warrant mention on this page?
https://www.postgresql.org/docs/current/when-can-parallel-query-be-used.html

Yours,
Laurenz Albe

#3Robert Haas
robertmhaas@gmail.com
In reply to: Laurenz Albe (#2)
Re: pgsql: Don't enter parallel mode when holding interrupts.

On Wed, Sep 18, 2024 at 3:27 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Wed, 2024-09-18 at 02:58 +0000, Noah Misch wrote:

Don't enter parallel mode when holding interrupts.

Doing so caused the leader to hang in wait_event=ParallelFinish, which
required an immediate shutdown to resolve. Back-patch to v12 (all
supported versions).

Francesco Degrassi

Discussion: /messages/by-id/CAC-SaSzHUKT=vZJ8MPxYdC_URPfax+yoA1hKTcF4ROz_Q6z0_Q@mail.gmail.com

Does that warrant mention on this page?
https://www.postgresql.org/docs/current/when-can-parallel-query-be-used.html

IMHO, no. This seems too low-level and too odd to mention.

TBH, I'm kind of surprised to learn that it's possible to start
executing a query while holding an LWLock. I see Tom is expressing
some doubts on the original thread, too. I wonder if we should instead
be erroring out if an LWLock is held at the start of query execution
-- or even earlier, like when we try to call a plpgsql function while
holding one. Leaving parallel query aside, what would prevent us from
attempting to reacquire the exact same LWLock that we already hold and
self-deadlocking? Or attempting to acquire some other LWLock and
deadlocking that way? I don't really feel like this is a parallel
query problem. I don't think we should be trying to run any
user-defined code while holding an LWLock, unless that code is written
in C (or C++, Rust, etc.). Trying to run procedural code at that point
doesn't seem reasonable.

--
Robert Haas
EDB: http://www.enterprisedb.com

#4Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#3)
Re: pgsql: Don't enter parallel mode when holding interrupts.

On Thu, Sep 19, 2024 at 09:25:05AM -0400, Robert Haas wrote:

On Wed, Sep 18, 2024 at 3:27 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Wed, 2024-09-18 at 02:58 +0000, Noah Misch wrote:

Don't enter parallel mode when holding interrupts.

Doing so caused the leader to hang in wait_event=ParallelFinish, which
required an immediate shutdown to resolve. Back-patch to v12 (all
supported versions).

Francesco Degrassi

Discussion: /messages/by-id/CAC-SaSzHUKT=vZJ8MPxYdC_URPfax+yoA1hKTcF4ROz_Q6z0_Q@mail.gmail.com

Does that warrant mention on this page?
https://www.postgresql.org/docs/current/when-can-parallel-query-be-used.html

IMHO, no. This seems too low-level and too odd to mention.

Agreed. If I were documenting it, I would document it with the material for
writing opclasses. It's probably too esoteric to document even there.

TBH, I'm kind of surprised to learn that it's possible to start
executing a query while holding an LWLock. I see Tom is expressing
some doubts on the original thread, too. I wonder if we should instead
be erroring out if an LWLock is held at the start of query execution
-- or even earlier, like when we try to call a plpgsql function while
holding one. Leaving parallel query aside, what would prevent us from
attempting to reacquire the exact same LWLock that we already hold and
self-deadlocking? Or attempting to acquire some other LWLock and
deadlocking that way? I don't really feel like this is a parallel
query problem. I don't think we should be trying to run any
user-defined code while holding an LWLock, unless that code is written
in C (or C++, Rust, etc.). Trying to run procedural code at that point
doesn't seem reasonable.

Nothing prevents those lwlock deadlocks. If you think it's worth breaking the
things folks use today (see original thread) in order to prevent that, please
do share that on the original thread. I'm fine either way. I think given
infinite resources across both postgresql.org and all extension maintainers, I
would do what you're thinking in v18 while in back branches, I would change
"erroring out" to "warn when assertions are enabled". I also think it's a
low-priority bug, given the only known ways to reach it are C code or a custom
opclass. Since resources aren't infinite, I'm inclined toward one of (a) stop
here or (b) all branches "warn when assertions are enabled" and maybe block
the plancache route discussed on the original thread.