[PATCH] Fix ProcKill lock-group vs procLatch recycle race

Started by Vlad Lesinabout 1 month ago16 messageshackers
Jump to latest
#1Vlad Lesin
vladlesin@gmail.com

Hello all,

The following are the patches and demonstration material for a
concurrency bug in ProcKill() where a lock-group leader and a member can
exit in parallel.

------------------------------------------------------------------------
Problem
------------------------------------------------------------------------

If a leader detaches from the lock group under leader_lwlock but
has not yet reached DisownLatch(&MyProc->procLatch), a concurrent
last follower can still put the *leader* PGPROC on a free list, or
the leader and the follower can make inconsistent decisions about
*who* returns which PGPROC, so that a slot is linked into the free
list with procLatch still owned, or is pushed twice. A new backend
that recycles the slot can then hit:

PANIC: latch already owned by PID ...

A concrete interleaving (lock group leader vs last member)
is the following(PG15 code).

=== Lock group leader ===
L1: LWLockAcquire(leader_lwlock)
L2: dlist_delete(&MyProc->lockGroupLink)
The list contains follower.
L3: dlist_is_empty → false (the follower still in)
L4: else if (leader != MyProc) → false, do nothing
L5: LWLockRelease(leader_lwlock)
*lwlock released*

WINDOW OPEN HERE<<<

L6: SwitchBackToLocalLatch()
L7: pgstat_reset_wait_event_storage()
L8: proc = MyProc; MyProc = NULL;
L9: DisownLatch(&proc->procLatch)
*only here owner_pid = 0*
L10: SpinLockAcquire(ProcStructLock)
L11: if (proc->lockGroupLeader == NULL) → false, skip
L12: SpinLockRelease(ProcStructLock)
===========================

=== What the last lock group member does in that WINDOW ===
M1: LWLockAcquire(leader_lwlock)
Can proceed as soon as L5 releases it
M2: dlist_delete(&MyProc->lockGroupLink)
The list becomes EMPTY
M3: dlist_is_empty → true
M4: leader->lockGroupLeader = NULL
Flip leader's "don't free me" flag
M5: leader != MyProc → true
M6: SpinLockAcquire(ProcStructLock)
M7: push leader's PGPROC onto freelist
!!! while leader's latch is still owned !!!
M8: SpinLockRelease(ProcStructLock)
M9: LWLockRelease(leader_lwlock)
===========================================================

At M7, the leader's PGPROC is back on freeProcs with
procLatch.owner_pid == leader_pid (non-zero). The leader is still
sitting between L5 and L9.

=== The new backend picks it up in InitProcess() ===
B1: SpinLockAcquire(ProcStructLock)
B2: MyProc = *procgloballist
pops the leader's PGPROC
B3: *procgloballist = MyProc->links.next
B4: SpinLockRelease(ProcStructLock)
B5: ...
B6: OwnLatch(&MyProc->procLatch)
→ owner_pid != 0
→ elog(PANIC, "latch already owned by PID %d", owner_pid);
=====================================================

Note that all PG versions since 9.6 are affected.

The fix is to run SwitchBackToLocalLatch(),
pgstat_reset_wait_event_storage() and DisownLatch() before the
lock-group block, and to decide under leader_lwlock
(push_leader / push_self) with a single freelist update section at
the end. This matches what we implemented; detailed commentary
is in the commit messages and in the patch.

Mainline parallel query usually avoids the race: the leader is not
expected to reach ProcKill with another group member still in play
the way some extension-driven workloads can be.

------------------------------------------------------------------------
Testing
------------------------------------------------------------------------

Technically, a regression test should be included. However, developing
one is non-trivial given the current PG18 injection point
implementation. Because ProcKill() partially de-initializes data
necessary for injection points to function, a significant workaround
would be required. Given the complexity of such a workaround, I would
like to discuss whether a test is mandatory for this patch.

I have provided versions without the test (0001, 0002) as well as a test
that uses an injection point workaround to reproduce the bug
deterministically(0003). If a test for the bug fix is required, please
note that I can only provide it for PG17+, as earlier versions do not
support injection points.

------------------------------------------------------------------------
Attached patches
------------------------------------------------------------------------
1) 0001-ProcKill-REL15-lockgroup-freelist-race.patch
Core proc.c for REL_15_STABLE.
No TAP: PG15 has no suitable injection support for a core
reproducer; fix-only for that line.
2) 0002-ProcKill-PG18-core-lockgroup-freelist-race.patch
Same *core* fix for PG18: proc.c only.
3) 0003-PG18-unfixed-repro-tap-injection-harness.patch
*Demonstration only* Shows the PANIC; not the shape to land after
the fix without replacing the test.

--
Best regards,
Vlad

Attachments:

0001-ProcKill-REL15-lockgroup-freelist-race.patchtext/x-patch; charset=UTF-8; name=0001-ProcKill-REL15-lockgroup-freelist-race.patchDownload+97-42
0002-ProcKill-PG18-core-lockgroup-freelist-race.patchtext/x-patch; charset=UTF-8; name=0002-ProcKill-PG18-core-lockgroup-freelist-race.patchDownload+76-40
0003-PG18-unfixed-repro-tap-injection-harness.patchtext/x-patch; charset=UTF-8; name=0003-PG18-unfixed-repro-tap-injection-harness.patchDownload+654-26
#2Evgeny Voropaev
evgeny.voropaev@tantorlabs.com
In reply to: Vlad Lesin (#1)
Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

Hi Vlad,

Concurrency problems are the most difficult to discover and
reproduce. Thanks for that work!

Since every patch is subject to testing, it is recommended to create a
CommitFest card for it. Once in the CF-card, the patch set is picked up
by CFBot and is thoroughly tested. You also will see whether the patch
successfully applies to the target branch or not.

You also might want to stick to a single PostgreSQL version in a thread,
as all attached files are applied to the upstream during a CFBot job,
which is impossible when a mail message includes patches for several
PG versions simultaneously. Adhering to a single version per a thread
and using a CommitFest card also make reviewers' task easier, for the
reason that they can see the version and the base commit to which the
patch should be applied. Patches for other PG's versions can probably
be moved to separated threads.

It is all by now. I am still looking into the code of the patch. I
hope, later I can give some useful comments in regard to the solution
itself.

P.s. I tried to apply the patch to 3b28dad70e2. After
0002-ProcKill-PG18-core-lockgroup-freelist-race had been applied
successfully, subsequent applying of
0003-PG18-unfixed-repro-tap-injection-harness failed.

Best regards,
Evgeny Voropaev,
Tantor Labs, LLC.

#3Andrey Borodin
amborodin@acm.org
In reply to: Vlad Lesin (#1)
Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

On 27 Apr 2026, at 13:14, Vlad Lesin <vladlesin@gmail.com> wrote:

Problem
------------------------------------------------------------------------

If a leader detaches from the lock group under leader_lwlock but
has not yet reached DisownLatch(&MyProc->procLatch), a concurrent
last follower can still put the *leader* PGPROC on a free list, or
the leader and the follower can make inconsistent decisions about
*who* returns which PGPROC, so that a slot is linked into the free
list with procLatch still owned, or is pushed twice. A new backend
that recycles the slot can then hit:

PANIC: latch already owned by PID ...

A concrete interleaving (lock group leader vs last member)
is the following(PG15 code).

Yeah, the problem seems real to me. Moreover we had related buildfarm
failures [0]/messages/by-id/CA+hUKGJ_0RGcr7oUNzcHdn7zHqHSB_wLSd3JyS2YC_DYB+-V=g@mail.gmail.com and Deep from GP reported observing the problem there too.
Yugabyte folks also observed this [1]https://github.com/yugabyte/yugabyte-db/issues/20309.

The invariant that latch should not be on freelist until it is disowned seems
reasonable to me.

But the test and the fix both are very confusing here. They are not patch steps
as someone might expect given 0001,0002,0003 prefixes. They are not based on
PG 18 as filenames states.

To help resolve this confusion I'm posting following sequence:

1. vAB1-0001-Add-regression-test-for-ProcKill-lock-group-pro.patch
This is an original test that is expected to demonstrate problem.
It contains heavy injection points refactoring, I assume it's not intended for commit.
This test was taken from a file 0003-PG18-unfixed-repro-tap-injection-harness.patch

2. vAB1-0002-Canonicalize-test-with-infrastructure.patch
My changes needed to make test runnable.

3. vAB1-0003-Fix-ProcKill-lock-group-vs-procLatch-recycle-ra.patch
Fix for the problem, proposed by the thread starter, rebased on current HEAD
and test patch.
The test passes after this step.

I would like to recommend author to make the patch leaner and easier for review.

Best regards, Andrey Borodin.

[0]: /messages/by-id/CA+hUKGJ_0RGcr7oUNzcHdn7zHqHSB_wLSd3JyS2YC_DYB+-V=g@mail.gmail.com
[1]: https://github.com/yugabyte/yugabyte-db/issues/20309

Attachments:

vAB1-0001-Add-regression-test-for-ProcKill-lock-group-pro.patchapplication/octet-stream; name=vAB1-0001-Add-regression-test-for-ProcKill-lock-group-pro.patch; x-unix-mode=0644Download+654-26
vAB1-0002-Canonicalize-test-with-infrastructure.patchapplication/octet-stream; name=vAB1-0002-Canonicalize-test-with-infrastructure.patch; x-unix-mode=0644Download+15-36
vAB1-0003-Fix-ProcKill-lock-group-vs-procLatch-recycle-ra.patchapplication/octet-stream; name=vAB1-0003-Fix-ProcKill-lock-group-vs-procLatch-recycle-ra.patch; x-unix-mode=0644Download+83-36
#4Vlad Lesin
vladlesin@gmail.com
In reply to: Andrey Borodin (#3)
Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

Andrey, thank you for your fixes.

On 5/5/26 12:07, Andrey Borodin wrote:

To help resolve this confusion I'm posting following sequence:

1. vAB1-0001-Add-regression-test-for-ProcKill-lock-group-pro.patch
This is an original test that is expected to demonstrate problem.
It contains heavy injection points refactoring, I assume it's not intended for commit.
This test was taken from a file 0003-PG18-unfixed-repro-tap-injection-harness.patch

2. vAB1-0002-Canonicalize-test-with-infrastructure.patch
My changes needed to make test runnable.

3. vAB1-0003-Fix-ProcKill-lock-group-vs-procLatch-recycle-ra.patch
Fix for the problem, proposed by the thread starter, rebased on current HEAD
and test patch.
The test passes after this step.

Deferring pgstat_reset_wait_event_storage() call in (3) enables the test
in (1) to work once (2) is applied. Without this change, the test hangs.
It might make sense to commit the test.

--
Best regards,
Vlad

#5Andrey Borodin
amborodin@acm.org
In reply to: Vlad Lesin (#4)
Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

On 5 May 2026, at 21:31, Vlad Lesin <vladlesin@gmail.com> wrote:

It might make sense to commit the test.

Let's try to work on it, maybe we can bring it to good shape to consider for
a commit.

cc to Michael:

prockill_race needs to build the same InjectionPointCondition payload that
injection_wait consumes to know which PID to block. The struct is currently
private to injection_points.c, so the patch extracts it into a small header
that prockill_race.c includes via a relative "../injection_points/" path.
That works but feels non-idiomatic. Since injection_points grows organically
to support new bug reproducers anyway, making the condition type part of its
public header seems like a natural fit - but we are not sure the fix is
committable as-is, so we wanted to ask before doing any more cleanup: is
this refactor acceptable at all, and if so, would you prefer a proper
installed header (as contrib/pg_plan_advice does) over the relative include?

(attached step 0001, other steps are not about injection points infrastructure)

To Vlad:

I simplified the test as much as I could. The hand-rolled polling loop is
replaced with poll_query_until, big inline comments explaining the
controller-session trick and the PGPROC-scan rationale are moved into the
C helper functions where the mechanism lives, and outcome classification is
two plain ok() calls.

The fix is intact. I did not find a way to make it simpler - early
DisownLatch, decisions under leader_lwlock, deferred pushes under
freeProcsLock, and the leader-skips-self-push case each address a distinct
invariant the bug violated. What I have not fully reasoned through is
whether the new ordering affects any surrounding invariants I might have
missed, so a second further review of a bug fix would be good.

Also maybe consider alternative possible names to prockill_race that would be
idiomatic to stuff in test modules... It's kind of not relevant to current stage of the
patch, but anyway.

Thank you!

Best regards, Andrey Borodin.

Attachments:

vAB2-0001-Extract-InjectionPointCondition-to-injection_po.patchapplication/octet-stream; name=vAB2-0001-Extract-InjectionPointCondition-to-injection_po.patch; x-unix-mode=0644Download+27-25
vAB2-0003-Fix-ProcKill-lock-group-vs-procLatch-recycle-ra.patchapplication/octet-stream; name=vAB2-0003-Fix-ProcKill-lock-group-vs-procLatch-recycle-ra.patch; x-unix-mode=0644Download+83-36
vAB2-0002-Add-regression-test-for-ProcKill-lock-group-pro.patchapplication/octet-stream; name=vAB2-0002-Add-regression-test-for-ProcKill-lock-group-pro.patch; x-unix-mode=0644Download+327-2
#6Michael Paquier
michael@paquier.xyz
In reply to: Andrey Borodin (#5)
Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

On Wed, May 06, 2026 at 02:51:00PM +0500, Andrey Borodin wrote:

cc to Michael:

prockill_race needs to build the same InjectionPointCondition payload that
injection_wait consumes to know which PID to block. The struct is currently
private to injection_points.c, so the patch extracts it into a small header
that prockill_race.c includes via a relative "../injection_points/" path.
That works but feels non-idiomatic. Since injection_points grows organically
to support new bug reproducers anyway, making the condition type part of its
public header seems like a natural fit - but we are not sure the fix is
committable as-is, so we wanted to ask before doing any more cleanup: is
this refactor acceptable at all, and if so, would you prefer a proper
installed header (as contrib/pg_plan_advice does) over the relative include?

I did not look at the bug fix in details, so this is a comment about
the structure of the test.

+#include "../injection_points/injection_point_condition.h"

Hmm. I would not see a problem in just moving all that to the module
injection_points instead, and keep it there, including your TAP test.
Noah has done something similar for its removable_cutoff() business,
and we are living well with it. One issue with the structure you are
proposing is that I suspect that it makes some installcheck scenarios
more iffy to deal with. More callbacks in the test module is fine.
--
Michael

#7Andrey Borodin
amborodin@acm.org
In reply to: Michael Paquier (#6)
Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

On 7 May 2026, at 03:54, Michael Paquier <michael@paquier.xyz> wrote:

+#include "../injection_points/injection_point_condition.h"

Hmm. I would not see a problem in just moving all that to the module
injection_points instead, and keep it there, including your TAP test.
Noah has done something similar for its removable_cutoff() business,
and we are living well with it. One issue with the structure you are
proposing is that I suspect that it makes some installcheck scenarios
more iffy to deal with. More callbacks in the test module is fine.

Thanks, Michael!

Done so. I still keep Vlad's injection_point_condition.h though.
It seems useful. But no more Meson\Makefile changes, and new sql stuff lives
now near removable_cutoff() SQL.

Vlad, there are 2 patches in the patchset again :)

Now we need an expert in ProcKill(). I also will review the patchset in more detail,
but, perhaps, after pgconf.dev.

Best regards, Andrey Borodin.

Attachments:

vAB3-0001-Add-regression-test-for-ProcKill-lock-group-pro.patchapplication/octet-stream; name=vAB3-0001-Add-regression-test-for-ProcKill-lock-group-pro.patch; x-unix-mode=0644Download+282-26
vAB3-0002-Fix-ProcKill-lock-group-vs-procLatch-recycle-ra.patchapplication/octet-stream; name=vAB3-0002-Fix-ProcKill-lock-group-vs-procLatch-recycle-ra.patch; x-unix-mode=0644Download+90-36
#8Michael Paquier
michael@paquier.xyz
In reply to: Andrey Borodin (#7)
Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

On Thu, May 07, 2026 at 04:57:28PM +0500, Andrey Borodin wrote:

Done so. I still keep Vlad's injection_point_condition.h though.
It seems useful. But no more Meson\Makefile changes, and new sql stuff lives
now near removable_cutoff() SQL.

Because you want to have your callbacks in a new regress_prockill.c
while being able to reuse the structures. I don't see why not on
clarity ground. The creation of injection_point_condition.h could be
done in its own commit, independent of the rest, but that's just me
being pedantic about such matters. I think that this should be
backpatched anyway, that could make the introduction of new tests
easier.

Now we need an expert in ProcKill(). I also will review the patchset
in more detail, but, perhaps, after pgconf.dev.

I have spent a few hours with my head down looking at this patch,
because the issue is quite bad.

So.. The root idea of the patch is that we need to make sure that
no PGPROC appears on the freelist until its procLatch has been
disowned, preventing any attempts from other backends to grab an entry
that could be thought with a latch already owned, which is what the
PANIC is about. The key to all that is just to make sure that
SwitchBackToLocalLatch() and DisownLatch() happen earlier so as
recycled entries cannot be seen by other backends. And that's true
since 9.6. Oops.

+ * Must be called from a controller session that is not one of the victims.
+ * Using injection_points_attach() from the victim itself would register a
+ * before_shmem_exit(injection_points_cleanup) hook that fires before ProcKill
+ * (which is on_shmem_exit) and would detach the point before the victim ever
+ * reaches it.  Calling InjectionPointAttach() directly from a separate
+ * controller session leaves no such hook on the victims.

The cleanup hook is good enough in some cases, but yeah it would not
work here. I am OK to live with the tweak that
prockill_attach_injection_wait relies on (aka my line is that folks
are free to push all the stuff they want in the injection_points
module, just don't pollute the backend APIs and keep them simple).
So, sorry, I guess? :D

That's just to say that I like the amount of hackery you have put here
in terms of lock grouping manipulations. That't nice. I want more of
this.

+# Two backends form a lock group, then are terminated concurrently. The test
+# uses injection points placed inside ProcKill() to pause both victims there
+# and verify that a freshly forked backend can claim the recycled PGPROC slot
+# without hitting "latch already owned by PID ..." PANIC.

This claim is slightly incorrect here. If I undo the fix, place the
two INJECTION_POINT macros and re-run the test, the test does not
produce a PANIC. In order to reproduce a PANIC, it seems that we
would need something more complicated with an extra point after a
DisownLatch(). That is not required to me, still this is misleading
and could be improved, I guess..

FWIW, I was first confused by the addition of push_leader and
push_self as presented in the patch, which are here to control how the
free lists are manipulated so as we do not rely on the "proc" lookup,
which is a reference we kept around because MyProc is reset. This
felt as an unnecessary refactoring piece that aims at simplifying the
way to think through the process, but I happen to like the way it
does, so no objections here in terms on including this kind of
refactoring piece into the fix itself. A huge advantage of the new
code is that it documents clearly which state is expected where,
particularly around the lockGroupLeader manipulations. Or is there a
subtle thing I am missing from the patch that justifies this code
change? I think that we should refactor the patch into two pieces,
for clarity:
- Patch 1 refactors the freelist manipulations and introduces the two
new boolean flags, documenting more precily how the freelists are
manipulated and why things are done this way (leader, self, etc.).
- Patch 2, which is about the earlier DisownLatch(), then becomes a
fix of its own. That's the primary fix we care about, to avoid the
entries to be recycled due to a too-early PGPROC entry marked as
available.

If there is a stricter relationship between patches 1 and 2, please
state so, but from what I can see it is not a good idea to mix two
separate concepts within a single patch. In short, the actual bug fix
is about calling DisownLatch() earlier. And there may be a second
fact hidden in the refactoring proposed, which may appear after doing
the first thing. The commit message seems to states that the second
fact is required after fixing the DisownLatch() issue. If this
statement is true, for me it is an argument in favor of doing the
refactoring around the manipulation of the free lists first, then fix
the actual bug. I don't object to doing the refactoring in the
back-branches if required and if absolutely necessary. I have read
the code, and it is an improvement, but please let's do things cleanly
and in an ordered fashion. If we don't need the refactoring pieces in
the back-branches, let's keep the fix as simple as possible.

Also, I suspect that the test is racy? For one, I strongly suspect
that you are going to need tweaks similar 011_lock_stats.pl in
test_misc where we rely on some query_until() with psql banners, to
make sure that the waits actually happen.. I got the point about
pg_stat_activity not being something we can rely on here. That's
something that would be easier to see on an overloaded machine.
--
Michael

#9Vlad Lesin
vladlesin@gmail.com
In reply to: Michael Paquier (#8)
Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

Hello Michael, thank you for your review.

I am sending a new series of patches that incorporate your code review
notes. The version prefix is "vVL2-" to distinguish Andrey's patches
from mine.

On 5/11/26 04:58, Michael Paquier wrote:

On Thu, May 07, 2026 at 04:57:28PM +0500, Andrey Borodin wrote:

Done so. I still keep Vlad's injection_point_condition.h though.
It seems useful. But no more Meson\Makefile changes, and new sql stuff lives
now near removable_cutoff() SQL.

Because you want to have your callbacks in a new regress_prockill.c
while being able to reuse the structures. I don't see why not on
clarity ground. The creation of injection_point_condition.h could be
done in its own commit, independent of the rest, but that's just me
being pedantic about such matters. I think that this should be
backpatched anyway, that could make the introduction of new tests
easier.

I moved the creation of injection_point_condition.h into a separate
commit in 0001 patch. Additionally, I noticed that if several processes
are waiting on the same injection point, only one of them will be
awakened by a single injection_points_wakeup() call. I am not sure if
this behavior is intentional; please let me know.

For context, MySQL has a similar feature called debug sync points [0]https://dev.mysql.com/doc/dev/mysql-server/9.6.0/PAGE_DEBUG_SYNC.html,
which allows waking up several threads waiting on them. I suppose this
same semantics would be useful for injection points as well to simplify
tests. I implemented this behavior in the 0002 patch and utilized it in
0003. If you disagree with this approach, I can rewrite the test to use
two separate injection points for the lock group leader and follower.

+# Two backends form a lock group, then are terminated concurrently. The test
+# uses injection points placed inside ProcKill() to pause both victims there
+# and verify that a freshly forked backend can claim the recycled PGPROC slot
+# without hitting "latch already owned by PID ..." PANIC.

This claim is slightly incorrect here. If I undo the fix, place the
two INJECTION_POINT macros and re-run the test, the test does not
produce a PANIC. In order to reproduce a PANIC, it seems that we
would need something more complicated with an extra point after a
DisownLatch(). That is not required to me, still this is misleading
and could be improved, I guess..

Yes, you are right, I fixed the race in 0003 patch with additional
injection point.

I think that we should refactor the patch into two pieces,
for clarity:
- Patch 1 refactors the freelist manipulations and introduces the two
new boolean flags, documenting more precily how the freelists are
manipulated and why things are done this way (leader, self, etc.).
- Patch 2, which is about the earlier DisownLatch(), then becomes a
fix of its own. That's the primary fix we care about, to avoid the
entries to be recycled due to a too-early PGPROC entry marked as
available.

Done. I have split this into two separate patches, 0004 and 0005. The
test in 0003 fails on patch 0004 and passes on 0005.

Also, I suspect that the test is racy? For one, I strongly suspect
that you are going to need tweaks similar 011_lock_stats.pl in
test_misc where we rely on some query_until() with psql banners, to
make sure that the waits actually happen.. I got the point about
pg_stat_activity not being something we can rely on here. That's
something that would be easier to see on an overloaded machine.

The test should no longer be racy following the changes in 0003.

--
Best regards,
Vlad

[0]: https://dev.mysql.com/doc/dev/mysql-server/9.6.0/PAGE_DEBUG_SYNC.html

Attachments:

vVL2-0001-injection_points-extract-condition-header.patchtext/x-patch; charset=UTF-8; name=vVL2-0001-injection_points-extract-condition-header.patchDownload+39-25
vVL2-0002-injection_points-wake-every-matching-waiter.patchtext/x-patch; charset=UTF-8; name=vVL2-0002-injection_points-wake-every-matching-waiter.patchDownload+13-11
vVL2-0003-add-prockill-lockgroup-regression-test.patchtext/x-patch; charset=UTF-8; name=vVL2-0003-add-prockill-lockgroup-regression-test.patchDownload+350-2
vVL2-0004-fix-lockgroup-double-push-and-leak.patchtext/x-patch; charset=UTF-8; name=vVL2-0004-fix-lockgroup-double-push-and-leak.patchDownload+66-27
vVL2-0005-fix-prockill-lockgroup-procLatch-recycle-race.patchtext/x-patch; charset=UTF-8; name=vVL2-0005-fix-prockill-lockgroup-procLatch-recycle-race.patchDownload+23-16
#10Michael Paquier
michael@paquier.xyz
In reply to: Vlad Lesin (#9)
Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

On Fri, May 15, 2026 at 03:29:30PM +0300, Vlad Lesin wrote:

I moved the creation of injection_point_condition.h into a separate commit
in 0001 patch.

Applied this one for now. I still need to look at the rest in
details.

Additionally, I noticed that if several processes are waiting
on the same injection point, only one of them will be awakened by a single
injection_points_wakeup() call. I am not sure if this behavior is
intentional; please let me know.

Yep, I recall that as being intentional, hence I don't feel that 0002
is a good thing to do, even worse doing so in the back-branches.
--
Michael

#11Vlad Lesin
vladlesin@gmail.com
In reply to: Michael Paquier (#10)
Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

On 5/18/26 06:15, Michael Paquier wrote:

Additionally, I noticed that if several processes are waiting
on the same injection point, only one of them will be awakened by a single
injection_points_wakeup() call. I am not sure if this behavior is
intentional; please let me know.

Yep, I recall that as being intentional, hence I don't feel that 0002
is a good thing to do, even worse doing so in the back-branches.

Ok, I fixed the test to use distinct injection points for the lock group
leader and follower.

--
Best regards,
Vlad

Attachments:

vVL3-0001-add-prockill-lockgroup-regression-test.patchtext/x-patch; charset=UTF-8; name=vVL3-0001-add-prockill-lockgroup-regression-test.patchDownload+345-2
vVL3-0002-fix-lockgroup-double-push-and-leak.patchtext/x-patch; charset=UTF-8; name=vVL3-0002-fix-lockgroup-double-push-and-leak.patchDownload+66-27
vVL3-0003-fix-prockill-lockgroup-procLatch-recycle-race.patchtext/x-patch; charset=UTF-8; name=vVL3-0003-fix-prockill-lockgroup-procLatch-recycle-race.patchDownload+23-16
#12Michael Paquier
michael@paquier.xyz
In reply to: Vlad Lesin (#11)
Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

On Mon, May 18, 2026 at 01:11:56PM +0300, Vlad Lesin wrote:

Ok, I fixed the test to use distinct injection points for the lock group
leader and follower.

I have spent a good portion of the day looking at 0002 and 0003,
simplified quite a few things around the comments and the reasons why
we are doing so (the code, these comments and the commit message had a
strong AI smell). And after convincing myself that the new logic is
sound I have applied and backpatched both patches.

The last part is the test, that I have not be able to get at yet. We
don't have injection points all the way down, but let's have coverage
on the branches where we can.

If you could send a rebase of the test on top of the current HEAD,
that should save me a few minutes..
--
Michael

#13Vlad Lesin
vladlesin@gmail.com
In reply to: Michael Paquier (#12)
Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

On 5/27/26 11:28, Michael Paquier wrote:

If you could send a rebase of the test on top of the current HEAD,
that should save me a few minutes..

Attached is an updated patch with the test rebased onto the latest master.

--
Best regards,
Vlad

Attachments:

vVL4-0001-add-prockill-lockgroup-regression-test.patchtext/x-patch; charset=UTF-8; name=vVL4-0001-add-prockill-lockgroup-regression-test.patchDownload+344-3
#14Michael Paquier
michael@paquier.xyz
In reply to: Vlad Lesin (#13)
Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

On Wed, May 27, 2026 at 02:03:37PM +0300, Vlad Lesin wrote:

Attached is an updated patch with the test rebased onto the latest master.

I have been looking at that, and after putting my HEAD on it I have
concluded that this test is an anti-pattern based on its current
shape. The test relies on a wait, but the wait mechanism in
injection_points will not be able to work as the latch of the backends
get detached earlier after the fix done at 84b9d6bceab6 to prevent the
recycle race. What your proposed test was relying on here is a
statement timeout to make sure that the leader stays long enough in
the second point so as the followers are awaken and can finish their
termination business first. But we cannot really rely on that, and it
makes the test much longer than necessary on fast machines, dependent
on statement_timeout.

Another thing that we could do is redesign the wait mechanism in
injection_points to not rely on latches, using a shared memory flag
where we don't depend on latches and conditional variables. I cannot
get excited enough about that for this specific case, but perhaps
somebody would like to give it a shot? The use of latches has been
mentioned more than once as an annoying limitation, but I've always
found that less efficient on fast machines as it would require an
internal polling with a sleep or equivalent. This thread would give
an extra reason to do so, perhaps not in the stable branches but HEAD
once v20 opens up?

prockill_attach_injection_wait_pid() is not really necessary. We
could just reuse the existing attach() function and add an optional
PID argument, defaulting to 0.

Using two different points, one for the leader and one for the
follower is indeed the correct way to do things.

Another thing that one may try is to switch the test to use NOTICEs
and server log lookups. That may catch the PANIC, but one really
needs to be lucky so it would be mostly a waste of cycles in the
buildfarm due to the low reproducibility rate.

The test was still a useful exercise to prove the bug, though. If one
reverts 84b9d6bceab6 and runs the test, we are able to reproduce the
original bug.

Anyway, if we are to move ahead with this test, I'd suggest a couple
of patch pieces first:
- Switch the wait mechanism to use something more primitive, with a
shmem state.
- Extend injection_points_attach() with an optional PID.
- Add the test.
All that would be quite invasise, especially for the 1st point, so
introducing that only in v20 may be better. I am not sure that the
case of this thread is good enough to justify these changes, TBH.
--
Michael

#15Vlad Lesin
vladlesin@gmail.com
In reply to: Michael Paquier (#14)
Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

On 5/28/26 04:08, Michael Paquier wrote:

The test relies on a wait, but the wait mechanism in
injection_points will not be able to work as the latch of the backends
get detached earlier after the fix done at 84b9d6bceab6 to prevent the
recycle race.

...

The test was still a useful exercise to prove the bug, though. If one
reverts 84b9d6bceab6 and runs the test, we are able to reproduce the
original bug.

...

I am not sure that the
case of this thread is good enough to justify these changes, TBH.

Yes, I agree with that arguments. Initially, I attached the test to
prove the bug, and I mentioned my doubts about the necessity of pushing
it in my first message. The test is supposed to fail on unfixed code,
and then commit 84b9d6bceab6 is supposed to make it pass. After the
simplifications proposed by Andrey, I thought that having the test even
in its current shape is better than having no test at all, as its
purpose is to catch regressions. The intention to have a separate module
with proc_kill_attach_injection_wait_pid() and the other test helpers
was driven by the wish to minimize changes to the injection point code
and prevent the test from becoming more complex than the fix itself.

--
Best regards,
Vlad

#16Vlad Lesin
vladlesin@gmail.com
In reply to: Michael Paquier (#14)
Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race

On 5/28/26 04:08, Michael Paquier wrote:

Using two different points, one for the leader and one for the
follower is indeed the correct way to do things.

I am not sure about that. It is quite confusing to have several waiters
when only one of them is awakened when calling the "wake up" function.
If this logic is forbidden, it should be documented in the code
comments, or an error should be thrown when attempting to add more than
one waiter to the same injection point.

--
Best regards,
Vlad