injection_points: Switch wait/wakeup to use atomics rather than latches
Hi all,
(Adding Andrey in CC, as I'm sure he is interested in that.)
While looking at the test proposed on the thread about the ProcKill(),
I have been reminded about the fact that relying on latches and a
condition variable for the wait and the wakeups has its limits:
/messages/by-id/aheVjCHmcbXBtiy0@paquier.xyz
In this case, we are trying to synchronize backends once they don't
have latch assigned anymore, which defeats the purpose of wait/wakeup
because the condition variable used in injection_points while waiting
expects a Latch to be set for the processes we are waiting on.
Folks have complained about this limitation a couple of times in the
past, and I never got around to do something about it. While looking
at that I have finished with the patch attached, which was
surprisingly simpler than what I thought was needed. This replaces
the condition variable with a set of atomic counters. The counters
are incremented at wakeup, and the wait checks them on a periodic
basis. The wait loop uses a delay that increases over time, maxed at
100ms so as we can get a good responsiveness on fast machines, without
burning CPU for nothing in tests that require more wait time due to a
tight loop with the counter checks.
One thing worth noticing is the CHECK_FOR_INTERRUPTS() in the wait
loop, which is something we need for the autovacuum test in test_misc
that requires some signaling and interrupt processing.
It may make sense to be conservative and limit ourselves to do this
change on HEAD, but I'd like to suggest a backpatch down to v17 so as
future tests that rely on a such change can be backpatched. I would
need this change for the other test, still consistency in the facility
primes for me here.
Note: The CI seems happy with the patch.
Thoughts or comments?
--
Michael
Attachments:
0001-injection_points-Switch-wait-wakeup-to-rely-on-atomi.patchtext/plain; charset=us-asciiDownload+27-27
On Wed, May 27, 2026 at 10:43 PM Michael Paquier <michael@paquier.xyz> wrote:
While looking at the test proposed on the thread about the ProcKill(),
I have been reminded about the fact that relying on latches and a
condition variable for the wait and the wakeups has its limits:
/messages/by-id/aheVjCHmcbXBtiy0@paquier.xyz
After reading this email, the linked-to email, and the commit message
for the patch, I still don't have a clear understanding of what this
is intended to fix. It seems like it's going to make the
responsiveness worse. In general, we want to replace escalating wait
loops with things that wake up instantly at the right time, and this
is going in the opposite direction.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, May 28, 2026 at 08:40:39AM -0400, Robert Haas wrote:
After reading this email, the linked-to email, and the commit message
for the patch, I still don't have a clear understanding of what this
is intended to fix. It seems like it's going to make the
responsiveness worse. In general, we want to replace escalating wait
loops with things that wake up instantly at the right time, and this
is going in the opposite direction.
This is an exchange between responsiveness of the system and
flexibility. I have had two complaints in the past about the fact
that the waits and wakeups were not doable due to the fact that we
rely on condition variables and latches:
- Postmaster context (lack of dsm access as one). Heikki has
mentioned that to me once as annoying when hacking on tests there at
protocol level, at least.
- Second case as shown on the previous thread, which was a tricky
scenario involving the termination of backends.
One limitation is also related to wait event visibility, which may not
be visible in pg_stat_activity. We could simply add a LOG entry in
injection_wait() once the old count is read, and rely on a server log
lookup in the TAP tests where we cannot use pg_stat_activity.
Compared to redesigning all the facilities that injection_points
relies on, this patch was striking me as having a good balance in
terms of responsiveness (min 10us, max 100ms) vs portability. The
minimum threshold does not really matter much in terms of runtime on
fast machines.
Does this explanation make sense?
--
Michael
On Thu, May 28, 2026 at 7:19 PM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, May 28, 2026 at 08:40:39AM -0400, Robert Haas wrote:
After reading this email, the linked-to email, and the commit message
for the patch, I still don't have a clear understanding of what this
is intended to fix. It seems like it's going to make the
responsiveness worse. In general, we want to replace escalating wait
loops with things that wake up instantly at the right time, and this
is going in the opposite direction.This is an exchange between responsiveness of the system and
flexibility. I have had two complaints in the past about the fact
that the waits and wakeups were not doable due to the fact that we
rely on condition variables and latches:
I'm still struggling to understand. Condition variables and latches
are both designed to allow for nice waits and wakeups.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 29/05/2026 15:48, Robert Haas wrote:
On Thu, May 28, 2026 at 7:19 PM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, May 28, 2026 at 08:40:39AM -0400, Robert Haas wrote:
After reading this email, the linked-to email, and the commit message
for the patch, I still don't have a clear understanding of what this
is intended to fix. It seems like it's going to make the
responsiveness worse. In general, we want to replace escalating wait
loops with things that wake up instantly at the right time, and this
is going in the opposite direction.This is an exchange between responsiveness of the system and
flexibility. I have had two complaints in the past about the fact
that the waits and wakeups were not doable due to the fact that we
rely on condition variables and latches:I'm still struggling to understand. Condition variables and latches
are both designed to allow for nice waits and wakeups.
They only work after you have a PGPROC slot. If you want to inject code
to authentication, or into postmaster, you cannot use them.
- Heikki
On Fri, May 29, 2026 at 9:31 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
They only work after you have a PGPROC slot. If you want to inject code
to authentication, or into postmaster, you cannot use them.
OK, got it now.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, May 29, 2026 at 12:00:46PM -0400, Robert Haas wrote:
On Fri, May 29, 2026 at 9:31 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
They only work after you have a PGPROC slot. If you want to inject code
to authentication, or into postmaster, you cannot use them.OK, got it now.
It seems like Heikki's comment was better worded than mine.
Also mentioned upthread, but the lack of PGPROC also means a lack of
monitoring as wait events cannot be tracked. Currently, we rely on
that in the TAP tests. For cases where the procs are not available, I
don't have a better idea than generating a LOG entry after the wait
counts have been generated (with a PID) and couple that with a poll of
the server logs to let a script understand that a process is in
waiting mode.
As a whole, I don't think that we should try to be fancy with the
implementation, which is why I have used primitives that should work
in any context, and I'm not convinced that this is worth its own
facility if that just means more responsiveness (in most cases the
wait should not take more than a couple ms to notice a wakeup). I'm
open to more fancy ideas, of course.
--
Michael
On 28 May 2026, at 07:43, Michael Paquier <michael@paquier.xyz> wrote:
Andrey in CC, as I'm sure he is interested in that.
Thanks! That's exactly what I need for my tests.
On 29 May 2026, at 18:31, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
I'm still struggling to understand. Condition variables and latches
are both designed to allow for nice waits and wakeups.They only work after you have a PGPROC slot. If you want to inject code to authentication, or into postmaster, you cannot use them.
I have another reason: postmaster death behavior. When we wait on
ConVar and postmaster is kill-9-ed, we release all LWLocks. Which causes
corruption [0]/messages/by-id/B3C69B86-7F82-4111-B97F-0005497BB745@yandex-team.ru, because checkpointer can flush something that's not in WAL.
So I'm trying to build corruption-seeking tests using tool that can induce corruption
in tests.
About the patch:
- inj_state->wait_counts[index]++;
SpinLockRelease(&inj_state->lock);
- /* And broadcast the change to the waiters */
- ConditionVariableBroadcast(&inj_state->wait_point);
+ pg_atomic_fetch_add_u32(&inj_state->wait_counts[index], 1);
Can we move pg_atomic_fetch_add_u32() back under the lock?
We determine slot index under lock, then wakeup slot outside the lock.
In a correctly written test meaning this is not a problem.
However, technically, identity of a slot can change between releasing the lock
and incrementing wait_counts[index].
I'll do another pass tomorrow, maybe something else will catch my eye.
Best regards, Andrey Borodin.
[0]: /messages/by-id/B3C69B86-7F82-4111-B97F-0005497BB745@yandex-team.ru