From a85b2827f4500bc2e7c533feace474bc47086dfa Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 12 Aug 2023 07:06:08 +1200
Subject: [PATCH] De-pessimize ConditionVariableCancelSleep().

Commit b91dd9de was concerned with a theoretical problem with our
non-atomic condition variable operations.  If you stop sleeping, and
then cancel the sleep in a separate step, you might be signaled in
between, and that could be lost.  That doesn't matter for callers of
ConditionVariableBroadcast(), but callers of ConditionVariableSignal()
might be upset if a signal went missing like this.

In the past I imagine that the main case would be that this sort of race
would be rare and would come up if you used
ConditionVariableTimedSleep(), but then commit bc971f4025c invented a
new way to multiplex a CV with other events using latch waits directly.
Since it doesn't even use ConditionVariableSleep() (which normally puts
us back in the wait list), ConditionVariableCancelSleep() is confused
and forwards a signal when we *have* been woken up by the condition.
That produces a nasty storm of wakeups.

New idea: ConditionVariableCancelSleep() can just return true if you've
been signaled.  Hypothetical users of ConditionVariableSignal() would
then still have a way to deal with rare lost signals if they are
concerned about that problem.

Reported-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/2840876b-4cfe-240f-0a7e-29ffd66711e7%40enterprisedb.com

diff --git a/src/backend/storage/lmgr/condition_variable.c b/src/backend/storage/lmgr/condition_variable.c
index 7e2bbf46d9..910a768206 100644
--- a/src/backend/storage/lmgr/condition_variable.c
+++ b/src/backend/storage/lmgr/condition_variable.c
@@ -223,15 +223,17 @@ ConditionVariableTimedSleep(ConditionVariable *cv, long timeout,
  *
  * Do nothing if nothing is pending; this allows this function to be called
  * during transaction abort to clean up any unfinished CV sleep.
+ *
+ * Return true if we've been signaled.
  */
-void
+bool
 ConditionVariableCancelSleep(void)
 {
 	ConditionVariable *cv = cv_sleep_target;
 	bool		signaled = false;
 
 	if (cv == NULL)
-		return;
+		return false;
 
 	SpinLockAcquire(&cv->mutex);
 	if (proclist_contains(&cv->wakeup, MyProc->pgprocno, cvWaitLink))
@@ -240,15 +242,9 @@ ConditionVariableCancelSleep(void)
 		signaled = true;
 	SpinLockRelease(&cv->mutex);
 
-	/*
-	 * If we've received a signal, pass it on to another waiting process, if
-	 * there is one.  Otherwise a call to ConditionVariableSignal() might get
-	 * lost, despite there being another process ready to handle it.
-	 */
-	if (signaled)
-		ConditionVariableSignal(cv);
-
 	cv_sleep_target = NULL;
+
+	return signaled;
 }
 
 /*
diff --git a/src/include/storage/condition_variable.h b/src/include/storage/condition_variable.h
index 589bdd323c..e218cb2c49 100644
--- a/src/include/storage/condition_variable.h
+++ b/src/include/storage/condition_variable.h
@@ -56,7 +56,7 @@ extern void ConditionVariableInit(ConditionVariable *cv);
 extern void ConditionVariableSleep(ConditionVariable *cv, uint32 wait_event_info);
 extern bool ConditionVariableTimedSleep(ConditionVariable *cv, long timeout,
 										uint32 wait_event_info);
-extern void ConditionVariableCancelSleep(void);
+extern bool ConditionVariableCancelSleep(void);
 
 /*
  * Optionally, ConditionVariablePrepareToSleep can be called before entering
-- 
2.39.2

