Latch reset ordering bug in condition_variable.c
Hi,
ConditionVariablePrepareToSleep() has a race that can leave you
hanging, introduced by me in the v4 patch. The problem is that that
function resets our latch after adding our process to the wakeup list.
With the right timing, the following sequence can happen:
1. ConditionVariablePrepareToSleep() adds us to the wakeup list.
2. Some other process calls ConditionVariableSignal(). It removes us
from the wakeup list and sets our latch.
3. ConditionVariablePrepareToSleep() resets our latch.
4. We enter (or continue) our predicate loop. Our exit condition
happens not to be true yet, so we call ConditionVariableSleep().
5. ConditionVariableSleep() never returns because WaitEventSet()
blocks. Our latch is not set, yet we are no longer in the wakeup list
so ConditionalVariableSignal() will never set it.
We should reset the latch first. Then there is no way to reach
ConditionVariableSleep() with neither a set latch nor an entry in the
wakeup queue.
See attached. Thoughts?
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
fix-condition-variable-race.patchapplication/octet-stream; name=fix-condition-variable-race.patchDownload
diff --git a/src/backend/storage/lmgr/condition_variable.c b/src/backend/storage/lmgr/condition_variable.c
index 67fe177..6f1ef0b 100644
--- a/src/backend/storage/lmgr/condition_variable.c
+++ b/src/backend/storage/lmgr/condition_variable.c
@@ -71,14 +71,17 @@ ConditionVariablePrepareToSleep(ConditionVariable *cv)
&MyProc->procLatch, NULL);
}
+ /*
+ * Reset my latch before adding myself to the queue and before entering
+ * the caller's predicate loop.
+ */
+ ResetLatch(&MyProc->procLatch);
+
/* Add myself to the wait queue. */
SpinLockAcquire(&cv->mutex);
if (!proclist_contains(&cv->wakeup, pgprocno, cvWaitLink))
proclist_push_tail(&cv->wakeup, pgprocno, cvWaitLink);
SpinLockRelease(&cv->mutex);
-
- /* Reset my latch before entering the caller's predicate loop. */
- ResetLatch(&MyProc->procLatch);
}
/*--------------------------------------------------------------------------
On Thu, Feb 9, 2017 at 6:01 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
ConditionVariablePrepareToSleep() has a race that can leave you
hanging, introduced by me in the v4 patch. The problem is that that
function resets our latch after adding our process to the wakeup list.
With the right timing, the following sequence can happen:1. ConditionVariablePrepareToSleep() adds us to the wakeup list.
2. Some other process calls ConditionVariableSignal(). It removes us
from the wakeup list and sets our latch.
3. ConditionVariablePrepareToSleep() resets our latch.
4. We enter (or continue) our predicate loop. Our exit condition
happens not to be true yet, so we call ConditionVariableSleep().
5. ConditionVariableSleep() never returns because WaitEventSet()
blocks. Our latch is not set, yet we are no longer in the wakeup list
so ConditionalVariableSignal() will never set it.We should reset the latch first. Then there is no way to reach
ConditionVariableSleep() with neither a set latch nor an entry in the
wakeup queue.See attached. Thoughts?
Oops. Committed.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers