upgrades in row-level locks can deadlock
Hello,
I have recently observed a deadlock on one of our production servers related
to locking only a single row in a job table. There were two functions involved
in the deadlock, the first one acquires a “for key share” lock on the row that
represents the job it works on and subsequently updates it with the job’s end
time (we need multiple jobs to be operating on a single row concurrently,
that’s why there is a "for key share" lock). The other function starts by
acquiring the “for update” lock on the job row and then performs actions that
should not be run in parallel with other jobs.
The deadlock can be easily reproduced with the following statements. The
queries run against a table job (id integer primary key, name text) with a
single row of (1,'a'))
X1: select id from job where name = 'a' for key share;
Y: select id from job where name = 'a' for update; -- starts waiting for X1
X2: select id from job where name = 'a' for key share;
X1: update job set name = 'b' where id = 1;
X2: update job set name = 'c' where id = 1; -- starts waiting for X1
X1: rollback;
At this point, Y is terminated by the deadlock detector:
"deadlock detected",
Process 53937 waits for ShareLock on transaction 488; blocked by process 53953.
Process 53953 waits for ExclusiveLock on tuple (0,1) of relation 16386 of database 12931;
blocked by process 53937.
Process 53937: select id from job where name = 'a' for update;
Process 53953: update job set name = 'c' where id = 1;",
The deadlock is between X2 and Y. Y waits for X2 to finish, as X2 holds a "key
share" lock, incompatible with "for update" that Y attempts to acquire. On the
other hand, X2 needs to acquire the row lock to perform the update, and that
is a two-phase process: first, get the tuple lock and then wait for
conflicting transactions to finish, releasing the tuple lock afterward. X2
tries to acquire the tuple lock, but it is owned by Y. PostgreSQL detects the
deadlock and terminates Y.
Such a deadlock only occurs when three or more sessions locking the same row
are present and the lock is upgraded in at least one session. With only two
sessions the upgrade does not go through the lock manager, as there are no
conflicts with locks stored in the tuple.
That gave me an idea on how to change PostgreSQL to avoid deadlocking under
the condition above. When detecting the lock upgrade from the multixact, we
can avoid acquiring the tuple lock; however, we should still wait for the
mutlixact members that hold conflicting locks, to avoid acquiring incompatible
ones.
The patch is attached. I had to tweak heap_update and heap_delete alongside
the heap_lock_tuple, as they acquire row locks as well. I am not very happy
with overloading DoesMultiXactIdConflict with another function to check if
current transaction id is among the multixact members, perhaps it is worth to
have a separate function for this. We can figure this out if we agree this is
the problem that needs to be solved and on the solution. The other possible
objection is related to the statement from README.tuplock that we need to go
through the lock manager to avoid starving waiting exclusive-lockers. Since
this patch omits the tuple lock only when the lock upgrade happens, it does
limit the starvation condition to the cases when the lock compatible with the
one the waiting process asks for is acquired first and then gets upgraded to
the incompatible one. Since under present conditions the same operation will
likely deadlock and cancel the exclusive waiter altogether, I don't see this
as a strong objection.
Cheers,
Oleksii
Attachments:
deadlock_on_row_lock_upgrades_v1.diffapplication/octet-stream; name=deadlock_on_row_lock_upgrades_v1.diff; x-unix-mode=0644Download
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 19d2c529d8..0454896151 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -95,7 +95,7 @@ static void GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask,
static TransactionId MultiXactIdGetUpdateXid(TransactionId xmax,
uint16 t_infomask);
static bool DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
- LockTupleMode lockmode);
+ LockTupleMode lockmode, bool *has_current_xid);
static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status, uint16 infomask,
Relation rel, ItemPointer ctid, XLTW_Oper oper,
int *remaining);
@@ -2556,15 +2556,22 @@ l1:
*/
if (infomask & HEAP_XMAX_IS_MULTI)
{
+ bool has_current_xid = false;
/* wait for multixact */
if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
- LockTupleExclusive))
+ LockTupleExclusive, &has_current_xid))
{
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
- /* acquire tuple lock, if necessary */
- heap_acquire_tuplock(relation, &(tp.t_self), LockTupleExclusive,
- LockWaitBlock, &have_tuple_lock);
+ /*
+ * Acquire the lock, if necessary. Avoid acquiring it when
+ * getting the stronger lock while holding a weaker one to
+ * prevent deadlocks with other lockers that already have
+ * a tuple lock and waiting for our transaction to finish.
+ */
+ if (!has_current_xid)
+ heap_acquire_tuplock(relation, &(tp.t_self), LockTupleExclusive,
+ LockWaitBlock, &have_tuple_lock);
/* wait for multixact */
MultiXactIdWait((MultiXactId) xwait, MultiXactStatusUpdate, infomask,
@@ -3135,15 +3142,23 @@ l2:
{
TransactionId update_xact;
int remain;
+ bool has_current_xid = false;
if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
- *lockmode))
+ *lockmode, &has_current_xid))
{
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
- /* acquire tuple lock, if necessary */
- heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
- LockWaitBlock, &have_tuple_lock);
+ /*
+ * Acquire the lock, if necessary. Avoid acquiring it when
+ * getting the stronger lock while holding a weaker one to
+ * prevent deadlocks with other lockers that already have
+ * a tuple lock and waiting for our transaction to finish.
+ */
+
+ if (!has_current_xid)
+ heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
+ LockWaitBlock, &have_tuple_lock);
/* wait for multixact */
MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask,
@@ -4037,6 +4052,7 @@ l3:
uint16 infomask;
uint16 infomask2;
bool require_sleep;
+ bool skip_tuple_lock;
ItemPointerData t_ctid;
/* must copy state data before unlocking buffer */
@@ -4062,6 +4078,7 @@ l3:
if (first_time)
{
first_time = false;
+ skip_tuple_lock = false;
if (infomask & HEAP_XMAX_IS_MULTI)
{
@@ -4090,6 +4107,21 @@ l3:
result = TM_Ok;
goto out_unlocked;
}
+ else
+ {
+ /*
+ * Avoid acquiring the tuple level lock.
+ *
+ * Otherwise, when promoting a weaker lock, we may
+ * deadlock with another potential locker that has
+ * acquired a conflicting row lock and waits for our
+ * transaction to finish.
+ *
+ * Note that we will wait for the multixact
+ * if required to avoid acquiring conflicting locks.
+ */
+ skip_tuple_lock = true;
+ }
}
if (members)
@@ -4244,7 +4276,7 @@ l3:
if (infomask & HEAP_XMAX_IS_MULTI)
{
if (!DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
- mode))
+ mode, NULL))
{
/*
* No conflict, but if the xmax changed under us in the
@@ -4327,7 +4359,8 @@ l3:
* this arranges that we stay at the head of the line while
* rechecking tuple state.
*/
- if (!heap_acquire_tuplock(relation, tid, mode, wait_policy,
+ if (!skip_tuple_lock &&
+ !heap_acquire_tuplock(relation, tid, mode, wait_policy,
&have_tuple_lock))
{
/*
@@ -6525,16 +6558,21 @@ HeapTupleGetUpdateXid(HeapTupleHeader tuple)
* tuple lock of the given strength?
*
* The passed infomask pairs up with the given multixact in the tuple header.
+ * has_current_xid is set when not-NULL if the given multixact contains current
+ * transaction.
*/
static bool
DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
- LockTupleMode lockmode)
+ LockTupleMode lockmode, bool *has_current_xid)
{
int nmembers;
MultiXactMember *members;
bool result = false;
LOCKMODE wanted = tupleLockExtraInfo[lockmode].hwlock;
+ if (has_current_xid != NULL)
+ *has_current_xid = false;
+
if (HEAP_LOCKED_UPGRADED(infomask))
return false;
@@ -6549,15 +6587,24 @@ DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
TransactionId memxid;
LOCKMODE memlockmode;
- memlockmode = LOCKMODE_from_mxstatus(members[i].status);
+ if (result && (has_current_xid == NULL || *has_current_xid))
+ break;
- /* ignore members that don't conflict with the lock we want */
- if (!DoLockModesConflict(memlockmode, wanted))
- continue;
+ memlockmode = LOCKMODE_from_mxstatus(members[i].status);
/* ignore members from current xact */
memxid = members[i].xid;
if (TransactionIdIsCurrentTransactionId(memxid))
+ {
+ if (has_current_xid != NULL)
+ *has_current_xid = true;
+ continue;
+ }
+ else if (result)
+ continue;
+
+ /* ignore members that don't conflict with the lock we want */
+ if (!DoLockModesConflict(memlockmode, wanted))
continue;
if (ISUPDATE_from_mxstatus(members[i].status))
@@ -6576,10 +6623,11 @@ DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
/*
* Whatever remains are either live lockers that conflict with our
* wanted lock, and updaters that are not aborted. Those conflict
- * with what we want, so return true.
+ * with what we want. Return true on the next loop iteration unless
+ * we need to look for the current transaction among the multixact
+ * members.
*/
result = true;
- break;
}
pfree(members);
}
Hello
On 2019-May-22, Oleksii Kliukin wrote:
I have recently observed a deadlock on one of our production servers related
to locking only a single row in a job table. There were two functions involved
in the deadlock, the first one acquires a “for key share” lock on the row that
represents the job it works on and subsequently updates it with the job’s end
time (we need multiple jobs to be operating on a single row concurrently,
that’s why there is a "for key share" lock). The other function starts by
acquiring the “for update” lock on the job row and then performs actions that
should not be run in parallel with other jobs.The deadlock can be easily reproduced with the following statements. The
queries run against a table job (id integer primary key, name text) with a
single row of (1,'a'))
Hmm, great find.
X1: select id from job where name = 'a' for key share;
Y: select id from job where name = 'a' for update; -- starts waiting for X1
X2: select id from job where name = 'a' for key share;
X1: update job set name = 'b' where id = 1;
X2: update job set name = 'c' where id = 1; -- starts waiting for X1
X1: rollback;At this point, Y is terminated by the deadlock detector:
Yeah, this seems undesirable in general terms. Here's a quick
isolationtester spec that reproduces the problem:
setup {
drop table if exists tlu_job;
create table tlu_job (id integer primary key, name text);
insert into tlu_job values (1, 'a');
}
teardown {
drop table tlu_job;
}
session "s1"
setup { begin; }
step "s1_keyshare" { select id from tlu_job where name = 'a' for key share; }
step "s1_update" { update tlu_job set name = 'b' where id = 1; }
step "s1_rollback" { rollback; }
session "s2"
setup { begin; }
step "s2_keyshare" { select id from tlu_job where name = 'a' for key share; }
step "s2_update" { update tlu_job set name = 'c' where id = 1; }
step "s2_commit" { commit; }
session "s3"
setup { begin; }
step "s3_forupd" { select id from tlu_job where name = 'a' for update; }
teardown { commit; }
# Alexey's permutation
permutation "s1_keyshare" "s3_forupd" "s2_keyshare" "s1_update" "s2_update" "s1_rollback" "s2_commit"
(X1 is s1, X2 is s2, Y is s3).
Permutations such as that one report a deadlock with the original code,
and does not report a deadlock after your proposed patch.
permutation "s1_keyshare" "s1_update" "s2_keyshare" "s3_forupd" "s2_update" "s1_rollback" "s2_commit"
But semantically, I wonder if your transactions are correct. If you
intend to modify the row in s1 and s2, shouldn't you be acquiring FOR NO
KEY UPDATE lock instead? I don't see how can s1 and s2 coexist
peacefully. Also, can your Y transaction use FOR NO KEY UPDATE instead
.. unless you intend to delete the tuple in that transaction?
I'm mulling over your proposed fix. I don't much like the idea that
DoesMultiXactIdConflict() returns that new boolean -- seems pretty
ad-hoc -- but I don't see any way to do better than that ... (If we get
down to details, DoesMultiXactIdConflict needn't initialize that
boolean: better let the callers do.)
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Oleksii Kliukin <alexk@hintbits.com> wrote:
Hi,
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2019-May-22, Oleksii Kliukin wrote:
X1: select id from job where name = 'a' for key share;
Y: select id from job where name = 'a' for update; -- starts waiting for X1
X2: select id from job where name = 'a' for key share;
X1: update job set name = 'b' where id = 1;
X2: update job set name = 'c' where id = 1; -- starts waiting for X1
X1: rollback;At this point, Y is terminated by the deadlock detector:
Yeah, this seems undesirable in general terms. Here's a quick
isolationtester spec that reproduces the problem:setup {
drop table if exists tlu_job;
create table tlu_job (id integer primary key, name text);
insert into tlu_job values (1, 'a');
}teardown {
drop table tlu_job;
}session "s1"
setup { begin; }
step "s1_keyshare" { select id from tlu_job where name = 'a' for key share; }
step "s1_update" { update tlu_job set name = 'b' where id = 1; }
step "s1_rollback" { rollback; }session "s2"
setup { begin; }
step "s2_keyshare" { select id from tlu_job where name = 'a' for key share; }
step "s2_update" { update tlu_job set name = 'c' where id = 1; }
step "s2_commit" { commit; }session "s3"
setup { begin; }
step "s3_forupd" { select id from tlu_job where name = 'a' for update; }
teardown { commit; }Thank you! I can make it even simpler; s1 just acquires for share lock, s3
gets for update one and s2 takes for share lock first, and then tries to
acquire for update one; once s1 finishes, s3 deadlocks.But semantically, I wonder if your transactions are correct. If you
intend to modify the row in s1 and s2, shouldn't you be acquiring FOR NO
KEY UPDATE lock instead? I don't see how can s1 and s2 coexist
peacefully. Also, can your Y transaction use FOR NO KEY UPDATE instead
.. unless you intend to delete the tuple in that transaction?It is correct. I wanted to make sure jobs that acquire for key share lock
can run concurrently most of the time; they only execute one update at the
end of the job, and it is just to update the last run timestamp.I'm mulling over your proposed fix. I don't much like the idea that
DoesMultiXactIdConflict() returns that new boolean -- seems pretty
ad-hoc -- but I don't see any way to do better than that ... (If we get
down to details, DoesMultiXactIdConflict needn't initialize that
boolean: better let the callers do.)I am also not happy about the new parameter to DoesMultiXactIdConflict, but
calling a separate function to fetch the presence of the current transaction
in the multixact would mean doing the job of DoesMultiXactIdConflict twice.
I am open to suggestions on how to make it nicer.Attached is a slightly modified patch that avoids initializing
has_current_xid inside DoesMultiXactIdConflict and should apply cleanly to
the current master.
And here is the v3 that also includes the isolation test I described above
(quoting my previous message in full as I accidentally sent it off-list,
sorry about that).
Cheers,
Oleksii
Attachments:
deadlock_on_row_lock_upgrades_v3.diffapplication/octet-stream; name=deadlock_on_row_lock_upgrades_v3.diff; x-unix-mode=0644Download
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8ac0f8a513..45100bab1d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -95,7 +95,7 @@ static void GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask,
static TransactionId MultiXactIdGetUpdateXid(TransactionId xmax,
uint16 t_infomask);
static bool DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
- LockTupleMode lockmode);
+ LockTupleMode lockmode, bool *has_current_xid);
static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status, uint16 infomask,
Relation rel, ItemPointer ctid, XLTW_Oper oper,
int *remaining);
@@ -2547,15 +2547,22 @@ l1:
*/
if (infomask & HEAP_XMAX_IS_MULTI)
{
+ bool has_current_xid = false;
/* wait for multixact */
if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
- LockTupleExclusive))
+ LockTupleExclusive, &has_current_xid))
{
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
- /* acquire tuple lock, if necessary */
- heap_acquire_tuplock(relation, &(tp.t_self), LockTupleExclusive,
- LockWaitBlock, &have_tuple_lock);
+ /*
+ * Acquire the lock, if necessary. Avoid acquiring it when
+ * getting the stronger lock while holding a weaker one to
+ * prevent deadlocks with other lockers that already have
+ * a tuple lock and waiting for our transaction to finish.
+ */
+ if (!has_current_xid)
+ heap_acquire_tuplock(relation, &(tp.t_self), LockTupleExclusive,
+ LockWaitBlock, &have_tuple_lock);
/* wait for multixact */
MultiXactIdWait((MultiXactId) xwait, MultiXactStatusUpdate, infomask,
@@ -3126,15 +3133,23 @@ l2:
{
TransactionId update_xact;
int remain;
+ bool has_current_xid = false;
if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
- *lockmode))
+ *lockmode, &has_current_xid))
{
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
- /* acquire tuple lock, if necessary */
- heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
- LockWaitBlock, &have_tuple_lock);
+ /*
+ * Acquire the lock, if necessary. Avoid acquiring it when
+ * getting the stronger lock while holding a weaker one to
+ * prevent deadlocks with other lockers that already have
+ * a tuple lock and waiting for our transaction to finish.
+ */
+
+ if (!has_current_xid)
+ heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
+ LockWaitBlock, &have_tuple_lock);
/* wait for multixact */
MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask,
@@ -4028,6 +4043,7 @@ l3:
uint16 infomask;
uint16 infomask2;
bool require_sleep;
+ bool skip_tuple_lock;
ItemPointerData t_ctid;
/* must copy state data before unlocking buffer */
@@ -4053,6 +4069,7 @@ l3:
if (first_time)
{
first_time = false;
+ skip_tuple_lock = false;
if (infomask & HEAP_XMAX_IS_MULTI)
{
@@ -4081,6 +4098,21 @@ l3:
result = TM_Ok;
goto out_unlocked;
}
+ else
+ {
+ /*
+ * Avoid acquiring the tuple level lock.
+ *
+ * Otherwise, when promoting a weaker lock, we may
+ * deadlock with another potential locker that has
+ * acquired a conflicting row lock and waits for our
+ * transaction to finish.
+ *
+ * Note that we will wait for the multixact
+ * if required to avoid acquiring conflicting locks.
+ */
+ skip_tuple_lock = true;
+ }
}
if (members)
@@ -4235,7 +4267,7 @@ l3:
if (infomask & HEAP_XMAX_IS_MULTI)
{
if (!DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
- mode))
+ mode, NULL))
{
/*
* No conflict, but if the xmax changed under us in the
@@ -4318,7 +4350,8 @@ l3:
* this arranges that we stay at the head of the line while
* rechecking tuple state.
*/
- if (!heap_acquire_tuplock(relation, tid, mode, wait_policy,
+ if (!skip_tuple_lock &&
+ !heap_acquire_tuplock(relation, tid, mode, wait_policy,
&have_tuple_lock))
{
/*
@@ -6516,10 +6549,12 @@ HeapTupleGetUpdateXid(HeapTupleHeader tuple)
* tuple lock of the given strength?
*
* The passed infomask pairs up with the given multixact in the tuple header.
+ * has_current_xid is set when not-NULL if the given multixact contains current
+ * transaction.
*/
static bool
DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
- LockTupleMode lockmode)
+ LockTupleMode lockmode, bool *has_current_xid)
{
int nmembers;
MultiXactMember *members;
@@ -6540,15 +6575,24 @@ DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
TransactionId memxid;
LOCKMODE memlockmode;
- memlockmode = LOCKMODE_from_mxstatus(members[i].status);
+ if (result && (has_current_xid == NULL || *has_current_xid))
+ break;
- /* ignore members that don't conflict with the lock we want */
- if (!DoLockModesConflict(memlockmode, wanted))
- continue;
+ memlockmode = LOCKMODE_from_mxstatus(members[i].status);
/* ignore members from current xact */
memxid = members[i].xid;
if (TransactionIdIsCurrentTransactionId(memxid))
+ {
+ if (has_current_xid != NULL)
+ *has_current_xid = true;
+ continue;
+ }
+ else if (result)
+ continue;
+
+ /* ignore members that don't conflict with the lock we want */
+ if (!DoLockModesConflict(memlockmode, wanted))
continue;
if (ISUPDATE_from_mxstatus(members[i].status))
@@ -6567,10 +6611,11 @@ DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
/*
* Whatever remains are either live lockers that conflict with our
* wanted lock, and updaters that are not aborted. Those conflict
- * with what we want, so return true.
+ * with what we want. Return true on the next loop iteration unless
+ * we need to look for the current transaction among the multixact
+ * members.
*/
result = true;
- break;
}
pfree(members);
}
diff --git a/src/test/isolation/expected/rowlock-upgrade-deadlock.out b/src/test/isolation/expected/rowlock-upgrade-deadlock.out
new file mode 100644
index 0000000000..537c522535
--- /dev/null
+++ b/src/test/isolation/expected/rowlock-upgrade-deadlock.out
@@ -0,0 +1,24 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s1l s2l s3sl s3ul s1r s3r s2r
+step s1l: select id from t where id = 1 for share;
+id
+
+1
+step s2l: select id from t where id = 1 for update; <waiting ...>
+step s3sl: select id from t where id = 1 for share;
+id
+
+1
+step s3ul: select id from t where id = 1 for update; <waiting ...>
+step s1r: rollback;
+step s3ul: <... completed>
+id
+
+1
+step s3r: rollback;
+step s2l: <... completed>
+id
+
+1
+step s2r: rollback;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 889b4d827a..8e8c9a919f 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -80,6 +80,7 @@ test: partition-key-update-1
test: partition-key-update-2
test: partition-key-update-3
test: partition-key-update-4
+test: rowlock-upgrade-deadlock
test: plpgsql-toast
test: truncate-conflict
test: serializable-parallel
diff --git a/src/test/isolation/specs/rowlock-upgrade-deadlock.spec b/src/test/isolation/specs/rowlock-upgrade-deadlock.spec
new file mode 100644
index 0000000000..0c031ef07f
--- /dev/null
+++ b/src/test/isolation/specs/rowlock-upgrade-deadlock.spec
@@ -0,0 +1,35 @@
+# This test checks that multiple sessions locking a single row in a table
+# does not deadlock each other, particularly when one of them upgrades its
+# lock while the others are waiting.
+
+setup
+{
+ drop table if exists t;
+ create table t (id integer);
+
+ insert into t values(1);
+}
+
+teardown
+{
+ drop table t;
+}
+
+session "s1"
+setup { begin; }
+step "s1l" { select id from t where id = 1 for share; }
+step "s1r" { rollback; }
+
+session "s2"
+setup { begin; }
+step "s2l" { select id from t where id = 1 for update; }
+step "s2r" { rollback; }
+
+session "s3"
+setup { begin; }
+step "s3sl" { select id from t where id = 1 for share; }
+step "s3ul" { select id from t where id = 1 for update; }
+step "s3r" { rollback; }
+
+# test that s2 will not deadlock with s3 when s1 is rolled back
+permutation "s1l" "s2l" "s3sl" "s3ul" "s1r" "s3r" "s2r"
Import Notes
Reply to msg id not found: 62A83BCF-1B90-44F8-BADF-69F750E677B6@hintbits.com
On 2019-Jun-12, Oleksii Kliukin wrote:
Thank you! I can make it even simpler; s1 just acquires for share lock, s3
gets for update one and s2 takes for share lock first, and then tries to
acquire for update one; once s1 finishes, s3 deadlocks.
Cool. I think it would be worthwhile to include a number of reasonable
permutations instead of just one, and make sure they all work correctly.
I don't think we need to include all possible permutations, just a few.
I think we need at least enough permutations to cover the two places of
the code that are modified by the patch, for both values of
have_current_xid (so there should be four permutations, I think).
Please don't simplify the table name to just "t" -- the reason I used
another name is that we want these tests to be able to run concurrently
at some point; ref.
/messages/by-id/20180124231006.z7spaz5gkzbdvob5@alvherre.pgsql
But semantically, I wonder if your transactions are correct. If you
intend to modify the row in s1 and s2, shouldn't you be acquiring FOR NO
KEY UPDATE lock instead? I don't see how can s1 and s2 coexist
peacefully. Also, can your Y transaction use FOR NO KEY UPDATE instead
.. unless you intend to delete the tuple in that transaction?It is correct. I wanted to make sure jobs that acquire for key share lock
can run concurrently most of the time; they only execute one update at the
end of the job, and it is just to update the last run timestamp.
I see. Under READ COMMITTED it works okay, I suppose.
I'm mulling over your proposed fix. I don't much like the idea that
DoesMultiXactIdConflict() returns that new boolean -- seems pretty
ad-hoc -- but I don't see any way to do better than that ... (If we get
down to details, DoesMultiXactIdConflict needn't initialize that
boolean: better let the callers do.)I am also not happy about the new parameter to DoesMultiXactIdConflict, but
calling a separate function to fetch the presence of the current transaction
in the multixact would mean doing the job of DoesMultiXactIdConflict twice.
I am open to suggestions on how to make it nicer.
Yeah, I didn't find anything better either. We could make things more
complex that we resolve the multixact once and then extract the two
sepraate bits of information that we need from that ... but it ends up
being uglier and messier for no real gain. So let's go with your
original idea.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Import Notes
Reply to msg id not found: E621393D-A213-41B0-8C9D-584F8EC9287F@hintbits.com62A83BCF-1B90-44F8-BADF-69F750E677B6@hintbits.com | Resolved by subject fallback
Hello,
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2019-Jun-12, Oleksii Kliukin wrote:
Thank you! I can make it even simpler; s1 just acquires for share lock, s3
gets for update one and s2 takes for share lock first, and then tries to
acquire for update one; once s1 finishes, s3 deadlocks.Cool. I think it would be worthwhile to include a number of reasonable
permutations instead of just one, and make sure they all work correctly.
I don't think we need to include all possible permutations, just a few.
I think we need at least enough permutations to cover the two places of
the code that are modified by the patch, for both values of
have_current_xid (so there should be four permutations, I think).
Makes sense. For the symmetry I have included those that perform lock
upgrades in one session and those that doesn’t, while the other sessions
acquire locks, do updates or deletes. For those that don’t upgrade locks the
test checks that the locks are acquired in the correct order.
Please don't simplify the table name to just "t" -- the reason I used
another name is that we want these tests to be able to run concurrently
at some point; ref.
/messages/by-id/20180124231006.z7spaz5gkzbdvob5@alvherre.pgsql
Alright, thanks.
But semantically, I wonder if your transactions are correct. If you
intend to modify the row in s1 and s2, shouldn't you be acquiring FOR NO
KEY UPDATE lock instead? I don't see how can s1 and s2 coexist
peacefully. Also, can your Y transaction use FOR NO KEY UPDATE instead
.. unless you intend to delete the tuple in that transaction?It is correct. I wanted to make sure jobs that acquire for key share lock
can run concurrently most of the time; they only execute one update at the
end of the job, and it is just to update the last run timestamp.I see. Under READ COMMITTED it works okay, I suppose.
I'm mulling over your proposed fix. I don't much like the idea that
DoesMultiXactIdConflict() returns that new boolean -- seems pretty
ad-hoc -- but I don't see any way to do better than that ... (If we get
down to details, DoesMultiXactIdConflict needn't initialize that
boolean: better let the callers do.)I am also not happy about the new parameter to DoesMultiXactIdConflict, but
calling a separate function to fetch the presence of the current transaction
in the multixact would mean doing the job of DoesMultiXactIdConflict twice.
I am open to suggestions on how to make it nicer.Yeah, I didn't find anything better either. We could make things more
complex that we resolve the multixact once and then extract the two
sepraate bits of information that we need from that ... but it ends up
being uglier and messier for no real gain. So let's go with your
original idea.
Ok, the v4 is attached. I have addressed your suggestion for the isolation
tests, added a paragraph to README.tuplock explaining why do we skip
LockTuple to avoid a deadlock in the session that upgrades its lock.
Cheers,
Oleksii
Attachments:
deadlock_on_row_lock_upgrades_v4.diffapplication/octet-stream; name=deadlock_on_row_lock_upgrades_v4.diff; x-unix-mode=0644Download
diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index b2f3a4ce90..d7a085366a 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -36,6 +36,17 @@ do LockTuple as well, if there is any conflict, to ensure that they don't
starve out waiting exclusive-lockers. However, if there is not any active
conflict for a tuple, we don't incur any extra overhead.
+We make an exception from the rule above for those lockers that already hold
+some lock on a tuple and attempt to acquire a stronger one on it. We have to
+let them skip the LockTuple() call even when there are conflicts, provided
+that the target tuple is being locked, updated or deleted by multiple sessions
+concurrently. If we didn't make such an exception we could have a deadlock,
+e.g. between a session that has recorded its weaker lock in the tuple header
+and is waiting on a LockTuple() call to upgrade to the stronger lock level and
+another one that has already acquired the tuple-level lock from the
+LockTuple() but at the same time waiting for the first session transaction to
+finish due to a conflicting lock level.
+
We provide four levels of tuple locking strength: SELECT FOR UPDATE obtains an
exclusive lock which prevents any kind of modification of the tuple. This is
the lock level that is implicitly taken by DELETE operations, and also by
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8ac0f8a513..45100bab1d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -95,7 +95,7 @@ static void GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask,
static TransactionId MultiXactIdGetUpdateXid(TransactionId xmax,
uint16 t_infomask);
static bool DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
- LockTupleMode lockmode);
+ LockTupleMode lockmode, bool *has_current_xid);
static void MultiXactIdWait(MultiXactId multi, MultiXactStatus status, uint16 infomask,
Relation rel, ItemPointer ctid, XLTW_Oper oper,
int *remaining);
@@ -2547,15 +2547,22 @@ l1:
*/
if (infomask & HEAP_XMAX_IS_MULTI)
{
+ bool has_current_xid = false;
/* wait for multixact */
if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
- LockTupleExclusive))
+ LockTupleExclusive, &has_current_xid))
{
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
- /* acquire tuple lock, if necessary */
- heap_acquire_tuplock(relation, &(tp.t_self), LockTupleExclusive,
- LockWaitBlock, &have_tuple_lock);
+ /*
+ * Acquire the lock, if necessary. Avoid acquiring it when
+ * getting the stronger lock while holding a weaker one to
+ * prevent deadlocks with other lockers that already have
+ * a tuple lock and waiting for our transaction to finish.
+ */
+ if (!has_current_xid)
+ heap_acquire_tuplock(relation, &(tp.t_self), LockTupleExclusive,
+ LockWaitBlock, &have_tuple_lock);
/* wait for multixact */
MultiXactIdWait((MultiXactId) xwait, MultiXactStatusUpdate, infomask,
@@ -3126,15 +3133,23 @@ l2:
{
TransactionId update_xact;
int remain;
+ bool has_current_xid = false;
if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
- *lockmode))
+ *lockmode, &has_current_xid))
{
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
- /* acquire tuple lock, if necessary */
- heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
- LockWaitBlock, &have_tuple_lock);
+ /*
+ * Acquire the lock, if necessary. Avoid acquiring it when
+ * getting the stronger lock while holding a weaker one to
+ * prevent deadlocks with other lockers that already have
+ * a tuple lock and waiting for our transaction to finish.
+ */
+
+ if (!has_current_xid)
+ heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
+ LockWaitBlock, &have_tuple_lock);
/* wait for multixact */
MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask,
@@ -4028,6 +4043,7 @@ l3:
uint16 infomask;
uint16 infomask2;
bool require_sleep;
+ bool skip_tuple_lock;
ItemPointerData t_ctid;
/* must copy state data before unlocking buffer */
@@ -4053,6 +4069,7 @@ l3:
if (first_time)
{
first_time = false;
+ skip_tuple_lock = false;
if (infomask & HEAP_XMAX_IS_MULTI)
{
@@ -4081,6 +4098,21 @@ l3:
result = TM_Ok;
goto out_unlocked;
}
+ else
+ {
+ /*
+ * Avoid acquiring the tuple level lock.
+ *
+ * Otherwise, when promoting a weaker lock, we may
+ * deadlock with another potential locker that has
+ * acquired a conflicting row lock and waits for our
+ * transaction to finish.
+ *
+ * Note that we will wait for the multixact
+ * if required to avoid acquiring conflicting locks.
+ */
+ skip_tuple_lock = true;
+ }
}
if (members)
@@ -4235,7 +4267,7 @@ l3:
if (infomask & HEAP_XMAX_IS_MULTI)
{
if (!DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
- mode))
+ mode, NULL))
{
/*
* No conflict, but if the xmax changed under us in the
@@ -4318,7 +4350,8 @@ l3:
* this arranges that we stay at the head of the line while
* rechecking tuple state.
*/
- if (!heap_acquire_tuplock(relation, tid, mode, wait_policy,
+ if (!skip_tuple_lock &&
+ !heap_acquire_tuplock(relation, tid, mode, wait_policy,
&have_tuple_lock))
{
/*
@@ -6516,10 +6549,12 @@ HeapTupleGetUpdateXid(HeapTupleHeader tuple)
* tuple lock of the given strength?
*
* The passed infomask pairs up with the given multixact in the tuple header.
+ * has_current_xid is set when not-NULL if the given multixact contains current
+ * transaction.
*/
static bool
DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
- LockTupleMode lockmode)
+ LockTupleMode lockmode, bool *has_current_xid)
{
int nmembers;
MultiXactMember *members;
@@ -6540,15 +6575,24 @@ DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
TransactionId memxid;
LOCKMODE memlockmode;
- memlockmode = LOCKMODE_from_mxstatus(members[i].status);
+ if (result && (has_current_xid == NULL || *has_current_xid))
+ break;
- /* ignore members that don't conflict with the lock we want */
- if (!DoLockModesConflict(memlockmode, wanted))
- continue;
+ memlockmode = LOCKMODE_from_mxstatus(members[i].status);
/* ignore members from current xact */
memxid = members[i].xid;
if (TransactionIdIsCurrentTransactionId(memxid))
+ {
+ if (has_current_xid != NULL)
+ *has_current_xid = true;
+ continue;
+ }
+ else if (result)
+ continue;
+
+ /* ignore members that don't conflict with the lock we want */
+ if (!DoLockModesConflict(memlockmode, wanted))
continue;
if (ISUPDATE_from_mxstatus(members[i].status))
@@ -6567,10 +6611,11 @@ DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
/*
* Whatever remains are either live lockers that conflict with our
* wanted lock, and updaters that are not aborted. Those conflict
- * with what we want, so return true.
+ * with what we want. Return true on the next loop iteration unless
+ * we need to look for the current transaction among the multixact
+ * members.
*/
result = true;
- break;
}
pfree(members);
}
diff --git a/src/test/isolation/expected/rowlock-upgrade-deadlock.out b/src/test/isolation/expected/rowlock-upgrade-deadlock.out
new file mode 100644
index 0000000000..99d12b125b
--- /dev/null
+++ b/src/test/isolation/expected/rowlock-upgrade-deadlock.out
@@ -0,0 +1,110 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s1_share s2_for_update s3_share s3_for_update s1_rollback s3_rollback s2_rollback
+step s1_share: select id from tlu_job where id = 1 for share;
+id
+
+1
+step s2_for_update: select id from tlu_job where id = 1 for update; <waiting ...>
+step s3_share: select id from tlu_job where id = 1 for share;
+id
+
+1
+step s3_for_update: select id from tlu_job where id = 1 for update; <waiting ...>
+step s1_rollback: rollback;
+step s3_for_update: <... completed>
+id
+
+1
+step s3_rollback: rollback;
+step s2_for_update: <... completed>
+id
+
+1
+step s2_rollback: rollback;
+
+starting permutation: s1_keyshare s2_for_update s3_keyshare s1_update s3_update s1_rollback s3_rollback s2_rollback
+step s1_keyshare: select id from tlu_job where id = 1 for key share;
+id
+
+1
+step s2_for_update: select id from tlu_job where id = 1 for update; <waiting ...>
+step s3_keyshare: select id from tlu_job where id = 1 for key share;
+id
+
+1
+step s1_update: update tlu_job set name = 'b' where id = 1;
+step s3_update: update tlu_job set name = 'b' where id = 1; <waiting ...>
+step s1_rollback: rollback;
+step s3_update: <... completed>
+step s3_rollback: rollback;
+step s2_for_update: <... completed>
+id
+
+1
+step s2_rollback: rollback;
+
+starting permutation: s1_keyshare s2_for_update s3_keyshare s3_delete s1_rollback s3_rollback s2_rollback
+step s1_keyshare: select id from tlu_job where id = 1 for key share;
+id
+
+1
+step s2_for_update: select id from tlu_job where id = 1 for update; <waiting ...>
+step s3_keyshare: select id from tlu_job where id = 1 for key share;
+id
+
+1
+step s3_delete: delete from tlu_job where id = 1; <waiting ...>
+step s1_rollback: rollback;
+step s3_delete: <... completed>
+step s3_rollback: rollback;
+step s2_for_update: <... completed>
+id
+
+1
+step s2_rollback: rollback;
+
+starting permutation: s1_share s2_for_update s3_for_update s1_rollback s2_rollback s3_rollback
+step s1_share: select id from tlu_job where id = 1 for share;
+id
+
+1
+step s2_for_update: select id from tlu_job where id = 1 for update; <waiting ...>
+step s3_for_update: select id from tlu_job where id = 1 for update; <waiting ...>
+step s1_rollback: rollback;
+step s2_for_update: <... completed>
+id
+
+1
+step s2_rollback: rollback;
+step s3_for_update: <... completed>
+id
+
+1
+step s3_rollback: rollback;
+
+starting permutation: s1_share s2_update s3_update s1_rollback s2_rollback s3_rollback
+step s1_share: select id from tlu_job where id = 1 for share;
+id
+
+1
+step s2_update: update tlu_job set name = 'b' where id = 1; <waiting ...>
+step s3_update: update tlu_job set name = 'b' where id = 1; <waiting ...>
+step s1_rollback: rollback;
+step s2_update: <... completed>
+step s2_rollback: rollback;
+step s3_update: <... completed>
+step s3_rollback: rollback;
+
+starting permutation: s1_share s2_delete s3_delete s1_rollback s2_rollback s3_rollback
+step s1_share: select id from tlu_job where id = 1 for share;
+id
+
+1
+step s2_delete: delete from tlu_job where id = 1; <waiting ...>
+step s3_delete: delete from tlu_job where id = 1; <waiting ...>
+step s1_rollback: rollback;
+step s2_delete: <... completed>
+step s2_rollback: rollback;
+step s3_delete: <... completed>
+step s3_rollback: rollback;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 889b4d827a..3b3bc10a65 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -42,6 +42,7 @@ test: delete-abort-savept-2
test: aborted-keyrevoke
test: multixact-no-deadlock
test: multixact-no-forget
+test: rowlock-upgrade-deadlock
test: lock-committed-update
test: lock-committed-keyupdate
test: update-locked-tuple
diff --git a/src/test/isolation/specs/rowlock-upgrade-deadlock.spec b/src/test/isolation/specs/rowlock-upgrade-deadlock.spec
new file mode 100644
index 0000000000..7e0230c7f7
--- /dev/null
+++ b/src/test/isolation/specs/rowlock-upgrade-deadlock.spec
@@ -0,0 +1,52 @@
+# This test checks that multiple sessions locking a single row in a table
+# does not deadlock each other, particularly when one of them upgrades its
+# lock while the others are waiting.
+
+setup
+{
+ drop table if exists tlu_job;
+ create table tlu_job (id integer primary key, name text);
+
+ insert into tlu_job values(1, 'a');
+}
+
+
+teardown
+{
+ drop table tlu_job;
+}
+
+session "s1"
+setup { begin; }
+step "s1_keyshare" { select id from tlu_job where id = 1 for key share;}
+step "s1_share" { select id from tlu_job where id = 1 for share; }
+step "s1_update" { update tlu_job set name = 'b' where id = 1; }
+step "s1_delete" { delete from tlu_job where id = 1; }
+step "s1_rollback" { rollback; }
+
+session "s2"
+setup { begin; }
+step "s2_for_update" { select id from tlu_job where id = 1 for update; }
+step "s2_update" { update tlu_job set name = 'b' where id = 1; }
+step "s2_delete" { delete from tlu_job where id = 1; }
+step "s2_rollback" { rollback; }
+
+session "s3"
+setup { begin; }
+step "s3_keyshare" { select id from tlu_job where id = 1 for key share; }
+step "s3_share" { select id from tlu_job where id = 1 for share; }
+step "s3_for_update" { select id from tlu_job where id = 1 for update; }
+step "s3_update" { update tlu_job set name = 'b' where id = 1; }
+step "s3_delete" { delete from tlu_job where id = 1; }
+step "s3_rollback" { rollback; }
+
+# test that s2 will not deadlock with s3 when s1 is rolled back
+permutation "s1_share" "s2_for_update" "s3_share" "s3_for_update" "s1_rollback" "s3_rollback" "s2_rollback"
+# test that update does not cause deadlocks if it can proceed
+permutation "s1_keyshare" "s2_for_update" "s3_keyshare" "s1_update" "s3_update" "s1_rollback" "s3_rollback" "s2_rollback"
+# test that delete does not cause deadlocks if it can proceed
+permutation "s1_keyshare" "s2_for_update" "s3_keyshare" "s3_delete" "s1_rollback" "s3_rollback" "s2_rollback"
+# test that sessions that don't upgrade locks acquire them in order
+permutation "s1_share" "s2_for_update" "s3_for_update" "s1_rollback" "s2_rollback" "s3_rollback"
+permutation "s1_share" "s2_update" "s3_update" "s1_rollback" "s2_rollback" "s3_rollback"
+permutation "s1_share" "s2_delete" "s3_delete" "s1_rollback" "s2_rollback" "s3_rollback"
On Wed, Jun 12, 2019 at 12:47 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
Please don't simplify the table name to just "t" -- the reason I used
another name is that we want these tests to be able to run concurrently
at some point; ref.
/messages/by-id/20180124231006.z7spaz5gkzbdvob5@alvherre.pgsql
Not only that, but 't' is completely ungreppable. If you name the
table 'walrus' and five years from now somebody sees an error about it
in some buildfarm log or whatever, they can type 'git grep walrus' to
find the test, and they'll probably only get that one hit. If you
name it 't', well...
[rhaas pgsql]$ git grep t | wc -l
1653468
Not very helpful.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2019-Jun-13, Oleksii Kliukin wrote:
Makes sense. For the symmetry I have included those that perform lock
upgrades in one session and those that doesn’t, while the other sessions
acquire locks, do updates or deletes. For those that don’t upgrade locks the
test checks that the locks are acquired in the correct order.
Thanks for the updated patch! I'm about to push to branches 9.6-master.
It applies semi-cleanly (only pgindent-maturity whitespace conflicts).
The [pg11 version of the] patch does applies to 9.5 cleanly ... but the
isolation test doesn't work, because isolationtester was not smart
enough back then. Since there have been no previous reports of this
problem, and to avoid pushing untested code, I'm going to refrain from
back-patching there. My guess is that it should work ...
In 9.4 there are quite some severe conflicts, because 27846f02c176 was
not back-patched there. (The bug number "#8470" still floats in my
memory from time to time. Shudder)
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-Jun-13, Alvaro Herrera wrote:
On 2019-Jun-13, Oleksii Kliukin wrote:
Makes sense. For the symmetry I have included those that perform lock
upgrades in one session and those that doesn’t, while the other sessions
acquire locks, do updates or deletes. For those that don’t upgrade locks the
test checks that the locks are acquired in the correct order.Thanks for the updated patch! I'm about to push to branches 9.6-master.
It applies semi-cleanly (only pgindent-maturity whitespace conflicts).
Done, thanks for the report and patch!
I tried hard to find a scenario that this patch breaks, but couldn't
find anything.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2019-Jun-13, Alvaro Herrera wrote:
On 2019-Jun-13, Oleksii Kliukin wrote:
Makes sense. For the symmetry I have included those that perform lock
upgrades in one session and those that doesn’t, while the other sessions
acquire locks, do updates or deletes. For those that don’t upgrade locks the
test checks that the locks are acquired in the correct order.Thanks for the updated patch! I'm about to push to branches 9.6-master.
It applies semi-cleanly (only pgindent-maturity whitespace conflicts).Done, thanks for the report and patch!
I tried hard to find a scenario that this patch breaks, but couldn't
find anything.
Thank you very much for reviewing and committing it!
Cheers,
Oleksii