pgsql: Avoid spurious deadlocks when upgrading a tuple lock

Started by Alvaro Herreraover 6 years ago23 messages
#1Alvaro Herrera
alvherre@alvh.no-ip.org

Avoid spurious deadlocks when upgrading a tuple lock

When two (or more) transactions are waiting for transaction T1 to release a
tuple-level lock, and transaction T1 upgrades its lock to a higher level, a
spurious deadlock can be reported among the waiting transactions when T1
finishes. The simplest example case seems to be:

T1: select id from job where name = 'a' for key share;
Y: select id from job where name = 'a' for update; -- starts waiting for X
Z: select id from job where name = 'a' for key share;
T1: update job set name = 'b' where id = 1;
Z: update job set name = 'c' where id = 1; -- starts waiting for X
T1: rollback;

At this point, transaction Y is rolled back on account of a deadlock: Y
holds the heavyweight tuple lock and is waiting for the Xmax to be released,
while Z holds part of the multixact and tries to acquire the heavyweight
lock (per protocol) and goes to sleep; once X releases its part of the
multixact, Z is awakened only to be put back to sleep on the heavyweight
lock that Y is holding while sleeping. Kaboom.

This can be avoided by having Z skip the heavyweight lock acquisition. As
far as I can see, the biggest downside is that if there are multiple Z
transactions, the order in which they resume after X finishes is not
guaranteed.

Backpatch to 9.6. The patch applies cleanly on 9.5, but the new tests don't
work there (because isolationtester is not smart enough), so I'm not going
to risk it.

Author: Oleksii Kliukin
Discussion: /messages/by-id/B9C9D7CD-EB94-4635-91B6-E558ACEC0EC3@hintbits.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/de87a084c0a5ac927017cd0834b33a932651cfc9

Modified Files
--------------
src/backend/access/heap/README.tuplock | 10 ++
src/backend/access/heap/heapam.c | 84 +++++++++---
.../expected/tuplelock-upgrade-no-deadlock.out | 150 +++++++++++++++++++++
src/test/isolation/isolation_schedule | 1 +
.../specs/tuplelock-upgrade-no-deadlock.spec | 57 ++++++++
5 files changed, 281 insertions(+), 21 deletions(-)

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#1)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Avoid spurious deadlocks when upgrading a tuple lock

I'm now getting

heapam.c: In function 'heap_lock_tuple':
heapam.c:4041: warning: 'skip_tuple_lock' may be used uninitialized in this function

Please fix.

regards, tom lane

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

On 2019-Jun-14, Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Avoid spurious deadlocks when upgrading a tuple lock

I'm now getting

heapam.c: In function 'heap_lock_tuple':
heapam.c:4041: warning: 'skip_tuple_lock' may be used uninitialized in this function

Hm, I don't get that warning. Does this patch silence it, please?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#3)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2019-Jun-14, Tom Lane wrote:

I'm now getting
heapam.c: In function 'heap_lock_tuple':
heapam.c:4041: warning: 'skip_tuple_lock' may be used uninitialized in this function

Hm, I don't get that warning. Does this patch silence it, please?

Uh, no patch attached? But initializing the variable where it's
declared would certainly silence it.

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#4)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

I wrote:

Hm, I don't get that warning. Does this patch silence it, please?

Uh, no patch attached? But initializing the variable where it's
declared would certainly silence it.

BTW, after looking around a bit I wonder if this complaint isn't
exposing an actual logic bug. Shouldn't skip_tuple_lock have
a lifetime similar to first_time?

regards, tom lane

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#5)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

On 2019-Jun-14, Tom Lane wrote:

I wrote:

Hm, I don't get that warning. Does this patch silence it, please?

Uh, no patch attached? But initializing the variable where it's
declared would certainly silence it.

BTW, after looking around a bit I wonder if this complaint isn't
exposing an actual logic bug. Shouldn't skip_tuple_lock have
a lifetime similar to first_time?

I think you're right. I should come up with a test case that exercises
that case.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#5)
1 attachment(s)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

On 2019-Jun-14, Tom Lane wrote:

I wrote:

Hm, I don't get that warning. Does this patch silence it, please?

Uh, no patch attached? But initializing the variable where it's
declared would certainly silence it.

BTW, after looking around a bit I wonder if this complaint isn't
exposing an actual logic bug. Shouldn't skip_tuple_lock have
a lifetime similar to first_time?

I think there are worse problems here. I tried the attached isolation
spec. Note that the only difference in the two permutations is that s0
finishes earlier in one than the other; yet the first one works fine and
the second one hangs until killed by the 180s timeout. (s3 isn't
released for a reason I'm not sure I understand.)

I don't think I'm going to have time to investigate this deeply over the
weekend, so I think the safest course of action is to revert this for
next week's set.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

with_s0.spectext/plain; charset=us-asciiDownload
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#7)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2019-Jun-14, Tom Lane wrote:

BTW, after looking around a bit I wonder if this complaint isn't
exposing an actual logic bug. Shouldn't skip_tuple_lock have
a lifetime similar to first_time?

I think there are worse problems here. I tried the attached isolation
spec. Note that the only difference in the two permutations is that s0
finishes earlier in one than the other; yet the first one works fine and
the second one hangs until killed by the 180s timeout. (s3 isn't
released for a reason I'm not sure I understand.)

Ugh.

I don't think I'm going to have time to investigate this deeply over the
weekend, so I think the safest course of action is to revert this for
next week's set.

+1. This is an old bug, we don't have to improve it for this release.

regards, tom lane

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#7)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

On 2019-Jun-14, Alvaro Herrera wrote:

I think there are worse problems here. I tried the attached isolation
spec. Note that the only difference in the two permutations is that s0
finishes earlier in one than the other; yet the first one works fine and
the second one hangs until killed by the 180s timeout. (s3 isn't
released for a reason I'm not sure I understand.)

Actually, those behaviors both seem correct to me now that I look
closer. So this was a false alarm. In the code before de87a084c0, the
first permutation deadlocks, and the second permutation hangs. The only
behavior change is that the first one no longer deadlocks, which is the
desired change.

I'm still trying to form a case to exercise the case of skip_tuple_lock
having the wrong lifetime.

The fact that both permutations behave differently, even though the
only difference is where s0 commits relative to the s3_share step, is an
artifact of our unusual tuple locking implementation.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10Oleksii Kliukin
alexk@hintbits.com
In reply to: Alvaro Herrera (#9)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2019-Jun-14, Alvaro Herrera wrote:

I think there are worse problems here. I tried the attached isolation
spec. Note that the only difference in the two permutations is that s0
finishes earlier in one than the other; yet the first one works fine and
the second one hangs until killed by the 180s timeout. (s3 isn't
released for a reason I'm not sure I understand.)

Actually, those behaviors both seem correct to me now that I look
closer. So this was a false alarm. In the code before de87a084c0, the
first permutation deadlocks, and the second permutation hangs. The only
behavior change is that the first one no longer deadlocks, which is the
desired change.

I'm still trying to form a case to exercise the case of skip_tuple_lock
having the wrong lifetime.

Hm… I think it was an oversight from my part not to give skip_lock_tuple the
same lifetime as have_tuple_lock or first_time (also initializing it to
false at the same time). Even if now it might not break anything in an
obvious way, a backward jump to l3 label will leave skip_lock_tuple
uninitialized, making it very dangerous for any future code that will rely
on its value.

The fact that both permutations behave differently, even though the
only difference is where s0 commits relative to the s3_share step, is an
artifact of our unusual tuple locking implementation.

Cheers,
Oleksii

#11Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Oleksii Kliukin (#10)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

On 2019-Jun-16, Oleksii Kliukin wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2019-Jun-14, Alvaro Herrera wrote:

I think there are worse problems here. I tried the attached isolation
spec. Note that the only difference in the two permutations is that s0
finishes earlier in one than the other; yet the first one works fine and
the second one hangs until killed by the 180s timeout. (s3 isn't
released for a reason I'm not sure I understand.)

Actually, those behaviors both seem correct to me now that I look
closer. So this was a false alarm. In the code before de87a084c0, the
first permutation deadlocks, and the second permutation hangs. The only
behavior change is that the first one no longer deadlocks, which is the
desired change.

I'm still trying to form a case to exercise the case of skip_tuple_lock
having the wrong lifetime.

Hm… I think it was an oversight from my part not to give skip_lock_tuple the
same lifetime as have_tuple_lock or first_time (also initializing it to
false at the same time). Even if now it might not break anything in an
obvious way, a backward jump to l3 label will leave skip_lock_tuple
uninitialized, making it very dangerous for any future code that will rely
on its value.

But that's not the danger ... with the current coding, it's initialized
to false every time through that block; that means the tuple lock will
never be skipped if we jump back to l3. So the danger is that the first
iteration sets the variable, then jumps back; second iteration
initializes the variable again, so instead of skipping the lock, it
takes it, causing a spurious deadlock.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#11)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

On 2019-Jun-15, Alvaro Herrera wrote:

But that's not the danger ... with the current coding, it's initialized
to false every time through that block; that means the tuple lock will
never be skipped if we jump back to l3. So the danger is that the first
iteration sets the variable, then jumps back; second iteration
initializes the variable again, so instead of skipping the lock, it
takes it, causing a spurious deadlock.

So, I'm too lazy today to generate a case that fully reproduces the
deadlock, because you need to stall 's2' a little bit using the
well-known advisory lock trick, but this one hits the code that would
re-initialize the variable.

I'm going to push the change of lifetime of the variable for now.

setup
{
drop table if exists tlu_job;
create table tlu_job (id integer primary key, name text);

insert into tlu_job values(1, 'a');
}

teardown
{
drop table tlu_job;
}

session "s0"
setup { begin; set deadlock_timeout=1}
step "s0_fornokeyupdate" { select id from tlu_job where id = 1 for no key update; }
step "s0_update" { update tlu_job set name = 's0' where id = 1; }
step "s0_commit" { commit; }

session "s1"
setup { begin; set deadlock_timeout=1}
step "s1_for_key_share" { select id from tlu_job where id = 1 for key share; }
step "s1_for_update" { select id from tlu_job where id = 1 for update; }
step "s1_rollback" { rollback; }

session "s2"
setup { begin; set deadlock_timeout=1}
step "s2_for_key_share" { select id from tlu_job where id = 1 for key share; }
step "s2_for_share" { select id from tlu_job where id = 1 for share; }
step "s2_rollback" { rollback; }

session "s3"
setup { begin; set deadlock_timeout=1}
step "s3_update" { update tlu_job set name = 'c' where id = 1; }
step "s3_rollback" { rollback; }

permutation "s1_for_key_share" "s2_for_key_share" "s0_fornokeyupdate" "s2_for_share" "s0_update" "s0_commit" "s1_rollback" "s2_rollback" "s3_rollback"

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#12)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I'm going to push the change of lifetime of the variable for now.

If you're going to push anything before tomorrow's wrap, please do it
*now* not later. We're running out of time to get a full sample of
buildfarm results.

regards, tom lane

#14Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#13)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

On 2019-Jun-16, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I'm going to push the change of lifetime of the variable for now.

If you're going to push anything before tomorrow's wrap, please do it
*now* not later. We're running out of time to get a full sample of
buildfarm results.

Yeah, I had to bail out earlier today, so the only thing I'm confident
pushing now is a revert.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#14)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2019-Jun-16, Tom Lane wrote:

If you're going to push anything before tomorrow's wrap, please do it
*now* not later. We're running out of time to get a full sample of
buildfarm results.

Yeah, I had to bail out earlier today, so the only thing I'm confident
pushing now is a revert.

Yeah, let's do that. I don't want to risk shipping broken code.
We can try again for the next updates.

regards, tom lane

#16Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#15)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

On Sun, Jun 16, 2019 at 09:10:13PM -0400, Tom Lane wrote:

Yeah, let's do that. I don't want to risk shipping broken code.
We can try again for the next updates.

Could you revert asap please then?
--
Michael

#17Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#16)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

On 2019-Jun-17, Michael Paquier wrote:

On Sun, Jun 16, 2019 at 09:10:13PM -0400, Tom Lane wrote:

Yeah, let's do that. I don't want to risk shipping broken code.
We can try again for the next updates.

Could you revert asap please then?

Done.

I initially thought to keep the test in place, but then realized it
might be unstable, so I removed that too.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#17)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2019-Jun-17, Michael Paquier wrote:

Could you revert asap please then?

Done.

Thanks.

regards, tom lane

#19Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#18)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

On Sun, Jun 16, 2019 at 10:27:25PM -0400, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2019-Jun-17, Michael Paquier wrote:

Could you revert asap please then?

Done.

Thanks.

Thanks, Alvaro.
--
Michael

#20Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#12)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

On 2019-Jun-16, Alvaro Herrera wrote:

So, I'm too lazy today to generate a case that fully reproduces the
deadlock, because you need to stall 's2' a little bit using the
well-known advisory lock trick, but this one hits the code that would
re-initialize the variable.

Here's such a case. I was unable to reproduce the condition with a
smaller sequence of commands. This one does hit the deadlock when used
with the previous code, as expected; with the fixed code (ie.
skip_tuple_lock in the outer scope and same lifetime as "first_time")
then it works fine, no deadlock.

I'm going to push the fixed commit this afternoon, including this as an
additional permutation in the spec file.

setup
{
drop table if exists tlu_job;
create table tlu_job (id integer primary key, name text);

insert into tlu_job values(1, 'a');
}

teardown
{
drop table tlu_job;
}

session "s0"
setup { begin; }
step "s0_keyshare" { select id from tlu_job where id = 1 for key share; }
step "s0_share" { select id from tlu_job where id = 1 for share; }
step "s0_rollback" { rollback; }

session "s1"
setup { begin; }
step "s1_keyshare" { select id from tlu_job where id = 1 for key share; }
step "s1_savept_e" { savepoint s1_e; }
step "s1_share" { select id from tlu_job where id = 1 for share; }
step "s1_savept_f" { savepoint s1_f; }
step "s1_fornokeyupd" { select id from tlu_job where id = 1 for no key update; }
step "s1_rollback_f" { rollback to s1_f; }
step "s1_rollback_e" { rollback to s1_e; }
step "s1_rollback" { rollback; }

session "s2"
setup { begin; }
step "s2_keyshare" { select id from tlu_job where id = 1 for key share; }
step "s2_fornokeyupd" { select id from tlu_job where id = 1 for no key update; }
step "s2_rollback" { rollback; }

session "s3"
setup { begin; }
step "s3_for_update" { select id from tlu_job where id = 1 for update; }
step "s3_rollback" { rollback; }

permutation "s1_keyshare" "s3_for_update" "s2_keyshare" "s1_savept_e" "s1_share" "s1_savept_f" "s1_fornokeyupd" "s2_fornokeyupd" "s0_keyshare" "s1_rollback_f" "s0_share" "s1_rollback_e" "s1_rollback" "s2_rollback" "s0_rollback" "s3_rollback"

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#21Oleksii Kliukin
alexk@hintbits.com
In reply to: Alvaro Herrera (#11)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2019-Jun-16, Oleksii Kliukin wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2019-Jun-14, Alvaro Herrera wrote:

I think there are worse problems here. I tried the attached isolation
spec. Note that the only difference in the two permutations is that s0
finishes earlier in one than the other; yet the first one works fine and
the second one hangs until killed by the 180s timeout. (s3 isn't
released for a reason I'm not sure I understand.)

Actually, those behaviors both seem correct to me now that I look
closer. So this was a false alarm. In the code before de87a084c0, the
first permutation deadlocks, and the second permutation hangs. The only
behavior change is that the first one no longer deadlocks, which is the
desired change.

I'm still trying to form a case to exercise the case of skip_tuple_lock
having the wrong lifetime.

Hm… I think it was an oversight from my part not to give skip_lock_tuple the
same lifetime as have_tuple_lock or first_time (also initializing it to
false at the same time). Even if now it might not break anything in an
obvious way, a backward jump to l3 label will leave skip_lock_tuple
uninitialized, making it very dangerous for any future code that will rely
on its value.

But that's not the danger ... with the current coding, it's initialized
to false every time through that block; that means the tuple lock will
never be skipped if we jump back to l3. So the danger is that the first
iteration sets the variable, then jumps back; second iteration
initializes the variable again, so instead of skipping the lock, it
takes it, causing a spurious deadlock.

Sorry, I was confused, as I was looking only at
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=de87a084c0a5ac927017cd0834b33a932651cfc9

without taking your subsequent commit that silences compiler warnings at
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3da73d6839dc47f1f47ca57974bf28e5abd9b572
into consideration. With that commit, the danger is indeed in resetting the
skip mechanism on each jump and potentially causing deadlocks.

Cheers,
Oleksii

#22Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Oleksii Kliukin (#21)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

On 2019-Jun-18, Oleksii Kliukin wrote:

Sorry, I was confused, as I was looking only at
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=de87a084c0a5ac927017cd0834b33a932651cfc9

without taking your subsequent commit that silences compiler warnings at
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3da73d6839dc47f1f47ca57974bf28e5abd9b572
into consideration. With that commit, the danger is indeed in resetting the
skip mechanism on each jump and potentially causing deadlocks.

Yeah, I understand the confusion.

Anyway, as bugs go, this one seems pretty benign. It would result in a
unexplained deadlock, very rarely, and only for people who use a very
strange locking pattern that includes (row-level) lock upgrades. I
think it also requires aborted savepoints too, though I don't rule out
the possibility that there might be a way to reproduce this without
that.

I pushed the patch again just now, with the new permutation.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#23Oleksii Kliukin
alexk@hintbits.com
In reply to: Alvaro Herrera (#22)
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2019-Jun-18, Oleksii Kliukin wrote:

Sorry, I was confused, as I was looking only at
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=de87a084c0a5ac927017cd0834b33a932651cfc9

without taking your subsequent commit that silences compiler warnings at
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3da73d6839dc47f1f47ca57974bf28e5abd9b572
into consideration. With that commit, the danger is indeed in resetting the
skip mechanism on each jump and potentially causing deadlocks.

Yeah, I understand the confusion.

Anyway, as bugs go, this one seems pretty benign. It would result in a
unexplained deadlock, very rarely, and only for people who use a very
strange locking pattern that includes (row-level) lock upgrades. I
think it also requires aborted savepoints too, though I don't rule out
the possibility that there might be a way to reproduce this without
that.

I pushed the patch again just now, with the new permutation.

Thank you very much for working on it and committing the fix!

Cheers,
Oleksii