BUG #17686: SELECT pg_advisory_lock(...) with low lock_timeout sometimes times out but the lock is acquired
The following bug has been logged on the website:
Bug reference: 17686
Logged by: Mike Adelson
Email address: mike.adelson314@gmail.com
PostgreSQL version: 12.2
Operating system: Windows 10
Description:
I am trying to use the pg_advisory_lock function in combination with setting
lock_timeout to wait on the lock with a timeout.
Here is my query:
```
SET LOCAL statement_timeout = 0;
SET LOCAL lock_timeout = 200;
SELECT pg_catalog.pg_advisory_lock(12345)
```
I'm finding that with relatively small values of lock_timeout and when the
system is under load (e.g. 8 connections acquiring concurrently), I will
encounter a case where the query exits with state 55P03 (lock_not_available)
and yet the lock was actually acquired (I can tell it has been acquired by
querying pg_locks and because other connections' calls to pg_advisory_lock
block).
Here's a standalone repro application which reliably recreates the behavior
for me: https://github.com/Tzachi009/distributed-locks-issue
More discussion here: https://github.com/madelson/DistributedLock/issues/147
PG Bug reporting form <noreply@postgresql.org> writes:
I'm finding that with relatively small values of lock_timeout and when the
system is under load (e.g. 8 connections acquiring concurrently), I will
encounter a case where the query exits with state 55P03 (lock_not_available)
and yet the lock was actually acquired (I can tell it has been acquired by
querying pg_locks and because other connections' calls to pg_advisory_lock
block).
Yeah, there's a fairly basic race condition there, which is that the
lock might get granted to us immediately after the timeout fires.
In ordinary usage this isn't very problematic because the lock would
get released anyway during the transaction abort resulting from the
timeout error. However, when you're asking for a session-level
advisory lock, that doesn't happen.
I spent a little time studying whether there's a way to close the race,
but if it's possible at all it'd take major restructuring of what's
already quite complex and delicate code. I can't get excited about
putting such effort into it, because related problems will exist
no matter what: if you get an error from the pg_advisory_lock command,
was the lock granted before that error occurred? I don't think there's
any bulletproof way to deal with that except to check the lock status
afterwards (e.g., via pg_locks). It's very much like the inherent issue
with errors late in COMMIT --- there's an atomic point where the commit
is logged, but the error report isn't going to be very clear about
whether we got past that, if indeed the error report gets to the client
at all.
regards, tom lane
<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style></head><body lang=EN-US link=blue vlink="#954F72" style='word-wrap:break-word'><div class=WordSection1><p class=MsoNormal>Thanks for looking into this Tom,<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Glad to have the race condition confirmed. In my view the “real” fix here would be to offer an advisory locking function that accepts a timeout. FWIW, similar functionality on other RDBMS systems (SQL Server, MySQL, Oracle) offers this. That way, users wouldn’t have to rely in SET LOCAL lock_timeout to get this functionality. Thoughts? Is there a backlog item for this or has the idea been considered previously?<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>-Mike</p><p class=MsoNormal><o:p> </o:p></p><div style='mso-element:para-border-div;border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal style='border:none;padding:0in'><b>From: </b><a href="mailto:tgl@sss.pgh.pa.us">Tom Lane</a><br><b>Sent: </b>Saturday, November 12, 2022 11:47 AM<br><b>To: </b><a href="mailto:mike.adelson314@gmail.com">mike.adelson314@gmail.com</a><br><b>Cc: </b><a href="mailto:pgsql-bugs@lists.postgresql.org">pgsql-bugs@lists.postgresql.org</a><br><b>Subject: </b>Re: BUG #17686: SELECT pg_advisory_lock(...) with low lock_timeout sometimes times out but the lock is acquired</p></div><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>PG Bug reporting form <noreply@postgresql.org> writes:</p><p class=MsoNormal>> I'm finding that with relatively small values of lock_timeout and when the</p><p class=MsoNormal>> system is under load (e.g. 8 connections acquiring concurrently), I will</p><p class=MsoNormal>> encounter a case where the query exits with state 55P03 (lock_not_available)</p><p class=MsoNormal>> and yet the lock was actually acquired (I can tell it has been acquired by</p><p class=MsoNormal>> querying pg_locks and because other connections' calls to pg_advisory_lock</p><p class=MsoNormal>> block).</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Yeah, there's a fairly basic race condition there, which is that the</p><p class=MsoNormal>lock might get granted to us immediately after the timeout fires.</p><p class=MsoNormal>In ordinary usage this isn't very problematic because the lock would</p><p class=MsoNormal>get released anyway during the transaction abort resulting from the</p><p class=MsoNormal>timeout error. However, when you're asking for a session-level</p><p class=MsoNormal>advisory lock, that doesn't happen.</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>I spent a little time studying whether there's a way to close the race,</p><p class=MsoNormal>but if it's possible at all it'd take major restructuring of what's</p><p class=MsoNormal>already quite complex and delicate code. I can't get excited about</p><p class=MsoNormal>putting such effort into it, because related problems will exist</p><p class=MsoNormal>no matter what: if you get an error from the pg_advisory_lock command,</p><p class=MsoNormal>was the lock granted before that error occurred? I don't think there's</p><p class=MsoNormal>any bulletproof way to deal with that except to check the lock status</p><p class=MsoNormal>afterwards (e.g., via pg_locks). It's very much like the inherent issue</p><p class=MsoNormal>with errors late in COMMIT --- there's an atomic point where the commit</p><p class=MsoNormal>is logged, but the error report isn't going to be very clear about</p><p class=MsoNormal>whether we got past that, if indeed the error report gets to the client</p><p class=MsoNormal>at all.</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal> regards, tom lane</p><p class=MsoNormal><o:p> </o:p></p></div></body></html>
Mike Adelson <mike.adelson314@gmail.com> writes:
Glad to have the race condition confirmed. In my view the “real” fix
here would be to offer an advisory locking function that accepts a
timeout.
How would that remove the low-level race condition?
regards, tom lane
<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style></head><body lang=EN-US link=blue vlink="#954F72" style='word-wrap:break-word'><div class=WordSection1><p class=MsoNormal>Hi Tom,</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Sorry to clarify it wouldn’t fix the race condition, but it would allow me to achieve my goal of “wait on an advisory lock until a timeout has elapsed” without requiring the use of a non-infinite lock_timeout, thereby avoiding the race condition altogether.</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>-Mike</p><p class=MsoNormal><o:p> </o:p></p><div style='mso-element:para-border-div;border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal style='border:none;padding:0in'><b>From: </b><a href="mailto:tgl@sss.pgh.pa.us">Tom Lane</a><br><b>Sent: </b>Saturday, November 12, 2022 1:51 PM<br><b>To: </b><a href="mailto:mike.adelson314@gmail.com">Mike Adelson</a><br><b>Cc: </b><a href="mailto:pgsql-bugs@lists.postgresql.org">pgsql-bugs@lists.postgresql.org</a><br><b>Subject: </b>Re: BUG #17686: SELECT pg_advisory_lock(...) with low lock_timeout sometimes times out but the lock is acquired</p></div><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Mike Adelson <mike.adelson314@gmail.com> writes:</p><p class=MsoNormal>> Glad to have the race condition confirmed. In my view the “real” fix</p><p class=MsoNormal>> here would be to offer an advisory locking function that accepts a</p><p class=MsoNormal>> timeout.</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>How would that remove the low-level race condition?</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal> regards, tom lane</p><p class=MsoNormal><o:p> </o:p></p></div></body></html>