the s_lock_stuck on perform_spin_delay

Started by Andy Fanabout 2 years ago50 messages
Jump to latest
#1Andy Fan
zhihui.fan1213@gmail.com

Hi,

from src/backend/storage/lmgr/README:

"""
Spinlocks. These are intended for *very* short-term locks. If a lock
is to be held more than a few dozen instructions, or across any sort of
kernel call (or even a call to a nontrivial subroutine), don't use a
spinlock. Spinlocks are primarily used as infrastructure for lightweight
locks.
"""

I totally agree with this and IIUC spin lock is usually used with the
following functions.

#define init_local_spin_delay(status) ..
void perform_spin_delay(SpinDelayStatus *status);
void finish_spin_delay(SpinDelayStatus *status);

During the perform_spin_delay, we have the following codes:

void
perform_spin_delay(SpinDelayStatus *status)

/* Block the process every spins_per_delay tries */
if (++(status->spins) >= spins_per_delay)
{
if (++(status->delays) > NUM_DELAYS)
s_lock_stuck(status->file, status->line, status->func);

the s_lock_stuck will PAINC the entire system.

My question is if someone doesn't obey the rule by mistake (everyone
can make mistake), shall we PANIC on a production environment? IMO I
think it can be a WARNING on a production environment and be a stuck
when 'ifdef USE_ASSERT_CHECKING'.

People may think spin lock may consume too much CPU, but it is not true
in the discussed scene since perform_spin_delay have pg_usleep in it,
and the MAX_DELAY_USEC is 1 second and MIN_DELAY_USEC is 0.001s.

I notice this issue actually because of the patch "Cache relation
sizes?" from Thomas Munro [1], where the latest patch[2] still have the 
following code. 
+		sr = smgr_alloc_sr();  <-- HERE a spin lock is hold
+
+		/* Upgrade to exclusive lock so we can create a mapping. */
+		LWLockAcquire(mapping_lock, LW_EXCLUSIVE); <-- HERE a complex
  operation is needed. it may take a long time.

Our internal testing system found more misuses on our own PG version.

I think a experienced engineer like Thomas can make this mistake and the
patch was reviewed by 3 peoples, the bug is still there. It is not easy
to say just don't do it.

the attached code show the prototype in my mind. Any feedback is welcome.

[1]: /messages/by-id/CA+hUKGJg+gqCs0dgo94L=1J9pDp5hKkotji9A05k2nhYQhF4+w@mail.gmail.com
/messages/by-id/CA+hUKGJg+gqCs0dgo94L=1J9pDp5hKkotji9A05k2nhYQhF4+w@mail.gmail.com
[2]: /messages/by-id/attachment/123659/v5-0001-WIP-Track-relation-sizes-in-shared-memory.patch
/messages/by-id/attachment/123659/v5-0001-WIP-Track-relation-sizes-in-shared-memory.patch

--
Best Regards
Andy Fan

Attachments:

v1-0001-Don-t-call-s_lock_stuck-in-production-environment.patchtext/x-diffDownload+9-1
#2Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Andy Fan (#1)
Re: the s_lock_stuck on perform_spin_delay

On Thu, 4 Jan 2024 at 08:09, Andy Fan <zhihuifan1213@163.com> wrote:

My question is if someone doesn't obey the rule by mistake (everyone
can make mistake), shall we PANIC on a production environment? IMO I
think it can be a WARNING on a production environment and be a stuck
when 'ifdef USE_ASSERT_CHECKING'.
[...]
I think a experienced engineer like Thomas can make this mistake and the
patch was reviewed by 3 peoples, the bug is still there. It is not easy
to say just don't do it.

the attached code show the prototype in my mind. Any feedback is welcome.

While I understand your point and could maybe agree with the change
itself (a crash isn't great), I don't think it's an appropriate fix
for the problem of holding a spinlock while waiting for a LwLock, as
spin.h specifically mentions the following (and you quoted the same):

"""
Keep in mind the coding rule that spinlocks must not be held for more
than a few instructions.
"""

I suspect that we'd be better off with stronger protections against
waiting for LwLocks while we hold any spin lock. More specifically,
I'm thinking about something like tracking how many spin locks we
hold, and Assert()-ing that we don't hold any such locks when we start
to wait for an LwLock or run CHECK_FOR_INTERRUPTS-related code (with
potential manual contextual overrides in specific areas of code where
specific care has been taken to make it safe to hold spin locks while
doing those operations - although I consider their existence unlikely
I can't rule them out as I've yet to go through all lock-touching
code). This would probably work in a similar manner as
START_CRIT_SECTION/END_CRIT_SECTION.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

#3Robert Haas
robertmhaas@gmail.com
In reply to: Andy Fan (#1)
Re: the s_lock_stuck on perform_spin_delay

On Thu, Jan 4, 2024 at 2:09 AM Andy Fan <zhihuifan1213@163.com> wrote:

My question is if someone doesn't obey the rule by mistake (everyone
can make mistake), shall we PANIC on a production environment? IMO I
think it can be a WARNING on a production environment and be a stuck
when 'ifdef USE_ASSERT_CHECKING'.

People may think spin lock may consume too much CPU, but it is not true
in the discussed scene since perform_spin_delay have pg_usleep in it,
and the MAX_DELAY_USEC is 1 second and MIN_DELAY_USEC is 0.001s.

I notice this issue actually because of the patch "Cache relation
sizes?" from Thomas Munro [1], where the latest patch[2] still have the
following code.
+               sr = smgr_alloc_sr();  <-- HERE a spin lock is hold
+
+               /* Upgrade to exclusive lock so we can create a mapping. */
+               LWLockAcquire(mapping_lock, LW_EXCLUSIVE); <-- HERE a complex
operation is needed. it may take a long time.

I'm not sure that the approach this patch takes is correct in detail,
but I kind of agree with you about the overall point. I mean, the idea
of the PANIC is to avoid having the system just sit there in a state
from which it will never recover ... but it can also have the effect
of killing a system that wasn't really dead. I'm not sure what the
best thing to do here is, but it's worth talking about, IMHO.

--
Robert Haas
EDB: http://www.enterprisedb.com

#4Andy Fan
zhihui.fan1213@gmail.com
In reply to: Matthias van de Meent (#2)
Re: the s_lock_stuck on perform_spin_delay

Hi Matthias and Robert,

Matthias van de Meent <boekewurm+postgres@gmail.com> writes:

On Thu, 4 Jan 2024 at 08:09, Andy Fan <zhihuifan1213@163.com> wrote:

My question is if someone doesn't obey the rule by mistake (everyone
can make mistake), shall we PANIC on a production environment? IMO I
think it can be a WARNING on a production environment and be a stuck
when 'ifdef USE_ASSERT_CHECKING'.
[...]
I think a experienced engineer like Thomas can make this mistake and the
patch was reviewed by 3 peoples, the bug is still there. It is not easy
to say just don't do it.

the attached code show the prototype in my mind. Any feedback is welcome.

While I understand your point and could maybe agree with the change
itself (a crash isn't great),

It's great that both of you agree that the crash is not great.

I don't think it's an appropriate fix
for the problem of holding a spinlock while waiting for a LwLock, as
spin.h specifically mentions the following (and you quoted the same):

"""
Keep in mind the coding rule that spinlocks must not be held for more
than a few instructions.
"""

Yes, I agree that the known [LW]LockAcquire after holding a Spin lock
should be fixed at the first chance rather than pander to it with my
previous patch. My previous patch just take care of the *unknown*
cases (and I cced thomas in the hope that he knows the bug). I also
agree that the special case about [LW]LockAcquire should be detected
more effective as you suggested below. So v2 comes and commit 2 is for
this suggestion.

I suspect that we'd be better off with stronger protections against
waiting for LwLocks while we hold any spin lock. More specifically,
I'm thinking about something like tracking how many spin locks we
hold, and Assert()-ing that we don't hold any such locks when we start
to wait for an LwLock or run CHECK_FOR_INTERRUPTS-related code (with
potential manual contextual overrides in specific areas of code where
specific care has been taken to make it safe to hold spin locks while
doing those operations - although I consider their existence unlikely
I can't rule them out as I've yet to go through all lock-touching
code). This would probably work in a similar manner as
START_CRIT_SECTION/END_CRIT_SECTION.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

--
Best Regards
Andy Fan

Attachments:

v2-0001-Don-t-call-s_lock_stuck-in-production-environment.patchtext/x-diffDownload+9-1
v2-0002-Disallow-LW-LockAcquire-when-holding-a-spin-lock-.patchtext/x-diffDownload+29-3
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#3)
Re: the s_lock_stuck on perform_spin_delay

Robert Haas <robertmhaas@gmail.com> writes:

I'm not sure that the approach this patch takes is correct in detail,
but I kind of agree with you about the overall point. I mean, the idea
of the PANIC is to avoid having the system just sit there in a state
from which it will never recover ... but it can also have the effect
of killing a system that wasn't really dead. I'm not sure what the
best thing to do here is, but it's worth talking about, IMHO.

I'm not a fan of adding overhead to such a performance-critical
thing as spinlocks in order to catch coding errors that are easily
detectable statically. IMV the correct usage of spinlocks is that
they should only be held across *short, straight line* code segments.
We should be making an effort to ban coding patterns like
"return with spinlock still held", because they're just too prone
to errors similar to this one. Note that trying to take another
lock is far from the only bad thing that can happen if you're
not very conservative about what code can execute with a spinlock
held.

regards, tom lane

#6Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#5)
Re: the s_lock_stuck on perform_spin_delay

On Thu, Jan 4, 2024 at 10:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm not a fan of adding overhead to such a performance-critical
thing as spinlocks in order to catch coding errors that are easily
detectable statically. IMV the correct usage of spinlocks is that
they should only be held across *short, straight line* code segments.
We should be making an effort to ban coding patterns like
"return with spinlock still held", because they're just too prone
to errors similar to this one. Note that trying to take another
lock is far from the only bad thing that can happen if you're
not very conservative about what code can execute with a spinlock
held.

I agree that we don't want to add overhead, and also about how
spinlocks should be used, but I dispute "easily detectable
statically." I mean, if you or I look at some code that uses a
spinlock, we'll know whether the pattern that you mention is being
followed or not, modulo differences of opinion in debatable cases. But
you and I cannot be there to look at all the code all the time. If we
had a static checking tool that was run as part of every build, or in
the buildfarm, or by cfbot, or somewhere else that raised the alarm if
this rule was violated, then we could claim to be effectively
enforcing this rule. But with 20-30 active committers and ~100 active
developers at any given point in time, any system that relies on every
relevant person knowing all the rules and remembering to enforce them
on every commit is bound to be less than 100% effective. Some people
won't know what the rule is, some people will decide that their
particular situation is Very Special, some people will just forget to
check for violations, and some people will check for violations but
miss something.

I think the question we should be asking here is what the purpose of
the PANIC is. I can think of two possible purposes. It could be either
(a) an attempt to prevent real-world harm by turning database hangs
into database panics, so that at least the system will restart and get
moving again instead of sitting there stuck for all eternity or (b) an
attempt to punish people for writing bad code by turning coding rule
violations into panics on production systems. If it's (a), that's
defensible, though we can still ask whether it does more harm than
good. If it's (b), that's not a good way of handling that problem,
because (b1) it affects production builds and not just development
builds, (b2) many coding rule violations are vanishingly unlikely to
trigger that PANIC in practice, and (b3) if the PANIC does fire, it
gives you basically zero help in figuring out where the actual problem
is. The PostgreSQL code base is way too big for "ERROR: you screwed
up" to be an acceptable diagnostic.

--
Robert Haas
EDB: http://www.enterprisedb.com

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#6)
Re: the s_lock_stuck on perform_spin_delay

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jan 4, 2024 at 10:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

We should be making an effort to ban coding patterns like
"return with spinlock still held", because they're just too prone
to errors similar to this one.

I agree that we don't want to add overhead, and also about how
spinlocks should be used, but I dispute "easily detectable
statically." I mean, if you or I look at some code that uses a
spinlock, we'll know whether the pattern that you mention is being
followed or not, modulo differences of opinion in debatable cases. But
you and I cannot be there to look at all the code all the time. If we
had a static checking tool that was run as part of every build, or in
the buildfarm, or by cfbot, or somewhere else that raised the alarm if
this rule was violated, then we could claim to be effectively
enforcing this rule.

I was indeed suggesting that maybe we could find a way to detect
such things automatically. While I've not been paying close
attention, I recall there's been some discussions of using LLVM/clang
infrastructure for customized static analysis, so maybe it'd be
possible without an undue amount of effort.

I think the question we should be asking here is what the purpose of
the PANIC is. I can think of two possible purposes. It could be either
(a) an attempt to prevent real-world harm by turning database hangs
into database panics, so that at least the system will restart and get
moving again instead of sitting there stuck for all eternity or (b) an
attempt to punish people for writing bad code by turning coding rule
violations into panics on production systems.

I believe it's (a). No matter what the reason for a stuck spinlock
is, the only reliable method of getting out of the problem is to
blow things up and start over. The patch proposed at the top of this
thread would leave the system unable to recover on its own, with the
only recourse being for the DBA to manually force a crash/restart ...
once she figured out that that was necessary, which might take a long
while if the only external evidence is an occasional WARNING that
might not even be making it to the postmaster log. How's that better?

... (b3) if the PANIC does fire, it
gives you basically zero help in figuring out where the actual problem
is. The PostgreSQL code base is way too big for "ERROR: you screwed
up" to be an acceptable diagnostic.

Ideally I agree with the latter, but that doesn't mean that doing
better is easy or even possible. (The proposed patch certainly does
nothing to help diagnose such issues.) As for the former point,
panicking here at least offers the chance of getting a stack trace,
which might help a developer find the problem.

regards, tom lane

#8Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#7)
Re: the s_lock_stuck on perform_spin_delay

On Thu, Jan 4, 2024 at 11:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I believe it's (a). No matter what the reason for a stuck spinlock
is, the only reliable method of getting out of the problem is to
blow things up and start over. The patch proposed at the top of this
thread would leave the system unable to recover on its own, with the
only recourse being for the DBA to manually force a crash/restart ...
once she figured out that that was necessary, which might take a long
while if the only external evidence is an occasional WARNING that
might not even be making it to the postmaster log. How's that better?

It's a fair question. I think you're correct if we assume that
everyone's following the coding rule ... at least assuming that the
target system isn't too insanely slow, and I've seen some pretty
crazily overloaded machines. But if the coding rule is not being
followed, then "the only reliable method of getting out of the problem
is to blow things up and start over" becomes a dubious conclusion.

Also, I wonder if many or even all uses of spinlocks uses ought to be
replaced with either LWLocks or atomics. An LWLock might be slightly
slower when contention is low, but it scales better when contention is
high, displays a wait event so that you can see that you have
contention if you do, and doesn't PANIC the system if the contention
gets too bad. And an atomic will just be faster, in the cases where
it's adequate.

The trouble with trying to do a replacement is that some of the
spinlock-using code is ancient and quite hairy. info_lck in particular
looks like a hot mess -- it's used in complex ways and in performance
critical paths, with terrifying comments like this:

* To read XLogCtl->LogwrtResult, you must hold either info_lck or
* WALWriteLock. To update it, you need to hold both locks. The point of
* this arrangement is that the value can be examined by code that already
* holds WALWriteLock without needing to grab info_lck as well. In addition
* to the shared variable, each backend has a private copy of LogwrtResult,
* which is updated when convenient.
*
* The request bookkeeping is simpler: there is a shared XLogCtl->LogwrtRqst
* (protected by info_lck), but we don't need to cache any copies of it.
*
* info_lck is only held long enough to read/update the protected variables,
* so it's a plain spinlock. The other locks are held longer (potentially
* over I/O operations), so we use LWLocks for them. These locks are:

But info_lck was introduced in 1999 and this scheme was introduced in
2012, and a lot has changed since then. Whatever benchmarking was done
to validate this locking regime is probably obsolete at this point.
Back then, LWLocks were built on top of spinlocks, and were, I
believe, a lot slower than they are now. Plus CPU performance
characteristics have changed a lot. So who really knows if the way
we're doing things here makes any sense at all these days? But one
doesn't want to do a naive replacement and pessimize things, either.

--
Robert Haas
EDB: http://www.enterprisedb.com

#9Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#8)
Re: the s_lock_stuck on perform_spin_delay

Hi,

On 2024-01-04 12:04:07 -0500, Robert Haas wrote:

On Thu, Jan 4, 2024 at 11:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I believe it's (a). No matter what the reason for a stuck spinlock
is, the only reliable method of getting out of the problem is to
blow things up and start over. The patch proposed at the top of this
thread would leave the system unable to recover on its own, with the
only recourse being for the DBA to manually force a crash/restart ...
once she figured out that that was necessary, which might take a long
while if the only external evidence is an occasional WARNING that
might not even be making it to the postmaster log. How's that better?

It's a fair question. I think you're correct if we assume that
everyone's following the coding rule ... at least assuming that the
target system isn't too insanely slow, and I've seen some pretty
crazily overloaded machines. But if the coding rule is not being
followed, then "the only reliable method of getting out of the problem
is to blow things up and start over" becomes a dubious conclusion.

If the coding rule isn't being followed, a crash restart is the least of ones
problems... But that doesn't mean we shouldn't add infrastructure to make it
easier to detect violations of the spinlock rules - we've had lots of buglets
around this over the years ourselves, so we hardly can expect extension
authors to get this right. Particularly because we don't even document the
rules well, afair.

I think we should add cassert-only infrastructure tracking whether we
currently hold spinlocks, are in a signal handler and perhaps a few other
states. That'd allow us to add assertions like:

- no memory allocations when holding spinlocks or in signal handlers
- no lwlocks while holding spinlocks
- no lwlocks or spinlocks while in signal handlers

Also, I wonder if many or even all uses of spinlocks uses ought to be
replaced with either LWLocks or atomics. An LWLock might be slightly
slower when contention is low, but it scales better when contention is
high, displays a wait event so that you can see that you have
contention if you do, and doesn't PANIC the system if the contention
gets too bad. And an atomic will just be faster, in the cases where
it's adequate.

I tried to replace all - unfortunately the results were not great. The problem
isn't primarily the lack of spinning (although it might be worth adding that
to lwlocks) or the cost of error recovery, the problem is that a reader-writer
lock are inherently more expensive than simpler locks that don't have multiple
levels.

One example of such increased overhead is that on x86 an lwlock unlock has to
be an atomic operation (to maintain the lock count), whereas as spinlock
unlock can just be a write + compiler barrier. Unfortunately the added atomic
operation turns out to matter in some performance critical cases like the
insertpos_lck.

I think we ought to split lwlocks into reader/writer and simpler mutex. The
simpler mutex still will be slower than s_lock in some relevant cases,
e.g. due to the error recovery handling, but it'd be "local" overhead, rather
than something affecting scalability.

FWIW, these days spinlocks do report a wait event when in perform_spin_delay()
- albeit without detail which lock is being held.

Greetings,

Andres Freund

#10Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#7)
Re: the s_lock_stuck on perform_spin_delay

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 1/4/24 10:33 AM, Tom Lane wrote:<br>
</div>
<blockquote type="cite" cite="mid:498328.1704386013@sss.pgh.pa.us">
<pre><pre class="moz-quote-pre" wrap="">Robert Haas <a
class="moz-txt-link-rfc2396E" href="mailto:robertmhaas@gmail.com"
moz-do-not-send="true">&lt;robertmhaas@gmail.com&gt;</a> writes:
</pre><blockquote type="cite" style="color: #007cff;"><pre
class="moz-quote-pre" wrap="">On Thu, Jan 4, 2024 at 10:22 AM Tom Lane <a
class="moz-txt-link-rfc2396E" href="mailto:tgl@sss.pgh.pa.us"
moz-do-not-send="true">&lt;tgl@sss.pgh.pa.us&gt;</a> wrote:
</pre><blockquote type="cite" style="color: #007cff;"><pre
class="moz-quote-pre" wrap="">We should be making an effort to ban coding patterns like
"return with spinlock still held", because they're just too prone
to errors similar to this one.
</pre></blockquote></blockquote><blockquote type="cite"
style="color: #007cff;"><pre class="moz-quote-pre" wrap="">I agree that we don't want to add overhead, and also about how
spinlocks should be used, but I dispute "easily detectable
statically." I mean, if you or I look at some code that uses a
spinlock, we'll know whether the pattern that you mention is being
followed or not, modulo differences of opinion in debatable cases. But
you and I cannot be there to look at all the code all the time. If we
had a static checking tool that was run as part of every build, or in
the buildfarm, or by cfbot, or somewhere else that raised the alarm if
this rule was violated, then we could claim to be effectively
enforcing this rule.
</pre></blockquote><pre class="moz-quote-pre" wrap="">I was indeed suggesting that maybe we could find a way to detect
such things automatically. While I've not been paying close
attention, I recall there's been some discussions of using LLVM/clang
infrastructure for customized static analysis, so maybe it'd be
possible without an undue amount of effort.</pre></pre>
</blockquote>
<p>FWIW, the lackey[1] tool in Valgrind is able to do some kinds of
instruction counting, so it might be possible to measure how many
instructions are actualyl being executed while holding a spinlock.
Might be easier than code analysis.</p>
<p>Another possibility might be using the CPUs timestamp counter.<br>
</p>
<p>1: <a class="moz-txt-link-freetext" href="https://valgrind.org/docs/manual/lk-manual.html&quot;&gt;https://valgrind.org/docs/manual/lk-manual.html&lt;/a&gt;&lt;br&gt;
</p>
<pre class="moz-signature" cols="72">--
Jim Nasby, Data Architect, Austin TX</pre>
</body>
</html>

#11Andres Freund
andres@anarazel.de
In reply to: Andy Fan (#1)
Re: the s_lock_stuck on perform_spin_delay

Hi,

On 2024-01-04 14:59:06 +0800, Andy Fan wrote:

My question is if someone doesn't obey the rule by mistake (everyone
can make mistake), shall we PANIC on a production environment? IMO I
think it can be a WARNING on a production environment and be a stuck
when 'ifdef USE_ASSERT_CHECKING'.

[...]

I notice this issue actually because of the patch "Cache relation
sizes?" from Thomas Munro [1], where the latest patch[2] still have the
following code.
+		sr = smgr_alloc_sr();  <-- HERE a spin lock is hold
+
+		/* Upgrade to exclusive lock so we can create a mapping. */
+		LWLockAcquire(mapping_lock, LW_EXCLUSIVE); <-- HERE a complex
operation is needed. it may take a long time.

Our internal testing system found more misuses on our own PG version.

I think a experienced engineer like Thomas can make this mistake and the
patch was reviewed by 3 peoples, the bug is still there. It is not easy
to say just don't do it.

I don't follow this argument - just ignoring the problem, which emitting a
WARNING basically is, doesn't reduce the impact of the bug, it *increases* the
impact, because now the system will not recover from the bug without explicit
operator intervention. During that time the system might be essentially
unresponsive, because all backends end up contending for some spinlock, which
makes investigating such issues very hard.

I think we should add infrastructure to detect bugs like this during
development, but not PANICing when this happens in production seems completely
non-viable.

Greetings,

Andres Freund

#12Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andres Freund (#11)
Re: the s_lock_stuck on perform_spin_delay

Hi,

Andres Freund <andres@anarazel.de> writes:

On 2024-01-04 14:59:06 +0800, Andy Fan wrote:

My question is if someone doesn't obey the rule by mistake (everyone
can make mistake), shall we PANIC on a production environment? IMO I
think it can be a WARNING on a production environment and be a stuck
when 'ifdef USE_ASSERT_CHECKING'.

[...]

I notice this issue actually because of the patch "Cache relation
sizes?" from Thomas Munro [1], where the latest patch[2] still have the
following code.
+		sr = smgr_alloc_sr();  <-- HERE a spin lock is hold
+
+		/* Upgrade to exclusive lock so we can create a mapping. */
+		LWLockAcquire(mapping_lock, LW_EXCLUSIVE); <-- HERE a complex
operation is needed. it may take a long time.

Our internal testing system found more misuses on our own PG version.

I think a experienced engineer like Thomas can make this mistake and the
patch was reviewed by 3 peoples, the bug is still there. It is not easy
to say just don't do it.

I don't follow this argument - just ignoring the problem,

I agree with you but I'm feeling you ignored my post at [1]/messages/by-id/871qaxp3ly.fsf@163.com, where I
said for the known issue, it should be fixed at the first chance.

which emitting a
WARNING basically is, doesn't reduce the impact of the bug, it *increases* the
impact, because now the system will not recover from the bug without explicit
operator intervention. During that time the system might be essentially
unresponsive, because all backends end up contending for some spinlock, which
makes investigating such issues very hard.

Acutally they are doing pg_usleep at the most time.

Besides what Robert said, one more reason to question PANIC is that: PAINC
can't always make the system recovery faster because: a). In the most
system, PANIC makes a core dump which take times and spaces. b). After
the reboot, all the caches like relcache, plancache, fdcache need to be
rebuit. c). Customer needs to handle failure better or else they will be
hurt *more often*. All of such sense cause slowness as well.

I think we should add infrastructure to detect bugs like this during
development,

The commit 2 in [1]/messages/by-id/871qaxp3ly.fsf@163.com does something like this. for the details, I missed the
check for memory allocation case as you suggested at [2]/messages/by-id/20240104225403.dgmbbfffmm3srpgq@awork3.anarazel.de, but checked
heavyweight lock as well. others should be same IIUC.

but not PANICing when this happens in production seems completely
non-viable.

Not sure what does *this* exactly means. If it means the bug in Thomas's
patch, I absoluately agree with you(since it is a known bug and it
should be fixed). If it means the general *unknown* case, it's something
we talked above.

I'm also agree that some LLVM static checker should be pretty good
ideally, it just requires more knowledge base and future maintain
effort. I am willing to have a try shortly.

[1]: /messages/by-id/871qaxp3ly.fsf@163.com
[2]: /messages/by-id/20240104225403.dgmbbfffmm3srpgq@awork3.anarazel.de
/messages/by-id/20240104225403.dgmbbfffmm3srpgq@awork3.anarazel.de

--
Best Regards
Andy Fan

#13Andres Freund
andres@anarazel.de
In reply to: Jim Nasby (#10)
Re: the s_lock_stuck on perform_spin_delay

Hi,

On 2024-01-04 17:03:18 -0600, Jim Nasby wrote:

On 1/4/24 10:33 AM, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jan 4, 2024 at 10:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

We should be making an effort to ban coding patterns like
"return with spinlock still held", because they're just too prone
to errors similar to this one.

I agree that we don't want to add overhead, and also about how
spinlocks should be used, but I dispute "easily detectable
statically." I mean, if you or I look at some code that uses a
spinlock, we'll know whether the pattern that you mention is being
followed or not, modulo differences of opinion in debatable cases. But
you and I cannot be there to look at all the code all the time. If we
had a static checking tool that was run as part of every build, or in
the buildfarm, or by cfbot, or somewhere else that raised the alarm if
this rule was violated, then we could claim to be effectively
enforcing this rule.

I was indeed suggesting that maybe we could find a way to detect
such things automatically. While I've not been paying close
attention, I recall there's been some discussions of using LLVM/clang
infrastructure for customized static analysis, so maybe it'd be
possible without an undue amount of effort.

I played around with this a while back. One reference with a link to a
playground to experiment with attributes:
/messages/by-id/20200616233105.sm5bvodo6unigno7@alap3.anarazel.de

Unfortunately clang's thread safety analysis doesn't handle conditionally
acquired locks, which made it far less promising than I initially thought.

I think there might be some other approaches, but they will all suffer from
not understanding "danger" encountered indirectly, via function calls doing
dangerous things. Which we would like to exclude, but I don't think that's
trivial either.

FWIW, the lackey[1] tool in Valgrind is able to do some kinds of instruction
counting, so it might be possible to measure how many instructions are actualyl
being executed while holding a spinlock. Might be easier than code analysis.

I don't think that's particularly promising. Lackey is *slow*. And it requires
actually reaching problematic states. Consider e.g. the case that was reported
upthread, an lwlock acquired within a spinlock protected section - most of the
time that's not going to result in a lot of cycles, because the lwlock is
free.

Greetings,

Andres Freund

#14Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#11)
Re: the s_lock_stuck on perform_spin_delay

On Thu, Jan 4, 2024 at 6:06 PM Andres Freund <andres@anarazel.de> wrote:

I think we should add infrastructure to detect bugs like this during
development, but not PANICing when this happens in production seems completely
non-viable.

I mean +1 for the infrastructure, but "completely non-viable"? Why?

I've only very rarely seen this PANIC occur, and in the few cases
where I've seen it, it was entirely unclear that the problem was due
to a bug where somebody failed to release a spinlock. It seemed more
likely that the machine was just not really functioning, and the PANIC
was a symptom of processes not getting scheduled rather than a PG bug.
And every time I tell a user that they might need to use a debugger
to, say, set VacuumCostActive = false, or to get a backtrace, or any
other reason, I have to tell them to make sure to detach the debugger
in under 60 seconds, because in the unlikely event that they attach
while the process is holding a spinlock, failure to detach in under 60
seconds will take their production system down for no reason. Now, if
you're about to say that people shouldn't need to use a debugger on
their production instance, I entirely agree ... but in the world I
inhabit, that's often the only way to solve a customer problem, and it
probably will continue to be until we have much better ways of getting
backtraces without using a debugger than is currently the case.

Have you seen real cases where this PANIC prevents a hangup? If yes,
that PANIC traced back to a bug in PostgreSQL? And why didn't the user
just keep hitting the same bug over and PANICing in an endless loop?

I feel like this is one of those things that has just been this way
forever and we don't question it because it's become an article of
faith that it's something we have to have. But I have a very hard time
explaining why it's even a net positive, let alone the unquestionable
good that you seem to think.

--
Robert Haas
EDB: http://www.enterprisedb.com

#15Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#14)
Re: the s_lock_stuck on perform_spin_delay

Hi,

On 2024-01-05 08:51:53 -0500, Robert Haas wrote:

On Thu, Jan 4, 2024 at 6:06 PM Andres Freund <andres@anarazel.de> wrote:

I think we should add infrastructure to detect bugs like this during
development, but not PANICing when this happens in production seems completely
non-viable.

I mean +1 for the infrastructure, but "completely non-viable"? Why?

I've only very rarely seen this PANIC occur, and in the few cases
where I've seen it, it was entirely unclear that the problem was due
to a bug where somebody failed to release a spinlock.

I see it fairly regularly. Including finding several related bugs that lead to
stuck systems last year (signal handlers are a menace).

It seemed more likely that the machine was just not really functioning, and
the PANIC was a symptom of processes not getting scheduled rather than a PG
bug.

If processes don't get scheduled for that long a crash-restart doesn't seem
that bad anymore :)

And every time I tell a user that they might need to use a debugger to, say,
set VacuumCostActive = false, or to get a backtrace, or any other reason, I
have to tell them to make sure to detach the debugger in under 60 seconds,
because in the unlikely event that they attach while the process is holding
a spinlock, failure to detach in under 60 seconds will take their production
system down for no reason.

Hm - isn't the stuck lock timeout more like 900s (MAX_DELAY_USEC * NUM_DELAYS
= 1000s, but we start at a lower delay)? One issue with the code as-is is
that interrupted sleeps count towards to the timeout, despite possibly
sleeping much shorter. We should probably fix that, and also report the time
the lock was stuck for in s_lock_stuck().

Now, if you're about to say that people shouldn't need to use a debugger on
their production instance, I entirely agree ... but in the world I inhabit,
that's often the only way to solve a customer problem, and it probably will
continue to be until we have much better ways of getting backtraces without
using a debugger than is currently the case.

Have you seen real cases where this PANIC prevents a hangup? If yes,
that PANIC traced back to a bug in PostgreSQL? And why didn't the user
just keep hitting the same bug over and PANICing in an endless loop?

Many, as hinted above. Some bugs in postgres, more bugs in extensions. IME
these bugs aren't hit commonly, so a crash-restart at least allows to hobble
along. The big issue with not crash-restarting is that often the system ends
up inaccessible, which makes it very hard to investigate the issue.

I feel like this is one of those things that has just been this way
forever and we don't question it because it's become an article of
faith that it's something we have to have. But I have a very hard time
explaining why it's even a net positive, let alone the unquestionable
good that you seem to think.

I don't think it's an unquestionable good, I just think the alternative of
just endlessly spinning is way worse.

Greetings,

Andres Freund

#16Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#15)
Re: the s_lock_stuck on perform_spin_delay

On Fri, Jan 5, 2024 at 2:11 PM Andres Freund <andres@anarazel.de> wrote:

I see it fairly regularly. Including finding several related bugs that lead to
stuck systems last year (signal handlers are a menace).

In that case, I think this proposal is dead. I can't personally
testify to this code being a force for good, but it sounds like you
can. So be it!

--
Robert Haas
EDB: http://www.enterprisedb.com

#17Andres Freund
andres@anarazel.de
In reply to: Andy Fan (#12)
Re: the s_lock_stuck on perform_spin_delay

Hi,

On 2024-01-05 10:20:39 +0800, Andy Fan wrote:

Andres Freund <andres@anarazel.de> writes:

On 2024-01-04 14:59:06 +0800, Andy Fan wrote:

My question is if someone doesn't obey the rule by mistake (everyone
can make mistake), shall we PANIC on a production environment? IMO I
think it can be a WARNING on a production environment and be a stuck
when 'ifdef USE_ASSERT_CHECKING'.

[...]

I notice this issue actually because of the patch "Cache relation
sizes?" from Thomas Munro [1], where the latest patch[2] still have the
following code.
+		sr = smgr_alloc_sr();  <-- HERE a spin lock is hold
+
+		/* Upgrade to exclusive lock so we can create a mapping. */
+		LWLockAcquire(mapping_lock, LW_EXCLUSIVE); <-- HERE a complex
operation is needed. it may take a long time.

Our internal testing system found more misuses on our own PG version.

I think a experienced engineer like Thomas can make this mistake and the
patch was reviewed by 3 peoples, the bug is still there. It is not easy
to say just don't do it.

I don't follow this argument - just ignoring the problem,

I agree with you but I'm feeling you ignored my post at [1], where I
said for the known issue, it should be fixed at the first chance.

With "ignoring the problem" I was referencing emitting a WARNING instead of
crash-restart.

IME stuck spinlocks are caused by issues like not releasing a spinlock,
possibly due to returning early due to an error or such, having lock-nesting
issues leading to deadlocks, acquiring spinlocks or lwlocks in signal
handlers, blocking in signal handlers while holding a spinlock outside of the
signal handers and many variations of those. The common theme of these causes
is that they don't resolve after some time. The only way out of the situation
is to crash-restart, either "automatically" or by operator intervention.

which emitting a
WARNING basically is, doesn't reduce the impact of the bug, it *increases* the
impact, because now the system will not recover from the bug without explicit
operator intervention. During that time the system might be essentially
unresponsive, because all backends end up contending for some spinlock, which
makes investigating such issues very hard.

Acutally they are doing pg_usleep at the most time.

Sure - but that changes nothing about the problem. The concern isn't CPU
usage, the concern is that there's often no possible forward progress. To take
a recent-ish production issue I looked at, a buggy signal handler lead to a
backend sleeping while holding a spinlock. Soon after the entire system got
stuck, because they also acquired the spinlock. The person initially
investigating the issue at first contacted me because they couldn't even log
into the system, because connection establishment also acquired the spinlock
(I'm not sure anymore which spinlock it was, possibly xlog.c's info_lck?).

but not PANICing when this happens in production seems completely
non-viable.

Not sure what does *this* exactly means. If it means the bug in Thomas's
patch, I absoluately agree with you(since it is a known bug and it
should be fixed). If it means the general *unknown* case, it's something
we talked above.

I mean that I think that not PANICing anymore would be a seriously bad idea
and cause way more problems than the PANIC.

Greetings,

Andres Freund

#18Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#16)
Re: the s_lock_stuck on perform_spin_delay

Hi,

On 2024-01-05 14:19:23 -0500, Robert Haas wrote:

On Fri, Jan 5, 2024 at 2:11 PM Andres Freund <andres@anarazel.de> wrote:

I see it fairly regularly. Including finding several related bugs that lead to
stuck systems last year (signal handlers are a menace).

In that case, I think this proposal is dead. I can't personally
testify to this code being a force for good, but it sounds like you
can. So be it!

I think the proposal to make it a WARNING shouldn't go anywhere, but I think
there are several improvements that could come out of this discussion:

- assertion checks against doing dangerous stuff
- compile time help for detecting bad stuff without hitting it at runtime
- make the stuck lock message more informative, e.g. by including the length
of time the lock was stuck for
- make sure that interrupts can't trigger the stuck lock much quicker, which
afaict can happen today

Greetings,

Andres Freund

#19Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andres Freund (#18)
Re: the s_lock_stuck on perform_spin_delay

Hi,

On 2024-01-05 14:19:23 -0500, Robert Haas wrote:

On Fri, Jan 5, 2024 at 2:11 PM Andres Freund <andres@anarazel.de> wrote:

I see it fairly regularly. Including finding several related bugs that lead to
stuck systems last year (signal handlers are a menace).

In that case, I think this proposal is dead. I can't personally
testify to this code being a force for good, but it sounds like you
can. So be it!

I think the proposal to make it a WARNING shouldn't go anywhere,

OK, I give up the WARNING method as well.

but I think
there are several improvements that could come out of this discussion:

- assertion checks against doing dangerous stuff
- make the stuck lock message more informative, e.g. by including the length
of time the lock was stuck for

Could you check the attached to see if it is something similar in your
mind?

commit e264da3050285cffd4885637ee97b2326d2f3938 SHOULD **NOT** BE COMMITTED.
Author: yizhi.fzh <yizhi.fzh@alibaba-inc.com>
Date: Sun Jan 7 15:06:14 2024 +0800

simple code to prove previously commit works.

commit 80cf987d1abe2cdae195bd5eea520e28142885b4
Author: yizhi.fzh <yizhi.fzh@alibaba-inc.com>
Date: Thu Jan 4 22:19:50 2024 +0800

Detect more misuse of spin lock automatically

spin lock are intended for *very* short-term locks, but it is possible
to be misused in many cases. e.g. Acquiring another LWLocks or regular
locks, memory allocation. In this patch, all of such cases will be
automatically detected in an ASSERT_CHECKING build.

Signal handle should be avoided when holding a spin lock because it is
nearly impossible to release the spin lock correctly if that happens.

Luckly after applying the patch, there is no failure when run 'make
check-world'.

- make sure that interrupts can't trigger the stuck lock much quicker, which
afaict can happen today

I can't follow this, do you mind explain more about this a bit?

--
Best Regards
Andy Fan

Attachments:

v3-0001-Detect-more-misuse-of-spin-lock-automatically.patchtext/x-diffDownload+53-11
v3-0002-simple-code-to-prove-previously-commit-works.patchtext/x-diffDownload+9-1
#20Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andres Freund (#9)
Re: the s_lock_stuck on perform_spin_delay

Hi!

I think we should add cassert-only infrastructure tracking whether we
currently hold spinlocks, are in a signal handler and perhaps a few other
states. That'd allow us to add assertions like:

..

- no lwlocks or ... while in signal handlers

I *wish* lwlocks should *not* be held while in signal handlers since it
inspired me for a direction of a low-frequency internal bug where a
backend acuqire a LWLock when it has acuqired it before. However when I
read more document and code, I am thinking this should not be a
problem.

per: src/backend/storage/lmgr/README

"""
LWLock manager will automatically release held LWLocks during elog()
recovery, so it is safe to raise an error while holding LWLocks.
"""

The code shows us after we acquire a LWLock, such information will be
added into a global variable named held_lwlocks, and whenever we want to
release all the them, we can just call LWLockReleaseAll.
LWLockReleaseAll is called in AbortTransaction, AbortSubTransaction,
ProcKill, AuxiliaryProcKill and so on. the code is same with what the
README said. So suppose we have some codes like this:

LWLockAcquire(...);
CHECK_FOR_INTERRUPTS();
LWLockRelease();

Even we got ERROR/FATAL in the CHECK_FOR_INTERRUPTS, I think the LWLock
are suppose to be released because of the above statement. Am I missing
anything?

--
Best Regards
Andy Fan

#21Robert Haas
robertmhaas@gmail.com
In reply to: Andy Fan (#20)
#22Andy Fan
zhihui.fan1213@gmail.com
In reply to: Robert Haas (#21)
#23Robert Haas
robertmhaas@gmail.com
In reply to: Andy Fan (#22)
#24Andy Fan
zhihui.fan1213@gmail.com
In reply to: Robert Haas (#23)
#25Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Andy Fan (#24)
#26Robert Haas
robertmhaas@gmail.com
In reply to: Matthias van de Meent (#25)
#27Andy Fan
zhihui.fan1213@gmail.com
In reply to: Matthias van de Meent (#25)
#28Andy Fan
zhihui.fan1213@gmail.com
In reply to: Robert Haas (#26)
#29Robert Haas
robertmhaas@gmail.com
In reply to: Andy Fan (#27)
#30Andy Fan
zhihui.fan1213@gmail.com
In reply to: Robert Haas (#29)
#31Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#30)
#32Robert Haas
robertmhaas@gmail.com
In reply to: Andy Fan (#31)
#33Andy Fan
zhihui.fan1213@gmail.com
In reply to: Robert Haas (#32)
#34Robert Haas
robertmhaas@gmail.com
In reply to: Andy Fan (#33)
#35Andy Fan
zhihui.fan1213@gmail.com
In reply to: Robert Haas (#34)
#36Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#35)
#37Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#36)
#38Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andy Fan (#37)
#39Robert Haas
robertmhaas@gmail.com
In reply to: Andy Fan (#38)
#40Andy Fan
zhihui.fan1213@gmail.com
In reply to: Robert Haas (#39)
#41Robert Haas
robertmhaas@gmail.com
In reply to: Andy Fan (#40)
#42Andy Fan
zhihui.fan1213@gmail.com
In reply to: Robert Haas (#41)
#43Robert Haas
robertmhaas@gmail.com
In reply to: Andy Fan (#42)
#44Andy Fan
zhihui.fan1213@gmail.com
In reply to: Robert Haas (#43)
#45Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#34)
#46Andres Freund
andres@anarazel.de
In reply to: Andy Fan (#38)
#47Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andres Freund (#46)
#48Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andres Freund (#45)
#49Andres Freund
andres@anarazel.de
In reply to: Andy Fan (#47)
#50Andy Fan
zhihui.fan1213@gmail.com
In reply to: Andres Freund (#49)