[PATCH] Identify LWLocks in tracepoints
Hi all
The attached patch set follows on from the discussion in [1]/messages/by-id/CAGRY4nz=SEs3qc1R6xD3max7sg3kS-L81eJk2aLUWSQAeAFJTA@mail.gmail.com . "Add LWLock
blocker(s) information" by adding the actual LWLock* and the numeric
tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.
This does not provide complete information on blockers, because it's not
necessarily valid to compare any two LWLock* pointers between two process
address spaces. The locks could be in DSM segments, and those DSM segments
could be mapped at different addresses.
I wasn't able to work out a sensible way to map a LWLock* to any sort of
(tranche-id, lock-index) because there's no requirement that locks in a
tranche be contiguous or known individually to the lmgr.
Despite that, the patches improve the information available for LWLock
analysis significantly.
Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be
fired from LWLockWaitForVar, despite that function never actually acquiring
the lock.
Patch 2 adds the tranche id and lock pointer for each trace hit. This makes
it possible to differentiate between individual locks within a tranche, and
(so long as they aren't tranches in a DSM segment) compare locks between
processes. That means you can do lock-order analysis etc, which was not
previously especially feasible. Traces also don't have to do userspace
reads for the tranche name all the time, so the trace can run with lower
overhead.
Patch 3 adds a single-path tracepoint for all lock acquires and releases,
so you only have to probe the lwlock__acquired and lwlock__release events
to see all acquires/releases, whether conditional or otherwise. It also
adds start markers that can be used for timing wallclock duration of LWLock
acquires/releases.
Patch 4 adds some comments on LWLock tranches to try to address some points
I found confusing and hard to understand when investigating this topic.
[1]: /messages/by-id/CAGRY4nz=SEs3qc1R6xD3max7sg3kS-L81eJk2aLUWSQAeAFJTA@mail.gmail.com .
/messages/by-id/CAGRY4nz=SEs3qc1R6xD3max7sg3kS-L81eJk2aLUWSQAeAFJTA@mail.gmail.com
.
Attachments:
v1-0004-Comments-on-LWLock-tranches.patchapplication/x-patch; name=v1-0004-Comments-on-LWLock-tranches.patchDownload+46-4
v1-0001-Remove-bogus-lwlock__acquire-tracepoint-from-LWLo.patchapplication/x-patch; name=v1-0001-Remove-bogus-lwlock__acquire-tracepoint-from-LWLo.patchDownload+0-3
v1-0002-Pass-the-target-LWLock-and-tranche-ID-to-LWLock-t.patchapplication/x-patch; name=v1-0002-Pass-the-target-LWLock-and-tranche-ID-to-LWLock-t.patchDownload+32-22
v1-0003-Add-to-the-tracepoints-in-LWLock-routines.patchapplication/x-patch; name=v1-0003-Add-to-the-tracepoints-in-LWLock-routines.patchDownload+32-1
Hi Craig,
On Sat, Dec 19, 2020 at 2:00 PM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:
Hi all
The attached patch set follows on from the discussion in [1] "Add LWLock blocker(s) information" by adding the actual LWLock* and the numeric tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.
This does not provide complete information on blockers, because it's not necessarily valid to compare any two LWLock* pointers between two process address spaces. The locks could be in DSM segments, and those DSM segments could be mapped at different addresses.
I wasn't able to work out a sensible way to map a LWLock* to any sort of (tranche-id, lock-index) because there's no requirement that locks in a tranche be contiguous or known individually to the lmgr.
Despite that, the patches improve the information available for LWLock analysis significantly.
Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be fired from LWLockWaitForVar, despite that function never actually acquiring the lock.
Patch 2 adds the tranche id and lock pointer for each trace hit. This makes it possible to differentiate between individual locks within a tranche, and (so long as they aren't tranches in a DSM segment) compare locks between processes. That means you can do lock-order analysis etc, which was not previously especially feasible. Traces also don't have to do userspace reads for the tranche name all the time, so the trace can run with lower overhead.
Patch 3 adds a single-path tracepoint for all lock acquires and releases, so you only have to probe the lwlock__acquired and lwlock__release events to see all acquires/releases, whether conditional or otherwise. It also adds start markers that can be used for timing wallclock duration of LWLock acquires/releases.
Patch 4 adds some comments on LWLock tranches to try to address some points I found confusing and hard to understand when investigating this topic.
You sent in your patch to pgsql-hackers on Dec 19, but you did not
post it to the next CommitFest[1]https://commitfest.postgresql.org/31/. If this was intentional, then you
need to take no action. However, if you want your patch to be
reviewed as part of the upcoming CommitFest, then you need to add it
yourself before 2021-01-01 AoE[2]https://en.wikipedia.org/wiki/Anywhere_on_Earth. Thanks for your contributions.
Regards,
[1]: https://commitfest.postgresql.org/31/
[2]: https://en.wikipedia.org/wiki/Anywhere_on_Earth
--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/
On Mon, 28 Dec 2020 at 20:09, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Hi Craig,
On Sat, Dec 19, 2020 at 2:00 PM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:Hi all
The attached patch set follows on from the discussion in [1] "Add LWLock
blocker(s) information" by adding the actual LWLock* and the numeric
tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.This does not provide complete information on blockers, because it's not
necessarily valid to compare any two LWLock* pointers between two process
address spaces. The locks could be in DSM segments, and those DSM segments
could be mapped at different addresses.I wasn't able to work out a sensible way to map a LWLock* to any sort of
(tranche-id, lock-index) because there's no requirement that locks in a
tranche be contiguous or known individually to the lmgr.Despite that, the patches improve the information available for LWLock
analysis significantly.
Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be
fired from LWLockWaitForVar, despite that function never actually acquiring
the lock.Patch 2 adds the tranche id and lock pointer for each trace hit. This
makes it possible to differentiate between individual locks within a
tranche, and (so long as they aren't tranches in a DSM segment) compare
locks between processes. That means you can do lock-order analysis etc,
which was not previously especially feasible. Traces also don't have to do
userspace reads for the tranche name all the time, so the trace can run
with lower overhead.Patch 3 adds a single-path tracepoint for all lock acquires and
releases, so you only have to probe the lwlock__acquired and
lwlock__release events to see all acquires/releases, whether conditional or
otherwise. It also adds start markers that can be used for timing wallclock
duration of LWLock acquires/releases.Patch 4 adds some comments on LWLock tranches to try to address some
points I found confusing and hard to understand when investigating this
topic.You sent in your patch to pgsql-hackers on Dec 19, but you did not
post it to the next CommitFest[1]. If this was intentional, then you
need to take no action. However, if you want your patch to be
reviewed as part of the upcoming CommitFest, then you need to add it
yourself before 2021-01-01 AoE[2]. Thanks for your contributions.Regards,
[1] https://commitfest.postgresql.org/31/
[2] https://en.wikipedia.org/wiki/Anywhere_on_Earth
Thanks.
CF entry created at https://commitfest.postgresql.org/32/2927/ . I don't
think it's urgent and will have limited review time so I didn't try to
wedge it into the current CF.
On Sat, 19 Dec 2020 at 13:00, Craig Ringer <craig.ringer@enterprisedb.com>
wrote:
Hi all
The attached patch set follows on from the discussion in [1] "Add LWLock
blocker(s) information" by adding the actual LWLock* and the numeric
tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.
I've attached a systemtap script that makes use of the information exported
by the enhanced LWLock tracepoints. It offers something akin to dynamic
-DLWLOCK_STATS with automatic statistical aggregation and some selective
LWLOCK_DEBUG output.
The script isn't super pretty. I didn't try to separate event-data
collection from results output, and there's some duplication in places. But
it gives you an idea what's possible when we report lock pointers and
tranche IDs to tracepoints and add entry/exit tracepoints.
Key features:
* Collect statistical aggregates on lwlock hold and wait durations across
all processes. Stats are grouped by lockmode (shared or exclusive) and by
tranche name, as well as rollup stats across all tranches.
* Report lock wait and hold durations for each process when that process
exits. Again, reported by mode and tranche name.
* For long lock waits, print the waiter pid and waiting pid, along with
each process's backend type and application_name if known, the acquire
mode, and the acquire function
The output is intended to be human-readable, but it'd be quite simple to
convert it into raw tsv-style output suitable for ingestion into
statistical postprocessing or graphing tools.
It should be fairly easy to break down the stats by acquire function if
desired, so LWLockAcquire(), LWLockWaitForVar(), and LWLockAcquireOrWait()
are reported separately. They're combined in the current output.
Capturing the current query string is pretty simple if needed, but I didn't
think it was likely to be especially useful.
Sample output for a pg_regress run attached. Abridged version follows. Here
the !!W!! lines are "waited a long time", the !!H!! lines are "held a long
time". Then [pid]:MyBackendType tranche_name wait_time_us (wait_time) in
wait_func (appliation_name) => [blocker_pid] (blocker_application_name) .
If blocker pid wasn't identified it won't be reported - I know how to fix
that and will do so soon.
!!W!! [ 93030]:3 BufferContent 12993 (0m0.012993s) in
lwlock__acquire__start (pg_regress/text)
!!W!! [ 93036]:3 LockManager 14540 (0m0.014540s) in
lwlock__acquire__start (pg_regress/float8) => [ 93045] (pg_regress/regproc)
!!W!! [ 93035]:3 BufferContent 12608 (0m0.012608s) in
lwlock__acquire__start (pg_regress/float4) => [ 93034] (pg_regress/oid)
!!W!! [ 93036]:3 LockManager 10301 (0m0.010301s) in
lwlock__acquire__start (pg_regress/float8)
!!W!! [ 93043]:3 LockManager 10356 (0m0.010356s) in
lwlock__acquire__start (pg_regress/pg_lsn)
!!H!! [ 93033]:3 BufferContent 20579 (0m0.020579s)
(pg_regress/int8)
!!W!! [ 93027]:3 BufferContent 10766 (0m0.010766s) in
lwlock__acquire__start (pg_regress/char) => [ 93037] (pg_regress/bit)
!!W!! [ 93036]:3 OidGen 12876 (0m0.012876s) in
lwlock__acquire__start (pg_regress/float8)
...
Then the summary rollup at the end of the run. This can also be output
periodically during the run. Abbreviated for highlights:
wait locks: all procs tranche mode count
total avg variance min max
W LW_EXCLUSIVE (all) E 54185
14062734 259 1850265 1 44177
W LW_SHARED (all) S 3668
1116022 304 1527261 2 18642
held locks: all procs tranche mode count
total avg variance min max
H LW_EXCLUSIVE (all) E 10438060
153077259 14 37035 1 195043
H LW_SHARED (all) S 14199902
65466934 4 5318 1 44030
all procs by tranche tranche mode count
total avg variance min max
W tranche (all) S 3668
1116022 304 1527261 2 18642
W tranche (all) E 54185
14062734 259 1850265 1 44177
W tranche WALInsert E 9839
2393229 243 1180294 2 14209
W tranche BufferContent E 3012
1726543 573 3869409 2 28186
W tranche BufferContent S 1664
657855 395 2185694 2 18642
W tranche LockFastPath E 28314
6327801 223 1278053 1 26133
W tranche LockFastPath S 87
59401 682 3703217 19 9454
W tranche LockManager E 7223
2764392 382 2514863 2 44177
Hope this is interesting to someone.
On Sat, Dec 19, 2020 at 01:00:01PM +0800, Craig Ringer wrote:
The attached patch set follows on from the discussion in [1] "Add LWLock
blocker(s) information" by adding the actual LWLock* and the numeric
tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.This does not provide complete information on blockers, because it's not
necessarily valid to compare any two LWLock* pointers between two process
address spaces. The locks could be in DSM segments, and those DSM segments
could be mapped at different addresses.I wasn't able to work out a sensible way to map a LWLock* to any sort of
(tranche-id, lock-index) because there's no requirement that locks in a
tranche be contiguous or known individually to the lmgr.Despite that, the patches improve the information available for LWLock
analysis significantly.
Thanks for the patches, this could be indeed useful. I've looked through
and haven't noticed any issues with either the tracepoint extensions or
commentaries, except that I find it is not that clear how trance_id
indicates a re-initialization here?
/* Re-initialization of individual LWLocks is not permitted */
Assert(tranche_id >= NUM_INDIVIDUAL_LWLOCKS || !IsUnderPostmaster);
Patch 2 adds the tranche id and lock pointer for each trace hit. This makes
it possible to differentiate between individual locks within a tranche, and
(so long as they aren't tranches in a DSM segment) compare locks between
processes. That means you can do lock-order analysis etc, which was not
previously especially feasible.
I'm curious in which kind of situations lock-order analysis could be
helpful?
Traces also don't have to do userspace reads for the tranche name all
the time, so the trace can run with lower overhead.
This one is also interesting. Just for me to clarify, wouldn't there be
a bit of overhead anyway (due to switching from kernel context to user
space when a tracepoint was hit) that will mask name read overhead? Or
are there any available numbers about it?
On 2020-12-19 06:00, Craig Ringer wrote:
Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be
fired from LWLockWaitForVar, despite that function never actually
acquiring the lock.
This was added in 68a2e52bbaf when LWLockWaitForVar() was first
introduced. It looks like a mistake to me too, but maybe Heikki wants
to comment.
On Wed, 13 Jan 2021 at 19:19, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
On Sat, Dec 19, 2020 at 01:00:01PM +0800, Craig Ringer wrote:
The attached patch set follows on from the discussion in [1] "Add LWLock
blocker(s) information" by adding the actual LWLock* and the numeric
tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.This does not provide complete information on blockers, because it's not
necessarily valid to compare any two LWLock* pointers between two process
address spaces. The locks could be in DSM segments, and those DSMsegments
could be mapped at different addresses.
I wasn't able to work out a sensible way to map a LWLock* to any sort of
(tranche-id, lock-index) because there's no requirement that locks in a
tranche be contiguous or known individually to the lmgr.Despite that, the patches improve the information available for LWLock
analysis significantly.Thanks for the patches, this could be indeed useful. I've looked through
and haven't noticed any issues with either the tracepoint extensions or
commentaries, except that I find it is not that clear how trance_id
indicates a re-initialization here?/* Re-initialization of individual LWLocks is not permitted */
Assert(tranche_id >= NUM_INDIVIDUAL_LWLOCKS || !IsUnderPostmaster);
There should be no reason for anything to call LWLockInitialize(...) on an
individual LWLock, since they are all initialized during postmaster startup.
Doing so must be a bug.
But that's a trivial change that can be done separately.
Patch 2 adds the tranche id and lock pointer for each trace hit. This
makes
it possible to differentiate between individual locks within a tranche,
and
(so long as they aren't tranches in a DSM segment) compare locks between
processes. That means you can do lock-order analysis etc, which was not
previously especially feasible.I'm curious in which kind of situations lock-order analysis could be
helpful?
If code-path 1 does
LWLockAcquire(LockA, LW_EXCLUSIVE);
...
LWLockAcquire(LockB, LW_EXCLUSIVE);
and code-path 2 does:
LWLockAcquire(LockB, LW_EXCLUSIVE);
...
LWLockAcquire(LockA, LW_EXCLUSIVE);
then they're subject to deadlock. But you might not actually hit that often
in test workloads if the timing required for the deadlock to occur is tight
and/or occurs on infrequent operations.
It's not always easy to reason about or prove things about lock order when
they're potentially nested deep within many layers of other calls and
callbacks. Obviously something we try to avoid with LWLocks, but not
impossible.
If you trace a workload and derive all possible nestings of lock acquire
order, you can then prove things about whether there are any possible
ordering conflicts and where they might arise.
A PoC to do so is on my TODO.
Traces also don't have to do userspace reads for the tranche name all
the time, so the trace can run with lower overhead.
This one is also interesting. Just for me to clarify, wouldn't there be
a bit of overhead anyway (due to switching from kernel context to user
space when a tracepoint was hit) that will mask name read overhead? Or
are there any available numbers about it?
I don't have numbers on that. Whether it matters will depend way too much
on how you're using the probe points and collecting/consuming the data
anyway.
It's a bit unfortunate (IMO) that we make a function call for each
tracepoint invocation to get the tranche names. Ideally I'd prefer to be
able to omit the tranche names lookups for these probes entirely for
something as hot as LWLocks. But it's a bit of a pain to look up the
tranche names from an external trace tool, so instead I'm inclined to see
if we can enable systemtap's semaphores and only compute the tranche name
if the target probe is actually enabled. But that'd be separate to this
patch and require a build change in how systemtap support is compiled and
linked.
BTW, a user->kernel->user context switch only occurs when the trace tool's
probes use kernel space - such as for perf based probes, or for systemtap's
kernel-runtime probes. The same markers can be used by e.g. systemtap's
"dyninst" runtime that runs entirely in userspace.
On Thu, 14 Jan 2021 at 15:56, Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:
On 2020-12-19 06:00, Craig Ringer wrote:
Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be
fired from LWLockWaitForVar, despite that function never actually
acquiring the lock.This was added in 68a2e52bbaf when LWLockWaitForVar() was first
introduced. It looks like a mistake to me too, but maybe Heikki wants
to comment.
I'm certain it's a copy/paste bug.
On 2021-01-14 09:39, Craig Ringer wrote:
On Thu, 14 Jan 2021 at 15:56, Peter Eisentraut
<peter.eisentraut@enterprisedb.com
<mailto:peter.eisentraut@enterprisedb.com>> wrote:On 2020-12-19 06:00, Craig Ringer wrote:
Patch 1 fixes a bogus tracepoint where an lwlock__acquire event
would be
fired from LWLockWaitForVar, despite that function never actually
acquiring the lock.This was added in 68a2e52bbaf when LWLockWaitForVar() was first
introduced. It looks like a mistake to me too, but maybe Heikki wants
to comment.I'm certain it's a copy/paste bug.
I have committed that patch.
On 1/22/21 6:02 AM, Peter Eisentraut wrote:
On 2021-01-14 09:39, Craig Ringer wrote:
On Thu, 14 Jan 2021 at 15:56, Peter Eisentraut
<peter.eisentraut@enterprisedb.com
<mailto:peter.eisentraut@enterprisedb.com>> wrote:On 2020-12-19 06:00, Craig Ringer wrote:
> Patch 1 fixes a bogus tracepoint where an lwlock__acquire event
would be
> fired from LWLockWaitForVar, despite that function never actually
> acquiring the lock.This was added in 68a2e52bbaf when LWLockWaitForVar() was first
introduced. It looks like a mistake to me too, but maybe Heikki
wants
to comment.I'm certain it's a copy/paste bug.
I have committed that patch.
This patch set no longer applies:
http://cfbot.cputube.org/patch_32_2927.log.
Can we get a rebase? Also marked Waiting on Author.
Regards,
--
-David
david@pgmasters.net
On Wed, 3 Mar 2021 at 20:50, David Steele <david@pgmasters.net> wrote:
On 1/22/21 6:02 AM, Peter Eisentraut wrote:
This patch set no longer applies:
http://cfbot.cputube.org/patch_32_2927.log.Can we get a rebase? Also marked Waiting on Author.
Rebased as requested.
I'm still interested in whether Andres will be able to do anything about
identifying LWLocks in a cross-backend manner. But this work doesn't really
depend on that; it'd benefit from it, but would be easily adapted to it
later if needed.
Attachments:
0001-Pass-the-target-LWLock-and-tranche-ID-to-LWLock-trac.patchtext/x-patch; charset=US-ASCII; name=0001-Pass-the-target-LWLock-and-tranche-ID-to-LWLock-trac.patchDownload+32-22
0004-SDT-tracepoints-for-backend-type-and-postmaster-pid-.patchtext/x-patch; charset=US-ASCII; name=0004-SDT-tracepoints-for-backend-type-and-postmaster-pid-.patchDownload+22-1
0003-Comments-on-LWLock-tranches.patchtext/x-patch; charset=US-ASCII; name=0003-Comments-on-LWLock-tranches.patchDownload+46-4
0002-Add-to-the-tracepoints-in-LWLock-routines.patchtext/x-patch; charset=US-ASCII; name=0002-Add-to-the-tracepoints-in-LWLock-routines.patchDownload+32-1
On 10.03.21 06:38, Craig Ringer wrote:
On Wed, 3 Mar 2021 at 20:50, David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:On 1/22/21 6:02 AM, Peter Eisentraut wrote:
This patch set no longer applies:
http://cfbot.cputube.org/patch_32_2927.log
<http://cfbot.cputube.org/patch_32_2927.log>.Can we get a rebase? Also marked Waiting on Author.
Rebased as requested.
In patch 0001, why was the TRACE_POSTGRESQL_LWLOCK_RELEASE() call moved?
Is there some correctness issue? If so, we should explain that (at
least in the commit message, or as a separate patch).
On Thu, 11 Mar 2021 at 15:57, Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:
On 10.03.21 06:38, Craig Ringer wrote:
On Wed, 3 Mar 2021 at 20:50, David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:On 1/22/21 6:02 AM, Peter Eisentraut wrote:
This patch set no longer applies:
http://cfbot.cputube.org/patch_32_2927.log
<http://cfbot.cputube.org/patch_32_2927.log>.Can we get a rebase? Also marked Waiting on Author.
Rebased as requested.
In patch 0001, why was the TRACE_POSTGRESQL_LWLOCK_RELEASE() call moved?
Is there some correctness issue? If so, we should explain that (at
least in the commit message, or as a separate patch).
If you want I can split it out, or drop that change. I thought it was
sufficiently inconsequential, but you're right to check.
The current tracepoint TRACE_POSTGRESQL_LWLOCK_RELEASE really means
"releaseD". It's appropriate to emit this as soon as the lock could be
acquired by anything else. By deferring it until we'd processed the
waitlist and woken other backends the window during which the lock was
reported as "held" was longer than it truly was, and it was easy to see one
backend acquire the lock while another still appeared to hold it.
It'd possibly make more sense to have a separate
TRACE_POSTGRESQL_LWLOCK_RELEASING just before the `pg_atomic_sub_fetch_u32`
call. But I didn't want to spam the tracepoints too hard, and there's
always going to be some degree of overlap because tracing tools cannot
intercept and act during the atomic swap, so they'll always see a slightly
premature or slightly delayed release. This window should be as short as
possible though, hence moving the tracepoint.
Side note:
The main reason I didn't want to add more tracepoints than were strictly
necessary is that Pg doesn't enable the systemtap semaphores feature, so
right now we do a T_NAME(lock) evaluation each time we pass a tracepoint if
--enable-dtrace is compiled in, whether or not anything is tracing. This
was fine on pg11 where it was just:
#define T_NAME(lock) \
(LWLockTrancheArray[(lock)->tranche])
but since pg13 it instead expands to
GetLWTrancheName((lock)->tranche)
where GetLWTrancheName isn't especially trivial. We'll run that function
every single time we pass any of these tracepoints and then discard the
result, which is ... not ideal. That applies so long as Pg is compiled with
--enable-dtrace. I've been meaning to look at enabling the systemtap
semaphores feature in our build so these can be wrapped in
unlikely(TRACE_POSTGRESQL_LWLOCK_RELEASE_ENABLED()) guards, but I wanted to
wrap this patch set up first as there are some complexities around enabling
the semaphores feature.
On 18.03.21 07:34, Craig Ringer wrote:
In patch 0001, why was the TRACE_POSTGRESQL_LWLOCK_RELEASE() call
moved?
Is there some correctness issue? If so, we should explain that (at
least in the commit message, or as a separate patch).If you want I can split it out, or drop that change. I thought it was
sufficiently inconsequential, but you're right to check.The current tracepoint TRACE_POSTGRESQL_LWLOCK_RELEASE really means
"releaseD". It's appropriate to emit this as soon as the lock could be
acquired by anything else. By deferring it until we'd processed the
waitlist and woken other backends the window during which the lock was
reported as "held" was longer than it truly was, and it was easy to see
one backend acquire the lock while another still appeared to hold it.
From the archeology department: The TRACE_POSTGRESQL_LWLOCK_RELEASE
probe was in the right place until PG 9.4, but was then moved by
ab5194e6f617a9a9e7aadb3dd1cee948a42d0755, which was a major rewrite, so
it seems the move might have been accidental. The documentation
specifically states that the probe is triggered before waiters are woken
up, which it specifically does not do at the moment. So this looks like
a straight bug fix to me.
On 18.03.21 07:34, Craig Ringer wrote:
The main reason I didn't want to add more tracepoints than were strictly
necessary is that Pg doesn't enable the systemtap semaphores feature, so
right now we do a T_NAME(lock) evaluation each time we pass a tracepoint
if --enable-dtrace is compiled in, whether or not anything is tracing.
This was fine on pg11 where it was just:#define T_NAME(lock) \
(LWLockTrancheArray[(lock)->tranche])but since pg13 it instead expands to
GetLWTrancheName((lock)->tranche)
where GetLWTrancheName isn't especially trivial. We'll run that function
every single time we pass any of these tracepoints and then discard the
result, which is ... not ideal. That applies so long as Pg is compiled
with --enable-dtrace. I've been meaning to look at enabling the
systemtap semaphores feature in our build so these can be wrapped in
unlikely(TRACE_POSTGRESQL_LWLOCK_RELEASE_ENABLED()) guards, but I wanted
to wrap this patch set up first as there are some complexities around
enabling the semaphores feature.
There is already support for that. See the documentation at the end of
this page:
https://www.postgresql.org/docs/devel/dynamic-trace.html#DEFINING-TRACE-POINTS
On Sat, 20 Mar 2021, 04:21 Peter Eisentraut, <
peter.eisentraut@enterprisedb.com> wrote:
On 18.03.21 07:34, Craig Ringer wrote:
The main reason I didn't want to add more tracepoints than were strictly
necessary is that Pg doesn't enable the systemtap semaphores feature, so
right now we do a T_NAME(lock) evaluation each time we pass a tracepoint
if --enable-dtrace is compiled in, whether or not anything is tracing.
This was fine on pg11 where it was just:#define T_NAME(lock) \
(LWLockTrancheArray[(lock)->tranche])but since pg13 it instead expands to
GetLWTrancheName((lock)->tranche)
where GetLWTrancheName isn't especially trivial. We'll run that function
every single time we pass any of these tracepoints and then discard the
result, which is ... not ideal. That applies so long as Pg is compiled
with --enable-dtrace. I've been meaning to look at enabling the
systemtap semaphores feature in our build so these can be wrapped in
unlikely(TRACE_POSTGRESQL_LWLOCK_RELEASE_ENABLED()) guards, but I wanted
to wrap this patch set up first as there are some complexities around
enabling the semaphores feature.There is already support for that. See the documentation at the end of
this page:https://www.postgresql.org/docs/devel/dynamic-trace.html#DEFINING-TRACE-POINTS
Pretty sure it won't work right now.
To use systemtap semaphores (the _ENABLED macros) you need to run dtrace -g
to generate a probes.o then link that into postgres.
I don't think we do that. I'll double check soon.
On 19.03.21 21:06, Peter Eisentraut wrote:
On 18.03.21 07:34, Craig Ringer wrote:
In patch 0001, why was the TRACE_POSTGRESQL_LWLOCK_RELEASE() call
moved?
Is there some correctness issue? If so, we should explain that
(at
least in the commit message, or as a separate patch).If you want I can split it out, or drop that change. I thought it was
sufficiently inconsequential, but you're right to check.The current tracepoint TRACE_POSTGRESQL_LWLOCK_RELEASE really means
"releaseD". It's appropriate to emit this as soon as the lock could be
acquired by anything else. By deferring it until we'd processed the
waitlist and woken other backends the window during which the lock was
reported as "held" was longer than it truly was, and it was easy to
see one backend acquire the lock while another still appeared to hold it.From the archeology department: The TRACE_POSTGRESQL_LWLOCK_RELEASE
probe was in the right place until PG 9.4, but was then moved by
ab5194e6f617a9a9e7aadb3dd1cee948a42d0755, which was a major rewrite, so
it seems the move might have been accidental. The documentation
specifically states that the probe is triggered before waiters are woken
up, which it specifically does not do at the moment. So this looks like
a straight bug fix to me.
committed a fix for that
On 10.03.21 06:38, Craig Ringer wrote:
On Wed, 3 Mar 2021 at 20:50, David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:On 1/22/21 6:02 AM, Peter Eisentraut wrote:
This patch set no longer applies:
http://cfbot.cputube.org/patch_32_2927.log
<http://cfbot.cputube.org/patch_32_2927.log>.Can we get a rebase? Also marked Waiting on Author.
Rebased as requested.
I'm still interested in whether Andres will be able to do anything about
identifying LWLocks in a cross-backend manner. But this work doesn't
really depend on that; it'd benefit from it, but would be easily adapted
to it later if needed.
First, a problem: 0002 doesn't build on macOS, because uint64 has been
used in the probe definitions. That needs to be handled like the other
nonnative types in that file.
All the probe changes and additions should be accompanied by
documentation changes.
The probes used to have an argument to identify the lock, which was
removed by 3761fe3c20bb040b15f0e8da58d824631da00caa. The 0001 patch is
essentially trying to reinstate that, which seems sensible. Perhaps we
should also use the argument order that used to be there. It used to be
probe lwlock__acquire(const char *, int, LWLockMode);
and now it would be
probe lwlock__acquire(const char *, LWLockMode, LWLock*, int);
Also, do we need both the tranche name and the tranche id? Or maybe we
don't need the name, or can record it differently, which might also
address your other concern that it's too expensive to compute. In any
case, I think an argument order like
probe lwlock__acquite(const char *, int, LWLock*, LWLockMode);
would make more sense.
In 0004, you add a probe to record the application_name setting? Would
there be any value in making that a generic probe that can record any
GUC change?
On 20.03.21 01:29, Craig Ringer wrote:
There is already support for that. See the documentation at the end of
this page:
https://www.postgresql.org/docs/devel/dynamic-trace.html#DEFINING-TRACE-POINTS
<https://www.postgresql.org/docs/devel/dynamic-trace.html#DEFINING-TRACE-POINTS>Pretty sure it won't work right now.
To use systemtap semaphores (the _ENABLED macros) you need to run dtrace
-g to generate a probes.o then link that into postgres.I don't think we do that. I'll double check soon.
We do that. (It's -G.)
On Mon, 22 Mar 2021 at 17:00, Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:
On 20.03.21 01:29, Craig Ringer wrote:
There is already support for that. See the documentation at the end
of
this page:
https://www.postgresql.org/docs/devel/dynamic-trace.html#DEFINING-TRACE-POINTS
<
https://www.postgresql.org/docs/devel/dynamic-trace.html#DEFINING-TRACE-POINTS
Pretty sure it won't work right now.
To use systemtap semaphores (the _ENABLED macros) you need to run dtrace
-g to generate a probes.o then link that into postgres.I don't think we do that. I'll double check soon.
We do that. (It's -G.)
Huh. I could've sworn we didn't. My mistake, it's there in
src/backend/Makefile .
In that case I'll amend the patch to use semaphore guards.
(On a side note, systemtap's semaphore support is actually a massive pain.
The way it's implemented in <sys/sdt.h> means that a single compilation
unit may not use both probes.d style markers produced by the dtrace script
and use regular DTRACE_PROBE(providername,probename) preprocessor macros.
If it attempts to do so, the DTRACE_PROBE macros will emit inline asm that
tries to reference probename_semaphore symbols that will not exist,
resulting in linker errors or runtime link errors. But that's really a
systemtap problem. Core PostgreSQL doesn't use any explicit
DTRACE_PROBE(...), STAP_PROBE(...) etc.)