Add LWLock blocker(s) information
Hi hackers,
I've attached a patch to add blocker(s) information for LW Locks.
The motive behind is to be able to get some blocker(s) information (if
any) in the context of LW Locks.
_Motivation:_
We have seen some cases with heavy contention on some LW Locks (large
number of backends waiting on the same LW Lock).
Adding some blocker information would make the investigations easier, it
could help answering questions like:
* how many PIDs are holding the LWLock (could be more than one in case
of LW_SHARED)?
* Is the blocking PID changing?
* Is the number of blocking PIDs changing?
* What is the blocking PID doing?
* Is the blocking PID waiting?
* In which mode request is the blocked PID?
* in which mode is the blocker PID holding the lock?
_Technical context and proposal:_
There is 2 points in this patch:
* Add the instrumentation:
* the patch adds into the LWLock struct:
last_holding_pid: last pid owner of the lock
last_mode: last holding mode of the last pid owner
of the lock
nholders: number of holders (could be >1 in case of
LW_SHARED)
* the patch adds into the PGPROC struct:
//lwLastHoldingPid: last holder of the LW lock the PID is waiting for
lwHolderMode; LW lock mode of last holder of the
LW lock the PID is waiting for
lwNbHolders: number of holders of the LW lock the
PID is waiting for
and what is necessary to update this new information.
* Provide a way to display the information: the patch also adds a
function /pg_lwlock_blocking_pid/ to display this new information.
_Outcome Example:_
# select * from pg_lwlock_blocking_pid(10259);
requested_mode | last_holder_pid | last_holder_mode | nb_holders
----------------+-----------------+------------------+------------
LW_EXCLUSIVE | 10232 | LW_EXCLUSIVE | 1
(1 row)
# select query,pid,state,wait_event,wait_event_type,pg_lwlock_blocking_pid(pid),pg_blocking_pids(pid) from pg_stat_activity where state='active' and pid != pg_backend_pid();
query | pid | state | wait_event | wait_event_type | pg_lwlock_blocking_pid | pg_blocking_pids
--------------------------------+-------+--------+---------------+-----------------+-------------------------------------------+------------------
insert into bdtlwa values (1); | 10232 | active | | | (,,,) | {}
insert into bdtlwb values (1); | 10254 | active | WALInsert | LWLock | (LW_WAIT_UNTIL_FREE,10232,LW_EXCLUSIVE,1) | {}
create table bdtwt (a int); | 10256 | active | WALInsert | LWLock | (LW_WAIT_UNTIL_FREE,10232,LW_EXCLUSIVE,1) | {}
insert into bdtlwa values (2); | 10259 | active | BufferContent | LWLock | (LW_EXCLUSIVE,10232,LW_EXCLUSIVE,1) | {}
drop table bdtlwd; | 10261 | active | WALInsert | LWLock | (LW_WAIT_UNTIL_FREE,10232,LW_EXCLUSIVE,1) | {}
(5 rows)
So, should a PID being blocked on a LWLock we could see:
* in which mode request it is waiting
* the last pid holding the lock
* the mode of the last PID holding the lock
* the number of PID(s) holding the lock
_Remarks:_
I did a few benchmarks so far and did not observe notable performance
degradation (can share more details if needed).
I did some quick attempts to get an exhaustive list of blockers (in case
of LW_SHARED holders), but I think that would be challenging as:
* There is about 40 000 calls to LWLockInitialize and all my attempts
to init a list here produced “ FATAL: out of shared memory” or similar.
* One way to get rid of using a list in LWLock could be to use
proc_list (with proclist_head in LWLock and proclist_node in
PGPROC). This is the current implementation for the “waiters” list.
But this would not work for the blockers as one PGPROC can hold
multiples LW locks so it could mean having a list of about 40K
proclist_node per PGPROC.
* I also have concerns about possible performance impact by using such
a huge list in this context.
Those are the reasons why this patch does not provide an exhaustive list
of blockers.
While this patch does not provide an exhaustive list of blockers (in
case of LW_SHARED holders), the information it delivers could already be
useful to get insights during LWLock contention scenario.
I will add this patch to the next commitfest. I look forward to your
feedback about the idea and/or implementation.
Regards,
Bertrand
Attachments:
v1-0001-Add-LWLock-blockers-info.patchtext/plain; charset=UTF-8; name=v1-0001-Add-LWLock-blockers-info.patch; x-mac-creator=0; x-mac-type=0Download
doc/src/sgml/func.sgml | 20 ++++++++++
src/backend/access/transam/twophase.c | 1 +
src/backend/storage/lmgr/lwlock.c | 19 +++++++++
src/backend/storage/lmgr/proc.c | 2 +
src/backend/utils/adt/lockfuncs.c | 74 +++++++++++++++++++++++++++++++++++
src/include/catalog/pg_proc.dat | 8 ++++
src/include/storage/lwlock.h | 24 ++++++------
src/include/storage/proc.h | 3 ++
8 files changed, 140 insertions(+), 11 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7c06afd3ea..091e6d805c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20961,6 +20961,26 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_lwlock_blocking_pid</primary>
+ </indexterm>
+ <function>pg_lwlock_blocking_pid</function> ( <type>integer</type> )
+ <returnvalue>integer[]</returnvalue>
+ </para>
+ <para>
+ Returns the waiting mode of the the specified process ID,
+ the last process ID (and its holding mode) holding the LWLock, the number
+ of process holding the LWLlock, or empty values if there is no such server process
+ or it is not blocked.
+ </para>
+ <para>
+ One server process blocks another if it holds a lwlock that
+ conflicts with the blocked process's lwlock request.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index e1904877fa..d9be3b06cc 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -475,6 +475,7 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
proc->isBackgroundWorker = false;
proc->lwWaiting = false;
proc->lwWaitMode = 0;
+ proc->lwLastHoldingPid = 0;
proc->waitLock = NULL;
proc->waitProcLock = NULL;
for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 2fa90cc095..b617c18916 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -749,6 +749,8 @@ LWLockInitialize(LWLock *lock, int tranche_id)
pg_atomic_init_u32(&lock->nwaiters, 0);
#endif
lock->tranche = tranche_id;
+ lock->last_holding_pid = 0;
+ pg_atomic_init_u32(&lock->nholders, 0);
proclist_init(&lock->waiters);
}
@@ -872,6 +874,11 @@ LWLockAttemptLock(LWLock *lock, LWLockMode mode)
if (lock_free)
{
/* Great! Got the lock. */
+ if (MyProc) {
+ lock->last_holding_pid = MyProc->pid;
+ lock->last_mode = mode;
+ pg_atomic_fetch_add_u32(&lock->nholders, 1);
+ }
#ifdef LOCK_DEBUG
if (mode == LW_EXCLUSIVE)
lock->owner = MyProc;
@@ -1056,6 +1063,7 @@ LWLockWakeup(LWLock *lock)
*/
pg_write_barrier();
waiter->lwWaiting = false;
+ waiter->lwLastHoldingPid = 0;
PGSemaphoreUnlock(waiter->sem);
}
}
@@ -1322,6 +1330,9 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
lwstats->block_count++;
#endif
+ MyProc->lwLastHoldingPid = lock->last_holding_pid;
+ MyProc->lwHolderMode = lock->last_mode;
+ MyProc->lwNbHolders = pg_atomic_read_u32(&lock->nholders);
LWLockReportWaitStart(lock);
TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
@@ -1483,6 +1494,9 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
lwstats->block_count++;
#endif
+ MyProc->lwLastHoldingPid = lock->last_holding_pid;
+ MyProc->lwHolderMode = lock->last_mode;
+ MyProc->lwNbHolders = pg_atomic_read_u32(&lock->nholders);
LWLockReportWaitStart(lock);
TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
@@ -1699,6 +1713,9 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
lwstats->block_count++;
#endif
+ MyProc->lwLastHoldingPid = lock->last_holding_pid;
+ MyProc->lwHolderMode = lock->last_mode;
+ MyProc->lwNbHolders = pg_atomic_read_u32(&lock->nholders);
LWLockReportWaitStart(lock);
TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), LW_EXCLUSIVE);
@@ -1800,6 +1817,7 @@ LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
/* check comment in LWLockWakeup() about this barrier */
pg_write_barrier();
waiter->lwWaiting = false;
+ waiter->lwLastHoldingPid = 0;
PGSemaphoreUnlock(waiter->sem);
}
}
@@ -1844,6 +1862,7 @@ LWLockRelease(LWLock *lock)
else
oldstate = pg_atomic_sub_fetch_u32(&lock->state, LW_VAL_SHARED);
+ pg_atomic_fetch_sub_u32(&lock->nholders, 1);
/* nobody else can have that kind of lock */
Assert(!(oldstate & LW_VAL_EXCLUSIVE));
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index f5eef6fa4e..9ec7dab9a1 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -402,6 +402,7 @@ InitProcess(void)
if (IsAutoVacuumWorkerProcess())
MyPgXact->vacuumFlags |= PROC_IS_AUTOVACUUM;
MyProc->lwWaiting = false;
+ MyProc->lwLastHoldingPid = 0;
MyProc->lwWaitMode = 0;
MyProc->waitLock = NULL;
MyProc->waitProcLock = NULL;
@@ -581,6 +582,7 @@ InitAuxiliaryProcess(void)
MyProc->delayChkpt = false;
MyPgXact->vacuumFlags = 0;
MyProc->lwWaiting = false;
+ MyProc->lwLastHoldingPid = 0;
MyProc->lwWaitMode = 0;
MyProc->waitLock = NULL;
MyProc->waitProcLock = NULL;
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index e992d1bbfc..3282988205 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -20,6 +20,8 @@
#include "storage/predicate_internals.h"
#include "utils/array.h"
#include "utils/builtins.h"
+#include "storage/procarray.h"
+#include "storage/proc.h"
/*
@@ -525,6 +527,78 @@ pg_blocking_pids(PG_FUNCTION_ARGS)
sizeof(int32), true, TYPALIGN_INT));
}
+/*
+ * pg_lwlock_blocking_pid
+ *
+ * returns the waiting mode of the the specified process ID,
+ * the last process ID (and its holding mode) holding the LWLock, the number
+ * of process holding the LWLlock, or empty values if there is no such server process
+ * or it is not blocked.
+ */
+Datum
+pg_lwlock_blocking_pid(PG_FUNCTION_ARGS)
+{
+ int blocked_pid = PG_GETARG_INT32(0);
+ PGPROC *proc = NULL;
+ Datum values[4];
+ bool nulls[4];
+ TupleDesc tupdesc;
+ HeapTuple htup;
+
+ /*
+ * Construct a tuple descriptor for the result row.
+ */
+ tupdesc = CreateTemplateTupleDesc(4);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 1, "requested_mode",
+ TEXTOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 2, "last_holder_pid",
+ INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 3, "last_holder_mode",
+ TEXTOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 4, "nb_holders",
+ INT2OID, -1, 0);
+
+ tupdesc = BlessTupleDesc(tupdesc);
+ /*
+ * Form tuple with appropriate data.
+ */
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, false, sizeof(nulls));
+
+ proc = BackendPidGetProc(blocked_pid);
+
+ if (proc != NULL && proc->lwWaiting) {
+ if (proc->lwWaitMode == 0)
+ values[0] = CStringGetTextDatum("LW_EXCLUSIVE");
+ else if (proc->lwWaitMode == 1)
+ values[0] = CStringGetTextDatum("LW_SHARED");
+ else if (proc->lwWaitMode == 2)
+ values[0] = CStringGetTextDatum("LW_WAIT_UNTIL_FREE");
+ else
+ values[0] = CStringGetTextDatum("N/A");
+
+ values[1] = Int32GetDatum(proc->lwLastHoldingPid);
+
+ if (proc->lwHolderMode == 0)
+ values[2] = CStringGetTextDatum("LW_EXCLUSIVE");
+ else if (proc->lwHolderMode == 1)
+ values[2] = CStringGetTextDatum("LW_SHARED");
+ else if (proc->lwHolderMode == 2)
+ values[2] = CStringGetTextDatum("LW_WAIT_UNTIL_FREE");
+ else
+ values[2] = CStringGetTextDatum("N/A");
+ values[3] = Int16GetDatum(proc->lwNbHolders);
+ } else {
+ nulls[0] = true;
+ nulls[1] = true;
+ nulls[2] = true;
+ nulls[3] = true;
+ }
+
+ htup = heap_form_tuple(tupdesc, values, nulls);
+
+ PG_RETURN_DATUM(HeapTupleGetDatum(htup));
+}
/*
* pg_safe_snapshot_blocking_pids - produce an array of the PIDs blocking
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 61f2c2f5b4..ce0848d979 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5916,6 +5916,14 @@
descr => 'get array of PIDs of sessions blocking specified backend PID from acquiring a heavyweight lock',
proname => 'pg_blocking_pids', provolatile => 'v', prorettype => '_int4',
proargtypes => 'int4', prosrc => 'pg_blocking_pids' },
+{ oid => '8279',
+ descr => 'get informations of the session(s) blocking specified backend PID from acquiring a lightweight lock',
+ proname => 'pg_lwlock_blocking_pid', provolatile => 'v', prorettype => 'record',
+ proargtypes => 'int4',
+ proallargtypes => '{text,int4,text,int2}',
+ proargmodes => '{o,o,o,o}',
+ proargnames => '{requested_mode,last_holder_pid,last_holder_mode,nb_holders}',
+ prosrc => 'pg_lwlock_blocking_pid' },
{ oid => '3376',
descr => 'get array of PIDs of sessions blocking specified backend PID from acquiring a safe snapshot',
proname => 'pg_safe_snapshot_blocking_pids', provolatile => 'v',
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index c04ae97148..a352ca1efb 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -24,6 +24,15 @@
struct PGPROC;
+typedef enum LWLockMode
+{
+ LW_EXCLUSIVE,
+ LW_SHARED,
+ LW_WAIT_UNTIL_FREE /* A special mode used in PGPROC->lwWaitMode,
+ * when waiting for lock to become free. Not
+ * to be used as LWLockAcquire argument */
+} LWLockMode;
+
/*
* Code outside of lwlock.c should not manipulate the contents of this
* structure directly, but we have to declare it here to allow LWLocks to be
@@ -31,9 +40,12 @@ struct PGPROC;
*/
typedef struct LWLock
{
- uint16 tranche; /* tranche ID */
+ uint16 tranche; /* tranche ID */
pg_atomic_uint32 state; /* state of exclusive/nonexclusive lockers */
proclist_head waiters; /* list of waiting PGPROCs */
+ int last_holding_pid; /* last pid owner of the lock */
+ LWLockMode last_mode; /* last holding mode of the last pid owner of the lock */
+ pg_atomic_uint32 nholders; /* number of holders (could be >1 in case of LW_SHARED */
#ifdef LOCK_DEBUG
pg_atomic_uint32 nwaiters; /* number of waiters */
struct PGPROC *owner; /* last exclusive owner of the lock */
@@ -128,16 +140,6 @@ extern PGDLLIMPORT int NamedLWLockTrancheRequests;
#define NUM_FIXED_LWLOCKS \
(PREDICATELOCK_MANAGER_LWLOCK_OFFSET + NUM_PREDICATELOCK_PARTITIONS)
-typedef enum LWLockMode
-{
- LW_EXCLUSIVE,
- LW_SHARED,
- LW_WAIT_UNTIL_FREE /* A special mode used in PGPROC->lwWaitMode,
- * when waiting for lock to become free. Not
- * to be used as LWLockAcquire argument */
-} LWLockMode;
-
-
#ifdef LOCK_DEBUG
extern bool Trace_lwlocks;
#endif
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 1ee9000b2b..829a8a4421 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -130,6 +130,9 @@ struct PGPROC
bool lwWaiting; /* true if waiting for an LW lock */
uint8 lwWaitMode; /* lwlock mode being waited for */
proclist_node lwWaitLink; /* position in LW lock wait list */
+ int lwLastHoldingPid; /* last holder of the LW lock */
+ LWLockMode lwHolderMode; /* LW lock mode of last holder of the LW lock */
+ uint32 lwNbHolders; /* number of holders of the LW lock */
/* Support for condition variables. */
proclist_node cvWaitLink; /* position in CV wait list */
Hi hackers,
On 6/2/20 2:24 PM, Drouvot, Bertrand wrote:
Hi hackers,
I've attached a patch to add blocker(s) information for LW Locks.
The motive behind is to be able to get some blocker(s) information (if
any) in the context of LW Locks._Motivation:_
We have seen some cases with heavy contention on some LW Locks (large
number of backends waiting on the same LW Lock).Adding some blocker information would make the investigations easier,
it could help answering questions like:* how many PIDs are holding the LWLock (could be more than one in
case of LW_SHARED)?
* Is the blocking PID changing?
* Is the number of blocking PIDs changing?
* What is the blocking PID doing?
* Is the blocking PID waiting?
* In which mode request is the blocked PID?
* in which mode is the blocker PID holding the lock?_Technical context and proposal:_
There is 2 points in this patch:
* Add the instrumentation:
* the patch adds into the LWLock struct:
last_holding_pid: last pid owner of the lock
last_mode: last holding mode of the last pid owner
of the lock
nholders: number of holders (could be >1 in case
of LW_SHARED)* the patch adds into the PGPROC struct:
//lwLastHoldingPid: last holder of the LW lock the PID is waiting for
lwHolderMode; LW lock mode of last holder of the
LW lock the PID is waiting for
lwNbHolders: number of holders of the LW lock the
PID is waiting forand what is necessary to update this new information.
* Provide a way to display the information: the patch also adds a
function /pg_lwlock_blocking_pid/ to display this new information._Outcome Example:_
# select * from pg_lwlock_blocking_pid(10259);
requested_mode | last_holder_pid | last_holder_mode | nb_holders
----------------+-----------------+------------------+------------
LW_EXCLUSIVE | 10232 | LW_EXCLUSIVE | 1
(1 row)# select query,pid,state,wait_event,wait_event_type,pg_lwlock_blocking_pid(pid),pg_blocking_pids(pid) from pg_stat_activity where state='active' and pid != pg_backend_pid();
query | pid | state | wait_event | wait_event_type | pg_lwlock_blocking_pid | pg_blocking_pids
--------------------------------+-------+--------+---------------+-----------------+-------------------------------------------+------------------
insert into bdtlwa values (1); | 10232 | active | | | (,,,) | {}
insert into bdtlwb values (1); | 10254 | active | WALInsert | LWLock | (LW_WAIT_UNTIL_FREE,10232,LW_EXCLUSIVE,1) | {}
create table bdtwt (a int); | 10256 | active | WALInsert | LWLock | (LW_WAIT_UNTIL_FREE,10232,LW_EXCLUSIVE,1) | {}
insert into bdtlwa values (2); | 10259 | active | BufferContent | LWLock | (LW_EXCLUSIVE,10232,LW_EXCLUSIVE,1) | {}
drop table bdtlwd; | 10261 | active | WALInsert | LWLock | (LW_WAIT_UNTIL_FREE,10232,LW_EXCLUSIVE,1) | {}
(5 rows)So, should a PID being blocked on a LWLock we could see:
* in which mode request it is waiting
* the last pid holding the lock
* the mode of the last PID holding the lock
* the number of PID(s) holding the lock_Remarks:_
I did a few benchmarks so far and did not observe notable performance
degradation (can share more details if needed).I did some quick attempts to get an exhaustive list of blockers (in
case of LW_SHARED holders), but I think that would be challenging as:* There is about 40 000 calls to LWLockInitialize and all my
attempts to init a list here produced “ FATAL: out of shared
memory” or similar.
* One way to get rid of using a list in LWLock could be to use
proc_list (with proclist_head in LWLock and proclist_node in
PGPROC). This is the current implementation for the “waiters”
list. But this would not work for the blockers as one PGPROC can
hold multiples LW locks so it could mean having a list of about
40K proclist_node per PGPROC.
* I also have concerns about possible performance impact by using
such a huge list in this context.Those are the reasons why this patch does not provide an exhaustive
list of blockers.While this patch does not provide an exhaustive list of blockers (in
case of LW_SHARED holders), the information it delivers could already
be useful to get insights during LWLock contention scenario.I will add this patch to the next commitfest. I look forward to your
feedback about the idea and/or implementation.Regards,
Bertrand
Attaching a new version of the patch with a tiny change to make it pass
the regression tests (opr_sanity was failing due to the new function
that is part of the patch).
Regards,
Bertrand
Attachments:
v1-0002-Add-LWLock-blockers-info.patchtext/plain; charset=UTF-8; name=v1-0002-Add-LWLock-blockers-info.patch; x-mac-creator=0; x-mac-type=0Download
doc/src/sgml/func.sgml | 20 ++++++++++
src/backend/access/transam/twophase.c | 1 +
src/backend/storage/lmgr/lwlock.c | 19 +++++++++
src/backend/storage/lmgr/proc.c | 2 +
src/backend/utils/adt/lockfuncs.c | 74 +++++++++++++++++++++++++++++++++++
src/include/catalog/pg_proc.dat | 8 ++++
src/include/storage/lwlock.h | 24 ++++++------
src/include/storage/proc.h | 3 ++
8 files changed, 140 insertions(+), 11 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b682154f63..529e722b28 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20958,6 +20958,26 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_lwlock_blocking_pid</primary>
+ </indexterm>
+ <function>pg_lwlock_blocking_pid</function> ( <type>integer</type> )
+ <returnvalue>integer[]</returnvalue>
+ </para>
+ <para>
+ Returns the waiting mode of the the specified process ID,
+ the last process ID (and its holding mode) holding the LWLock, the number
+ of process holding the LWLlock, or empty values if there is no such server process
+ or it is not blocked.
+ </para>
+ <para>
+ One server process blocks another if it holds a lwlock that
+ conflicts with the blocked process's lwlock request.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index e1904877fa..d9be3b06cc 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -475,6 +475,7 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
proc->isBackgroundWorker = false;
proc->lwWaiting = false;
proc->lwWaitMode = 0;
+ proc->lwLastHoldingPid = 0;
proc->waitLock = NULL;
proc->waitProcLock = NULL;
for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 2fa90cc095..b617c18916 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -749,6 +749,8 @@ LWLockInitialize(LWLock *lock, int tranche_id)
pg_atomic_init_u32(&lock->nwaiters, 0);
#endif
lock->tranche = tranche_id;
+ lock->last_holding_pid = 0;
+ pg_atomic_init_u32(&lock->nholders, 0);
proclist_init(&lock->waiters);
}
@@ -872,6 +874,11 @@ LWLockAttemptLock(LWLock *lock, LWLockMode mode)
if (lock_free)
{
/* Great! Got the lock. */
+ if (MyProc) {
+ lock->last_holding_pid = MyProc->pid;
+ lock->last_mode = mode;
+ pg_atomic_fetch_add_u32(&lock->nholders, 1);
+ }
#ifdef LOCK_DEBUG
if (mode == LW_EXCLUSIVE)
lock->owner = MyProc;
@@ -1056,6 +1063,7 @@ LWLockWakeup(LWLock *lock)
*/
pg_write_barrier();
waiter->lwWaiting = false;
+ waiter->lwLastHoldingPid = 0;
PGSemaphoreUnlock(waiter->sem);
}
}
@@ -1322,6 +1330,9 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
lwstats->block_count++;
#endif
+ MyProc->lwLastHoldingPid = lock->last_holding_pid;
+ MyProc->lwHolderMode = lock->last_mode;
+ MyProc->lwNbHolders = pg_atomic_read_u32(&lock->nholders);
LWLockReportWaitStart(lock);
TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
@@ -1483,6 +1494,9 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
lwstats->block_count++;
#endif
+ MyProc->lwLastHoldingPid = lock->last_holding_pid;
+ MyProc->lwHolderMode = lock->last_mode;
+ MyProc->lwNbHolders = pg_atomic_read_u32(&lock->nholders);
LWLockReportWaitStart(lock);
TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
@@ -1699,6 +1713,9 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
lwstats->block_count++;
#endif
+ MyProc->lwLastHoldingPid = lock->last_holding_pid;
+ MyProc->lwHolderMode = lock->last_mode;
+ MyProc->lwNbHolders = pg_atomic_read_u32(&lock->nholders);
LWLockReportWaitStart(lock);
TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), LW_EXCLUSIVE);
@@ -1800,6 +1817,7 @@ LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
/* check comment in LWLockWakeup() about this barrier */
pg_write_barrier();
waiter->lwWaiting = false;
+ waiter->lwLastHoldingPid = 0;
PGSemaphoreUnlock(waiter->sem);
}
}
@@ -1844,6 +1862,7 @@ LWLockRelease(LWLock *lock)
else
oldstate = pg_atomic_sub_fetch_u32(&lock->state, LW_VAL_SHARED);
+ pg_atomic_fetch_sub_u32(&lock->nholders, 1);
/* nobody else can have that kind of lock */
Assert(!(oldstate & LW_VAL_EXCLUSIVE));
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index f5eef6fa4e..9ec7dab9a1 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -402,6 +402,7 @@ InitProcess(void)
if (IsAutoVacuumWorkerProcess())
MyPgXact->vacuumFlags |= PROC_IS_AUTOVACUUM;
MyProc->lwWaiting = false;
+ MyProc->lwLastHoldingPid = 0;
MyProc->lwWaitMode = 0;
MyProc->waitLock = NULL;
MyProc->waitProcLock = NULL;
@@ -581,6 +582,7 @@ InitAuxiliaryProcess(void)
MyProc->delayChkpt = false;
MyPgXact->vacuumFlags = 0;
MyProc->lwWaiting = false;
+ MyProc->lwLastHoldingPid = 0;
MyProc->lwWaitMode = 0;
MyProc->waitLock = NULL;
MyProc->waitProcLock = NULL;
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index e992d1bbfc..3282988205 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -20,6 +20,8 @@
#include "storage/predicate_internals.h"
#include "utils/array.h"
#include "utils/builtins.h"
+#include "storage/procarray.h"
+#include "storage/proc.h"
/*
@@ -525,6 +527,78 @@ pg_blocking_pids(PG_FUNCTION_ARGS)
sizeof(int32), true, TYPALIGN_INT));
}
+/*
+ * pg_lwlock_blocking_pid
+ *
+ * returns the waiting mode of the the specified process ID,
+ * the last process ID (and its holding mode) holding the LWLock, the number
+ * of process holding the LWLlock, or empty values if there is no such server process
+ * or it is not blocked.
+ */
+Datum
+pg_lwlock_blocking_pid(PG_FUNCTION_ARGS)
+{
+ int blocked_pid = PG_GETARG_INT32(0);
+ PGPROC *proc = NULL;
+ Datum values[4];
+ bool nulls[4];
+ TupleDesc tupdesc;
+ HeapTuple htup;
+
+ /*
+ * Construct a tuple descriptor for the result row.
+ */
+ tupdesc = CreateTemplateTupleDesc(4);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 1, "requested_mode",
+ TEXTOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 2, "last_holder_pid",
+ INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 3, "last_holder_mode",
+ TEXTOID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 4, "nb_holders",
+ INT2OID, -1, 0);
+
+ tupdesc = BlessTupleDesc(tupdesc);
+ /*
+ * Form tuple with appropriate data.
+ */
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, false, sizeof(nulls));
+
+ proc = BackendPidGetProc(blocked_pid);
+
+ if (proc != NULL && proc->lwWaiting) {
+ if (proc->lwWaitMode == 0)
+ values[0] = CStringGetTextDatum("LW_EXCLUSIVE");
+ else if (proc->lwWaitMode == 1)
+ values[0] = CStringGetTextDatum("LW_SHARED");
+ else if (proc->lwWaitMode == 2)
+ values[0] = CStringGetTextDatum("LW_WAIT_UNTIL_FREE");
+ else
+ values[0] = CStringGetTextDatum("N/A");
+
+ values[1] = Int32GetDatum(proc->lwLastHoldingPid);
+
+ if (proc->lwHolderMode == 0)
+ values[2] = CStringGetTextDatum("LW_EXCLUSIVE");
+ else if (proc->lwHolderMode == 1)
+ values[2] = CStringGetTextDatum("LW_SHARED");
+ else if (proc->lwHolderMode == 2)
+ values[2] = CStringGetTextDatum("LW_WAIT_UNTIL_FREE");
+ else
+ values[2] = CStringGetTextDatum("N/A");
+ values[3] = Int16GetDatum(proc->lwNbHolders);
+ } else {
+ nulls[0] = true;
+ nulls[1] = true;
+ nulls[2] = true;
+ nulls[3] = true;
+ }
+
+ htup = heap_form_tuple(tupdesc, values, nulls);
+
+ PG_RETURN_DATUM(HeapTupleGetDatum(htup));
+}
/*
* pg_safe_snapshot_blocking_pids - produce an array of the PIDs blocking
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 61f2c2f5b4..d06ec798de 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5916,6 +5916,14 @@
descr => 'get array of PIDs of sessions blocking specified backend PID from acquiring a heavyweight lock',
proname => 'pg_blocking_pids', provolatile => 'v', prorettype => '_int4',
proargtypes => 'int4', prosrc => 'pg_blocking_pids' },
+{ oid => '8279',
+ descr => 'get informations of the session(s) blocking specified backend PID from acquiring a lightweight lock',
+ proname => 'pg_lwlock_blocking_pid', provolatile => 'v', prorettype => 'record',
+ proargtypes => 'int4',
+ proallargtypes => '{int4,text,int4,text,int2}',
+ proargmodes => '{i,o,o,o,o}',
+ proargnames => '{pid,requested_mode,last_holder_pid,last_holder_mode,nb_holders}',
+ prosrc => 'pg_lwlock_blocking_pid' },
{ oid => '3376',
descr => 'get array of PIDs of sessions blocking specified backend PID from acquiring a safe snapshot',
proname => 'pg_safe_snapshot_blocking_pids', provolatile => 'v',
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index c04ae97148..a352ca1efb 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -24,6 +24,15 @@
struct PGPROC;
+typedef enum LWLockMode
+{
+ LW_EXCLUSIVE,
+ LW_SHARED,
+ LW_WAIT_UNTIL_FREE /* A special mode used in PGPROC->lwWaitMode,
+ * when waiting for lock to become free. Not
+ * to be used as LWLockAcquire argument */
+} LWLockMode;
+
/*
* Code outside of lwlock.c should not manipulate the contents of this
* structure directly, but we have to declare it here to allow LWLocks to be
@@ -31,9 +40,12 @@ struct PGPROC;
*/
typedef struct LWLock
{
- uint16 tranche; /* tranche ID */
+ uint16 tranche; /* tranche ID */
pg_atomic_uint32 state; /* state of exclusive/nonexclusive lockers */
proclist_head waiters; /* list of waiting PGPROCs */
+ int last_holding_pid; /* last pid owner of the lock */
+ LWLockMode last_mode; /* last holding mode of the last pid owner of the lock */
+ pg_atomic_uint32 nholders; /* number of holders (could be >1 in case of LW_SHARED */
#ifdef LOCK_DEBUG
pg_atomic_uint32 nwaiters; /* number of waiters */
struct PGPROC *owner; /* last exclusive owner of the lock */
@@ -128,16 +140,6 @@ extern PGDLLIMPORT int NamedLWLockTrancheRequests;
#define NUM_FIXED_LWLOCKS \
(PREDICATELOCK_MANAGER_LWLOCK_OFFSET + NUM_PREDICATELOCK_PARTITIONS)
-typedef enum LWLockMode
-{
- LW_EXCLUSIVE,
- LW_SHARED,
- LW_WAIT_UNTIL_FREE /* A special mode used in PGPROC->lwWaitMode,
- * when waiting for lock to become free. Not
- * to be used as LWLockAcquire argument */
-} LWLockMode;
-
-
#ifdef LOCK_DEBUG
extern bool Trace_lwlocks;
#endif
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 1ee9000b2b..829a8a4421 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -130,6 +130,9 @@ struct PGPROC
bool lwWaiting; /* true if waiting for an LW lock */
uint8 lwWaitMode; /* lwlock mode being waited for */
proclist_node lwWaitLink; /* position in LW lock wait list */
+ int lwLastHoldingPid; /* last holder of the LW lock */
+ LWLockMode lwHolderMode; /* LW lock mode of last holder of the LW lock */
+ uint32 lwNbHolders; /* number of holders of the LW lock */
/* Support for condition variables. */
proclist_node cvWaitLink; /* position in CV wait list */
Hi,
This is a very interesting topic. I did apply the 2nd patch to master branch and performed a quick test. I can observe below information,
postgres=# select * from pg_lwlock_blocking_pid(26925);
requested_mode | last_holder_pid | last_holder_mode | nb_holders
----------------+-----------------+------------------+------------
LW_EXCLUSIVE | 26844 | LW_EXCLUSIVE | 1
(1 row)
postgres=# select query,pid,state,wait_event,wait_event_type,pg_lwlock_blocking_pid(pid),pg_blocking_pids(pid) from pg_stat_activity where state='active' and pid != pg_backend_pid();
query | pid | state | wait_event | wait_event_type | pg_lwlock_blocking_pid | pg_blocking_pids
--------------------------------------------------------------+-------+--------+------------+-----------------+-------------------------------------+------------------
INSERT INTO orders SELECT FROM generate_series(1, 10000000); | 26925 | active | WALWrite | LWLock | (LW_EXCLUSIVE,26844,LW_EXCLUSIVE,1) | {}
(1 row)
At some points, I have to keep repeating the query in order to capture the "lock info". I think this is probably part of the design, but I was wondering,
if a query is in deadlock expecting a developer to take a look using the methods above, will the process be killed before a developer get the chance to execute the one of the query?
if some statistics information can be added, it may help the developers to get an overall idea about the lock status, and if the developers can specify some filters, such as, the number of times a query entered into a deadlock, the queries hold the lock more than number of ms, etc, it might help to troubleshooting the "lock" issue even better. And moreover, if this feature can be an independent extension, similar to "pg_buffercache" it will be great.
Best regards,
David
On Tue, Jun 2, 2020 at 8:25 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
the patch adds into the LWLock struct:
last_holding_pid: last pid owner of the lock
last_mode: last holding mode of the last pid owner of the lock
nholders: number of holders (could be >1 in case of LW_SHARED)
There's been significant work done over the years to get the size of
an LWLock down; I'm not very enthusiastic about making it bigger
again. See for example commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1
which embeds one of the LWLocks associated with a BufferDesc into the
structure to reduce the number of cache lines associated with common
buffer operations. I'm not sure whether this patch would increase the
space usage of a BufferDesc to more than one cache line again, but at
the very least it would make it a lot tighter, since it looks like it
adds 12 bytes to the size of each one.
It's also a little hard to believe that this doesn't hurt performance
on workloads with a lot of LWLock contention, although maybe not; it
doesn't seem crazy expensive, just possibly enough to matter.
I thought a little bit about what this might buy as compared with just
sampling wait events. That by itself is enough to tell you which
LWLocks are heavily contended. It doesn't tell you what they are
contending against, so this would be superior in that regard. However,
I wonder how much of a problem that actually is. Typically, LWLocks
aren't being taken for long periods, so all the things that are
accessing the lock spend some time waiting (which you will see via
wait events in pg_stat_activity) and some time holding the lock
(making you see other things in pg_stat_activity). It's possible to
have cases where this isn't true; e.g. a relatively small number of
backends committing transactions could be slowing down a much larger
number of backends taking snapshots, and you'd mostly only see the
latter waiting for ProcArrayLock. However, those kinds of cases don't
seem super-common or super-difficult to figure out.
What kinds of scenarios motivate you to propose this?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2020-08-10 18:27:17 -0400, Robert Haas wrote:
On Tue, Jun 2, 2020 at 8:25 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
the patch adds into the LWLock struct:
last_holding_pid: last pid owner of the lock
last_mode: last holding mode of the last pid owner of the lock
nholders: number of holders (could be >1 in case of LW_SHARED)There's been significant work done over the years to get the size of
an LWLock down; I'm not very enthusiastic about making it bigger
again. See for example commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1
which embeds one of the LWLocks associated with a BufferDesc into the
structure to reduce the number of cache lines associated with common
buffer operations. I'm not sure whether this patch would increase the
space usage of a BufferDesc to more than one cache line again, but at
the very least it would make it a lot tighter, since it looks like it
adds 12 bytes to the size of each one.
+many. If anything I would like to make them *smaller*. We should strive
to make locking more and more granular, and that requires the space
overhead to be small. I'm unhappy enough about the tranche being in
there, and requiring padding etc.
I spent a *LOT* of sweat getting where we are, I'd be unhappy to regress
on size or efficiency.
It's also a little hard to believe that this doesn't hurt performance
on workloads with a lot of LWLock contention, although maybe not; it
doesn't seem crazy expensive, just possibly enough to matter.
Yea.
I thought a little bit about what this might buy as compared with just
sampling wait events. That by itself is enough to tell you which
LWLocks are heavily contended. It doesn't tell you what they are
contending against, so this would be superior in that regard. However,
I wonder how much of a problem that actually is. Typically, LWLocks
aren't being taken for long periods, so all the things that are
accessing the lock spend some time waiting (which you will see via
wait events in pg_stat_activity) and some time holding the lock
(making you see other things in pg_stat_activity). It's possible to
have cases where this isn't true; e.g. a relatively small number of
backends committing transactions could be slowing down a much larger
number of backends taking snapshots, and you'd mostly only see the
latter waiting for ProcArrayLock. However, those kinds of cases don't
seem super-common or super-difficult to figure out.
Most of the cases where this kind of information really is interesting
seem to benefit a lot from having stack information available. That
obviously has overhead, so we don't want the cost all the
time. The script at
/messages/by-id/20170622210845.d2hsbqv6rxu2tiye@alap3.anarazel.de
can give you results like e.g.
https://anarazel.de/t/2017-06-22/pgsemwait_64_async.svg
Greetings,
Andres Freund
On Mon, Aug 10, 2020 at 5:41 PM Andres Freund <andres@anarazel.de> wrote:
Most of the cases where this kind of information really is interesting
seem to benefit a lot from having stack information available. That
obviously has overhead, so we don't want the cost all the
time. The script at
/messages/by-id/20170622210845.d2hsbqv6rxu2tiye@alap3.anarazel.de
can give you results like e.g.
https://anarazel.de/t/2017-06-22/pgsemwait_64_async.svg
It seems to have bitrot. Do you have a more recent version of the script?
--
Peter Geoghegan
Hi,
On 2020-08-12 16:47:13 -0700, Peter Geoghegan wrote:
On Mon, Aug 10, 2020 at 5:41 PM Andres Freund <andres@anarazel.de> wrote:
Most of the cases where this kind of information really is interesting
seem to benefit a lot from having stack information available. That
obviously has overhead, so we don't want the cost all the
time. The script at
/messages/by-id/20170622210845.d2hsbqv6rxu2tiye@alap3.anarazel.de
can give you results like e.g.
https://anarazel.de/t/2017-06-22/pgsemwait_64_async.svgIt seems to have bitrot. Do you have a more recent version of the script?
Attached. Needed one python3 fix, and to be adapted so it works with
futex based semaphores. Seems to work for both sysv and posix semaphores
now, based a very short test.
sudo python3 ./pgsemwait.py -x /home/andres/build/postgres/dev-optimize/vpath/src/backend/postgres -f 3|~/src/flamegraph/flamegraph.pl
Will add a note to the other thread.
Greetings,
Andres Freund
On Wed, Aug 12, 2020 at 5:39 PM Andres Freund <andres@anarazel.de> wrote:
Attached. Needed one python3 fix, and to be adapted so it works with
futex based semaphores. Seems to work for both sysv and posix semaphores
now, based a very short test.
Great, thanks!
--
Peter Geoghegan
On 11/08/2020 03:41, Andres Freund wrote:
On 2020-08-10 18:27:17 -0400, Robert Haas wrote:
On Tue, Jun 2, 2020 at 8:25 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
the patch adds into the LWLock struct:
last_holding_pid: last pid owner of the lock
last_mode: last holding mode of the last pid owner of the lock
nholders: number of holders (could be >1 in case of LW_SHARED)There's been significant work done over the years to get the size of
an LWLock down; I'm not very enthusiastic about making it bigger
again. See for example commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1
which embeds one of the LWLocks associated with a BufferDesc into the
structure to reduce the number of cache lines associated with common
buffer operations. I'm not sure whether this patch would increase the
space usage of a BufferDesc to more than one cache line again, but at
the very least it would make it a lot tighter, since it looks like it
adds 12 bytes to the size of each one.+many. If anything I would like to make them *smaller*. We should strive
to make locking more and more granular, and that requires the space
overhead to be small. I'm unhappy enough about the tranche being in
there, and requiring padding etc.I spent a *LOT* of sweat getting where we are, I'd be unhappy to regress
on size or efficiency.
That seems to be the consensus, so I'm marking this as Returned with
Feeback in the commitfest.
- Heikki
On Wed, Nov 18, 2020 at 5:25 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 11/08/2020 03:41, Andres Freund wrote:
On 2020-08-10 18:27:17 -0400, Robert Haas wrote:
On Tue, Jun 2, 2020 at 8:25 AM Drouvot, Bertrand <bdrouvot@amazon.com>
wrote:
the patch adds into the LWLock struct:
last_holding_pid: last pid owner of the lock
last_mode: last holding mode of the last pidowner of the lock
nholders: number of holders (could be >1 in case
of LW_SHARED)
There's been significant work done over the years to get the size of
an LWLock down; I'm not very enthusiastic about making it bigger
again. See for example commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1
which embeds one of the LWLocks associated with a BufferDesc into the
structure to reduce the number of cache lines associated with common
buffer operations. I'm not sure whether this patch would increase the
space usage of a BufferDesc to more than one cache line again, but at
the very least it would make it a lot tighter, since it looks like it
adds 12 bytes to the size of each one.+many. If anything I would like to make them *smaller*. We should strive
to make locking more and more granular, and that requires the space
overhead to be small. I'm unhappy enough about the tranche being in
there, and requiring padding etc.I spent a *LOT* of sweat getting where we are, I'd be unhappy to regress
on size or efficiency.That seems to be the consensus, so I'm marking this as Returned with
Feeback in the commitfest.
For what it's worth, I think that things like this are where we can really
benefit from external tracing and observation tools.
Instead of tracking the information persistently in the LWLock struct, we
can emit TRACE_POSTGRESQL_LWLOCK_BLOCKED_ON(...) in a context where we have
the information available to us, then forget all about it. We don't spend
anything unless someone's collecting the info.
If someone wants to track LWLock blocking relationships during debugging
and performance work, they can use systemtap, dtrace, bpftrace, or a
similar tool to observe the LWLock tracepoints and generate stats on LWLock
blocking frequencies/durations. Doing so with systemtap should be rather
simple.
I actually already had a look at this before. I found that the tracepoints
that're in the LWLock code right now don't supply enough information in
their arguments so you have to use DWARF debuginfo based probes, which is a
pain. The tranche name alone doesn't let you identify which lock within a
tranche is the current target.
I've attached a patch that adds the actual LWLock* to each tracepoint in
the LWLock subsystem. That permits different locks to be tracked when
handling tracepoint events within a single process.
Another patch adds tracepoints that were missing from LWLockUpdateVar and
LWLockWaitForVar. And another removes a stray
TRACE_POSTGRESQL_LWLOCK_ACQUIRE() in LWLockWaitForVar() which should not
have been there, since the lock is not actually acquired by
LWLockWaitForVar().
I'd hoped to add some sort of "index within the tranche" to tracepoints,
but it looks like it's not feasible. It turns out to be basically
impossible to get a stable identifier for an individual LWLock that is
valid across different backends. A LWLock inside a DSM segment might have a
different address in different backends depending on where the DSM segment
got mapped. The LWLock subsystem doesn't keep track of them and doesn't
have any way to map a LWLock pointer to any sort of cross-process-valid
identifier. So that's a bit of a pain when tracing. To provide something
stable I think it'd be necessary to add some kind of counter tracked
per-tranche and set by LWLockInitialize in the LWLock struct itself, which
we sure don't want to do. If this ever becomes important for some reason we
can probably look up whether the address is within a DSM segment or static
shmem and compute some kind of relative address to report. For now you
can't identify and compare individual locks within a tranche except for
individual locks and named tranches.
By the way, the LWLock tracepoints currently fire T_NAME(lock) which calls
GetLWTrancheName() for each tracepoint hit, so long as Pg is built with
--enable-dtrace, even when nothing is actually tracing them. We might want
to consider guarding them in systemtap tracepoint semaphore tests so they
just become a predicted-away branch when not active. Doing so requires a
small change to how we compile probes.d and the related makefile, but
shouldn't be too hard. I haven't done that in this patch set.
Attachments:
v1-0002-Pass-the-target-LWLock-and-tranche-ID-to-LWLock-t.patchtext/x-patch; charset=US-ASCII; name=v1-0002-Pass-the-target-LWLock-and-tranche-ID-to-LWLock-t.patchDownload
From c8a0d97545a6d58d5ca1646146c63a687cb33079 Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig.ringer@2ndquadrant.com>
Date: Thu, 19 Nov 2020 17:38:45 +0800
Subject: [PATCH v1 2/5] Pass the target LWLock* and tranche ID to LWLock
tracepoints
Previously the TRACE_POSTGRESQL_LWLOCK_ tracepoints only received a
pointer to the LWLock tranche name. This made it impossible to identify
individual locks.
Passing the lock pointer itself isn't perfect. If the lock is allocated inside
a DSM segment then it might be mapped at a different address in different
backends. It's safe to compare lock pointers between backends (assuming
!EXEC_BACKEND) if they're in the individual lock tranches or an
extension-requested named tranche, but not necessarily for tranches in
BuiltinTrancheIds or tranches >= LWTRANCHE_FIRST_USER_DEFINED that were
directly assigned with LWLockNewTrancheId(). Still, it's better than nothing;
the pointer is stable within a backend, and usually between backends.
---
src/backend/storage/lmgr/lwlock.c | 35 +++++++++++++++++++------------
src/backend/utils/probes.d | 18 +++++++++-------
2 files changed, 32 insertions(+), 21 deletions(-)
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 0a147bddaf..92de34a399 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -1323,7 +1323,8 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
#endif
LWLockReportWaitStart(lock);
- TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
+ TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode, lock,
+ lock->tranche);
for (;;)
{
@@ -1345,7 +1346,8 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
}
#endif
- TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+ TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode, lock,
+ lock->tranche);
LWLockReportWaitEnd();
LOG_LWDEBUG("LWLockAcquire", lock, "awakened");
@@ -1354,7 +1356,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
result = false;
}
- TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode);
+ TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode, lock, lock->tranche);
/* Add lock to list of locks held by this backend */
held_lwlocks[num_held_lwlocks].lock = lock;
@@ -1405,14 +1407,16 @@ LWLockConditionalAcquire(LWLock *lock, LWLockMode mode)
RESUME_INTERRUPTS();
LOG_LWDEBUG("LWLockConditionalAcquire", lock, "failed");
- TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), mode);
+ TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), mode, lock,
+ lock->tranche);
}
else
{
/* Add lock to list of locks held by this backend */
held_lwlocks[num_held_lwlocks].lock = lock;
held_lwlocks[num_held_lwlocks++].mode = mode;
- TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), mode);
+ TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), mode, lock,
+ lock->tranche);
}
return !mustwait;
}
@@ -1484,7 +1488,8 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
#endif
LWLockReportWaitStart(lock);
- TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
+ TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode, lock,
+ lock->tranche);
for (;;)
{
@@ -1502,7 +1507,8 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
Assert(nwaiters < MAX_BACKENDS);
}
#endif
- TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+ TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode, lock,
+ lock->tranche);
LWLockReportWaitEnd();
LOG_LWDEBUG("LWLockAcquireOrWait", lock, "awakened");
@@ -1532,7 +1538,8 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
/* Failed to get lock, so release interrupt holdoff */
RESUME_INTERRUPTS();
LOG_LWDEBUG("LWLockAcquireOrWait", lock, "failed");
- TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), mode);
+ TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), mode, lock,
+ lock->tranche);
}
else
{
@@ -1540,7 +1547,8 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
/* Add lock to list of locks held by this backend */
held_lwlocks[num_held_lwlocks].lock = lock;
held_lwlocks[num_held_lwlocks++].mode = mode;
- TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), mode);
+ TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), mode, lock,
+ lock->tranche);
}
return !mustwait;
@@ -1700,7 +1708,8 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
#endif
LWLockReportWaitStart(lock);
- TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), LW_EXCLUSIVE);
+ TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), LW_EXCLUSIVE, lock,
+ lock->tranche);
for (;;)
{
@@ -1719,7 +1728,7 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
}
#endif
- TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), LW_EXCLUSIVE);
+ TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), LW_EXCLUSIVE, lock, lock->tranche);
LWLockReportWaitEnd();
LOG_LWDEBUG("LWLockWaitForVar", lock, "awakened");
@@ -1845,6 +1854,8 @@ LWLockRelease(LWLock *lock)
/* nobody else can have that kind of lock */
Assert(!(oldstate & LW_VAL_EXCLUSIVE));
+ /* Released, though not woken yet. All releases must fire this. */
+ TRACE_POSTGRESQL_LWLOCK_RELEASE(T_NAME(lock), mode, lock, lock->tranche);
/*
* We're still waiting for backends to get scheduled, don't wake them up
@@ -1868,8 +1879,6 @@ LWLockRelease(LWLock *lock)
LWLockWakeup(lock);
}
- TRACE_POSTGRESQL_LWLOCK_RELEASE(T_NAME(lock));
-
/*
* Now okay to allow cancel/die interrupts.
*/
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index a0b0458108..89805c3a89 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -17,6 +17,7 @@
#define LocalTransactionId unsigned int
#define LWLockMode int
#define LOCKMODE int
+#define LWLock void
#define BlockNumber unsigned int
#define Oid unsigned int
#define ForkNumber int
@@ -28,14 +29,15 @@ provider postgresql {
probe transaction__commit(LocalTransactionId);
probe transaction__abort(LocalTransactionId);
- probe lwlock__acquire(const char *, LWLockMode);
- probe lwlock__release(const char *);
- probe lwlock__wait__start(const char *, LWLockMode);
- probe lwlock__wait__done(const char *, LWLockMode);
- probe lwlock__condacquire(const char *, LWLockMode);
- probe lwlock__condacquire__fail(const char *, LWLockMode);
- probe lwlock__acquire__or__wait(const char *, LWLockMode);
- probe lwlock__acquire__or__wait__fail(const char *, LWLockMode);
+ probe lwlock__acquire(const char *, LWLockMode, LWLock*, int);
+ probe lwlock__release(const char *, LWLockMode, LWLock*, int);
+ probe lwlock__wait__start(const char *, LWLockMode, LWLock*, int);
+ probe lwlock__wait__done(const char *, LWLockMode, LWLock*, int);
+ probe lwlock__condacquire(const char *, LWLockMode, LWLock*, int);
+ probe lwlock__condacquire__fail(const char *, LWLockMode, LWLock*, int);
+ probe lwlock__acquire__or__wait(const char *, LWLockMode, LWLock*, int);
+ probe lwlock__acquire__or__wait__fail(const char *, LWLockMode, LWLock*, int);
+
probe lock__wait__start(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
probe lock__wait__done(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
--
2.26.2
v1-0001-Remove-bogus-lwlock__acquire-tracepoint-from-LWLo.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Remove-bogus-lwlock__acquire-tracepoint-from-LWLo.patchDownload
From 9969eb12b5a1fdd39cdc7b3b7acb28370e1f95aa Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig.ringer@2ndquadrant.com>
Date: Thu, 19 Nov 2020 18:15:34 +0800
Subject: [PATCH v1 1/5] Remove bogus lwlock__acquire tracepoint from
LWLockWaitForVar
Calls to LWLockWaitForVar fired the TRACE_POSTGRESQL_LWLOCK_ACQUIRE
tracepoint, but LWLockWaitForVar() never actually acquires the
LWLock. Remove it.
---
src/backend/storage/lmgr/lwlock.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 2fa90cc095..0a147bddaf 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -1727,8 +1727,6 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
/* Now loop back and check the status of the lock again. */
}
- TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), LW_EXCLUSIVE);
-
/*
* Fix the process wait semaphore's count for any absorbed wakeups.
*/
--
2.26.2
v1-0003-Add-to-the-tracepoints-in-LWLock-routines.patchtext/x-patch; charset=US-ASCII; name=v1-0003-Add-to-the-tracepoints-in-LWLock-routines.patchDownload
From 7c697ae06dff02bcd306c55aa6a79eabd58284fc Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig.ringer@2ndquadrant.com>
Date: Thu, 19 Nov 2020 18:05:39 +0800
Subject: [PATCH v1 3/5] Add to the tracepoints in LWLock routines
The existing tracepoints in lwlock.c didn't mark the start of LWLock
acquisition, so timing of the full LWLock acquire cycle wasn't
possible without relying on debuginfo. Since this can be quite
relevant for production performance issues, emit tracepoints at the
start of LWLock acquire.
Also add a tracepoint that's fired for all LWLock acquisitions at the
moment the shared memory state changes, whether done by LWLockAcquire
or LWLockConditionalAcquire. This lets tools reliably track which
backends hold which LWLocks even if we add new functions that acquire
LWLocks in future.
Add tracepoints in LWLockWaitForVar and LWLockUpdateVar so process
interaction around LWLock variable waits can be observed from trace
tooling. They can cause long waits and/or deadlocks, so it's worth
being able to time and track them.
---
src/backend/storage/lmgr/lwlock.c | 24 ++++++++++++++++++++++++
src/backend/utils/probes.d | 8 ++++++++
2 files changed, 32 insertions(+)
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 92de34a399..d63119931b 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -876,6 +876,9 @@ LWLockAttemptLock(LWLock *lock, LWLockMode mode)
if (mode == LW_EXCLUSIVE)
lock->owner = MyProc;
#endif
+ /* All LWLock acquires must hit this tracepoint */
+ TRACE_POSTGRESQL_LWLOCK_ACQUIRED(T_NAME(lock), mode, lock,
+ lock->tranche);
return false;
}
else
@@ -1239,6 +1242,9 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
elog(ERROR, "too many LWLocks taken");
+ TRACE_POSTGRESQL_LWLOCK_ACQUIRE_START(T_NAME(lock), mode, lock,
+ lock->tranche);
+
/*
* Lock out cancel/die interrupts until we exit the code section protected
* by the LWLock. This ensures that interrupts will not interfere with
@@ -1391,6 +1397,9 @@ LWLockConditionalAcquire(LWLock *lock, LWLockMode mode)
if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
elog(ERROR, "too many LWLocks taken");
+ TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_START(T_NAME(lock), mode, lock,
+ lock->tranche);
+
/*
* Lock out cancel/die interrupts until we exit the code section protected
* by the LWLock. This ensures that interrupts will not interfere with
@@ -1455,6 +1464,9 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
elog(ERROR, "too many LWLocks taken");
+ TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_START(T_NAME(lock), mode, lock,
+ lock->tranche);
+
/*
* Lock out cancel/die interrupts until we exit the code section protected
* by the LWLock. This ensures that interrupts will not interfere with
@@ -1637,6 +1649,9 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
PRINT_LWDEBUG("LWLockWaitForVar", lock, LW_WAIT_UNTIL_FREE);
+ TRACE_POSTGRESQL_LWLOCK_WAITFORVAR_START(T_NAME(lock), lock,
+ lock->tranche, valptr, oldval, *valptr);
+
/*
* Lock out cancel/die interrupts while we sleep on the lock. There is no
* cleanup mechanism to remove us from the wait queue if we got
@@ -1747,6 +1762,9 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
*/
RESUME_INTERRUPTS();
+ TRACE_POSTGRESQL_LWLOCK_WAITFORVAR_DONE(T_NAME(lock), lock, lock->tranche,
+ valptr, oldval, *newval, result);
+
return result;
}
@@ -1769,6 +1787,9 @@ LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
PRINT_LWDEBUG("LWLockUpdateVar", lock, LW_EXCLUSIVE);
+ TRACE_POSTGRESQL_LWLOCK_UPDATEVAR_START(T_NAME(lock), lock, lock->tranche, valptr,
+ val);
+
proclist_init(&wakeup);
LWLockWaitListLock(lock);
@@ -1809,6 +1830,9 @@ LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
waiter->lwWaiting = false;
PGSemaphoreUnlock(waiter->sem);
}
+
+ TRACE_POSTGRESQL_LWLOCK_UPDATEVAR_DONE(T_NAME(lock), lock, lock->tranche,
+ valptr, val);
}
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index 89805c3a89..a62fdf61df 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -29,14 +29,22 @@ provider postgresql {
probe transaction__commit(LocalTransactionId);
probe transaction__abort(LocalTransactionId);
+ probe lwlock__acquired(const char *, LWLockMode, LWLock*, int);
+ probe lwlock__acquire__start(const char *, LWLockMode, LWLock*, int);
probe lwlock__acquire(const char *, LWLockMode, LWLock*, int);
probe lwlock__release(const char *, LWLockMode, LWLock*, int);
probe lwlock__wait__start(const char *, LWLockMode, LWLock*, int);
probe lwlock__wait__done(const char *, LWLockMode, LWLock*, int);
+ probe lwlock__condacquire__start(const char *, LWLockMode, LWLock*, int);
probe lwlock__condacquire(const char *, LWLockMode, LWLock*, int);
probe lwlock__condacquire__fail(const char *, LWLockMode, LWLock*, int);
+ probe lwlock__acquire__or__wait__start(const char *, LWLockMode, LWLock*, int);
probe lwlock__acquire__or__wait(const char *, LWLockMode, LWLock*, int);
probe lwlock__acquire__or__wait__fail(const char *, LWLockMode, LWLock*, int);
+ probe lwlock__waitforvar__start(const char *, LWLock*, int, uint64, uint64, uint64);
+ probe lwlock__waitforvar__done(const char *, LWLock*, int, uint64, uint64, uint64, bool);
+ probe lwlock__updatevar__start(const char *, LWLock*, int, uint64, uint64);
+ probe lwlock__updatevar__done(const char *, LWLock*, int, uint64, uint64);
probe lock__wait__start(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
--
2.26.2