[PATCH] Identify LWLocks in tracepoints

Started by Craig Ringerabout 5 years ago52 messages
#1Craig Ringer
craig.ringer@enterprisedb.com
4 attachment(s)

Hi all

The attached patch set follows on from the discussion in [1]/messages/by-id/CAGRY4nz=SEs3qc1R6xD3max7sg3kS-L81eJk2aLUWSQAeAFJTA@mail.gmail.com . "Add LWLock
blocker(s) information" by adding the actual LWLock* and the numeric
tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.

This does not provide complete information on blockers, because it's not
necessarily valid to compare any two LWLock* pointers between two process
address spaces. The locks could be in DSM segments, and those DSM segments
could be mapped at different addresses.

I wasn't able to work out a sensible way to map a LWLock* to any sort of
(tranche-id, lock-index) because there's no requirement that locks in a
tranche be contiguous or known individually to the lmgr.

Despite that, the patches improve the information available for LWLock
analysis significantly.

Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be
fired from LWLockWaitForVar, despite that function never actually acquiring
the lock.

Patch 2 adds the tranche id and lock pointer for each trace hit. This makes
it possible to differentiate between individual locks within a tranche, and
(so long as they aren't tranches in a DSM segment) compare locks between
processes. That means you can do lock-order analysis etc, which was not
previously especially feasible. Traces also don't have to do userspace
reads for the tranche name all the time, so the trace can run with lower
overhead.

Patch 3 adds a single-path tracepoint for all lock acquires and releases,
so you only have to probe the lwlock__acquired and lwlock__release events
to see all acquires/releases, whether conditional or otherwise. It also
adds start markers that can be used for timing wallclock duration of LWLock
acquires/releases.

Patch 4 adds some comments on LWLock tranches to try to address some points
I found confusing and hard to understand when investigating this topic.

[1]: /messages/by-id/CAGRY4nz=SEs3qc1R6xD3max7sg3kS-L81eJk2aLUWSQAeAFJTA@mail.gmail.com .
/messages/by-id/CAGRY4nz=SEs3qc1R6xD3max7sg3kS-L81eJk2aLUWSQAeAFJTA@mail.gmail.com
.

Attachments:

v1-0004-Comments-on-LWLock-tranches.patchapplication/x-patch; name=v1-0004-Comments-on-LWLock-tranches.patchDownload
From 583c818e3121c0f7c6506b434497c81ae94ee9cb Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig.ringer@2ndquadrant.com>
Date: Thu, 19 Nov 2020 17:30:47 +0800
Subject: [PATCH v1 4/4] Comments on LWLock tranches

---
 src/backend/storage/lmgr/lwlock.c | 49 +++++++++++++++++++++++++++++--
 1 file changed, 46 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index cfdfa7f328..123bcc463e 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -112,11 +112,14 @@ extern slock_t *ShmemLock;
  *
  * 1. The individually-named locks defined in lwlocknames.h each have their
  * own tranche.  The names of these tranches appear in IndividualLWLockNames[]
- * in lwlocknames.c.
+ * in lwlocknames.c. The LWLock structs are allocated in MainLWLockArray.
  *
  * 2. There are some predefined tranches for built-in groups of locks.
  * These are listed in enum BuiltinTrancheIds in lwlock.h, and their names
- * appear in BuiltinTrancheNames[] below.
+ * appear in BuiltinTrancheNames[] below. The LWLock structs are allocated
+ * elsewhere under the control of the subsystem that manages the tranche. The
+ * LWLock code does not know or care where in shared memory they are allocated
+ * or how many there are in a tranche.
  *
  * 3. Extensions can create new tranches, via either RequestNamedLWLockTranche
  * or LWLockRegisterTranche.  The names of these that are known in the current
@@ -196,6 +199,13 @@ static int	LWLockTrancheNamesAllocated = 0;
  * This points to the main array of LWLocks in shared memory.  Backends inherit
  * the pointer by fork from the postmaster (except in the EXEC_BACKEND case,
  * where we have special measures to pass it down).
+ *
+ * This array holds individual LWLocks and LWLocks allocated in named tranches.
+ *
+ * It does not hold locks for any LWLock that's separately initialized with
+ * LWLockInitialize(). Locks in tranches listed in BuiltinTrancheIds or
+ * allocated with LWLockNewTrancheId() can be embedded in other structs
+ * anywhere in shared memory.
  */
 LWLockPadded *MainLWLockArray = NULL;
 
@@ -593,6 +603,12 @@ InitLWLockAccess(void)
  * Caller needs to retrieve the requested number of LWLocks starting from
  * the base lock address returned by this API.  This can be used for
  * tranches that are requested by using RequestNamedLWLockTranche() API.
+ *
+ * The locks are already initialized.
+ *
+ * This function can not be used for locks in builtin tranches or tranches
+ * registered with LWLockRegisterTranche(). There is no way to look those locks
+ * up by name.
  */
 LWLockPadded *
 GetNamedLWLockTranche(const char *tranche_name)
@@ -647,6 +663,14 @@ LWLockNewTrancheId(void)
  *
  * The tranche name will be user-visible as a wait event name, so try to
  * use a name that fits the style for those.
+ *
+ * The tranche ID should be a user-defined tranche ID acquired from
+ * LWLockNewTrancheId(). It is not necessary to call this for tranches
+ * allocated by RequestNamedLWLockTranche().
+ *
+ * The LWLock subsystem does not know where LWLock(s) that will be assigned to
+ * this tranche are stored, or how many of them there are. The caller allocates
+ * suitable shared memory storage and initializes locks with LWLockInitialize().
  */
 void
 LWLockRegisterTranche(int tranche_id, const char *tranche_name)
@@ -699,6 +723,10 @@ LWLockRegisterTranche(int tranche_id, const char *tranche_name)
  *
  * The tranche name will be user-visible as a wait event name, so try to
  * use a name that fits the style for those.
+ *
+ * The LWLocks allocated here are retrieved after shmem startup using
+ * GetNamedLWLockTranche(). They are intialized during shared memory startup so
+ * it is not necessary to call LWLockInitialize() on them.
  */
 void
 RequestNamedLWLockTranche(const char *tranche_name, int num_lwlocks)
@@ -739,10 +767,17 @@ RequestNamedLWLockTranche(const char *tranche_name, int num_lwlocks)
 
 /*
  * LWLockInitialize - initialize a new lwlock; it's initially unlocked
+ *
+ * For callers outside the LWLock subsystem itself, the tranche ID must either
+ * be a BuiltinTrancheIds entry for the calling subsysytem or a tranche ID
+ * assigned with LWLockNewTrancheId().
  */
 void
 LWLockInitialize(LWLock *lock, int tranche_id)
 {
+	/* Re-initialization of individual LWLocks is not permitted */
+	Assert(tranche_id >= NUM_INDIVIDUAL_LWLOCKS || !IsUnderPostmaster);
+
 	pg_atomic_init_u32(&lock->state, LW_FLAG_RELEASE_OK);
 #ifdef LOCK_DEBUG
 	pg_atomic_init_u32(&lock->nwaiters, 0);
@@ -803,6 +838,11 @@ GetLWTrancheName(uint16 trancheId)
 
 /*
  * Return an identifier for an LWLock based on the wait class and event.
+ *
+ * Note that there's no way to identify a individual LWLock within a tranche by
+ * anything except its address. The LWLock subsystem doesn't know how many
+ * locks there are in all tranches and there's no requirement that they be
+ * stored in contiguous arrays.
  */
 const char *
 GetLWLockIdentifier(uint32 classId, uint16 eventId)
@@ -1010,7 +1050,7 @@ LWLockWakeup(LWLock *lock)
 
 	Assert(proclist_is_empty(&wakeup) || pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS);
 
-	/* unset required flags, and release lock, in one fell swoop */
+	/* unset required flags, and release waitlist lock, in one fell swoop */
 	{
 		uint32		old_state;
 		uint32		desired_state;
@@ -1837,6 +1877,9 @@ LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
 
 /*
  * LWLockRelease - release a previously acquired lock
+ *
+ * The actual lock acquire corresponding to this release happens in
+ * LWLockAttemptLock().
  */
 void
 LWLockRelease(LWLock *lock)
-- 
2.29.2

v1-0001-Remove-bogus-lwlock__acquire-tracepoint-from-LWLo.patchapplication/x-patch; name=v1-0001-Remove-bogus-lwlock__acquire-tracepoint-from-LWLo.patchDownload
From e5f7ba0ba4c72db7f59c3d22818f532f6b0be90a Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig.ringer@2ndquadrant.com>
Date: Thu, 19 Nov 2020 18:15:34 +0800
Subject: [PATCH v1 1/4] Remove bogus lwlock__acquire tracepoint from
 LWLockWaitForVar

Calls to LWLockWaitForVar fired the TRACE_POSTGRESQL_LWLOCK_ACQUIRE
tracepoint, but LWLockWaitForVar() never actually acquires the
LWLock. Remove it.
---
 src/backend/storage/lmgr/lwlock.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 108e652179..29e29707d7 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -1726,8 +1726,6 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 		/* Now loop back and check the status of the lock again. */
 	}
 
-	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), LW_EXCLUSIVE);
-
 	/*
 	 * Fix the process wait semaphore's count for any absorbed wakeups.
 	 */
-- 
2.29.2

v1-0002-Pass-the-target-LWLock-and-tranche-ID-to-LWLock-t.patchapplication/x-patch; name=v1-0002-Pass-the-target-LWLock-and-tranche-ID-to-LWLock-t.patchDownload
From b9b93919d51710cbb3427f9d99764e013657bb3a Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig.ringer@2ndquadrant.com>
Date: Thu, 19 Nov 2020 17:38:45 +0800
Subject: [PATCH v1 2/4] Pass the target LWLock* and tranche ID to LWLock
 tracepoints

Previously the TRACE_POSTGRESQL_LWLOCK_ tracepoints only received a
pointer to the LWLock tranche name. This made it impossible to identify
individual locks.

Passing the lock pointer itself isn't perfect. If the lock is allocated inside
a DSM segment then it might be mapped at a different address in different
backends. It's safe to compare lock pointers between backends (assuming
!EXEC_BACKEND) if they're in the individual lock tranches or an
extension-requested named tranche, but not necessarily for tranches in
BuiltinTrancheIds or tranches >= LWTRANCHE_FIRST_USER_DEFINED that were
directly assigned with LWLockNewTrancheId(). Still, it's better than nothing;
the pointer is stable within a backend, and usually between backends.
---
 src/backend/storage/lmgr/lwlock.c | 35 +++++++++++++++++++------------
 src/backend/utils/probes.d        | 18 +++++++++-------
 2 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 29e29707d7..63f1a619b0 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -1322,7 +1322,8 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 #endif
 
 		LWLockReportWaitStart(lock);
-		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode, lock,
+				lock->tranche);
 
 		for (;;)
 		{
@@ -1344,7 +1345,8 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 		}
 #endif
 
-		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode, lock,
+				lock->tranche);
 		LWLockReportWaitEnd();
 
 		LOG_LWDEBUG("LWLockAcquire", lock, "awakened");
@@ -1353,7 +1355,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 		result = false;
 	}
 
-	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode);
+	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode, lock, lock->tranche);
 
 	/* Add lock to list of locks held by this backend */
 	held_lwlocks[num_held_lwlocks].lock = lock;
@@ -1404,14 +1406,16 @@ LWLockConditionalAcquire(LWLock *lock, LWLockMode mode)
 		RESUME_INTERRUPTS();
 
 		LOG_LWDEBUG("LWLockConditionalAcquire", lock, "failed");
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), mode, lock,
+				lock->tranche);
 	}
 	else
 	{
 		/* Add lock to list of locks held by this backend */
 		held_lwlocks[num_held_lwlocks].lock = lock;
 		held_lwlocks[num_held_lwlocks++].mode = mode;
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), mode, lock,
+				lock->tranche);
 	}
 	return !mustwait;
 }
@@ -1483,7 +1487,8 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 #endif
 
 			LWLockReportWaitStart(lock);
-			TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
+			TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode, lock,
+					lock->tranche);
 
 			for (;;)
 			{
@@ -1501,7 +1506,8 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 				Assert(nwaiters < MAX_BACKENDS);
 			}
 #endif
-			TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+			TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode, lock,
+					lock->tranche);
 			LWLockReportWaitEnd();
 
 			LOG_LWDEBUG("LWLockAcquireOrWait", lock, "awakened");
@@ -1531,7 +1537,8 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 		/* Failed to get lock, so release interrupt holdoff */
 		RESUME_INTERRUPTS();
 		LOG_LWDEBUG("LWLockAcquireOrWait", lock, "failed");
-		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), mode, lock,
+				lock->tranche);
 	}
 	else
 	{
@@ -1539,7 +1546,8 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 		/* Add lock to list of locks held by this backend */
 		held_lwlocks[num_held_lwlocks].lock = lock;
 		held_lwlocks[num_held_lwlocks++].mode = mode;
-		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), mode, lock,
+				lock->tranche);
 	}
 
 	return !mustwait;
@@ -1699,7 +1707,8 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 #endif
 
 		LWLockReportWaitStart(lock);
-		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), LW_EXCLUSIVE);
+		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), LW_EXCLUSIVE, lock,
+				lock->tranche);
 
 		for (;;)
 		{
@@ -1718,7 +1727,7 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 		}
 #endif
 
-		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), LW_EXCLUSIVE);
+		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), LW_EXCLUSIVE, lock, lock->tranche);
 		LWLockReportWaitEnd();
 
 		LOG_LWDEBUG("LWLockWaitForVar", lock, "awakened");
@@ -1844,6 +1853,8 @@ LWLockRelease(LWLock *lock)
 	/* nobody else can have that kind of lock */
 	Assert(!(oldstate & LW_VAL_EXCLUSIVE));
 
+	/* Released, though not woken yet. All releases must fire this. */
+	TRACE_POSTGRESQL_LWLOCK_RELEASE(T_NAME(lock), mode, lock, lock->tranche);
 
 	/*
 	 * We're still waiting for backends to get scheduled, don't wake them up
@@ -1867,8 +1878,6 @@ LWLockRelease(LWLock *lock)
 		LWLockWakeup(lock);
 	}
 
-	TRACE_POSTGRESQL_LWLOCK_RELEASE(T_NAME(lock));
-
 	/*
 	 * Now okay to allow cancel/die interrupts.
 	 */
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index a0b0458108..89805c3a89 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -17,6 +17,7 @@
 #define LocalTransactionId unsigned int
 #define LWLockMode int
 #define LOCKMODE int
+#define LWLock void
 #define BlockNumber unsigned int
 #define Oid unsigned int
 #define ForkNumber int
@@ -28,14 +29,15 @@ provider postgresql {
 	probe transaction__commit(LocalTransactionId);
 	probe transaction__abort(LocalTransactionId);
 
-	probe lwlock__acquire(const char *, LWLockMode);
-	probe lwlock__release(const char *);
-	probe lwlock__wait__start(const char *, LWLockMode);
-	probe lwlock__wait__done(const char *, LWLockMode);
-	probe lwlock__condacquire(const char *, LWLockMode);
-	probe lwlock__condacquire__fail(const char *, LWLockMode);
-	probe lwlock__acquire__or__wait(const char *, LWLockMode);
-	probe lwlock__acquire__or__wait__fail(const char *, LWLockMode);
+	probe lwlock__acquire(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__release(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__wait__start(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__wait__done(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__condacquire(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__condacquire__fail(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__acquire__or__wait(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__acquire__or__wait__fail(const char *, LWLockMode, LWLock*, int);
+
 
 	probe lock__wait__start(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
 	probe lock__wait__done(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
-- 
2.29.2

v1-0003-Add-to-the-tracepoints-in-LWLock-routines.patchapplication/x-patch; name=v1-0003-Add-to-the-tracepoints-in-LWLock-routines.patchDownload
From 942e36fb368352f2a3c9c93f7f191ec5d5ef5bf7 Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig.ringer@2ndquadrant.com>
Date: Thu, 19 Nov 2020 18:05:39 +0800
Subject: [PATCH v1 3/4] Add to the tracepoints in LWLock routines

The existing tracepoints in lwlock.c didn't mark the start of LWLock
acquisition, so timing of the full LWLock acquire cycle wasn't
possible without relying on debuginfo. Since this can be quite
relevant for production performance issues, emit tracepoints at the
start of LWLock acquire.

Also add a tracepoint that's fired for all LWLock acquisitions at the
moment the shared memory state changes, whether done by LWLockAcquire
or LWLockConditionalAcquire. This lets tools reliably track which
backends hold which LWLocks even if we add new functions that acquire
LWLocks in future.

Add tracepoints in LWLockWaitForVar and LWLockUpdateVar so process
interaction around LWLock variable waits can be observed from trace
tooling. They can cause long waits and/or deadlocks, so it's worth
being able to time and track them.
---
 src/backend/storage/lmgr/lwlock.c | 24 ++++++++++++++++++++++++
 src/backend/utils/probes.d        |  8 ++++++++
 2 files changed, 32 insertions(+)

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 63f1a619b0..cfdfa7f328 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -875,6 +875,9 @@ LWLockAttemptLock(LWLock *lock, LWLockMode mode)
 				if (mode == LW_EXCLUSIVE)
 					lock->owner = MyProc;
 #endif
+				/* All LWLock acquires must hit this tracepoint */
+				TRACE_POSTGRESQL_LWLOCK_ACQUIRED(T_NAME(lock), mode, lock,
+						lock->tranche);
 				return false;
 			}
 			else
@@ -1238,6 +1241,9 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 	if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
 		elog(ERROR, "too many LWLocks taken");
 
+	TRACE_POSTGRESQL_LWLOCK_ACQUIRE_START(T_NAME(lock), mode, lock,
+			lock->tranche);
+
 	/*
 	 * Lock out cancel/die interrupts until we exit the code section protected
 	 * by the LWLock.  This ensures that interrupts will not interfere with
@@ -1390,6 +1396,9 @@ LWLockConditionalAcquire(LWLock *lock, LWLockMode mode)
 	if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
 		elog(ERROR, "too many LWLocks taken");
 
+	TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_START(T_NAME(lock), mode, lock,
+			lock->tranche);
+
 	/*
 	 * Lock out cancel/die interrupts until we exit the code section protected
 	 * by the LWLock.  This ensures that interrupts will not interfere with
@@ -1454,6 +1463,9 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 	if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
 		elog(ERROR, "too many LWLocks taken");
 
+	TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_START(T_NAME(lock), mode, lock,
+			lock->tranche);
+
 	/*
 	 * Lock out cancel/die interrupts until we exit the code section protected
 	 * by the LWLock.  This ensures that interrupts will not interfere with
@@ -1636,6 +1648,9 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 
 	PRINT_LWDEBUG("LWLockWaitForVar", lock, LW_WAIT_UNTIL_FREE);
 
+	TRACE_POSTGRESQL_LWLOCK_WAITFORVAR_START(T_NAME(lock), lock,
+			lock->tranche, valptr, oldval, *valptr);
+
 	/*
 	 * Lock out cancel/die interrupts while we sleep on the lock.  There is no
 	 * cleanup mechanism to remove us from the wait queue if we got
@@ -1746,6 +1761,9 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 	 */
 	RESUME_INTERRUPTS();
 
+	TRACE_POSTGRESQL_LWLOCK_WAITFORVAR_DONE(T_NAME(lock), lock, lock->tranche,
+			valptr, oldval, *newval, result);
+
 	return result;
 }
 
@@ -1768,6 +1786,9 @@ LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
 
 	PRINT_LWDEBUG("LWLockUpdateVar", lock, LW_EXCLUSIVE);
 
+	TRACE_POSTGRESQL_LWLOCK_UPDATEVAR_START(T_NAME(lock), lock, lock->tranche, valptr,
+			val);
+
 	proclist_init(&wakeup);
 
 	LWLockWaitListLock(lock);
@@ -1808,6 +1829,9 @@ LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
 		waiter->lwWaiting = false;
 		PGSemaphoreUnlock(waiter->sem);
 	}
+
+	TRACE_POSTGRESQL_LWLOCK_UPDATEVAR_DONE(T_NAME(lock), lock, lock->tranche,
+			valptr, val);
 }
 
 
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index 89805c3a89..a62fdf61df 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -29,14 +29,22 @@ provider postgresql {
 	probe transaction__commit(LocalTransactionId);
 	probe transaction__abort(LocalTransactionId);
 
+	probe lwlock__acquired(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__acquire__start(const char *, LWLockMode, LWLock*, int);
 	probe lwlock__acquire(const char *, LWLockMode, LWLock*, int);
 	probe lwlock__release(const char *, LWLockMode, LWLock*, int);
 	probe lwlock__wait__start(const char *, LWLockMode, LWLock*, int);
 	probe lwlock__wait__done(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__condacquire__start(const char *, LWLockMode, LWLock*, int);
 	probe lwlock__condacquire(const char *, LWLockMode, LWLock*, int);
 	probe lwlock__condacquire__fail(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__acquire__or__wait__start(const char *, LWLockMode, LWLock*, int);
 	probe lwlock__acquire__or__wait(const char *, LWLockMode, LWLock*, int);
 	probe lwlock__acquire__or__wait__fail(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__waitforvar__start(const char *, LWLock*, int, uint64, uint64, uint64);
+	probe lwlock__waitforvar__done(const char *, LWLock*, int, uint64, uint64, uint64, bool);
+	probe lwlock__updatevar__start(const char *, LWLock*, int, uint64, uint64);
+	probe lwlock__updatevar__done(const char *, LWLock*, int, uint64, uint64);
 
 
 	probe lock__wait__start(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
-- 
2.29.2

#2Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Craig Ringer (#1)
Re: [PATCH] Identify LWLocks in tracepoints

Hi Craig,

On Sat, Dec 19, 2020 at 2:00 PM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:

Hi all

The attached patch set follows on from the discussion in [1] "Add LWLock blocker(s) information" by adding the actual LWLock* and the numeric tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.

This does not provide complete information on blockers, because it's not necessarily valid to compare any two LWLock* pointers between two process address spaces. The locks could be in DSM segments, and those DSM segments could be mapped at different addresses.

I wasn't able to work out a sensible way to map a LWLock* to any sort of (tranche-id, lock-index) because there's no requirement that locks in a tranche be contiguous or known individually to the lmgr.

Despite that, the patches improve the information available for LWLock analysis significantly.

Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be fired from LWLockWaitForVar, despite that function never actually acquiring the lock.

Patch 2 adds the tranche id and lock pointer for each trace hit. This makes it possible to differentiate between individual locks within a tranche, and (so long as they aren't tranches in a DSM segment) compare locks between processes. That means you can do lock-order analysis etc, which was not previously especially feasible. Traces also don't have to do userspace reads for the tranche name all the time, so the trace can run with lower overhead.

Patch 3 adds a single-path tracepoint for all lock acquires and releases, so you only have to probe the lwlock__acquired and lwlock__release events to see all acquires/releases, whether conditional or otherwise. It also adds start markers that can be used for timing wallclock duration of LWLock acquires/releases.

Patch 4 adds some comments on LWLock tranches to try to address some points I found confusing and hard to understand when investigating this topic.

You sent in your patch to pgsql-hackers on Dec 19, but you did not
post it to the next CommitFest[1]https://commitfest.postgresql.org/31/. If this was intentional, then you
need to take no action. However, if you want your patch to be
reviewed as part of the upcoming CommitFest, then you need to add it
yourself before 2021-01-01 AoE[2]https://en.wikipedia.org/wiki/Anywhere_on_Earth. Thanks for your contributions.

Regards,

[1]: https://commitfest.postgresql.org/31/
[2]: https://en.wikipedia.org/wiki/Anywhere_on_Earth

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

#3Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Masahiko Sawada (#2)
Re: [PATCH] Identify LWLocks in tracepoints

On Mon, 28 Dec 2020 at 20:09, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi Craig,

On Sat, Dec 19, 2020 at 2:00 PM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:

Hi all

The attached patch set follows on from the discussion in [1] "Add LWLock

blocker(s) information" by adding the actual LWLock* and the numeric
tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.

This does not provide complete information on blockers, because it's not

necessarily valid to compare any two LWLock* pointers between two process
address spaces. The locks could be in DSM segments, and those DSM segments
could be mapped at different addresses.

I wasn't able to work out a sensible way to map a LWLock* to any sort of

(tranche-id, lock-index) because there's no requirement that locks in a
tranche be contiguous or known individually to the lmgr.

Despite that, the patches improve the information available for LWLock

analysis significantly.

Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be

fired from LWLockWaitForVar, despite that function never actually acquiring
the lock.

Patch 2 adds the tranche id and lock pointer for each trace hit. This

makes it possible to differentiate between individual locks within a
tranche, and (so long as they aren't tranches in a DSM segment) compare
locks between processes. That means you can do lock-order analysis etc,
which was not previously especially feasible. Traces also don't have to do
userspace reads for the tranche name all the time, so the trace can run
with lower overhead.

Patch 3 adds a single-path tracepoint for all lock acquires and

releases, so you only have to probe the lwlock__acquired and
lwlock__release events to see all acquires/releases, whether conditional or
otherwise. It also adds start markers that can be used for timing wallclock
duration of LWLock acquires/releases.

Patch 4 adds some comments on LWLock tranches to try to address some

points I found confusing and hard to understand when investigating this
topic.

You sent in your patch to pgsql-hackers on Dec 19, but you did not
post it to the next CommitFest[1]. If this was intentional, then you
need to take no action. However, if you want your patch to be
reviewed as part of the upcoming CommitFest, then you need to add it
yourself before 2021-01-01 AoE[2]. Thanks for your contributions.

Regards,

[1] https://commitfest.postgresql.org/31/
[2] https://en.wikipedia.org/wiki/Anywhere_on_Earth

Thanks.

CF entry created at https://commitfest.postgresql.org/32/2927/ . I don't
think it's urgent and will have limited review time so I didn't try to
wedge it into the current CF.

#4Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Craig Ringer (#1)
2 attachment(s)
Re: [PATCH] Identify LWLocks in tracepoints

On Sat, 19 Dec 2020 at 13:00, Craig Ringer <craig.ringer@enterprisedb.com>
wrote:

Hi all

The attached patch set follows on from the discussion in [1] "Add LWLock
blocker(s) information" by adding the actual LWLock* and the numeric
tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.

I've attached a systemtap script that makes use of the information exported
by the enhanced LWLock tracepoints. It offers something akin to dynamic
-DLWLOCK_STATS with automatic statistical aggregation and some selective
LWLOCK_DEBUG output.

The script isn't super pretty. I didn't try to separate event-data
collection from results output, and there's some duplication in places. But
it gives you an idea what's possible when we report lock pointers and
tranche IDs to tracepoints and add entry/exit tracepoints.

Key features:

* Collect statistical aggregates on lwlock hold and wait durations across
all processes. Stats are grouped by lockmode (shared or exclusive) and by
tranche name, as well as rollup stats across all tranches.
* Report lock wait and hold durations for each process when that process
exits. Again, reported by mode and tranche name.
* For long lock waits, print the waiter pid and waiting pid, along with
each process's backend type and application_name if known, the acquire
mode, and the acquire function

The output is intended to be human-readable, but it'd be quite simple to
convert it into raw tsv-style output suitable for ingestion into
statistical postprocessing or graphing tools.

It should be fairly easy to break down the stats by acquire function if
desired, so LWLockAcquire(), LWLockWaitForVar(), and LWLockAcquireOrWait()
are reported separately. They're combined in the current output.

Capturing the current query string is pretty simple if needed, but I didn't
think it was likely to be especially useful.

Sample output for a pg_regress run attached. Abridged version follows. Here
the !!W!! lines are "waited a long time", the !!H!! lines are "held a long
time". Then [pid]:MyBackendType tranche_name wait_time_us (wait_time) in
wait_func (appliation_name) => [blocker_pid] (blocker_application_name) .
If blocker pid wasn't identified it won't be reported - I know how to fix
that and will do so soon.

!!W!! [ 93030]:3 BufferContent 12993 (0m0.012993s) in
lwlock__acquire__start (pg_regress/text)
!!W!! [ 93036]:3 LockManager 14540 (0m0.014540s) in
lwlock__acquire__start (pg_regress/float8) => [ 93045] (pg_regress/regproc)
!!W!! [ 93035]:3 BufferContent 12608 (0m0.012608s) in
lwlock__acquire__start (pg_regress/float4) => [ 93034] (pg_regress/oid)
!!W!! [ 93036]:3 LockManager 10301 (0m0.010301s) in
lwlock__acquire__start (pg_regress/float8)
!!W!! [ 93043]:3 LockManager 10356 (0m0.010356s) in
lwlock__acquire__start (pg_regress/pg_lsn)
!!H!! [ 93033]:3 BufferContent 20579 (0m0.020579s)
(pg_regress/int8)
!!W!! [ 93027]:3 BufferContent 10766 (0m0.010766s) in
lwlock__acquire__start (pg_regress/char) => [ 93037] (pg_regress/bit)
!!W!! [ 93036]:3 OidGen 12876 (0m0.012876s) in
lwlock__acquire__start (pg_regress/float8)
...

Then the summary rollup at the end of the run. This can also be output
periodically during the run. Abbreviated for highlights:

wait locks: all procs tranche mode count
total avg variance min max
W LW_EXCLUSIVE (all) E 54185
14062734 259 1850265 1 44177
W LW_SHARED (all) S 3668
1116022 304 1527261 2 18642

held locks: all procs tranche mode count
total avg variance min max
H LW_EXCLUSIVE (all) E 10438060
153077259 14 37035 1 195043
H LW_SHARED (all) S 14199902
65466934 4 5318 1 44030

all procs by tranche tranche mode count
total avg variance min max
W tranche (all) S 3668
1116022 304 1527261 2 18642
W tranche (all) E 54185
14062734 259 1850265 1 44177
W tranche WALInsert E 9839
2393229 243 1180294 2 14209
W tranche BufferContent E 3012
1726543 573 3869409 2 28186
W tranche BufferContent S 1664
657855 395 2185694 2 18642
W tranche LockFastPath E 28314
6327801 223 1278053 1 26133
W tranche LockFastPath S 87
59401 682 3703217 19 9454
W tranche LockManager E 7223
2764392 382 2514863 2 44177

Hope this is interesting to someone.

Attachments:

sample_output.txttext/plain; charset=US-ASCII; name=sample_output.txtDownload
lwlock_hold_durations.stpapplication/octet-stream; name=lwlock_hold_durations.stpDownload
#5Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Craig Ringer (#1)
Re: [PATCH] Identify LWLocks in tracepoints

On Sat, Dec 19, 2020 at 01:00:01PM +0800, Craig Ringer wrote:

The attached patch set follows on from the discussion in [1] "Add LWLock
blocker(s) information" by adding the actual LWLock* and the numeric
tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.

This does not provide complete information on blockers, because it's not
necessarily valid to compare any two LWLock* pointers between two process
address spaces. The locks could be in DSM segments, and those DSM segments
could be mapped at different addresses.

I wasn't able to work out a sensible way to map a LWLock* to any sort of
(tranche-id, lock-index) because there's no requirement that locks in a
tranche be contiguous or known individually to the lmgr.

Despite that, the patches improve the information available for LWLock
analysis significantly.

Thanks for the patches, this could be indeed useful. I've looked through
and haven't noticed any issues with either the tracepoint extensions or
commentaries, except that I find it is not that clear how trance_id
indicates a re-initialization here?

/* Re-initialization of individual LWLocks is not permitted */
Assert(tranche_id >= NUM_INDIVIDUAL_LWLOCKS || !IsUnderPostmaster);

Patch 2 adds the tranche id and lock pointer for each trace hit. This makes
it possible to differentiate between individual locks within a tranche, and
(so long as they aren't tranches in a DSM segment) compare locks between
processes. That means you can do lock-order analysis etc, which was not
previously especially feasible.

I'm curious in which kind of situations lock-order analysis could be
helpful?

Traces also don't have to do userspace reads for the tranche name all
the time, so the trace can run with lower overhead.

This one is also interesting. Just for me to clarify, wouldn't there be
a bit of overhead anyway (due to switching from kernel context to user
space when a tracepoint was hit) that will mask name read overhead? Or
are there any available numbers about it?

#6Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Craig Ringer (#1)
Re: [PATCH] Identify LWLocks in tracepoints

On 2020-12-19 06:00, Craig Ringer wrote:

Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be
fired from LWLockWaitForVar, despite that function never actually
acquiring the lock.

This was added in 68a2e52bbaf when LWLockWaitForVar() was first
introduced. It looks like a mistake to me too, but maybe Heikki wants
to comment.

#7Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Dmitry Dolgov (#5)
Re: [PATCH] Identify LWLocks in tracepoints

On Wed, 13 Jan 2021 at 19:19, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Sat, Dec 19, 2020 at 01:00:01PM +0800, Craig Ringer wrote:

The attached patch set follows on from the discussion in [1] "Add LWLock
blocker(s) information" by adding the actual LWLock* and the numeric
tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.

This does not provide complete information on blockers, because it's not
necessarily valid to compare any two LWLock* pointers between two process
address spaces. The locks could be in DSM segments, and those DSM

segments

could be mapped at different addresses.

I wasn't able to work out a sensible way to map a LWLock* to any sort of
(tranche-id, lock-index) because there's no requirement that locks in a
tranche be contiguous or known individually to the lmgr.

Despite that, the patches improve the information available for LWLock
analysis significantly.

Thanks for the patches, this could be indeed useful. I've looked through
and haven't noticed any issues with either the tracepoint extensions or
commentaries, except that I find it is not that clear how trance_id
indicates a re-initialization here?

/* Re-initialization of individual LWLocks is not permitted */
Assert(tranche_id >= NUM_INDIVIDUAL_LWLOCKS || !IsUnderPostmaster);

There should be no reason for anything to call LWLockInitialize(...) on an
individual LWLock, since they are all initialized during postmaster startup.

Doing so must be a bug.

But that's a trivial change that can be done separately.

Patch 2 adds the tranche id and lock pointer for each trace hit. This

makes

it possible to differentiate between individual locks within a tranche,

and

(so long as they aren't tranches in a DSM segment) compare locks between
processes. That means you can do lock-order analysis etc, which was not
previously especially feasible.

I'm curious in which kind of situations lock-order analysis could be
helpful?

If code-path 1 does

LWLockAcquire(LockA, LW_EXCLUSIVE);
...
LWLockAcquire(LockB, LW_EXCLUSIVE);

and code-path 2 does:

LWLockAcquire(LockB, LW_EXCLUSIVE);
...
LWLockAcquire(LockA, LW_EXCLUSIVE);

then they're subject to deadlock. But you might not actually hit that often
in test workloads if the timing required for the deadlock to occur is tight
and/or occurs on infrequent operations.

It's not always easy to reason about or prove things about lock order when
they're potentially nested deep within many layers of other calls and
callbacks. Obviously something we try to avoid with LWLocks, but not
impossible.

If you trace a workload and derive all possible nestings of lock acquire
order, you can then prove things about whether there are any possible
ordering conflicts and where they might arise.

A PoC to do so is on my TODO.

Traces also don't have to do userspace reads for the tranche name all

the time, so the trace can run with lower overhead.

This one is also interesting. Just for me to clarify, wouldn't there be
a bit of overhead anyway (due to switching from kernel context to user
space when a tracepoint was hit) that will mask name read overhead? Or
are there any available numbers about it?

I don't have numbers on that. Whether it matters will depend way too much
on how you're using the probe points and collecting/consuming the data
anyway.

It's a bit unfortunate (IMO) that we make a function call for each
tracepoint invocation to get the tranche names. Ideally I'd prefer to be
able to omit the tranche names lookups for these probes entirely for
something as hot as LWLocks. But it's a bit of a pain to look up the
tranche names from an external trace tool, so instead I'm inclined to see
if we can enable systemtap's semaphores and only compute the tranche name
if the target probe is actually enabled. But that'd be separate to this
patch and require a build change in how systemtap support is compiled and
linked.

BTW, a user->kernel->user context switch only occurs when the trace tool's
probes use kernel space - such as for perf based probes, or for systemtap's
kernel-runtime probes. The same markers can be used by e.g. systemtap's
"dyninst" runtime that runs entirely in userspace.

#8Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Peter Eisentraut (#6)
Re: [PATCH] Identify LWLocks in tracepoints

On Thu, 14 Jan 2021 at 15:56, Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:

On 2020-12-19 06:00, Craig Ringer wrote:

Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be
fired from LWLockWaitForVar, despite that function never actually
acquiring the lock.

This was added in 68a2e52bbaf when LWLockWaitForVar() was first
introduced. It looks like a mistake to me too, but maybe Heikki wants
to comment.

I'm certain it's a copy/paste bug.

#9Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Craig Ringer (#8)
Re: [PATCH] Identify LWLocks in tracepoints

On 2021-01-14 09:39, Craig Ringer wrote:

On Thu, 14 Jan 2021 at 15:56, Peter Eisentraut
<peter.eisentraut@enterprisedb.com
<mailto:peter.eisentraut@enterprisedb.com>> wrote:

On 2020-12-19 06:00, Craig Ringer wrote:

Patch 1 fixes a bogus tracepoint where an lwlock__acquire event

would be

fired from LWLockWaitForVar, despite that function never actually
acquiring the lock.

This was added in 68a2e52bbaf when LWLockWaitForVar() was first
introduced.  It looks like a mistake to me too, but maybe Heikki wants
to comment.

I'm certain it's a copy/paste bug.

I have committed that patch.

#10David Steele
david@pgmasters.net
In reply to: Peter Eisentraut (#9)
Re: [PATCH] Identify LWLocks in tracepoints

On 1/22/21 6:02 AM, Peter Eisentraut wrote:

On 2021-01-14 09:39, Craig Ringer wrote:

On Thu, 14 Jan 2021 at 15:56, Peter Eisentraut
<peter.eisentraut@enterprisedb.com
<mailto:peter.eisentraut@enterprisedb.com>> wrote:

    On 2020-12-19 06:00, Craig Ringer wrote:
     > Patch 1 fixes a bogus tracepoint where an lwlock__acquire event
    would be
     > fired from LWLockWaitForVar, despite that function never actually
     > acquiring the lock.

    This was added in 68a2e52bbaf when LWLockWaitForVar() was first
    introduced.  It looks like a mistake to me too, but maybe Heikki
wants
    to comment.

I'm certain it's a copy/paste bug.

I have committed that patch.

This patch set no longer applies:
http://cfbot.cputube.org/patch_32_2927.log.

Can we get a rebase? Also marked Waiting on Author.

Regards,
--
-David
david@pgmasters.net

#11Craig Ringer
craig.ringer@enterprisedb.com
In reply to: David Steele (#10)
4 attachment(s)
Re: [PATCH] Identify LWLocks in tracepoints

On Wed, 3 Mar 2021 at 20:50, David Steele <david@pgmasters.net> wrote:

On 1/22/21 6:02 AM, Peter Eisentraut wrote:

This patch set no longer applies:
http://cfbot.cputube.org/patch_32_2927.log.

Can we get a rebase? Also marked Waiting on Author.

Rebased as requested.

I'm still interested in whether Andres will be able to do anything about
identifying LWLocks in a cross-backend manner. But this work doesn't really
depend on that; it'd benefit from it, but would be easily adapted to it
later if needed.

Attachments:

0001-Pass-the-target-LWLock-and-tranche-ID-to-LWLock-trac.patchtext/x-patch; charset=US-ASCII; name=0001-Pass-the-target-LWLock-and-tranche-ID-to-LWLock-trac.patchDownload
From 36c7ddcbca2dbbcb2967f01cb92aa1f61620c838 Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig.ringer@2ndquadrant.com>
Date: Thu, 19 Nov 2020 17:38:45 +0800
Subject: [PATCH 1/4] Pass the target LWLock* and tranche ID to LWLock
 tracepoints

Previously the TRACE_POSTGRESQL_LWLOCK_ tracepoints only received a
pointer to the LWLock tranche name. This made it impossible to identify
individual locks.

Passing the lock pointer itself isn't perfect. If the lock is allocated inside
a DSM segment then it might be mapped at a different address in different
backends. It's safe to compare lock pointers between backends (assuming
!EXEC_BACKEND) if they're in the individual lock tranches or an
extension-requested named tranche, but not necessarily for tranches in
BuiltinTrancheIds or tranches >= LWTRANCHE_FIRST_USER_DEFINED that were
directly assigned with LWLockNewTrancheId(). Still, it's better than nothing;
the pointer is stable within a backend, and usually between backends.
---
 src/backend/storage/lmgr/lwlock.c | 35 +++++++++++++++++++------------
 src/backend/utils/probes.d        | 18 +++++++++-------
 2 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 8cb6a6f042..5c8744d316 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -1321,7 +1321,8 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 #endif
 
 		LWLockReportWaitStart(lock);
-		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode, lock,
+				lock->tranche);
 
 		for (;;)
 		{
@@ -1343,7 +1344,8 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 		}
 #endif
 
-		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode, lock,
+				lock->tranche);
 		LWLockReportWaitEnd();
 
 		LOG_LWDEBUG("LWLockAcquire", lock, "awakened");
@@ -1352,7 +1354,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 		result = false;
 	}
 
-	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode);
+	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode, lock, lock->tranche);
 
 	/* Add lock to list of locks held by this backend */
 	held_lwlocks[num_held_lwlocks].lock = lock;
@@ -1403,14 +1405,16 @@ LWLockConditionalAcquire(LWLock *lock, LWLockMode mode)
 		RESUME_INTERRUPTS();
 
 		LOG_LWDEBUG("LWLockConditionalAcquire", lock, "failed");
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), mode, lock,
+				lock->tranche);
 	}
 	else
 	{
 		/* Add lock to list of locks held by this backend */
 		held_lwlocks[num_held_lwlocks].lock = lock;
 		held_lwlocks[num_held_lwlocks++].mode = mode;
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), mode, lock,
+				lock->tranche);
 	}
 	return !mustwait;
 }
@@ -1482,7 +1486,8 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 #endif
 
 			LWLockReportWaitStart(lock);
-			TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
+			TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode, lock,
+					lock->tranche);
 
 			for (;;)
 			{
@@ -1500,7 +1505,8 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 				Assert(nwaiters < MAX_BACKENDS);
 			}
 #endif
-			TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+			TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode, lock,
+					lock->tranche);
 			LWLockReportWaitEnd();
 
 			LOG_LWDEBUG("LWLockAcquireOrWait", lock, "awakened");
@@ -1530,7 +1536,8 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 		/* Failed to get lock, so release interrupt holdoff */
 		RESUME_INTERRUPTS();
 		LOG_LWDEBUG("LWLockAcquireOrWait", lock, "failed");
-		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), mode, lock,
+				lock->tranche);
 	}
 	else
 	{
@@ -1538,7 +1545,8 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 		/* Add lock to list of locks held by this backend */
 		held_lwlocks[num_held_lwlocks].lock = lock;
 		held_lwlocks[num_held_lwlocks++].mode = mode;
-		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), mode, lock,
+				lock->tranche);
 	}
 
 	return !mustwait;
@@ -1698,7 +1706,8 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 #endif
 
 		LWLockReportWaitStart(lock);
-		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), LW_EXCLUSIVE);
+		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), LW_EXCLUSIVE, lock,
+				lock->tranche);
 
 		for (;;)
 		{
@@ -1717,7 +1726,7 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 		}
 #endif
 
-		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), LW_EXCLUSIVE);
+		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), LW_EXCLUSIVE, lock, lock->tranche);
 		LWLockReportWaitEnd();
 
 		LOG_LWDEBUG("LWLockWaitForVar", lock, "awakened");
@@ -1843,6 +1852,8 @@ LWLockRelease(LWLock *lock)
 	/* nobody else can have that kind of lock */
 	Assert(!(oldstate & LW_VAL_EXCLUSIVE));
 
+	/* Released, though not woken yet. All releases must fire this. */
+	TRACE_POSTGRESQL_LWLOCK_RELEASE(T_NAME(lock), mode, lock, lock->tranche);
 
 	/*
 	 * We're still waiting for backends to get scheduled, don't wake them up
@@ -1866,8 +1877,6 @@ LWLockRelease(LWLock *lock)
 		LWLockWakeup(lock);
 	}
 
-	TRACE_POSTGRESQL_LWLOCK_RELEASE(T_NAME(lock));
-
 	/*
 	 * Now okay to allow cancel/die interrupts.
 	 */
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index b0c50a3c7f..363839f85d 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -17,6 +17,7 @@
 #define LocalTransactionId unsigned int
 #define LWLockMode int
 #define LOCKMODE int
+#define LWLock void
 #define BlockNumber unsigned int
 #define Oid unsigned int
 #define ForkNumber int
@@ -28,14 +29,15 @@ provider postgresql {
 	probe transaction__commit(LocalTransactionId);
 	probe transaction__abort(LocalTransactionId);
 
-	probe lwlock__acquire(const char *, LWLockMode);
-	probe lwlock__release(const char *);
-	probe lwlock__wait__start(const char *, LWLockMode);
-	probe lwlock__wait__done(const char *, LWLockMode);
-	probe lwlock__condacquire(const char *, LWLockMode);
-	probe lwlock__condacquire__fail(const char *, LWLockMode);
-	probe lwlock__acquire__or__wait(const char *, LWLockMode);
-	probe lwlock__acquire__or__wait__fail(const char *, LWLockMode);
+	probe lwlock__acquire(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__release(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__wait__start(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__wait__done(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__condacquire(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__condacquire__fail(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__acquire__or__wait(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__acquire__or__wait__fail(const char *, LWLockMode, LWLock*, int);
+
 
 	probe lock__wait__start(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
 	probe lock__wait__done(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
-- 
2.29.2

0004-SDT-tracepoints-for-backend-type-and-postmaster-pid-.patchtext/x-patch; charset=US-ASCII; name=0004-SDT-tracepoints-for-backend-type-and-postmaster-pid-.patchDownload
From 16a9da60cdf7b8eaf2f5fed82e4ab79cfdf33a6a Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig.ringer@2ndquadrant.com>
Date: Mon, 11 Jan 2021 10:33:10 +0800
Subject: [PATCH 4/4] SDT tracepoints for backend type and postmaster pid on
 startup

For collecting stats on things like LWLock activity it's good to be able
to identify a backend reliably, even without debuginfo available.
---
 src/backend/utils/init/miscinit.c  | 5 +++++
 src/backend/utils/misc/guc.c       | 3 +++
 src/backend/utils/misc/ps_status.c | 9 +++++++++
 src/backend/utils/probes.d         | 5 +++++
 4 files changed, 22 insertions(+)

diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 8b73850d0d..01b5cccf85 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -233,6 +233,11 @@ GetBackendTypeDesc(BackendType backendType)
 {
 	const char *backendDesc = "unknown process type";
 
+	if (MyProcPid == PostmasterPid)
+		return "postmaster";
+	else if (!IsUnderPostmaster)
+		return "standalone";
+
 	switch (backendType)
 	{
 		case B_INVALID:
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index e337df42cb..b34cfcefdb 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -95,6 +95,7 @@
 #include "utils/pg_lsn.h"
 #include "utils/plancache.h"
 #include "utils/portal.h"
+#include "utils/probes.h"
 #include "utils/ps_status.h"
 #include "utils/rls.h"
 #include "utils/snapmgr.h"
@@ -11834,6 +11835,8 @@ assign_application_name(const char *newval, void *extra)
 {
 	/* Update the pg_stat_activity view */
 	pgstat_report_appname(newval);
+
+	TRACE_POSTGRESQL_GUC_APPLICATION_NAME_ASSIGNED(newval);
 }
 
 static bool
diff --git a/src/backend/utils/misc/ps_status.c b/src/backend/utils/misc/ps_status.c
index 5819faaf2d..f6ea64a233 100644
--- a/src/backend/utils/misc/ps_status.c
+++ b/src/backend/utils/misc/ps_status.c
@@ -31,6 +31,7 @@
 #include "pgstat.h"
 #include "utils/guc.h"
 #include "utils/ps_status.h"
+#include "utils/probes.h"
 
 extern char **environ;
 bool		update_process_title = true;
@@ -265,6 +266,14 @@ init_ps_display(const char *fixed_part)
 	if (!fixed_part)
 		fixed_part = GetBackendTypeDesc(MyBackendType);
 
+	/*
+	 * init_ps_display() is a useful time to report the backend type to
+	 * tracing tools, since otherwise the probe would have to appear in
+	 * many different main loops for different backend types.
+	 */
+	TRACE_POSTGRESQL_BACKEND_TYPE(MyBackendType, fixed_part);
+	TRACE_POSTGRESQL_POSTMASTER_PID(PostmasterPid);
+
 #ifndef PS_USE_NONE
 	/* no ps display for stand-alone backend */
 	if (!IsUnderPostmaster)
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index 608a89af91..0157c3432f 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -101,4 +101,9 @@ provider postgresql {
 	probe wal__switch();
 	probe wal__buffer__write__dirty__start();
 	probe wal__buffer__write__dirty__done();
+
+	/* Useful probes for startup and process info, for when tracing a group of processes */
+	probe backend__type(int backend_type, const char * backend_type_description);
+	probe postmaster__pid(int postmaster_pid);
+	probe guc__application__name__assigned(const char *newval);
 };
-- 
2.29.2

0003-Comments-on-LWLock-tranches.patchtext/x-patch; charset=US-ASCII; name=0003-Comments-on-LWLock-tranches.patchDownload
From c358e8fee0c01d2d459365b93a04e57d945c9cfe Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig.ringer@2ndquadrant.com>
Date: Thu, 19 Nov 2020 17:30:47 +0800
Subject: [PATCH 3/4] Comments on LWLock tranches

---
 src/backend/storage/lmgr/lwlock.c | 49 +++++++++++++++++++++++++++++--
 1 file changed, 46 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 4c65a30c76..26ea3e64a9 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -112,11 +112,14 @@ extern slock_t *ShmemLock;
  *
  * 1. The individually-named locks defined in lwlocknames.h each have their
  * own tranche.  The names of these tranches appear in IndividualLWLockNames[]
- * in lwlocknames.c.
+ * in lwlocknames.c. The LWLock structs are allocated in MainLWLockArray.
  *
  * 2. There are some predefined tranches for built-in groups of locks.
  * These are listed in enum BuiltinTrancheIds in lwlock.h, and their names
- * appear in BuiltinTrancheNames[] below.
+ * appear in BuiltinTrancheNames[] below. The LWLock structs are allocated
+ * elsewhere under the control of the subsystem that manages the tranche. The
+ * LWLock code does not know or care where in shared memory they are allocated
+ * or how many there are in a tranche.
  *
  * 3. Extensions can create new tranches, via either RequestNamedLWLockTranche
  * or LWLockRegisterTranche.  The names of these that are known in the current
@@ -196,6 +199,13 @@ static int	LWLockTrancheNamesAllocated = 0;
  * This points to the main array of LWLocks in shared memory.  Backends inherit
  * the pointer by fork from the postmaster (except in the EXEC_BACKEND case,
  * where we have special measures to pass it down).
+ *
+ * This array holds individual LWLocks and LWLocks allocated in named tranches.
+ *
+ * It does not hold locks for any LWLock that's separately initialized with
+ * LWLockInitialize(). Locks in tranches listed in BuiltinTrancheIds or
+ * allocated with LWLockNewTrancheId() can be embedded in other structs
+ * anywhere in shared memory.
  */
 LWLockPadded *MainLWLockArray = NULL;
 
@@ -592,6 +602,12 @@ InitLWLockAccess(void)
  * Caller needs to retrieve the requested number of LWLocks starting from
  * the base lock address returned by this API.  This can be used for
  * tranches that are requested by using RequestNamedLWLockTranche() API.
+ *
+ * The locks are already initialized.
+ *
+ * This function can not be used for locks in builtin tranches or tranches
+ * registered with LWLockRegisterTranche(). There is no way to look those locks
+ * up by name.
  */
 LWLockPadded *
 GetNamedLWLockTranche(const char *tranche_name)
@@ -646,6 +662,14 @@ LWLockNewTrancheId(void)
  *
  * The tranche name will be user-visible as a wait event name, so try to
  * use a name that fits the style for those.
+ *
+ * The tranche ID should be a user-defined tranche ID acquired from
+ * LWLockNewTrancheId(). It is not necessary to call this for tranches
+ * allocated by RequestNamedLWLockTranche().
+ *
+ * The LWLock subsystem does not know where LWLock(s) that will be assigned to
+ * this tranche are stored, or how many of them there are. The caller allocates
+ * suitable shared memory storage and initializes locks with LWLockInitialize().
  */
 void
 LWLockRegisterTranche(int tranche_id, const char *tranche_name)
@@ -698,6 +722,10 @@ LWLockRegisterTranche(int tranche_id, const char *tranche_name)
  *
  * The tranche name will be user-visible as a wait event name, so try to
  * use a name that fits the style for those.
+ *
+ * The LWLocks allocated here are retrieved after shmem startup using
+ * GetNamedLWLockTranche(). They are intialized during shared memory startup so
+ * it is not necessary to call LWLockInitialize() on them.
  */
 void
 RequestNamedLWLockTranche(const char *tranche_name, int num_lwlocks)
@@ -738,10 +766,17 @@ RequestNamedLWLockTranche(const char *tranche_name, int num_lwlocks)
 
 /*
  * LWLockInitialize - initialize a new lwlock; it's initially unlocked
+ *
+ * For callers outside the LWLock subsystem itself, the tranche ID must either
+ * be a BuiltinTrancheIds entry for the calling subsysytem or a tranche ID
+ * assigned with LWLockNewTrancheId().
  */
 void
 LWLockInitialize(LWLock *lock, int tranche_id)
 {
+	/* Re-initialization of individual LWLocks is not permitted */
+	Assert(tranche_id >= NUM_INDIVIDUAL_LWLOCKS || !IsUnderPostmaster);
+
 	pg_atomic_init_u32(&lock->state, LW_FLAG_RELEASE_OK);
 #ifdef LOCK_DEBUG
 	pg_atomic_init_u32(&lock->nwaiters, 0);
@@ -802,6 +837,11 @@ GetLWTrancheName(uint16 trancheId)
 
 /*
  * Return an identifier for an LWLock based on the wait class and event.
+ *
+ * Note that there's no way to identify a individual LWLock within a tranche by
+ * anything except its address. The LWLock subsystem doesn't know how many
+ * locks there are in all tranches and there's no requirement that they be
+ * stored in contiguous arrays.
  */
 const char *
 GetLWLockIdentifier(uint32 classId, uint16 eventId)
@@ -1009,7 +1049,7 @@ LWLockWakeup(LWLock *lock)
 
 	Assert(proclist_is_empty(&wakeup) || pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS);
 
-	/* unset required flags, and release lock, in one fell swoop */
+	/* unset required flags, and release waitlist lock, in one fell swoop */
 	{
 		uint32		old_state;
 		uint32		desired_state;
@@ -1836,6 +1876,9 @@ LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
 
 /*
  * LWLockRelease - release a previously acquired lock
+ *
+ * The actual lock acquire corresponding to this release happens in
+ * LWLockAttemptLock().
  */
 void
 LWLockRelease(LWLock *lock)
-- 
2.29.2

0002-Add-to-the-tracepoints-in-LWLock-routines.patchtext/x-patch; charset=US-ASCII; name=0002-Add-to-the-tracepoints-in-LWLock-routines.patchDownload
From af1b0ebcce96c2a81b4a7d88e71ac719477440a8 Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig.ringer@2ndquadrant.com>
Date: Thu, 19 Nov 2020 18:05:39 +0800
Subject: [PATCH 2/4] Add to the tracepoints in LWLock routines

The existing tracepoints in lwlock.c didn't mark the start of LWLock
acquisition, so timing of the full LWLock acquire cycle wasn't
possible without relying on debuginfo. Since this can be quite
relevant for production performance issues, emit tracepoints at the
start of LWLock acquire.

Also add a tracepoint that's fired for all LWLock acquisitions at the
moment the shared memory state changes, whether done by LWLockAcquire
or LWLockConditionalAcquire. This lets tools reliably track which
backends hold which LWLocks even if we add new functions that acquire
LWLocks in future.

Add tracepoints in LWLockWaitForVar and LWLockUpdateVar so process
interaction around LWLock variable waits can be observed from trace
tooling. They can cause long waits and/or deadlocks, so it's worth
being able to time and track them.
---
 src/backend/storage/lmgr/lwlock.c | 24 ++++++++++++++++++++++++
 src/backend/utils/probes.d        |  8 ++++++++
 2 files changed, 32 insertions(+)

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 5c8744d316..4c65a30c76 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -874,6 +874,9 @@ LWLockAttemptLock(LWLock *lock, LWLockMode mode)
 				if (mode == LW_EXCLUSIVE)
 					lock->owner = MyProc;
 #endif
+				/* All LWLock acquires must hit this tracepoint */
+				TRACE_POSTGRESQL_LWLOCK_ACQUIRED(T_NAME(lock), mode, lock,
+						lock->tranche);
 				return false;
 			}
 			else
@@ -1237,6 +1240,9 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 	if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
 		elog(ERROR, "too many LWLocks taken");
 
+	TRACE_POSTGRESQL_LWLOCK_ACQUIRE_START(T_NAME(lock), mode, lock,
+			lock->tranche);
+
 	/*
 	 * Lock out cancel/die interrupts until we exit the code section protected
 	 * by the LWLock.  This ensures that interrupts will not interfere with
@@ -1389,6 +1395,9 @@ LWLockConditionalAcquire(LWLock *lock, LWLockMode mode)
 	if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
 		elog(ERROR, "too many LWLocks taken");
 
+	TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_START(T_NAME(lock), mode, lock,
+			lock->tranche);
+
 	/*
 	 * Lock out cancel/die interrupts until we exit the code section protected
 	 * by the LWLock.  This ensures that interrupts will not interfere with
@@ -1453,6 +1462,9 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 	if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
 		elog(ERROR, "too many LWLocks taken");
 
+	TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_START(T_NAME(lock), mode, lock,
+			lock->tranche);
+
 	/*
 	 * Lock out cancel/die interrupts until we exit the code section protected
 	 * by the LWLock.  This ensures that interrupts will not interfere with
@@ -1635,6 +1647,9 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 
 	PRINT_LWDEBUG("LWLockWaitForVar", lock, LW_WAIT_UNTIL_FREE);
 
+	TRACE_POSTGRESQL_LWLOCK_WAITFORVAR_START(T_NAME(lock), lock,
+			lock->tranche, valptr, oldval, *valptr);
+
 	/*
 	 * Lock out cancel/die interrupts while we sleep on the lock.  There is no
 	 * cleanup mechanism to remove us from the wait queue if we got
@@ -1745,6 +1760,9 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 	 */
 	RESUME_INTERRUPTS();
 
+	TRACE_POSTGRESQL_LWLOCK_WAITFORVAR_DONE(T_NAME(lock), lock, lock->tranche,
+			valptr, oldval, *newval, result);
+
 	return result;
 }
 
@@ -1767,6 +1785,9 @@ LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
 
 	PRINT_LWDEBUG("LWLockUpdateVar", lock, LW_EXCLUSIVE);
 
+	TRACE_POSTGRESQL_LWLOCK_UPDATEVAR_START(T_NAME(lock), lock, lock->tranche, valptr,
+			val);
+
 	proclist_init(&wakeup);
 
 	LWLockWaitListLock(lock);
@@ -1807,6 +1828,9 @@ LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
 		waiter->lwWaiting = false;
 		PGSemaphoreUnlock(waiter->sem);
 	}
+
+	TRACE_POSTGRESQL_LWLOCK_UPDATEVAR_DONE(T_NAME(lock), lock, lock->tranche,
+			valptr, val);
 }
 
 
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index 363839f85d..608a89af91 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -29,14 +29,22 @@ provider postgresql {
 	probe transaction__commit(LocalTransactionId);
 	probe transaction__abort(LocalTransactionId);
 
+	probe lwlock__acquired(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__acquire__start(const char *, LWLockMode, LWLock*, int);
 	probe lwlock__acquire(const char *, LWLockMode, LWLock*, int);
 	probe lwlock__release(const char *, LWLockMode, LWLock*, int);
 	probe lwlock__wait__start(const char *, LWLockMode, LWLock*, int);
 	probe lwlock__wait__done(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__condacquire__start(const char *, LWLockMode, LWLock*, int);
 	probe lwlock__condacquire(const char *, LWLockMode, LWLock*, int);
 	probe lwlock__condacquire__fail(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__acquire__or__wait__start(const char *, LWLockMode, LWLock*, int);
 	probe lwlock__acquire__or__wait(const char *, LWLockMode, LWLock*, int);
 	probe lwlock__acquire__or__wait__fail(const char *, LWLockMode, LWLock*, int);
+	probe lwlock__waitforvar__start(const char *, LWLock*, int, uint64, uint64, uint64);
+	probe lwlock__waitforvar__done(const char *, LWLock*, int, uint64, uint64, uint64, bool);
+	probe lwlock__updatevar__start(const char *, LWLock*, int, uint64, uint64);
+	probe lwlock__updatevar__done(const char *, LWLock*, int, uint64, uint64);
 
 
 	probe lock__wait__start(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
-- 
2.29.2

#12Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Craig Ringer (#11)
Re: [PATCH] Identify LWLocks in tracepoints

On 10.03.21 06:38, Craig Ringer wrote:

On Wed, 3 Mar 2021 at 20:50, David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:

On 1/22/21 6:02 AM, Peter Eisentraut wrote:

This patch set no longer applies:
http://cfbot.cputube.org/patch_32_2927.log
<http://cfbot.cputube.org/patch_32_2927.log&gt;.

Can we get a rebase? Also marked Waiting on Author.

Rebased as requested.

In patch 0001, why was the TRACE_POSTGRESQL_LWLOCK_RELEASE() call moved?
Is there some correctness issue? If so, we should explain that (at
least in the commit message, or as a separate patch).

#13Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Peter Eisentraut (#12)
Re: [PATCH] Identify LWLocks in tracepoints

On Thu, 11 Mar 2021 at 15:57, Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:

On 10.03.21 06:38, Craig Ringer wrote:

On Wed, 3 Mar 2021 at 20:50, David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:

On 1/22/21 6:02 AM, Peter Eisentraut wrote:

This patch set no longer applies:
http://cfbot.cputube.org/patch_32_2927.log
<http://cfbot.cputube.org/patch_32_2927.log&gt;.

Can we get a rebase? Also marked Waiting on Author.

Rebased as requested.

In patch 0001, why was the TRACE_POSTGRESQL_LWLOCK_RELEASE() call moved?
Is there some correctness issue? If so, we should explain that (at
least in the commit message, or as a separate patch).

If you want I can split it out, or drop that change. I thought it was
sufficiently inconsequential, but you're right to check.

The current tracepoint TRACE_POSTGRESQL_LWLOCK_RELEASE really means
"releaseD". It's appropriate to emit this as soon as the lock could be
acquired by anything else. By deferring it until we'd processed the
waitlist and woken other backends the window during which the lock was
reported as "held" was longer than it truly was, and it was easy to see one
backend acquire the lock while another still appeared to hold it.

It'd possibly make more sense to have a separate
TRACE_POSTGRESQL_LWLOCK_RELEASING just before the `pg_atomic_sub_fetch_u32`
call. But I didn't want to spam the tracepoints too hard, and there's
always going to be some degree of overlap because tracing tools cannot
intercept and act during the atomic swap, so they'll always see a slightly
premature or slightly delayed release. This window should be as short as
possible though, hence moving the tracepoint.

Side note:

The main reason I didn't want to add more tracepoints than were strictly
necessary is that Pg doesn't enable the systemtap semaphores feature, so
right now we do a T_NAME(lock) evaluation each time we pass a tracepoint if
--enable-dtrace is compiled in, whether or not anything is tracing. This
was fine on pg11 where it was just:

#define T_NAME(lock) \
(LWLockTrancheArray[(lock)->tranche])

but since pg13 it instead expands to

GetLWTrancheName((lock)->tranche)

where GetLWTrancheName isn't especially trivial. We'll run that function
every single time we pass any of these tracepoints and then discard the
result, which is ... not ideal. That applies so long as Pg is compiled with
--enable-dtrace. I've been meaning to look at enabling the systemtap
semaphores feature in our build so these can be wrapped in
unlikely(TRACE_POSTGRESQL_LWLOCK_RELEASE_ENABLED()) guards, but I wanted to
wrap this patch set up first as there are some complexities around enabling
the semaphores feature.

#14Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Craig Ringer (#13)
Re: [PATCH] Identify LWLocks in tracepoints

On 18.03.21 07:34, Craig Ringer wrote:

In patch 0001, why was the TRACE_POSTGRESQL_LWLOCK_RELEASE() call
moved?
  Is there some correctness issue?  If so, we should explain that (at
least in the commit message, or as a separate patch).

If you want I can split it out, or drop that change. I thought it was
sufficiently inconsequential, but you're right to check.

The current tracepoint TRACE_POSTGRESQL_LWLOCK_RELEASE really means
"releaseD". It's appropriate to emit this as soon as the lock could be
acquired by anything else. By deferring it until we'd processed the
waitlist and woken other backends the window during which the lock was
reported as "held" was longer than it truly was, and it was easy to see
one backend acquire the lock while another still appeared to hold it.

From the archeology department: The TRACE_POSTGRESQL_LWLOCK_RELEASE
probe was in the right place until PG 9.4, but was then moved by
ab5194e6f617a9a9e7aadb3dd1cee948a42d0755, which was a major rewrite, so
it seems the move might have been accidental. The documentation
specifically states that the probe is triggered before waiters are woken
up, which it specifically does not do at the moment. So this looks like
a straight bug fix to me.

#15Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Craig Ringer (#13)
Re: [PATCH] Identify LWLocks in tracepoints

On 18.03.21 07:34, Craig Ringer wrote:

The main reason I didn't want to add more tracepoints than were strictly
necessary is that Pg doesn't enable the systemtap semaphores feature, so
right now we do a T_NAME(lock) evaluation each time we pass a tracepoint
if --enable-dtrace is compiled in, whether or not anything is tracing.
This was fine on pg11 where it was just:

#define T_NAME(lock) \
        (LWLockTrancheArray[(lock)->tranche])

but since pg13 it instead expands to

        GetLWTrancheName((lock)->tranche)

where GetLWTrancheName isn't especially trivial. We'll run that function
every single time we pass any of these tracepoints and then discard the
result, which is ... not ideal. That applies so long as Pg is compiled
with --enable-dtrace. I've been meaning to look at enabling the
systemtap semaphores feature in our build so these can be wrapped in
unlikely(TRACE_POSTGRESQL_LWLOCK_RELEASE_ENABLED()) guards, but I wanted
to wrap this patch set up first as there are some complexities around
enabling the semaphores feature.

There is already support for that. See the documentation at the end of
this page:
https://www.postgresql.org/docs/devel/dynamic-trace.html#DEFINING-TRACE-POINTS

#16Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Peter Eisentraut (#15)
Re: [PATCH] Identify LWLocks in tracepoints

On Sat, 20 Mar 2021, 04:21 Peter Eisentraut, <
peter.eisentraut@enterprisedb.com> wrote:

On 18.03.21 07:34, Craig Ringer wrote:

The main reason I didn't want to add more tracepoints than were strictly
necessary is that Pg doesn't enable the systemtap semaphores feature, so
right now we do a T_NAME(lock) evaluation each time we pass a tracepoint
if --enable-dtrace is compiled in, whether or not anything is tracing.
This was fine on pg11 where it was just:

#define T_NAME(lock) \
(LWLockTrancheArray[(lock)->tranche])

but since pg13 it instead expands to

GetLWTrancheName((lock)->tranche)

where GetLWTrancheName isn't especially trivial. We'll run that function
every single time we pass any of these tracepoints and then discard the
result, which is ... not ideal. That applies so long as Pg is compiled
with --enable-dtrace. I've been meaning to look at enabling the
systemtap semaphores feature in our build so these can be wrapped in
unlikely(TRACE_POSTGRESQL_LWLOCK_RELEASE_ENABLED()) guards, but I wanted
to wrap this patch set up first as there are some complexities around
enabling the semaphores feature.

There is already support for that. See the documentation at the end of
this page:

https://www.postgresql.org/docs/devel/dynamic-trace.html#DEFINING-TRACE-POINTS

Pretty sure it won't work right now.

To use systemtap semaphores (the _ENABLED macros) you need to run dtrace -g
to generate a probes.o then link that into postgres.

I don't think we do that. I'll double check soon.

#17Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Peter Eisentraut (#14)
Re: [PATCH] Identify LWLocks in tracepoints

On 19.03.21 21:06, Peter Eisentraut wrote:

On 18.03.21 07:34, Craig Ringer wrote:

    In patch 0001, why was the TRACE_POSTGRESQL_LWLOCK_RELEASE() call
    moved?
       Is there some correctness issue?  If so, we should explain that
(at
    least in the commit message, or as a separate patch).

If you want I can split it out, or drop that change. I thought it was
sufficiently inconsequential, but you're right to check.

The current tracepoint TRACE_POSTGRESQL_LWLOCK_RELEASE really means
"releaseD". It's appropriate to emit this as soon as the lock could be
acquired by anything else. By deferring it until we'd processed the
waitlist and woken other backends the window during which the lock was
reported as "held" was longer than it truly was, and it was easy to
see one backend acquire the lock while another still appeared to hold it.

From the archeology department: The TRACE_POSTGRESQL_LWLOCK_RELEASE
probe was in the right place until PG 9.4, but was then moved by
ab5194e6f617a9a9e7aadb3dd1cee948a42d0755, which was a major rewrite, so
it seems the move might have been accidental.  The documentation
specifically states that the probe is triggered before waiters are woken
up, which it specifically does not do at the moment.  So this looks like
a straight bug fix to me.

committed a fix for that

#18Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Craig Ringer (#11)
Re: [PATCH] Identify LWLocks in tracepoints

On 10.03.21 06:38, Craig Ringer wrote:

On Wed, 3 Mar 2021 at 20:50, David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:

On 1/22/21 6:02 AM, Peter Eisentraut wrote:

This patch set no longer applies:
http://cfbot.cputube.org/patch_32_2927.log
<http://cfbot.cputube.org/patch_32_2927.log&gt;.

Can we get a rebase? Also marked Waiting on Author.

Rebased as requested.

I'm still interested in whether Andres will be able to do anything about
identifying LWLocks in a cross-backend manner. But this work doesn't
really depend on that; it'd benefit from it, but would be easily adapted
to it later if needed.

First, a problem: 0002 doesn't build on macOS, because uint64 has been
used in the probe definitions. That needs to be handled like the other
nonnative types in that file.

All the probe changes and additions should be accompanied by
documentation changes.

The probes used to have an argument to identify the lock, which was
removed by 3761fe3c20bb040b15f0e8da58d824631da00caa. The 0001 patch is
essentially trying to reinstate that, which seems sensible. Perhaps we
should also use the argument order that used to be there. It used to be

probe lwlock__acquire(const char *, int, LWLockMode);

and now it would be

probe lwlock__acquire(const char *, LWLockMode, LWLock*, int);

Also, do we need both the tranche name and the tranche id? Or maybe we
don't need the name, or can record it differently, which might also
address your other concern that it's too expensive to compute. In any
case, I think an argument order like

probe lwlock__acquite(const char *, int, LWLock*, LWLockMode);

would make more sense.

In 0004, you add a probe to record the application_name setting? Would
there be any value in making that a generic probe that can record any
GUC change?

#19Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Craig Ringer (#16)
Re: [PATCH] Identify LWLocks in tracepoints

On 20.03.21 01:29, Craig Ringer wrote:

There is already support for that.  See the documentation at the end of
this page:
https://www.postgresql.org/docs/devel/dynamic-trace.html#DEFINING-TRACE-POINTS
<https://www.postgresql.org/docs/devel/dynamic-trace.html#DEFINING-TRACE-POINTS&gt;

Pretty sure it won't work right now.

To use systemtap semaphores (the _ENABLED macros) you need to run dtrace
-g to generate a probes.o then link that into postgres.

I don't think we do that. I'll double check soon.

We do that. (It's -G.)

#20Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Peter Eisentraut (#19)
Re: [PATCH] Identify LWLocks in tracepoints

On Mon, 22 Mar 2021 at 17:00, Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:

On 20.03.21 01:29, Craig Ringer wrote:

There is already support for that. See the documentation at the end

of

this page:

https://www.postgresql.org/docs/devel/dynamic-trace.html#DEFINING-TRACE-POINTS

<

https://www.postgresql.org/docs/devel/dynamic-trace.html#DEFINING-TRACE-POINTS

Pretty sure it won't work right now.

To use systemtap semaphores (the _ENABLED macros) you need to run dtrace
-g to generate a probes.o then link that into postgres.

I don't think we do that. I'll double check soon.

We do that. (It's -G.)

Huh. I could've sworn we didn't. My mistake, it's there in
src/backend/Makefile .

In that case I'll amend the patch to use semaphore guards.

(On a side note, systemtap's semaphore support is actually a massive pain.
The way it's implemented in <sys/sdt.h> means that a single compilation
unit may not use both probes.d style markers produced by the dtrace script
and use regular DTRACE_PROBE(providername,probename) preprocessor macros.
If it attempts to do so, the DTRACE_PROBE macros will emit inline asm that
tries to reference probename_semaphore symbols that will not exist,
resulting in linker errors or runtime link errors. But that's really a
systemtap problem. Core PostgreSQL doesn't use any explicit
DTRACE_PROBE(...), STAP_PROBE(...) etc.)

#21Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Peter Eisentraut (#18)
Re: [PATCH] Identify LWLocks in tracepoints

On Mon, 22 Mar 2021 at 16:38, Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:

First, a problem: 0002 doesn't build on macOS, because uint64 has been
used in the probe definitions. That needs to be handled like the other
nonnative types in that file.

Will fix.

All the probe changes and additions should be accompanied by

documentation changes.

Agreed, will fix.

The probes used to have an argument to identify the lock, which was

removed by 3761fe3c20bb040b15f0e8da58d824631da00caa.

Huh. That's exactly the functionality I was looking for. Damn. I understand
why Robert removed it, but its removal makes it much harder to identify an
LWLock since it might fall in a DSM segment that could be mapped at
different base addresses in different backends.

Robert's patch didn't replace the offset within tranche with anything else
to identify the lock. A LWLock* is imperfect due to ASLR and DSM but it's
better than nothing. In theory we could even remap them in trace tools if
we had tracepoints on DSM attach and detach that showed their size and base
address too.

CC'ing Andres, as he expressed interest in being able to globally identify
LWLocks too.

The 0001 patch is
essentially trying to reinstate that, which seems sensible. Perhaps we
should also use the argument order that used to be there. It used to be

probe lwlock__acquire(const char *, int, LWLockMode);

and now it would be

probe lwlock__acquire(const char *, LWLockMode, LWLock*, int);

Also, do we need both the tranche name and the tranche id?

Reasons to have the name:

* There is no easy way to look up the tranche name by ID from outside the
backend
* A tranche ID by itself is pretty much meaningless especially for dynamic
tranches
* Any existing scripts will rely on the tranche name

So the tranche name is really required to generate useful output for any
dynamic tranches, or simple and readable output from things like perf.

Reasons to have the tranche ID:

* The tranche name is not guaranteed to have the same address for a given
value across backends in the presence of ASLR, even for built-in tranches.
So tools need to read tranche names as user-space strings, which is much
more expensive than consuming an int argument from the trace args. Storing
and reporting maps of events by tranche name (string) in tools is also more
expensive than having a tranche id.
* When the trace tool or script wants to filter for only one particular
tranche,particularly when it's a built-in tranche where the tranche ID is
known, having the ID is much more useful and efficient.
* If we can avoid computing the tranche name, emitting just the tranche ID
would be much faster.

It's annoying that we have to pay the cost of computing the tranche name
though. It never used to matter, but now that T_NAME() expands to
GetLWTrancheName() calls as of 29c3e2dd5a6 it's going to cost a little more
on such a hot path. I might see if I can do a little comparison and see how
much.

I could add TRACE_POSTGRESQL_<<tracepointname>>_ENABLED() guards since we
do in fact build with SDT semaphore support. That adds a branch for each
tracepoint, but they're already marshalling arguments and making a function
call that does lots more than a single branch, so that seems pretty
sensible. The main downside of using _ENABLED() USDT semaphore guards is
that not all tools are guaranteed to understand or support them. So an
older perf, for example, might simply fail to fire events on guarded
probes. That seems OK to me, the onus should be on the probe tool to pay
any costs, not on PostgreSQL. Despite that I don't want to mark the
_ENABLED() guards unlikely(), since that'd increase the observer effect
where probing LWLocks changes their timing and behaviour. Branch prediction
should do a very good job in such cases without being forced.

I wonder a little about the possible cache costs of the _ENABLED() macros
though. Their data is in a separate ELF segment and separate .o, with no
locality to the traced code. It might be worth checking that before
proceeding; I guess it's even possible that the GetLWTrancheName() calls
could be cheaper. Will see if I can run some checks and report back.

BTW, if you want some of the details on how userspace SDTs work,
https://leezhenghui.github.io/linux/2019/03/05/exploring-usdt-on-linux.html
is interesting and useful. It helps explain uprobes, ftrace, bcc, etc.

Or maybe we

don't need the name, or can record it differently, which might also
address your other concern that it's too expensive to compute. In any
case, I think an argument order like

probe lwlock__acquite(const char *, int, LWLock*, LWLockMode);

would make more sense.

OK.

In 0004, you add a probe to record the application_name setting? Would

there be any value in making that a generic probe that can record any
GUC change?

Yes, there would, but I didn't want to go and do that in the same patch,
and a named probe on application_name is useful separately to having probes
on any GUC.

There's value in having a probe with an easily targeted name that probes
the application_name since it's of obvious interest and utility to probing
and tracing tools. A probe specifically on application_name means a probing
script doesn't have to fire an event for every GUC, copy the GUC name
string, strcmp() it to see if it's the GUC of interest, etc. So specific
probes on "major" GUCs like this are IMO very useful.

(It'd be possible to instead generate probes for each GUC at compile-time
using the preprocessor and the DTRACE_ macros. But as noted above, that
doesn't currently work properly in the same compilation unit that a dtrace
script-generated probes.h is included in. I think it's probably nicer to
have specific probes for GUCs of high interest, then generic probes that
capture all GUCs anyway.)

There are a TON of probes I want to add, and I have a tree full of them
waiting to submit progressively. Yes, ability to probe all GUCs is in
there. So is detail on walsender, reorder buffer, and snapshot builder
activity. Heavyweight lock SDTs. A probe that identifies the backend type
at startup. SDT probe events emitted for every wait-event. Probes in elog.c
to let probes observe error unwinding, capture error messages, etc. (Those
can also be used with systemtap guru mode scripts to do things like turn a
particular elog(DEBUG) into a PANIC at runtime for diagnostic purposes).
Probes in shm_mq to observe message passing and blocking. A probe that
fires whenever debug_query_string changes. Lots. But I can't submit them
all at once, especially without some supporting use cases and scripts that
other people can use so they can understand why these probes are useful.

So I figured I'd start here...

#22Andres Freund
andres@anarazel.de
In reply to: Craig Ringer (#21)
Re: [PATCH] Identify LWLocks in tracepoints

Hi,

On 2021-04-12 14:31:32 +0800, Craig Ringer wrote:

* There is no easy way to look up the tranche name by ID from outside the
backend

But it's near trivial to add that.

It's annoying that we have to pay the cost of computing the tranche name
though. It never used to matter, but now that T_NAME() expands to
GetLWTrancheName() calls as of 29c3e2dd5a6 it's going to cost a little more
on such a hot path. I might see if I can do a little comparison and see how
much. I could add TRACE_POSTGRESQL_<<tracepointname>>_ENABLED() guards since we
do in fact build with SDT semaphore support. That adds a branch for each
tracepoint, but they're already marshalling arguments and making a function
call that does lots more than a single branch, so that seems pretty
sensible.

I am against adding any overhead for this feature. I honestly think the
probes we have right now in postgres do not provide a commensurate
benefit.

(It'd be possible to instead generate probes for each GUC at compile-time
using the preprocessor and the DTRACE_ macros. But as noted above, that
doesn't currently work properly in the same compilation unit that a dtrace
script-generated probes.h is included in. I think it's probably nicer to
have specific probes for GUCs of high interest, then generic probes that
capture all GUCs anyway.)

There are a TON of probes I want to add, and I have a tree full of them
waiting to submit progressively. Yes, ability to probe all GUCs is in
there. So is detail on walsender, reorder buffer, and snapshot builder
activity. Heavyweight lock SDTs. A probe that identifies the backend type
at startup. SDT probe events emitted for every wait-event. Probes in elog.c
to let probes observe error unwinding, capture error messages,
etc. [...] A probe that fires whenever debug_query_string
changes. Lots. But I can't submit them all at once, especially without
some supporting use cases and scripts that other people can use so
they can understand why these probes are useful.

-1. This is not scalable. Adding static probes all over has both a
runtime (L1I, branches, code optimization) and maintenance overhead.

(Those can also be used with systemtap guru mode scripts to do things
like turn a particular elog(DEBUG) into a PANIC at runtime for
diagnostic purposes).

Yikes.

Greetings,

Andres Freund

#23Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Andres Freund (#22)
Re: [PATCH] Identify LWLocks in tracepoints

On Tue, 13 Apr 2021 at 02:23, Andres Freund <andres@anarazel.de> wrote:

[I've changed the order of the quoted sections a little to prioritize
the key stuff]

On 2021-04-12 14:31:32 +0800, Craig Ringer wrote:

It's annoying that we have to pay the cost of computing the tranche name
though. It never used to matter, but now that T_NAME() expands to
GetLWTrancheName() calls as of 29c3e2dd5a6 it's going to cost a little more
on such a hot path. I might see if I can do a little comparison and see how
much. I could add TRACE_POSTGRESQL_<<tracepointname>>_ENABLED() guards since we
do in fact build with SDT semaphore support. That adds a branch for each
tracepoint, but they're already marshalling arguments and making a function
call that does lots more than a single branch, so that seems pretty
sensible.

I am against adding any overhead for this feature. I honestly think the
probes we have right now in postgres do not provide a commensurate
benefit.

I agree that the probes we have now are nearly useless, if not
entirely useless. The transaction management ones are misplaced and
utterly worthless. The LWLock ones don't carry enough info to be much
use and are incomplete. I doubt anybody uses any of them at all, or
would even notice their absence.

In terms of overhead, what is in place right now is not free. It used
to be very cheap, but since 29c3e2dd5a6 it's not. I'd like to reduce
the current cost and improve functionality at the same time, so it's
actually useful.

* There is no easy way to look up the tranche name by ID from outside the
backend

But it's near trivial to add that.

Really?

We can expose a pg_catalog.lwlock_tranches view that lets you observe
the current mappings for any given user backend I guess.

But if I'm looking for performance issues caused by excessive LWLock
contention or waits, LWLocks held too long, LWLock lock-ordering
deadlocks, or the like, it's something I want to capture across the
whole postgres instance. Each backend can have different tranche IDs
(right?) and there's no way to know what a given non-built-in tranche
ID means for any given backend without accessing backend-specific
in-memory state. Including for non-user-accessible backends like
bgworkers and auxprocs, where it's not possible to just query the
state from a view directly.

So we'd be looking at some kind of shm based monstrosity. That doesn't
sound appealing. Worse, there's no way to solve races with it - is a
given tranche ID already allocated when you see it? If not, can you
look it up from the backend before the backend exits/dies? For that
matter, how do you do that, since the connection to the backend is
likely under the control of an application, not your monitoring and
diagnostic tooling.

Some trace tools can poke backend memory directly, but it generally
requires debuginfo, is fragile and Pg version specific, slow, and a
real pain to use. If we don't attach the LWLock names to the
tracepoints in some way they're pretty worthless.

Again, I don't plan to add new costs here. I'm actually proposing to
reduce an existing cost.

And you can always build without `--enable-dtrace` and ... just not care.

Anyway - I'll do some `perf` runs shortly to quantify this:

* With/without tracepoints at all
* With/without names in tracepoints
* With/without tracepoint refcounting (_ENABLED() semaphores)

so as to rely less on handwaving.

(Those can also be used with systemtap guru mode scripts to do things
like turn a particular elog(DEBUG) into a PANIC at runtime for
diagnostic purposes).

Yikes.

Well, it's not like it can happen by accident. You have to
deliberately write a script that twiddles process memory, using a tool
that requires special privileges and

I recently had to prepare a custom build for a customer that converted
an elog(DEBUG) into an elog(PANIC) in order to capture a core with
much better diagnostic info for a complex, hard to reproduce and
intermittent memory management issue. It would've been rather nice to
be able to do so with a trace marker instead of a custom build.

There are a TON of probes I want to add, and I have a tree full of them
waiting to submit progressively. Yes, ability to probe all GUCs is in
there. So is detail on walsender, reorder buffer, and snapshot builder
activity. Heavyweight lock SDTs. A probe that identifies the backend type
at startup. SDT probe events emitted for every wait-event. Probes in elog.c
to let probes observe error unwinding, capture error messages,
etc. [...] A probe that fires whenever debug_query_string
changes. Lots. But I can't submit them all at once, especially without
some supporting use cases and scripts that other people can use so
they can understand why these probes are useful.

-1. This is not scalable. Adding static probes all over has both a
runtime (L1I, branches, code optimization) and maintenance overhead.

Take a look at "sudo perf list".

sched:sched_kthread_work_execute_end [Tracepoint event]
sched:sched_kthread_work_execute_start [Tracepoint event]
...
sched:sched_migrate_task [Tracepoint event]
...
sched:sched_process_exec [Tracepoint event]
...
sched:sched_process_fork [Tracepoint event]
...
sched:sched_stat_iowait [Tracepoint event]
...
sched:sched_stat_sleep [Tracepoint event]
sched:sched_stat_wait [Tracepoint event]
...
sched:sched_switch [Tracepoint event]
...
sched:sched_wakeup [Tracepoint event]

The kernel is packed with extremely useful trace events, and for very
good reasons. Some on very hot paths.

I do _not_ want to randomly add probes everywhere. I propose that they be added:

* Where they will meaningfully aid production diagnosis, complex
testing, and/or development activity. Expose high level activity of
key subsystems via trace markers especially at the boundaries of IPCs
or logic otherwise passes between processes.
* Where it's not feasible to instead adjust code structure to make
DWARF debuginfo based probing sufficient.
* Where there's no other sensible way to get useful information
without excessive complexity and/or runtime cost, but it could be very
important for understanding intermittent production issues or
performance problems at scale in live systems.
* Where the execution path is not extremely hot - e.g. no static
tracepoints in spinlocks or atomics.
* Where a DWARF debuginfo based probe cannot easily replace them, i.e.
generally not placed on entry and exit of stable and well-known
functions.

Re the code structure point above, we have lots of places where we
return in multiple places, or where a single function can do many
different things with different effects on system state. For example
right now it's quite complex to place probes to definitively confirm
the outcome of a given transaction and capture its commit record lsn.
Functions with many branches that each fiddle with system state,
functions that test for the validity of some global and short-circuit
return if invalid, etc. Functions that do long loops over big chunks
of logic are hard too, e.g. ReorderBufferCommit.

I want to place probes where they will greatly simplify observation of
important global system state that's not easily observed using
traditional tools like gdb or logging.

When applied sensibly and moderately, trace markers are absolutely
amazing for diagnostic and performance work. You can attach to them in
production builds even without debuginfo and observe behaviour that
would otherwise be impossible without complex fiddling around with
multi-process gdb. This sort of capability is going to become more and
more important as we become more parallel and can rely less on
single-process gdb-style tracing. Diagnostics using logging is a blunt
hammer that does not scale and is rarely viable for intermittent or
hard to reproduce production issues.

I will always favour "native postgres" solutions where feasible - for
example, I want to add some basic reorder buffer state to struct
WalSnd and the pg_stat_replication views, and I want to expose some
means to get a walsender to report details of its ReorderBuffer state.

But some things are not very amenable to that. Either the runtime
costs of having the facility available are too high (we're never going
to have a pg_catalog.pg_lwlocks for good reasons) or it's too
complicated to write and maintain. Especially where info is needed
from many processes.

That's where trace markers become valuable. But right now what we have
in Pg is worthless, and it seems almost nobody knows how to use the
tools. I want to change that, but it's a bit of a catch-22. Making
tooling easy to use benefits enormously from some more stable
interfaces that don't break so much version-to-version, don't require
deep code knowledge to understand, and work without debuginfo on
production builds. But without some "oh, wow" tools, it's hard to
convince anyone we should invest any effort in improving the
infrastructure...

It's possible I'm beating a dead horse here. I find these tools
amazingly useful, but they're currently made 10x harder than they need
to be by the complexities of directly poking at postgres's complex and
version-specific internal structure using debuginfo based probing.
Things that should be simple, like determining the timings of a txn
from xid assignment -> 2pc prepare -> 2pc commit prepared .... really
aren't. Markers that report xid assignment, commit, rollback, etc,
with the associated topxid would help immensely.

#24Andres Freund
andres@anarazel.de
In reply to: Craig Ringer (#23)
Re: [PATCH] Identify LWLocks in tracepoints

Hi,

On 2021-04-13 10:34:18 +0800, Craig Ringer wrote:

But it's near trivial to add that.

Really?

Yes.

Each backend can have different tranche IDs (right?)

No, they have to be the same in each. Note how the tranche ID is part of
struct LWLock. Which is why LWLockNewTrancheId() has to acquire a lock
etc.

But if I'm looking for performance issues caused by excessive LWLock
contention or waits, LWLocks held too long, [...] or the like, it's
something I want to capture across the whole postgres instance.

Sure.

Although I still don't really buy that static tracepoints are the best
way to measure this kind of thing, given the delay introducing them and
the cost of having them around. I think I pointed out
/messages/by-id/20200813004233.hdsdfvufqrbdwzgr@alap3.anarazel.de
before.

LWLock lock-ordering deadlocks

This seems unrelated to tracepoints to me.

and there's no way to know what a given non-built-in tranche ID means
for any given backend without accessing backend-specific in-memory
state. Including for non-user-accessible backends like bgworkers and
auxprocs, where it's not possible to just query the state from a view
directly.

The only per-backend part is that some backends might not know the
tranche name for dynamically registered tranches where the
LWLockRegisterTranche() hasn't been executed in a backend. Which should
pretty much never be an aux process or such. And even for bgworkers it
seems like a pretty rare thing, because those need to be started by
something...

It might be worth proposing a shared hashtable with tranch names and
jut reserving enough space for ~hundred entries...

And you can always build without `--enable-dtrace` and ... just not care.

Practically speaking, distributions enable it, which then incurs the
cost for everyone.

Take a look at "sudo perf list".

sched:sched_kthread_work_execute_end [Tracepoint event]
sched:sched_kthread_work_execute_start [Tracepoint event]
...
sched:sched_migrate_task [Tracepoint event]
...
sched:sched_process_exec [Tracepoint event]
...
sched:sched_process_fork [Tracepoint event]
...
sched:sched_stat_iowait [Tracepoint event]
...
sched:sched_stat_sleep [Tracepoint event]
sched:sched_stat_wait [Tracepoint event]
...
sched:sched_switch [Tracepoint event]
...
sched:sched_wakeup [Tracepoint event]

The kernel is packed with extremely useful trace events, and for very
good reasons. Some on very hot paths.

IIRC those aren't really comparable - the kernel actually does modify
the executable code to replace the tracepoints with nops.

Greetings,

Andres Freund

#25Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Andres Freund (#24)
Re: [PATCH] Identify LWLocks in tracepoints

On Tue, 13 Apr 2021 at 11:06, Andres Freund <andres@anarazel.de> wrote:

Each backend can have different tranche IDs (right?)

No, they have to be the same in each. Note how the tranche ID is part of
struct LWLock. Which is why LWLockNewTrancheId() has to acquire a lock
etc.

Ah. I misunderstood that at some point.

That makes it potentially more sensible to skip reporting tranche
names. Not great, because it's much less convenient to work with trace
data full of internal ordinals that must be re-mapped in
post-processing. But I'm generally OK with deferring runtime costs to
tooling rather than the db itself so long as doing so is moderately
practical.

In this case, I think we could likely get away with removing the
tranche names from the tracepoints if we instead emit a trace event on
each dynamic tranche registration that reports the tranche id -> name
mapping. It still sucks for tools, since they have to scrape up the
static tranche registrations from somewhere else, but ... it'd be
tolerable.

The kernel is packed with extremely useful trace events, and for very
good reasons. Some on very hot paths.

IIRC those aren't really comparable - the kernel actually does modify
the executable code to replace the tracepoints with nops.

Same with userspace static trace markers (USDTs).

A followup mail will contain a testcase and samples to demonstrate this.

Although I still don't really buy that static tracepoints are the best
way to measure this kind of thing, given the delay introducing them and
the cost of having them around. I think I pointed out
/messages/by-id/20200813004233.hdsdfvufqrbdwzgr@alap3.anarazel.de
before.

Yeah. Semaphores are something hot enough that I'd hesitate to touch them.

LWLock lock-ordering deadlocks

This seems unrelated to tracepoints to me.

If I can observe which locks are acquired in which order by each proc,
I can then detect excessive waits and report the stack of held locks
of both procs and their order of acquisition.

Since LWLocks shmem state doesn't AFAICS track any information on the
lock holder(s) I don't see a way to do this in-process.

It's not vital, it's just one of the use cases I have in mind. I
suspect that any case where such deadlocks are possible represents a
misuse of LWLocks anyway.

and there's no way to know what a given non-built-in tranche ID means
for any given backend without accessing backend-specific in-memory
state. Including for non-user-accessible backends like bgworkers and
auxprocs, where it's not possible to just query the state from a view
directly.

The only per-backend part is that some backends might not know the
tranche name for dynamically registered tranches where the
LWLockRegisterTranche() hasn't been executed in a backend. Which should
pretty much never be an aux process or such. And even for bgworkers it
seems like a pretty rare thing, because those need to be started by
something...

It might be worth proposing a shared hashtable with tranch names and
jut reserving enough space for ~hundred entries...

Yeah, that'd probably work and be cheap enough not to really matter.
Might even save us a chunk of memory by not turning CoW pages into
private mappings for each backend during registration.

And you can always build without `--enable-dtrace` and ... just not care.

Practically speaking, distributions enable it, which then incurs the
cost for everyone.

Yep. That's part of why I was so surprised to notice the
GetLWTrancheName() function call in LWLock tracepoints. Nearly
anywhere else it wouldn't matter at all, but LWLocks are hot enough
that it just might matter for the no-wait fastpath.

#26Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Craig Ringer (#25)
3 attachment(s)
Re: [PATCH] Identify LWLocks in tracepoints

On Tue, 13 Apr 2021 at 21:05, Craig Ringer
<craig.ringer@enterprisedb.com> wrote:

On Tue, 13 Apr 2021 at 11:06, Andres Freund <andres@anarazel.de> wrote:

IIRC those aren't really comparable - the kernel actually does modify
the executable code to replace the tracepoints with nops.

Same with userspace static trace markers (USDTs).

A followup mail will contain a testcase and samples to demonstrate this.

Demo follows, with source attached too. gcc 10.2 compiling with -O2,
using dtrace and <sys/sdt.h> from systemtap 4.4 .

Trivial empty function definition:

__attribute__((noinline))
void
no_args(void)
{
SDT_NOOP_NO_ARGS();
}

Disassembly when SDT_NOOP_NO_ARGS is defined as

#define SDT_NOOP_NO_ARGS()

is:

<no_args>:
retq

When built with a probes.d definition processed by the dtrace script
instead, the disassembly becomes:

<no_args>:
nop
retq

So ... yup, it's a nop.

Now, if we introduce semaphores that changes.

__attribute__((noinline))
void
no_args(void)
{
if (SDT_NOOP_NO_ARGS_ENABLED())
SDT_NOOP_NO_ARGS();
}

disassembles to:

<no_args>:
cmpw $0x0,0x2ec4(%rip) # <sdt_noop_no_args_semaphore>
jne <no_args+0x10>
retq
nopl 0x0(%rax,%rax,1)
nop
retq

so the semaphore test is actually quite harmful and wasteful in this
case. That's not surprising since this SDT is a simple marker point.
But what if we supply arguments to it? It turns out that the
disassembly is the same if args are passed, whether locals or globals,
including globals assigned based on program input that can't be
determined at compile time. Still just a nop.

If I pass a function call as an argument expression to a probe, e.g.

__attribute__((noinline)) static int
compute_probe_argument(void)
{
return 100;
}

void
with_computed_arg(void)
{
SDT_NOOP_WITH_COMPUTED_ARG(compute_probe_argument());
}

then the disassembly with SDTs is:

<with_computed_arg>:
callq <compute_probe_argument>
nop
retq

so the function call isn't elided even if it's unused. That's somewhat
expected. The same will be true if the arguments to a probe require
pointer chasing or non-trivial marshalling.

If a semaphore guard is added this becomes:

<with_computed_arg>:
cmpw $0x0,0x2e2e(%rip) # <sdt_noop_with_computed_arg_semaphore>
jne <with_computed_arg+0x10>
retq
nopl 0x0(%rax,%rax,1)
callq <compute_probe_argument>
nop
retq

so now the call to compute_probe_argument() is skipped unless the
probe is enabled, but the function is longer and requires a test and
jump.

If I dummy up a function that does some pointer chasing, without
semaphores I get

<with_pointer_chasing>:
mov (%rdi),%rax
mov (%rax),%rax
mov (%rax),%rax
nop
retq

so the arguments are marshalled then ignored.

with semaphores I get:

<with_pointer_chasing>:
cmpw $0x0,0x2d90(%rip) # <sdt_noop_with_pointer_chasing_semaphore>
jne <with_pointer_chasing+0x10>
retq
nopl 0x0(%rax,%rax,1)
mov (%rdi),%rax
mov (%rax),%rax
mov (%rax),%rax
nop
retq

so again the probe's argument marshalling is inline in the function
body, but at the end, and skipped over.

Findings:

* A probe without arguments or with simple arguments is just a 'nop' instruction
* Probes that require function calls, pointer chasing, other
expression evaluation etc may impose a fixed cost to collect up
arguments even if the probe is disabled.
* SDT semaphores can avoid that cost but add a branch, so should
probably be avoided unless preparing probe arguments is likely to be
expensive.

Hideous but effective demo code attached.

Attachments:

sdt_noop_probes.dtext/x-dsrc; charset=US-ASCII; name=sdt_noop_probes.dDownload
Makefileapplication/octet-stream; name=MakefileDownload
sdt_noop.ctext/x-csrc; charset=US-ASCII; name=sdt_noop.cDownload
#27Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Craig Ringer (#26)
Re: [PATCH] Identify LWLocks in tracepoints

On Tue, 13 Apr 2021 at 21:40, Craig Ringer
<craig.ringer@enterprisedb.com> wrote:

Findings:

* A probe without arguments or with simple arguments is just a 'nop' instruction
* Probes that require function calls, pointer chasing, other
expression evaluation etc may impose a fixed cost to collect up
arguments even if the probe is disabled.
* SDT semaphores can avoid that cost but add a branch, so should
probably be avoided unless preparing probe arguments is likely to be
expensive.

Back to the topic directly at hand.

Attached a disassembly of what LWLockAcquire looks like now on my
current build of git master @ 5fe83adad9efd5e3929f0465b44e786dc23c7b55
. This is an --enable-debug --enable-cassert --enable-dtrace build
with -Og -ggdb3.

The three tracepoints in it are all of the form:

movzwl 0x0(%r13),%edi
call 0x801c49 <GetLWTrancheName>
nop

so it's clear we're doing redundant calls to GetLWTrancheName(), as
expected. Not ideal.

Now if I patch it to add the _ENABLED() guards on all the tracepoints,
the probes look like this:

0x0000000000803176 <+200>: cmpw $0x0,0x462da8(%rip) #
0xc65f26 <postgresql_lwlock__acquire_semaphore>
0x000000000080317e <+208>: jne 0x80328b <LWLockAcquire+477>
.... other interleaved code ...
0x000000000080328b <+477>: movzwl 0x0(%r13),%edi
0x0000000000803290 <+482>: call 0x801c49 <GetLWTrancheName>
0x0000000000803295 <+487>: nop
0x0000000000803296 <+488>: jmp 0x803184 <LWLockAcquire+214>

so we avoid the GetLWTrancheName() call at the cost of a test and
possible branch, and a small expansion in total function size. Without
the semaphores, LWLockAcquire is 463 bytes. With them, it's 524 bytes,
which is nothing to sneeze at for code that doesn't do anything
99.999% of the time, but we avoid a bunch of GetLWTrancheName() calls.

If I instead replace T_NAME() with NULL for all tracepoints in
LWLockAcquire, the disassembly shows that the tracepoints now become a
simple

0x0000000000803176 <+200>: nop

which is pretty hard to be concerned about.

So at the very least we should be calling GetLWTrancheName() once at
the start of the function if built with dtrace support and remembering
the value, instead of calling it for each tracepoint.

And omitting the tranche name looks like it might be sensible for the
LWLock code. In most places it won't matter, but LWLocks are hot
enough that it possibly might. A simple pg_regress run hits
LWLockAcquire 25 million times, so that's 75 million calls to
GetLWTrancheName().

#28Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#24)
Re: [PATCH] Identify LWLocks in tracepoints

On Mon, Apr 12, 2021 at 11:06 PM Andres Freund <andres@anarazel.de> wrote:

No, they have to be the same in each. Note how the tranche ID is part of
struct LWLock. Which is why LWLockNewTrancheId() has to acquire a lock
etc.

More precisely, if a tranche ID is defined in multiple backends, it
needs to be defined the same way in all of them. But it is possible to
have an extension loaded into some backends and not others and have it
define a tranche ID that other backends know nothing about.

Another point to note is that, originally, I had an idea that each
tranche of lwlocks was situation in a single array somewhere in
memory. Perhaps that was an array of something else, like buffer
descriptors, and the lwlocks were just one element of the struct, or
maybe it was an array specifically of LWLocks, but one way or the
other, there was definitely one array that had all the LWLocks from
that tranche in it. So before the commit in question --
3761fe3c20bb040b15f0e8da58d824631da00caa -- T_ID() used to compute an
offset for a lock within the tranche that was supposed to uniquely
identify the lock. However, the whole idea of an array per tranche
turns out to be broken by design.

Consider parallel query. You could, perhaps, arrange for all the
LWLocks that a particular query needs to be in one tranche. And that's
all fine. But what if there are multiple parallel contexts in
existence at the same time? I think right now that may be impossible
as a practical matter, since for example an SQL function that is
called by a parallel query is supposed to run any SQL statements
inside of it without parallelism. But, that's basically a policy
decision. There's nothing in the parallel context machinery itself
which prevents multiple parallel contexts from being active at the
same time. And if that happens, then you'd have multiple arrays with
the same tranche ID, so how do you identify the locks then? The
pre-3761fe3c20bb040b15f0e8da58d824631da00caa data structure doesn't
work because it has only one place to store an array base, but having
multiple places to store an array base doesn't fix it either because
now you've just given the same identifier to multiple locks.

You could maybe fix it by putting a limit on how many parallel
contexts can be open at the same time, and then having N copies of
each parallelism-related tranche. But that seems ugly and messy and a
burden on extension authors and not really what anybody wants.

You could try to identify locks by pointer addresses, but that's got
security hazards and the addreses aren't portable across all the
backends involved in the parallel query because of how DSM works, so
it's not really that helpful in terms of matching stuff up.

You could identify every lock by a tranche ID + an array offset + a
"tranche instance ID". But where would you store the tranche instance
ID to make it readily accessible, other than in the lock itself?
Andres wasn't thrilled about using even 2 bytes to identify the
LWLock, so he'll probably like having more bytes in there for that
kind of thing even less. And to be honest I wouldn't blame him. We
only need 12 bytes to implement the lock itself -- we can't justify
having more than a couple of additional bytes for debugging purposes.

On a broader level, I agree that being able to find out what the
system is doing is really important. But I'm also not entirely
convinced that having really fine-grained information here to
distinguish between one lock and another is the way to get there.
Personally, I've never run into a problem where I really needed to
know anything more than the tranche name. Like, I've seen problems for
example we can see that there's a lot of contention on
SubtransSLRULock, or there's problems with WALInsertLock. But I can't
really see why I'd need to know which WALInsertLock was experiencing
contention. If we were speaking of buffer content locks, I suppose I
can imagine wanting more details, but it's not really the buffer
number I'd want to know. I'd want to know the database OID, the
relfilenode, the fork number, and the block number. You can argue that
we should just expose the buffer number and let the user sort out the
rest with dtrace/systemtap magic, but that makes it useless in
practice to an awful lot of people, including me.

--
Robert Haas
EDB: http://www.enterprisedb.com

#29Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#28)
Re: [PATCH] Identify LWLocks in tracepoints

Hi,

On 2021-04-13 14:25:23 -0400, Robert Haas wrote:

On Mon, Apr 12, 2021 at 11:06 PM Andres Freund <andres@anarazel.de> wrote:
You could identify every lock by a tranche ID + an array offset + a
"tranche instance ID". But where would you store the tranche instance
ID to make it readily accessible, other than in the lock itself?
Andres wasn't thrilled about using even 2 bytes to identify the
LWLock, so he'll probably like having more bytes in there for that
kind of thing even less.

I still don't like the two bytes, fwiw ;). Especially because it's 4
bytes due to padding right now.

I'd like to move the LWLock->waiters list to outside of the lwlock
itself - at most TotalProcs LWLocks can be waited for, so we don't need
millions of empty proclist_heads. That way we could also remove the
proclist indirection - which shows up a fair bit in contended workloads.

And if we had a separate "lwlocks being waited for" structure, we could
also add more information to it if we wanted to...

The difficulty of course is having space to indicate which of these
"waiting for" lists are being used - there's not enough space in ->state
right now to represent that. Two possibile approaches:

- We could make it work if we restricted MAX_BACKENDS to be 2**14 - but
while I personally think that's a sane upper limit, I already had a
hard time getting consensus to lower the limit to 2^18-1.

- Use a 64bit integer. Then we can easily fit MAX_BACKENDS lockers, as
well as an offset to one of MAX_BACKENDS "wait lists" into LWLock.

It's not so much that I want to lower the overall memory usage (although
it doesn't hurt). It's more about being able to fit more data into one
cacheline together with the lwlock. E.g. being able to fit more into
BufferDesc would be very useful.

A secondary benefit of such an approach would be that it it makes it a
lot easier to add efficient adaptive spinning on contended locks. I did
experiment with that, and there's some considerable potential for
performance benefits there. But for it to scale well we need something
similar to "mcs locks", to avoid causing too much contention. And that
pretty much requires some separate space to store wait information
anyway.

With an 8 bytes state we probably could also stash the tranche inside
that...

On a broader level, I agree that being able to find out what the
system is doing is really important. But I'm also not entirely
convinced that having really fine-grained information here to
distinguish between one lock and another is the way to get there.
Personally, I've never run into a problem where I really needed to
know anything more than the tranche name.

I think it's quite useful for relatively simple things like analyzing
the total amount of time spent in individual locks, without incuring
much overhead when not doing so (for which you need to identify
individual locks, otherwise your end - start time is going to be
meaningless). And, slightly more advanced, for analyzing what the stack
was when the lock was released - which then allows you to see what work
you're blocked on, something I found hard to figure out otherwise.

I found that that's mostly quite doable with dynamic probes though.

Like, I've seen problems for example we can see that there's a lot of
contention on SubtransSLRULock, or there's problems with
WALInsertLock. But I can't really see why I'd need to know which
WALInsertLock was experiencing contention.

Well, but you might want to know what the task blocking you was
doing. What to optimize might differ if the other task is e.g. a log
switch (which acquires all insert locks), than if it's WAL writes by
VACUUM.

If we were speaking of buffer content locks, I suppose I can imagine
wanting more details, but it's not really the buffer number I'd want
to know. I'd want to know the database OID, the relfilenode, the fork
number, and the block number. You can argue that we should just expose
the buffer number and let the user sort out the rest with
dtrace/systemtap magic, but that makes it useless in practice to an
awful lot of people, including me.

I have wondered if we ought to put some utilities for that in contrib or
such. It's a lot easier to address something new with a decent starting
point...

Greetings,

Andres Freund

#30Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Andres Freund (#29)
Re: [PATCH] Identify LWLocks in tracepoints

On Wed, 14 Apr 2021 at 04:46, Andres Freund <andres@anarazel.de> wrote:

On 2021-04-13 14:25:23 -0400, Robert Haas wrote:

On Mon, Apr 12, 2021 at 11:06 PM Andres Freund <andres@anarazel.de> wrote:
You could identify every lock by a tranche ID + an array offset + a
"tranche instance ID". But where would you store the tranche instance
ID to make it readily accessible, other than in the lock itself?
Andres wasn't thrilled about using even 2 bytes to identify the
LWLock, so he'll probably like having more bytes in there for that
kind of thing even less.

I still don't like the two bytes, fwiw ;). Especially because it's 4
bytes due to padding right now.

Aha, did I hear you say "there are two free bytes for me to shove
something marginally useful and irrelevant into"?

(*grin*)

I'd like to move the LWLock->waiters list to outside of the lwlock
itself - at most TotalProcs LWLocks can be waited for, so we don't need
millions of empty proclist_heads. That way we could also remove the
proclist indirection - which shows up a fair bit in contended workloads.

And if we had a separate "lwlocks being waited for" structure, we could
also add more information to it if we wanted to...

Having the ability to observe LWLock waiters would be nice. But you're
right to constantly emphasise that LWLocks need to be very slim. We
don't want to turn them into near-heavyweight locks by saddling them
with overhead that's not strictly necessary. A simple pg_regress run
(with cassert, admittedly) takes 25,000,000 LWLocks and does 24,000
LWLock waits and 130,000 condvar waits. All in less than a minute.
OTOH, once someone's waiting we don't care nearly as much about
bookkeeping cost, it's the un-contested fast paths that're most
critical.

- We could make it work if we restricted MAX_BACKENDS to be 2**14 - but
while I personally think that's a sane upper limit, I already had a
hard time getting consensus to lower the limit to 2^18-1.

16384 backends is totally fine in sane real world deployments. But
it'll probably upset marketing people when OtherDatabaseVendor starts
shouting that they support 14 million connections, and postgres only
has 16k. Sigh.

The real answer here in the long term probably needs to be decoupling
of executors from connection state inside postgres. But while we're on
that topic, how about we convert the entire codebase to Rust while
riding on a flying rainbow unicorn? We're stuck with the 1:1
connection to executor mapping for the foreseeable future.

- Use a 64bit integer. Then we can easily fit MAX_BACKENDS lockers, as
well as an offset to one of MAX_BACKENDS "wait lists" into LWLock.

You know much more than me about the possible impacts of that on
layout and caching, but I gather that it's probably undesirable to
make LWLocks any bigger.

I think it's quite useful for relatively simple things like analyzing
the total amount of time spent in individual locks, without incuring
much overhead when not doing so (for which you need to identify
individual locks, otherwise your end - start time is going to be
meaningless).

Yep.

That's why the removal of the lock offset is a bit frustrating,
because you can't identify an LWLock instance-wide by LWLock* due to
the possibility of different per-backend DSM base-address mappings.
Well, and ASLR on EXEC_BACKEND systems, but who cares about those?

The removal was for good reasons though. And it only affects LWLocks
in DSM, for everything else the LWLock* is good enough. If identifying
LWLocks in DSM ever matters enough to bother to solve that problem, we
can emit trace events on DSM mapping attach in each backend, and trace
tools can do the work to track which LWLocks are in DSM and convert
their addresses to a reference base address. Pg shouldn't have to pay
the price for that unless it's something a lot of people need.

And, slightly more advanced, for analyzing what the stack
was when the lock was released - which then allows you to see what work
you're blocked on, something I found hard to figure out otherwise.

I found that that's mostly quite doable with dynamic probes though.

Yeah, it is.

That's part of why my patchset here doesn't try to do a lot to LWLock
tracepoints - I didn't think it was necessary to add a lot.
The LWLock code is fairly stable, not usually something you have to
worry about in production unless you're debugging badly behaved
extensions, and usually somewhat probe-able with DWARF based dynamic
probes. However, the way the wait-loop and fast-path are in the same
function is a serious pain for dynamic probing; you can't tell the
difference between a fast-path acquire and an acquire after a wait
without using probes on function+offset or probing by source line.
Both those are fine for dev work but useless in tool/library scripts.

I almost wonder if we should test out moving the LWLock wait-loops out
of LWLockAcquire(), LWLockAcquireOrWait() and LWLockWaitForVar()
anyway, so the hot parts of the function are smaller. That'd make
dynamic probing more convenient as a pleasant side effect. I imagine
you must've tried this, benchmarked and profiled it, though, and found
it to be a net loss, otherwise you surely would've done it as part of
your various (awesome) performance work.

Anyway, there are some other areas of postgres that are ridiculously
painful to instrument with dynamic probes, especially in a somewhat
version- and build-independent way. Tracking txn commit and abort
(including 2PC and normal xacts, with capture of commit LSNs) is just
painful with dynamic probing for example, and is one of my top
priority areas to get some decent tracepoints for - the current txn
management tracepoints are utterly worthless. But LWLocks are mostly
fine, the only really big piece missing is a tracepoint fired exactly
once when a lock is released by any release path.

Like, I've seen problems for example we can see that there's a lot of
contention on SubtransSLRULock, or there's problems with
WALInsertLock. But I can't really see why I'd need to know which
WALInsertLock was experiencing contention.

Well, but you might want to know what the task blocking you was
doing. What to optimize might differ if the other task is e.g. a log
switch (which acquires all insert locks), than if it's WAL writes by
VACUUM.

That sort of thing is why I've been interested in IDing the LWLock.
That, and I work with extension code that probably abuses LWLocks a
bit, but that's not a problem core postgres should have to care about.

If we were speaking of buffer content locks, I suppose I can imagine
wanting more details, but it's not really the buffer number I'd want
to know. I'd want to know the database OID, the relfilenode, the fork
number, and the block number. You can argue that we should just expose
the buffer number and let the user sort out the rest with
dtrace/systemtap magic, but that makes it useless in practice to an
awful lot of people, including me.

I have wondered if we ought to put some utilities for that in contrib or
such. It's a lot easier to address something new with a decent starting
point...

Long term that's exactly what I want to do.

I wrote some with systemtap, but it's since become clear to me that
systemtap isn't going to get enough people on board. Setup is a
hassle. So I'm trying to pivot over to bpf tools now, with the
intention of getting together a library of canned probes and example
scripts to help people get started.

I've written systemtap scripts that can track a transaction from
localxid allocation through real xid allocation, commit (or 2pc
prepare and commit prepared), logical decoding, reorder buffering,
output plugin processing, receipt by a logical downstream, downstream
xid assignment, downstream commit and replorigin advance,
downstream->upstream feedback, upstream slot advance, and upstream
confirmed flush/catalog_xmin advance. Whole-lifecycle tracking with
timing of each phase, across multiple processes and two postgres
instances. For now the two postgres instances must be on the same
host, but that could be dealt with too. The script reports the
application name and pid of the upstream session, the upstream
localxid, the upstream xid, upstream commit lsn, the downstream xid
and the downstream commit lsn as it goes, and follows associations to
track the transaction through its lifecycle. (The current script is
written for BDR and pglogical, not in-core logical decoding, but the
principles are the same).

The problem is that scripts like this are just too fragile right now.
Changes across Pg versions break the dynamic function probes they use,
though that can be adapted to somewhat. The bigger problem is the
number of places I have to insert statement (function+offset) probes,
which are just too fragile to make these sorts of scripts generally
useful. I have to fix them whenever I want to use them, so there's not
much point trying to get people to use them.

But it's hard to convince people of the value of static tracepoints
that would make this sort of thing so much easier to do in a more
stable manner when they can't easily see hands-on examples of what's
possible. There's no "wow" factor. So I need to address the worst of
the difficult-to-probe sites and start sharing some examples that use
them.

I thought this would be low-hanging fruit to start with. Whoops!

#31Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Robert Haas (#28)
Re: [PATCH] Identify LWLocks in tracepoints

On Wed, 14 Apr 2021 at 02:25, Robert Haas <robertmhaas@gmail.com> wrote:

So before the commit in question --
3761fe3c20bb040b15f0e8da58d824631da00caa -- T_ID() used to compute an
offset for a lock within the tranche that was supposed to uniquely
identify the lock. However, the whole idea of an array per tranche
turns out to be broken by design.

Yeah, I understand that.

I'd really love it if a committer could add an explanatory comment or
two in the area though. I'm happy to draft a comment patch if anyone
agrees my suggestion is sensible. The key things I needed to know when
studying the code were:

* A LWLock* is always part of a tranche, but locks within a given
tranche are not necessarily co-located in memory, cross referenced or
associated in any way.
* A LWLock tranche does not track how many LWLocks are in the tranche
or where they are in memory. It only groups up LWLocks into categories
and maps the tranche ID to a name.
* Not all LWLocks are part of the main LWLock array; others can be
embedded in shmem structs elsewhere, including in DSM segments.
* LWLocks in DSM segments may not have the same address between
different backends, because the DSM base address can vary, so a
LWLock* cannot be reliably compared between backends unless you know
it's in the main LWLock array or in static shmem.

Having that info in lwlock.c near the tranche management code or the
tranche and main lwlock arrays would've been very handy.

You could try to identify locks by pointer addresses, but that's got
security hazards and the addreses aren't portable across all the
backends involved in the parallel query because of how DSM works, so
it's not really that helpful in terms of matching stuff up.

What I'm doing now is identifying them by LWLock* across backends. I
keep track of DSM segment mappings in each backend inside the trace
script and I relocate LWLock* pointers known to be inside DSM segments
relative to a dummy base address so they're equal across backends.

It's a real pain, but it works. The main downside is that the trace
script has to observe the DSM attach; if it's started once a backend
already has the DSM segment attached, it has no idea the LWLock is in
a DSM segment or how to remap it. But that's not a serious issue.

On a broader level, I agree that being able to find out what the
system is doing is really important. But I'm also not entirely
convinced that having really fine-grained information here to
distinguish between one lock and another is the way to get there.

At the start of this thread I would've disagreed with you.

But yeah, you and Andres are right, because the costs outweigh the
benefits here.

I'm actually inclined to revise the patch I sent in order to *remove*
the LWLock name from the tracepoint argument. At least for the
fast-path tracepoints on start-of-acquire and end-of-acquire. I think
it's probably OK to report it in the lock wait tracepoints, which is
where it's most important to have anyway. So the tracepoint will
always report the LWLock* and tranche ID, but it won't report the
tranche name for the fast-path. I'll add trace events for tranche ID
registration, so trace tools can either remember the tranche ID->name
mappings from there, or capture them from lock wait events and
remember them.

Reasonable? That way we retain the most important trace functionality,
but we reduce the overheads.

#32Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Craig Ringer (#31)
Re: [PATCH] Identify LWLocks in tracepoints

On Wed, 14 Apr 2021 at 10:41, Craig Ringer
<craig.ringer@enterprisedb.com> wrote:

On Wed, 14 Apr 2021 at 02:25, Robert Haas <robertmhaas@gmail.com> wrote:

You could try to identify locks by pointer addresses, but that's got
security hazards and the addreses aren't portable across all the
backends involved in the parallel query because of how DSM works, so
it's not really that helpful in terms of matching stuff up.

What I'm doing now is identifying them by LWLock* across backends. I
keep track of DSM segment mappings in each backend inside the trace
script and I relocate LWLock* pointers known to be inside DSM segments
relative to a dummy base address so they're equal across backends.

BTW, one of the reasons I did this was to try to identify BDR and
pglogical code that blocks or sleeps while holding a LWLock. I got
stuck on that for other reasons, so it didn't go anywhere, but those
issues are now resolved so I should probably return to it at some
point.

It'd be a nice thing to be able to run on postgres itself too.

#33Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Craig Ringer (#20)
Re: [PATCH] Identify LWLocks in tracepoints

On 12.04.21 07:46, Craig Ringer wrote:

To use systemtap semaphores (the _ENABLED macros) you need to run

dtrace

-g to generate a probes.o then link that into postgres.

I don't think we do that. I'll double check soon.

We do that.  (It's -G.)

Huh. I could've sworn we didn't. My mistake, it's there in
src/backend/Makefile .

In that case I'll amend the patch to use semaphore guards.

This whole thread is now obviously moved to consideration for PG15, but
I did add an open item about this particular issue
(https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items, search for
"dtrace"). So if you could produce a separate patch that adds the
_ENABLED guards targeting PG14 (and PG13), that would be helpful.

#34Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#29)
Re: [PATCH] Identify LWLocks in tracepoints

On Tue, Apr 13, 2021 at 4:46 PM Andres Freund <andres@anarazel.de> wrote:

I still don't like the two bytes, fwiw ;). Especially because it's 4
bytes due to padding right now.

I'm not surprised by that disclosure. But I think it's entirely worth
it. Making wait states visible in pg_stat_activity isn't the most
useful thing I've ever done to PostgreSQL, but it's far from the least
useful. If we can get the same benefit with less overhead, that's
great.

I'd like to move the LWLock->waiters list to outside of the lwlock
itself - at most TotalProcs LWLocks can be waited for, so we don't need
millions of empty proclist_heads. That way we could also remove the
proclist indirection - which shows up a fair bit in contended workloads.

And if we had a separate "lwlocks being waited for" structure, we could
also add more information to it if we wanted to...

The difficulty of course is having space to indicate which of these
"waiting for" lists are being used - there's not enough space in ->state
right now to represent that. Two possibile approaches:

- We could make it work if we restricted MAX_BACKENDS to be 2**14 - but
while I personally think that's a sane upper limit, I already had a
hard time getting consensus to lower the limit to 2^18-1.

- Use a 64bit integer. Then we can easily fit MAX_BACKENDS lockers, as
well as an offset to one of MAX_BACKENDS "wait lists" into LWLock.

I'd rather not further reduce MAX_BACKENDS. I still think some day
we're going to want to make that bigger again. Maybe not for a while,
admittedly. But, do you need to fit this into "state"? If you just
replaced "waiters" with a 32-bit integer, you'd save 4 bytes and have
bits left over (and maybe restrict the tranche ID to 2^14 and squeeze
that in too, as you mention).

--
Robert Haas
EDB: http://www.enterprisedb.com

#35Robert Haas
robertmhaas@gmail.com
In reply to: Craig Ringer (#31)
Re: [PATCH] Identify LWLocks in tracepoints

On Tue, Apr 13, 2021 at 10:42 PM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:

I'd really love it if a committer could add an explanatory comment or
two in the area though. I'm happy to draft a comment patch if anyone
agrees my suggestion is sensible. The key things I needed to know when
studying the code were:

* A LWLock* is always part of a tranche, but locks within a given
tranche are not necessarily co-located in memory, cross referenced or
associated in any way.
* A LWLock tranche does not track how many LWLocks are in the tranche
or where they are in memory. It only groups up LWLocks into categories
and maps the tranche ID to a name.
* Not all LWLocks are part of the main LWLock array; others can be
embedded in shmem structs elsewhere, including in DSM segments.
* LWLocks in DSM segments may not have the same address between
different backends, because the DSM base address can vary, so a
LWLock* cannot be reliably compared between backends unless you know
it's in the main LWLock array or in static shmem.

Having that info in lwlock.c near the tranche management code or the
tranche and main lwlock arrays would've been very handy.

I'm willing to review a comment patch along those lines.

I'm actually inclined to revise the patch I sent in order to *remove*
the LWLock name from the tracepoint argument. At least for the
fast-path tracepoints on start-of-acquire and end-of-acquire. I think
it's probably OK to report it in the lock wait tracepoints, which is
where it's most important to have anyway. So the tracepoint will
always report the LWLock* and tranche ID, but it won't report the
tranche name for the fast-path. I'll add trace events for tranche ID
registration, so trace tools can either remember the tranche ID->name
mappings from there, or capture them from lock wait events and
remember them.

Reasonable? That way we retain the most important trace functionality,
but we reduce the overheads.

Reducing the overheads is good, but I have no opinion on what's
important for people doing tracing, because I am not one of those
people.

--
Robert Haas
EDB: http://www.enterprisedb.com

#36Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Peter Eisentraut (#33)
1 attachment(s)
Re: [PATCH] Identify LWLocks in tracepoints

On 14.04.21 15:20, Peter Eisentraut wrote:

On 12.04.21 07:46, Craig Ringer wrote:

     > To use systemtap semaphores (the _ENABLED macros) you need to run
    dtrace
     > -g to generate a probes.o then link that into postgres.
     >
     > I don't think we do that. I'll double check soon.

    We do that.  (It's -G.)

Huh. I could've sworn we didn't. My mistake, it's there in
src/backend/Makefile .

In that case I'll amend the patch to use semaphore guards.

This whole thread is now obviously moved to consideration for PG15, but
I did add an open item about this particular issue
(https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items, search for
"dtrace").  So if you could produce a separate patch that adds the
_ENABLED guards targeting PG14 (and PG13), that would be helpful.

Here is a proposed patch for this.

Attachments:

0001-Prevent-lwlock-dtrace-probes-from-unnecessary-work.patchtext/plain; charset=UTF-8; name=0001-Prevent-lwlock-dtrace-probes-from-unnecessary-work.patch; x-mac-creator=0; x-mac-type=0Download
From cae610f0203ba485770e4d274378f3ab198dcc27 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 29 Apr 2021 09:26:40 +0200
Subject: [PATCH] Prevent lwlock dtrace probes from unnecessary work

If dtrace is compiled in but disabled, the lwlock dtrace probes still
evaluate their arguments.  Since PostgreSQL 13, T_NAME(lock) does
nontrivial work, so it should be avoided if not needed.  To fix, make
these calls conditional on the *_ENABLED() macro corresponding to each
probe.

Reported-by: Craig Ringer <craig.ringer@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/CAGRY4nwxKUS_RvXFW-ugrZBYxPFFM5kjwKT5O+0+Stuga5b4+Q@mail.gmail.com
---
 src/backend/storage/lmgr/lwlock.c | 36 ++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 975d547f34..55b9d7970e 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -1318,7 +1318,8 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 #endif
 
 		LWLockReportWaitStart(lock);
-		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
+		if (TRACE_POSTGRESQL_LWLOCK_WAIT_START_ENABLED())
+			TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
 
 		for (;;)
 		{
@@ -1340,7 +1341,8 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 		}
 #endif
 
-		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+		if (TRACE_POSTGRESQL_LWLOCK_WAIT_DONE_ENABLED())
+			TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
 		LWLockReportWaitEnd();
 
 		LOG_LWDEBUG("LWLockAcquire", lock, "awakened");
@@ -1349,7 +1351,8 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 		result = false;
 	}
 
-	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode);
+	if (TRACE_POSTGRESQL_LWLOCK_ACQUIRE_ENABLED())
+		TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode);
 
 	/* Add lock to list of locks held by this backend */
 	held_lwlocks[num_held_lwlocks].lock = lock;
@@ -1400,14 +1403,16 @@ LWLockConditionalAcquire(LWLock *lock, LWLockMode mode)
 		RESUME_INTERRUPTS();
 
 		LOG_LWDEBUG("LWLockConditionalAcquire", lock, "failed");
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), mode);
+		if (TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL_ENABLED())
+			TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), mode);
 	}
 	else
 	{
 		/* Add lock to list of locks held by this backend */
 		held_lwlocks[num_held_lwlocks].lock = lock;
 		held_lwlocks[num_held_lwlocks++].mode = mode;
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), mode);
+		if (TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_ENABLED())
+			TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), mode);
 	}
 	return !mustwait;
 }
@@ -1479,7 +1484,8 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 #endif
 
 			LWLockReportWaitStart(lock);
-			TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
+			if (TRACE_POSTGRESQL_LWLOCK_WAIT_START_ENABLED())
+				TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
 
 			for (;;)
 			{
@@ -1497,7 +1503,8 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 				Assert(nwaiters < MAX_BACKENDS);
 			}
 #endif
-			TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+			if (TRACE_POSTGRESQL_LWLOCK_WAIT_DONE_ENABLED())
+				TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
 			LWLockReportWaitEnd();
 
 			LOG_LWDEBUG("LWLockAcquireOrWait", lock, "awakened");
@@ -1527,7 +1534,8 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 		/* Failed to get lock, so release interrupt holdoff */
 		RESUME_INTERRUPTS();
 		LOG_LWDEBUG("LWLockAcquireOrWait", lock, "failed");
-		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), mode);
+		if (TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL_ENABLED())
+			TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), mode);
 	}
 	else
 	{
@@ -1535,7 +1543,8 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 		/* Add lock to list of locks held by this backend */
 		held_lwlocks[num_held_lwlocks].lock = lock;
 		held_lwlocks[num_held_lwlocks++].mode = mode;
-		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), mode);
+		if (TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_ENABLED())
+			TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), mode);
 	}
 
 	return !mustwait;
@@ -1695,7 +1704,8 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 #endif
 
 		LWLockReportWaitStart(lock);
-		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), LW_EXCLUSIVE);
+		if (TRACE_POSTGRESQL_LWLOCK_WAIT_START_ENABLED())
+			TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), LW_EXCLUSIVE);
 
 		for (;;)
 		{
@@ -1714,7 +1724,8 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 		}
 #endif
 
-		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), LW_EXCLUSIVE);
+		if (TRACE_POSTGRESQL_LWLOCK_WAIT_DONE_ENABLED())
+			TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), LW_EXCLUSIVE);
 		LWLockReportWaitEnd();
 
 		LOG_LWDEBUG("LWLockWaitForVar", lock, "awakened");
@@ -1840,7 +1851,8 @@ LWLockRelease(LWLock *lock)
 	/* nobody else can have that kind of lock */
 	Assert(!(oldstate & LW_VAL_EXCLUSIVE));
 
-	TRACE_POSTGRESQL_LWLOCK_RELEASE(T_NAME(lock));
+	if (TRACE_POSTGRESQL_LWLOCK_RELEASE_ENABLED())
+		TRACE_POSTGRESQL_LWLOCK_RELEASE(T_NAME(lock));
 
 	/*
 	 * We're still waiting for backends to get scheduled, don't wake them up

base-commit: 3a948ea0a2ced719f26e725b030558f2e4ab1d8e
-- 
2.31.1

#37Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Peter Eisentraut (#36)
Re: [PATCH] Identify LWLocks in tracepoints

On Thu, 29 Apr 2021 at 15:31, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

So if you could produce a separate patch that adds the
_ENABLED guards targeting PG14 (and PG13), that would be helpful.

Here is a proposed patch for this.

LGTM.

Applies and builds fine on master and (with default fuzz) on
REL_13_STABLE. Works as expected.

This does increase the size of LWLockAcquire() etc slightly but since
it skips these function calls, and the semaphores are easily
predicted, I don't have any doubt it's a net win. +1 for merge.

#38Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Robert Haas (#35)
Re: [PATCH] Identify LWLocks in tracepoints

On Wed, 14 Apr 2021, 22:29 Robert Haas, <robertmhaas@gmail.com> wrote:

On Tue, Apr 13, 2021 at 10:42 PM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:

I'd really love it if a committer could add an explanatory comment or
two in the area though. I'm happy to draft a comment patch if anyone
agrees my suggestion is sensible. The key things I needed to know when
studying the code were:
[...]

I'm willing to review a comment patch along those lines.

Cool. I'll draft soon.

I since noticed that some of the info is present, but it's in lwlock.h
whereas in Pg comment detail is more often than not in the .c file.

I prefer it in headers myself anyway, since it's more available to tools
like doxygen. I'll add a few "see lwlock.h" hints, a short para about
appropriate lwlock use in the .c into comment etc and post on a separate
thread soon.

I'm actually inclined to revise the patch I sent in order to *remove*
the LWLock name from the tracepoint argument.

Reducing the overheads is good, but I have no opinion on what's

important for people doing tracing, because I am not one of those
people.

Truthfully I'm not convinced anyone is "those people" right now. I don't
think anyone is likely to be making serious use of them due to their
limitations.

Certainly that'll be the case for the txn ones which are almost totally
useless. They only track the localxid lifecycle, they don't track real txid
allocation, WAL writing, commit (wal or shmem), etc.

#39Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Craig Ringer (#38)
Re: [PATCH] Identify LWLocks in tracepoints

On Fri, Apr 30, 2021 at 11:23:56AM +0800, Craig Ringer wrote:
On Wed, 14 Apr 2021, 22:29 Robert Haas, <robertmhaas@gmail.com> wrote:

I'm actually inclined to revise the patch I sent in order to *remove*
the LWLock name from the tracepoint argument.

Reducing the overheads is good, but I have no opinion on what's
important for people doing tracing, because I am not one of those
people.

Truthfully I'm not convinced anyone is "those people" right now. I don't
think anyone is likely to be making serious use of them due to their
limitations.

I would like to mention that tracepoints could be useful not only directly,
they also:

* deliver an information about what is important enough to trace from the
developers, who wrote the code, point of view.

* declare more stable tracing points within the code, which are somewhat more
reliable between the versions.

E.g. writing bcc scripts one is also sort of limited in use of those
tracepoints because of requirement to provide a specific pid, but still can get
better understanding what to look at (maybe using other methods).

#40Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Craig Ringer (#37)
Re: [PATCH] Identify LWLocks in tracepoints

On 30.04.21 05:22, Craig Ringer wrote:

On Thu, 29 Apr 2021 at 15:31, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

So if you could produce a separate patch that adds the
_ENABLED guards targeting PG14 (and PG13), that would be helpful.

Here is a proposed patch for this.

LGTM.

Applies and builds fine on master and (with default fuzz) on
REL_13_STABLE. Works as expected.

committed

#41Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#40)
Re: [PATCH] Identify LWLocks in tracepoints

Hi,

On 2021-05-03 21:06:30 +0200, Peter Eisentraut wrote:

On 30.04.21 05:22, Craig Ringer wrote:

On Thu, 29 Apr 2021 at 15:31, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

So if you could produce a separate patch that adds the
_ENABLED guards targeting PG14 (and PG13), that would be helpful.

Here is a proposed patch for this.

LGTM.

Applies and builds fine on master and (with default fuzz) on
REL_13_STABLE. Works as expected.

committed

I'm now getting

/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c: In function ‘LWLockAcquire’:
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1322:58: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
1322 | TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
| ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1345:57: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
1345 | TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
| ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1355:54: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
1355 | TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode);
| ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c: In function ‘LWLockConditionalAcquire’:
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1407:64: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
1407 | TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), mode);
| ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1415:59: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
1415 | TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), mode);
| ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c: In function ‘LWLockAcquireOrWait’:
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1488:59: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
1488 | TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
| ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1507:58: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
1507 | TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
| ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1538:68: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
1538 | TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), mode);
| ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1547:63: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
1547 | TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), mode);
| ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c: In function ‘LWLockWaitForVar’:
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1708:66: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
1708 | TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), LW_EXCLUSIVE);
| ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1728:65: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
1728 | TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), LW_EXCLUSIVE);
| ^
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c: In function ‘LWLockRelease’:
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1855:48: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
1855 | TRACE_POSTGRESQL_LWLOCK_RELEASE(T_NAME(lock));

In a build without the trace stuff enabled.

Greetings,

Andres Freund

#42Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Andres Freund (#41)
Re: [PATCH] Identify LWLocks in tracepoints

On Wed, 5 May 2021, 06:15 Andres Freund, <andres@anarazel.de> wrote:

Hi,

warning: suggest braces around empty body in an ‘if’ statement
[-Wempty-body]
1322 | TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
| ^

Odd that I didn't get that.

I'll send a patch to revise shortly.

#43Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Craig Ringer (#42)
1 attachment(s)
Re: [PATCH] Identify LWLocks in tracepoints

On Wed, 5 May 2021 at 09:15, Craig Ringer <craig.ringer@enterprisedb.com> wrote:

warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
1322 | TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
| ^

Odd that I didn't get that.

This compiler complaint is not due to the _ENABLED() test as such.
It's due to the fact that we completely define out the
TRACE_POSTGRESQL_ macros with src/backend/utils/Gen_dummy_probes.sed .

While explicit braces could be added around each test, I suggest
fixing Gen_dummy_probes.sed to emit the usual dummy statement instead.
Patch attached.

Attachments:

v1-0001-Emit-dummy-statements-for-probes.d-probes-when-di.patchtext/x-patch; charset=UTF-8; name=v1-0001-Emit-dummy-statements-for-probes.d-probes-when-di.patchDownload
From a1ca7d752b56717854cc47b31cd299e651de0430 Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig.ringer@2ndquadrant.com>
Date: Wed, 5 May 2021 12:06:10 +0800
Subject: [PATCH v1] Emit dummy statements for probes.d probes when disabled
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When building without --enable-dtrace, emit dummy

    do {} while (0)

statements for the stubbed-out TRACE_POSTGRESQL_foo() macros
instead of empty macros that totally elide the original probe
statement.

This fixes the

    warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]

introduced by b94409a02f.
---
 src/backend/utils/Gen_dummy_probes.sed | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/backend/utils/Gen_dummy_probes.sed b/src/backend/utils/Gen_dummy_probes.sed
index aa3db59cce..6e29d86afa 100644
--- a/src/backend/utils/Gen_dummy_probes.sed
+++ b/src/backend/utils/Gen_dummy_probes.sed
@@ -19,5 +19,6 @@ s/([^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\})/(INT1, INT2,
 s/([^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\})/(INT1, INT2, INT3, INT4, INT5, INT6)/
 s/([^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\})/(INT1, INT2, INT3, INT4, INT5, INT6, INT7)/
 s/([^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\})/(INT1, INT2, INT3, INT4, INT5, INT6, INT7, INT8)/
+s/$/ do {} while (0)/
 P
 s/(.*$/_ENABLED() (0)/
-- 
2.30.2

#44Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Andres Freund (#41)
Re: [PATCH] Identify LWLocks in tracepoints

On 05.05.21 00:15, Andres Freund wrote:

I'm now getting

/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c: In function ‘LWLockAcquire’:
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1322:58: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
1322 | TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
| ^

For clarification, -Wempty-body is not part of the default warnings, right?

And even if I turn it on explicitly, I don't get this warning. I read
something that newer compilers don't warn in cases of macro expansion.

What compiler are you using in this situation?

#45Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#44)
Re: [PATCH] Identify LWLocks in tracepoints

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

On 05.05.21 00:15, Andres Freund wrote:

I'm now getting
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c: In function ‘LWLockAcquire’:
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1322:58: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
1322 | TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
| ^

What compiler are you using in this situation?

All of these buildfarm members are now showing this warning:

calliphoridae gcc (Debian 10.1.0-6) 10.1.0
culicidae gcc (Debian 10.1.0-6) 10.1.0
flaviventris gcc (Debian 20200124-1) 10.0.1 20200124 (experimental)
francolin gcc (Debian 10.1.0-6) 10.1.0
piculetœ gcc (Debian 10.1.0-6) 10.1.0
rorqual gcc (Debian 10.1.0-6) 10.1.0
serinus gcc (Debian 20200124-1) 10.0.1 20200124 (experimental)
skink gcc (Debian 10.1.0-6) 10.1.0

so there's your answer.

(I wonder why flaviventris and serinus are still using an "experimental"
compiler version that is now behind mainstream.)

regards, tom lane

#46Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#45)
Re: [PATCH] Identify LWLocks in tracepoints

Hi,

On 2021-05-08 13:13:47 -0400, Tom Lane wrote:

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

On 05.05.21 00:15, Andres Freund wrote:

I'm now getting
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c: In function ‘LWLockAcquire’:
/home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1322:58: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
1322 | TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
| ^

What compiler are you using in this situation?

gcc - I think the warning is pulled in via -Wextra. I think it's
something sensible to warn about, too easy to end up with misleading
behaviour when statement-like macros are defined empty.

All of these buildfarm members are now showing this warning:

calliphoridae gcc (Debian 10.1.0-6) 10.1.0
culicidae gcc (Debian 10.1.0-6) 10.1.0
flaviventris gcc (Debian 20200124-1) 10.0.1 20200124 (experimental)
francolin gcc (Debian 10.1.0-6) 10.1.0
piculetœ gcc (Debian 10.1.0-6) 10.1.0
rorqual gcc (Debian 10.1.0-6) 10.1.0
serinus gcc (Debian 20200124-1) 10.0.1 20200124 (experimental)
skink gcc (Debian 10.1.0-6) 10.1.0

I think those likely are all mine, so it's not too surprising. They all
use something like
CFLAGS => '-Og -ggdb -g3 -Wall -Wextra -Wno-unused-parameter -Wno-sign-compare -Wno-missing-field-initializers -fno-omit-frame-pointer',

(I wonder why flaviventris and serinus are still using an "experimental"
compiler version that is now behind mainstream.)

The upgrade script didn't install the newer version it because it had to
remove some conflicting packages... Should be fixed for runs starting
now.

Greetings,

Andres Freund

#47Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#46)
Re: [PATCH] Identify LWLocks in tracepoints

Andres Freund <andres@anarazel.de> writes:

On 2021-05-08 13:13:47 -0400, Tom Lane wrote:

(I wonder why flaviventris and serinus are still using an "experimental"
compiler version that is now behind mainstream.)

The upgrade script didn't install the newer version it because it had to
remove some conflicting packages... Should be fixed for runs starting
now.

Looks like that didn't work ...

regards, tom lane

#48Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Craig Ringer (#43)
Re: [PATCH] Identify LWLocks in tracepoints

On 05.05.21 06:20, Craig Ringer wrote:

On Wed, 5 May 2021 at 09:15, Craig Ringer <craig.ringer@enterprisedb.com> wrote:

warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
1322 | TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
| ^

Odd that I didn't get that.

This compiler complaint is not due to the _ENABLED() test as such.
It's due to the fact that we completely define out the
TRACE_POSTGRESQL_ macros with src/backend/utils/Gen_dummy_probes.sed .

While explicit braces could be added around each test, I suggest
fixing Gen_dummy_probes.sed to emit the usual dummy statement instead.
Patch attached.

Committed, with the Gen_dummy_probes.pl change added.

#49Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#47)
Re: [PATCH] Identify LWLocks in tracepoints

On 2021-05-09 19:51:13 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2021-05-08 13:13:47 -0400, Tom Lane wrote:

(I wonder why flaviventris and serinus are still using an "experimental"
compiler version that is now behind mainstream.)

The upgrade script didn't install the newer version it because it had to
remove some conflicting packages... Should be fixed for runs starting
now.

Looks like that didn't work ...

Looks like it did, but turned out to have some unintended side-effects
:(.

The snapshot builds are now new:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=flaviventris&amp;dt=2021-05-10%2015%3A43%3A56&amp;stg=configure
configure:3966: ccache /usr/lib/gcc-snapshot/bin/gcc --version >&5
gcc (Debian 20210421-1) 11.0.1 20210421 (prerelease) [gcc-11 revision fbb7739892e:d13ce34bd01:3756d99dab6a268d0d8a17583980a86f23f0595a]

But the aforementioned dependencies that needed to remove broke the
installed old versions of gcc/clang.

I started to build the old versions of llvm manually, but that then hits
the issue that at least 3.9 doesn't build with halfway modern versions
of gcc/clang. So I gotta do it stepwise (i.e. go backwards, build llvm
n-2 with n-1), will take a bit of time.

Greetings,

Andres Freund

#50Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#49)
Re: [PATCH] Identify LWLocks in tracepoints

Andres Freund <andres@anarazel.de> writes:

Looks like it did, but turned out to have some unintended side-effects
:(.
The snapshot builds are now new:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=flaviventris&amp;dt=2021-05-10%2015%3A43%3A56&amp;stg=configure
configure:3966: ccache /usr/lib/gcc-snapshot/bin/gcc --version >&5
gcc (Debian 20210421-1) 11.0.1 20210421 (prerelease) [gcc-11 revision fbb7739892e:d13ce34bd01:3756d99dab6a268d0d8a17583980a86f23f0595a]
But the aforementioned dependencies that needed to remove broke the
installed old versions of gcc/clang.
I started to build the old versions of llvm manually, but that then hits
the issue that at least 3.9 doesn't build with halfway modern versions
of gcc/clang. So I gotta do it stepwise (i.e. go backwards, build llvm
n-2 with n-1), will take a bit of time.

Ugh. Memo to self: don't rag on other peoples' buildfarm configurations
right before a release deadline :-(. Sorry to cause you trouble.

regards, tom lane

#51Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#50)
Re: [PATCH] Identify LWLocks in tracepoints

Hi,

On 2021-05-10 12:14:46 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

Looks like it did, but turned out to have some unintended side-effects
:(.
The snapshot builds are now new:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=flaviventris&amp;dt=2021-05-10%2015%3A43%3A56&amp;stg=configure
configure:3966: ccache /usr/lib/gcc-snapshot/bin/gcc --version >&5
gcc (Debian 20210421-1) 11.0.1 20210421 (prerelease) [gcc-11 revision fbb7739892e:d13ce34bd01:3756d99dab6a268d0d8a17583980a86f23f0595a]
But the aforementioned dependencies that needed to remove broke the
installed old versions of gcc/clang.
I started to build the old versions of llvm manually, but that then hits
the issue that at least 3.9 doesn't build with halfway modern versions
of gcc/clang. So I gotta do it stepwise (i.e. go backwards, build llvm
n-2 with n-1), will take a bit of time.

Ugh. Memo to self: don't rag on other peoples' buildfarm configurations
right before a release deadline :-(. Sorry to cause you trouble.

No worries - I knew that I'd have to do this at some point, even though
I hadn't planned to do that today... I should have all of them green
before end of today.

I found that I actually can build LLVM 3.9 directly, as clang-6 can
still build it directly (wheras the oldest gcc still installed can't
build it directly). So it's a bit less painful than I thought at first

The 3.9 instances (phycodurus, dragonet) tests are running right now,
and I'm fairly sure they'll pass (most of a --noreport --nostatus run
passed). Going forward building LLVM 4,5,6 now - the later versions take
longer...

Greetings,

Andres Freund

#52Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#51)
Re: [PATCH] Identify LWLocks in tracepoints

Hi,

On 2021-05-10 09:46:02 -0700, Andres Freund wrote:

No worries - I knew that I'd have to do this at some point, even though
I hadn't planned to do that today... I should have all of them green
before end of today.

I found that I actually can build LLVM 3.9 directly, as clang-6 can
still build it directly (wheras the oldest gcc still installed can't
build it directly). So it's a bit less painful than I thought at first

The 3.9 instances (phycodurus, dragonet) tests are running right now,
and I'm fairly sure they'll pass (most of a --noreport --nostatus run
passed). Going forward building LLVM 4,5,6 now - the later versions take
longer...

Looks like it's all clear now. All but the results for 11 had cleared up
until yesterday evening, and the rest came in ok over night.

Greetings,

Andres Freund