Creating a DSA area to provide work space for parallel execution

Started by Thomas Munroover 9 years ago25 messages

thomas.munro@enterprisedb.com

over 9 years ago

1 attachment(s)

Hi hackers,

A couple of months ago I proposed dynamic shared areas[1]/messages/by-id/CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com. DSA areas
are dynamically sized shared memory heaps that backends can use to
share data, building on top of the existing DSM infrastructure.

One target use case for DSA areas is to provide work space for
parallel query execution. To that end, here is a patch to create a
DSA area for use by executor code. The area is automatically attached
to the leader and all worker processes for the duration of the
parallel query, and is available as estate->es_query_area.

Backends already have access to shared memory through a single DSM
segment managed with a table-of-contents. The TOC provides a way to
carve out some shared storage space for individual executor nodes and
look it up later by plan node ID. That works for things like
ParallelHeapScanDescData whose size is known up front, but not so well
if you need something more like a heap in which to build shared data
structures. Through estate->es_query_area, a parallel-aware executor
node can use and recycle arbitrary amounts of shared memory with an
allocate/free interface.

Motivating use cases include shared bitmaps and shared hash tables
(patches to follow).

Currently, this doesn't mean you don't also need the existing DSM
segment. In order share data structures in the DSA area, you need a
way to exchange pointers to find them, and the existing segment + TOC
mechanism is ideal for that.

One obvious problem is that this patch results in at least *two* DSM
segments being created for every parallel query execution: the main
segment used for parallel execution, and then the initial segment
managed by the DSA area. One thought is that DSA areas are the more
general mechanism, so perhaps we should figure out how to store
contents of the existing segment in it. The TOC interface would need
a few tweaks to be able to live in memory allocated with dsa_allocate,
and they we'd need to share that address with other backends so that
they could find it (cf the current approach of finding the TOC at the
start of the segment). I haven't prototyped that yet. That'd involve
changing the wording "InitializeDSM" that appears in various places
including the FDW API, which has been putting me off...

This patch depends on dsa-v2.patch[1]/messages/by-id/CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com.

[1]: /messages/by-id/CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

dsa-area-for-executor-v1.patchapplication/octet-stream; name=dsa-area-for-executor-v1.patchDownload

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 5aa6f02..72bacd5 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -32,6 +32,7 @@
 #include "nodes/nodeFuncs.h"
 #include "optimizer/planmain.h"
 #include "optimizer/planner.h"
+#include "storage/dsa.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
 #include "utils/memutils.h"
@@ -47,6 +48,7 @@
 #define PARALLEL_KEY_BUFFER_USAGE		UINT64CONST(0xE000000000000003)
 #define PARALLEL_KEY_TUPLE_QUEUE		UINT64CONST(0xE000000000000004)
 #define PARALLEL_KEY_INSTRUMENTATION	UINT64CONST(0xE000000000000005)
+#define PARALLEL_KEY_AREA_HANDLE		UINT64CONST(0xE000000000000006)
 
 #define PARALLEL_TUPLE_QUEUE_SIZE		65536
 
@@ -345,6 +347,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers)
 	int			param_len;
 	int			instrumentation_len = 0;
 	int			instrument_offset = 0;
+	dsa_handle *area_handle;
 
 	/* Allocate object for return value. */
 	pei = palloc0(sizeof(ParallelExecutorInfo));
@@ -354,6 +357,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers)
 	/* Fix up and serialize plan to be sent to workers. */
 	pstmt_data = ExecSerializePlan(planstate->plan, estate);
 
+	/* Create a DSA area that can be used by all workers. */
+	pei->area = dsa_create_dynamic(LWTRANCHE_PARALLEL_EXEC_AREA,
+								   "parallel query memory area");
+
+	/*
+	 * Make the area available to executor nodes running in the leader.  See
+	 * also ParallelQueryMain which makes it available to workers.
+	 */
+	estate->es_query_area = pei->area;
+
 	/* Create a parallel context. */
 	pcxt = CreateParallelContext(ParallelQueryMain, nworkers);
 	pei->pcxt = pcxt;
@@ -413,6 +426,10 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers)
 		shm_toc_estimate_keys(&pcxt->estimator, 1);
 	}
 
+	/* Estimate space for DSA area handle. */
+	shm_toc_estimate_chunk(&pcxt->estimator, sizeof(dsa_handle));
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+
 	/* Everyone's had a chance to ask for space, so now create the DSM. */
 	InitializeParallelDSM(pcxt);
 
@@ -483,6 +500,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers)
 	if (e.nnodes != d.nnodes)
 		elog(ERROR, "inconsistent count of PlanState nodes");
 
+	/* Store the DSA area handle so that worker backends can attach. */
+	area_handle = shm_toc_allocate(pcxt->toc, sizeof(dsa_handle));
+	*area_handle = dsa_get_handle(pei->area);
+	shm_toc_insert(pcxt->toc, PARALLEL_KEY_AREA_HANDLE, area_handle);
+
 	/* OK, we're ready to rock and roll. */
 	return pei;
 }
@@ -571,6 +593,11 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
 void
 ExecParallelCleanup(ParallelExecutorInfo *pei)
 {
+	if (pei->area != NULL)
+	{
+		dsa_detach(pei->area);
+		pei->area = NULL;
+	}
 	if (pei->pcxt != NULL)
 	{
 		DestroyParallelContext(pei->pcxt);
@@ -728,6 +755,8 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	QueryDesc  *queryDesc;
 	SharedExecutorInstrumentation *instrumentation;
 	int			instrument_options = 0;
+	dsa_handle *area_handle;
+	dsa_area *area;
 
 	/* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
 	receiver = ExecParallelGetReceiver(seg, toc);
@@ -739,8 +768,14 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Prepare to track buffer usage during query execution. */
 	InstrStartParallelQuery();
 
+	/* Attach to the dynamic shared memory area. */
+	area_handle = shm_toc_lookup(toc, PARALLEL_KEY_AREA_HANDLE);
+	Assert(area_handle != NULL);
+	area = dsa_attach_dynamic(*area_handle);
+
 	/* Start up the executor, have it run the plan, and then shut it down. */
 	ExecutorStart(queryDesc, 0);
+	queryDesc->planstate->state->es_query_area = area;
 	ExecParallelInitializeWorker(queryDesc->planstate, toc);
 	ExecutorRun(queryDesc, ForwardScanDirection, 0L);
 	ExecutorFinish(queryDesc);
@@ -758,6 +793,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	ExecutorEnd(queryDesc);
 
 	/* Cleanup. */
+	dsa_detach(area);
 	FreeQueryDesc(queryDesc);
 	(*receiver->rDestroy) (receiver);
 }
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index f4c6d37..2afd3d1 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -17,6 +17,7 @@
 #include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
 #include "nodes/plannodes.h"
+#include "storage/dsa.h"
 
 typedef struct SharedExecutorInstrumentation SharedExecutorInstrumentation;
 
@@ -27,6 +28,7 @@ typedef struct ParallelExecutorInfo
 	BufferUsage *buffer_usage;
 	SharedExecutorInstrumentation *instrumentation;
 	shm_mq_handle **tqueue;
+	dsa_area *area;
 	bool		finished;
 } ParallelExecutorInfo;
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 4fa3661..bb1f56a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -20,6 +20,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/dsa.h"
 #include "utils/hsearch.h"
 #include "utils/reltrigger.h"
 #include "utils/sortsupport.h"
@@ -422,6 +423,9 @@ typedef struct EState
 	HeapTuple  *es_epqTuple;	/* array of EPQ substitute tuples */
 	bool	   *es_epqTupleSet; /* true if EPQ tuple is provided */
 	bool	   *es_epqScanDone; /* true if EPQ tuple has been fetched */
+
+	/* The per-query shared memory area to use for parallel execution. */
+	dsa_area *es_query_area;
 } EState;
 
 
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 9a2d869..951e421 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -235,6 +235,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_BUFFER_MAPPING,
 	LWTRANCHE_LOCK_MANAGER,
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
+	LWTRANCHE_PARALLEL_EXEC_AREA,
 	LWTRANCHE_FIRST_USER_DEFINED
 }	BuiltinTrancheIds;

Thomas Munro

thomas.munro@enterprisedb.com

about 9 years ago

In reply to: Thomas Munro (#1)

1 attachment(s)

Re: Creating a DSA area to provide work space for parallel execution

On Wed, Oct 5, 2016 at 10:32 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

One obvious problem is that this patch results in at least *two* DSM
segments being created for every parallel query execution: the main
segment used for parallel execution, and then the initial segment
managed by the DSA area. One thought is that DSA areas are the more
general mechanism, so perhaps we should figure out how to store
contents of the existing segment in it. The TOC interface would need
a few tweaks to be able to live in memory allocated with dsa_allocate,
and they we'd need to share that address with other backends so that
they could find it (cf the current approach of finding the TOC at the
start of the segment). I haven't prototyped that yet. That'd involve
changing the wording "InitializeDSM" that appears in various places
including the FDW API, which has been putting me off...

... or we could allow DSA areas to be constructed inside existing
shmem, as in the attached patch which requires dsa_create_in_place,
from the patch at
/messages/by-id/CAEepm=0-pbokaQdCXhtYn=w64MmdJz4hYW7qcSU235ar276x7w@mail.gmail.com
.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

dsa-area-for-executor-v2.patchapplication/octet-stream; name=dsa-area-for-executor-v2.patchDownload

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 5aa6f02..c1bb9f9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -34,6 +34,7 @@
 #include "optimizer/planner.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
+#include "utils/dsa.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -47,8 +48,10 @@
 #define PARALLEL_KEY_BUFFER_USAGE		UINT64CONST(0xE000000000000003)
 #define PARALLEL_KEY_TUPLE_QUEUE		UINT64CONST(0xE000000000000004)
 #define PARALLEL_KEY_INSTRUMENTATION	UINT64CONST(0xE000000000000005)
+#define PARALLEL_KEY_AREA				UINT64CONST(0xE000000000000006)
 
 #define PARALLEL_TUPLE_QUEUE_SIZE		65536
+#define PARALLEL_AREA_SIZE				65536
 
 /*
  * DSM structure for accumulating per-PlanState instrumentation.
@@ -345,6 +348,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers)
 	int			param_len;
 	int			instrumentation_len = 0;
 	int			instrument_offset = 0;
+	char	   *area_space;
 
 	/* Allocate object for return value. */
 	pei = palloc0(sizeof(ParallelExecutorInfo));
@@ -354,6 +358,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers)
 	/* Fix up and serialize plan to be sent to workers. */
 	pstmt_data = ExecSerializePlan(planstate->plan, estate);
 
+	/*
+	 * Make the area available to executor nodes running in the leader.  See
+	 * also ParallelQueryMain which makes it available to workers.
+	 */
+	estate->es_query_area = pei->area;
+
 	/* Create a parallel context. */
 	pcxt = CreateParallelContext(ParallelQueryMain, nworkers);
 	pei->pcxt = pcxt;
@@ -413,6 +423,10 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers)
 		shm_toc_estimate_keys(&pcxt->estimator, 1);
 	}
 
+	/* Estimate space for DSA area. */
+	shm_toc_estimate_chunk(&pcxt->estimator, PARALLEL_AREA_SIZE);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+
 	/* Everyone's had a chance to ask for space, so now create the DSM. */
 	InitializeParallelDSM(pcxt);
 
@@ -466,6 +480,14 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers)
 		pei->instrumentation = instrumentation;
 	}
 
+	/* Create a DSA area that can be used by the leader and all workers. */
+	area_space = shm_toc_allocate(pcxt->toc, PARALLEL_AREA_SIZE);
+	shm_toc_insert(pcxt->toc, PARALLEL_KEY_AREA, area_space);
+	pei->area = dsa_create_in_place(area_space, PARALLEL_AREA_SIZE,
+									LWTRANCHE_PARALLEL_EXEC_AREA,
+								   "parallel query memory area");
+	on_dsm_detach(pcxt->seg, dsa_on_dsm_detach, PointerGetDatum(area_space));
+
 	/*
 	 * Give parallel-aware nodes a chance to initialize their shared data.
 	 * This also initializes the elements of instrumentation->ps_instrument,
@@ -728,6 +750,8 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	QueryDesc  *queryDesc;
 	SharedExecutorInstrumentation *instrumentation;
 	int			instrument_options = 0;
+	void *area_space;
+	dsa_area *area;
 
 	/* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
 	receiver = ExecParallelGetReceiver(seg, toc);
@@ -739,8 +763,14 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Prepare to track buffer usage during query execution. */
 	InstrStartParallelQuery();
 
+	/* Attach to the dynamic shared memory area. */
+	area_space = shm_toc_lookup(toc, PARALLEL_KEY_AREA);
+	area = dsa_attach_in_place(area_space);
+	on_dsm_detach(seg, dsa_on_dsm_detach, PointerGetDatum(area_space));
+
 	/* Start up the executor, have it run the plan, and then shut it down. */
 	ExecutorStart(queryDesc, 0);
+	queryDesc->planstate->state->es_query_area = area;
 	ExecParallelInitializeWorker(queryDesc->planstate, toc);
 	ExecutorRun(queryDesc, ForwardScanDirection, 0L);
 	ExecutorFinish(queryDesc);
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index f4c6d37..4b57474 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -17,6 +17,7 @@
 #include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
 #include "nodes/plannodes.h"
+#include "utils/dsa.h"
 
 typedef struct SharedExecutorInstrumentation SharedExecutorInstrumentation;
 
@@ -27,6 +28,7 @@ typedef struct ParallelExecutorInfo
 	BufferUsage *buffer_usage;
 	SharedExecutorInstrumentation *instrumentation;
 	shm_mq_handle **tqueue;
+	dsa_area *area;
 	bool		finished;
 } ParallelExecutorInfo;
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f6f73f3..7967383 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -20,6 +20,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "utils/dsa.h"
 #include "utils/hsearch.h"
 #include "utils/reltrigger.h"
 #include "utils/sortsupport.h"
@@ -422,6 +423,9 @@ typedef struct EState
 	HeapTuple  *es_epqTuple;	/* array of EPQ substitute tuples */
 	bool	   *es_epqTupleSet; /* true if EPQ tuple is provided */
 	bool	   *es_epqScanDone; /* true if EPQ tuple has been fetched */
+
+	/* The per-query shared memory area to use for parallel execution. */
+	dsa_area *es_query_area;
 } EState;
 
 
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 9a2d869..951e421 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -235,6 +235,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_BUFFER_MAPPING,
 	LWTRANCHE_LOCK_MANAGER,
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
+	LWTRANCHE_PARALLEL_EXEC_AREA,
 	LWTRANCHE_FIRST_USER_DEFINED
 }	BuiltinTrancheIds;

Dilip Kumar

dilipbalaut@gmail.com

about 9 years ago

In reply to: Thomas Munro (#2)

Re: Creating a DSA area to provide work space for parallel execution

On Wed, Nov 23, 2016 at 5:42 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

... or we could allow DSA areas to be constructed inside existing
shmem, as in the attached patch which requires dsa_create_in_place,
from the patch at
/messages/by-id/CAEepm=0-pbokaQdCXhtYn=w64MmdJz4hYW7qcSU235ar276x7w@mail.gmail.com
.

Seems like there is problem in this patch..

In below code, pei->area is not yet allocated at this point , so
estate->es_query_area will always me NULL ?

+ estate->es_query_area = pei->area;
+
/* Create a parallel context. */
pcxt = CreateParallelContext(ParallelQueryMain, nworkers);
pei->pcxt = pcxt;
@@ -413,6 +423,10 @@ ExecInitParallelPlan(PlanState *planstate, EState
*estate, int nworkers)
shm_toc_estimate_keys(&pcxt->estimator, 1);
}

+ /* Estimate space for DSA area. */
+ shm_toc_estimate_chunk(&pcxt->estimator, PARALLEL_AREA_SIZE);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
  /* Everyone's had a chance to ask for space, so now create the DSM. */
  InitializeParallelDSM(pcxt);

@@ -466,6 +480,14 @@ ExecInitParallelPlan(PlanState *planstate, EState
*estate, int nworkers)
pei->instrumentation = instrumentation;
}

+ /* Create a DSA area that can be used by the leader and all workers. */
+ area_space = shm_toc_allocate(pcxt->toc, PARALLEL_AREA_SIZE);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_AREA, area_space);
+ pei->area = dsa_create_in_place(area_space, PARALLEL_AREA_SIZE,
+ LWTRANCHE_PARALLEL_EXEC_AREA,
+   "parallel query memory area");

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Dilip Kumar

dilipbalaut@gmail.com

about 9 years ago

In reply to: Dilip Kumar (#3)

Re: Creating a DSA area to provide work space for parallel execution

I have one more question,

In V1 we were calling dsa_detach in ExecParallelCleanup and in
ParallelQueryMain, but it's removed in v2.

Any specific reason ?
Does this need to be used differently ?

 ExecParallelCleanup(ParallelExecutorInfo *pei)
 {
+ if (pei->area != NULL)
+ {
+ dsa_detach(pei->area);
+ pei->area = NULL;
+ }

After this changes, I am getting DSM segment leak warning.

I am calling dsa_allocate and dsa_free.

On Thu, Nov 24, 2016 at 8:09 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Nov 23, 2016 at 5:42 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

... or we could allow DSA areas to be constructed inside existing
shmem, as in the attached patch which requires dsa_create_in_place,
from the patch at
/messages/by-id/CAEepm=0-pbokaQdCXhtYn=w64MmdJz4hYW7qcSU235ar276x7w@mail.gmail.com
.

Seems like there is problem in this patch..

In below code, pei->area is not yet allocated at this point , so
estate->es_query_area will always me NULL ?

+ estate->es_query_area = pei->area;
+
/* Create a parallel context. */
pcxt = CreateParallelContext(ParallelQueryMain, nworkers);
pei->pcxt = pcxt;
@@ -413,6 +423,10 @@ ExecInitParallelPlan(PlanState *planstate, EState
*estate, int nworkers)
shm_toc_estimate_keys(&pcxt->estimator, 1);
}
+ /* Estimate space for DSA area. */
+ shm_toc_estimate_chunk(&pcxt->estimator, PARALLEL_AREA_SIZE);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);
@@ -466,6 +480,14 @@ ExecInitParallelPlan(PlanState *planstate, EState
*estate, int nworkers)
pei->instrumentation = instrumentation;
}
+ /* Create a DSA area that can be used by the leader and all workers. */
+ area_space = shm_toc_allocate(pcxt->toc, PARALLEL_AREA_SIZE);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_AREA, area_space);
+ pei->area = dsa_create_in_place(area_space, PARALLEL_AREA_SIZE,
+ LWTRANCHE_PARALLEL_EXEC_AREA,
+   "parallel query memory area");
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thomas Munro

thomas.munro@enterprisedb.com

about 9 years ago

In reply to: Dilip Kumar (#4)

1 attachment(s)

Re: Creating a DSA area to provide work space for parallel execution

On Fri, Nov 25, 2016 at 4:32 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

I have one more question,

In V1 we were calling dsa_detach in ExecParallelCleanup and in
ParallelQueryMain, but it's removed in v2.

Any specific reason ?
Does this need to be used differently ?
ExecParallelCleanup(ParallelExecutorInfo *pei)
{
+ if (pei->area != NULL)
+ {
+ dsa_detach(pei->area);
+ pei->area = NULL;
+ }
After this changes, I am getting DSM segment leak warning.

Thanks! I had some chicken-vs-egg problems dealing with cleanup of
DSM segments belonging to DSA areas created inside DSM segments.
Here's a new version to apply on top of dsa-v7.patch.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

dsa-area-for-executor-v3.patchapplication/octet-stream; name=dsa-area-for-executor-v3.patchDownload

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 5aa6f02..b9f18dc 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -34,6 +34,7 @@
 #include "optimizer/planner.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
+#include "utils/dsa.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -47,8 +48,10 @@
 #define PARALLEL_KEY_BUFFER_USAGE		UINT64CONST(0xE000000000000003)
 #define PARALLEL_KEY_TUPLE_QUEUE		UINT64CONST(0xE000000000000004)
 #define PARALLEL_KEY_INSTRUMENTATION	UINT64CONST(0xE000000000000005)
+#define PARALLEL_KEY_AREA				UINT64CONST(0xE000000000000006)
 
 #define PARALLEL_TUPLE_QUEUE_SIZE		65536
+#define PARALLEL_AREA_SIZE				65536
 
 /*
  * DSM structure for accumulating per-PlanState instrumentation.
@@ -345,6 +348,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers)
 	int			param_len;
 	int			instrumentation_len = 0;
 	int			instrument_offset = 0;
+	char	   *area_space;
 
 	/* Allocate object for return value. */
 	pei = palloc0(sizeof(ParallelExecutorInfo));
@@ -413,6 +417,10 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers)
 		shm_toc_estimate_keys(&pcxt->estimator, 1);
 	}
 
+	/* Estimate space for DSA area. */
+	shm_toc_estimate_chunk(&pcxt->estimator, PARALLEL_AREA_SIZE);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+
 	/* Everyone's had a chance to ask for space, so now create the DSM. */
 	InitializeParallelDSM(pcxt);
 
@@ -466,6 +474,21 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers)
 		pei->instrumentation = instrumentation;
 	}
 
+	/* Create a DSA area that can be used by the leader and all workers. */
+	area_space = shm_toc_allocate(pcxt->toc, PARALLEL_AREA_SIZE);
+	shm_toc_insert(pcxt->toc, PARALLEL_KEY_AREA, area_space);
+	pei->area = dsa_create_in_place(area_space, PARALLEL_AREA_SIZE,
+									LWTRANCHE_PARALLEL_EXEC_AREA,
+								   "parallel query memory area");
+	on_dsm_detach(pcxt->seg, dsa_on_dsm_detach_release_in_place,
+				  PointerGetDatum(area_space));
+
+	/*
+	 * Make the area available to executor nodes running in the leader.  See
+	 * also ParallelQueryMain which makes it available to workers.
+	 */
+	estate->es_query_area = pei->area;
+
 	/*
 	 * Give parallel-aware nodes a chance to initialize their shared data.
 	 * This also initializes the elements of instrumentation->ps_instrument,
@@ -571,6 +594,11 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
 void
 ExecParallelCleanup(ParallelExecutorInfo *pei)
 {
+	if (pei->area != NULL)
+	{
+		dsa_detach(pei->area);
+		pei->area = NULL;
+	}
 	if (pei->pcxt != NULL)
 	{
 		DestroyParallelContext(pei->pcxt);
@@ -728,6 +756,8 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	QueryDesc  *queryDesc;
 	SharedExecutorInstrumentation *instrumentation;
 	int			instrument_options = 0;
+	void *area_space;
+	dsa_area *area;
 
 	/* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
 	receiver = ExecParallelGetReceiver(seg, toc);
@@ -739,8 +769,15 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Prepare to track buffer usage during query execution. */
 	InstrStartParallelQuery();
 
+	/* Attach to the dynamic shared memory area. */
+	area_space = shm_toc_lookup(toc, PARALLEL_KEY_AREA);
+	area = dsa_attach_in_place(area_space);
+	on_dsm_detach(seg, dsa_on_dsm_detach_release_in_place,
+				  PointerGetDatum(area_space));
+
 	/* Start up the executor, have it run the plan, and then shut it down. */
 	ExecutorStart(queryDesc, 0);
+	queryDesc->planstate->state->es_query_area = area;
 	ExecParallelInitializeWorker(queryDesc->planstate, toc);
 	ExecutorRun(queryDesc, ForwardScanDirection, 0L);
 	ExecutorFinish(queryDesc);
@@ -758,6 +795,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	ExecutorEnd(queryDesc);
 
 	/* Cleanup. */
+	dsa_detach(area);
 	FreeQueryDesc(queryDesc);
 	(*receiver->rDestroy) (receiver);
 }
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index f4c6d37..4b57474 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -17,6 +17,7 @@
 #include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
 #include "nodes/plannodes.h"
+#include "utils/dsa.h"
 
 typedef struct SharedExecutorInstrumentation SharedExecutorInstrumentation;
 
@@ -27,6 +28,7 @@ typedef struct ParallelExecutorInfo
 	BufferUsage *buffer_usage;
 	SharedExecutorInstrumentation *instrumentation;
 	shm_mq_handle **tqueue;
+	dsa_area *area;
 	bool		finished;
 } ParallelExecutorInfo;
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f6f73f3..7967383 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -20,6 +20,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "utils/dsa.h"
 #include "utils/hsearch.h"
 #include "utils/reltrigger.h"
 #include "utils/sortsupport.h"
@@ -422,6 +423,9 @@ typedef struct EState
 	HeapTuple  *es_epqTuple;	/* array of EPQ substitute tuples */
 	bool	   *es_epqTupleSet; /* true if EPQ tuple is provided */
 	bool	   *es_epqScanDone; /* true if EPQ tuple has been fetched */
+
+	/* The per-query shared memory area to use for parallel execution. */
+	dsa_area *es_query_area;
 } EState;
 
 
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 9a2d869..951e421 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -235,6 +235,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_BUFFER_MAPPING,
 	LWTRANCHE_LOCK_MANAGER,
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
+	LWTRANCHE_PARALLEL_EXEC_AREA,
 	LWTRANCHE_FIRST_USER_DEFINED
 }	BuiltinTrancheIds;

Thomas Munro

thomas.munro@enterprisedb.com

about 9 years ago

In reply to: Thomas Munro (#5)

1 attachment(s)

Re: Creating a DSA area to provide work space for parallel execution

On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Here's a new version to apply on top of dsa-v7.patch.

Here's a version to go with dsa-v8.patch.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

dsa-area-for-executor-v4.patchapplication/octet-stream; name=dsa-area-for-executor-v4.patchDownload

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f9c8598..7d88489 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -34,6 +34,7 @@
 #include "optimizer/planner.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
+#include "utils/dsa.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -47,6 +48,7 @@
 #define PARALLEL_KEY_BUFFER_USAGE		UINT64CONST(0xE000000000000003)
 #define PARALLEL_KEY_TUPLE_QUEUE		UINT64CONST(0xE000000000000004)
 #define PARALLEL_KEY_INSTRUMENTATION	UINT64CONST(0xE000000000000005)
+#define PARALLEL_KEY_AREA				UINT64CONST(0xE000000000000006)
 
 #define PARALLEL_TUPLE_QUEUE_SIZE		65536
 
@@ -345,6 +347,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers)
 	int			param_len;
 	int			instrumentation_len = 0;
 	int			instrument_offset = 0;
+	char	   *area_space;
 
 	/* Allocate object for return value. */
 	pei = palloc0(sizeof(ParallelExecutorInfo));
@@ -413,6 +416,10 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers)
 		shm_toc_estimate_keys(&pcxt->estimator, 1);
 	}
 
+	/* Estimate space for DSA area. */
+	shm_toc_estimate_chunk(&pcxt->estimator, dsa_minimum_size());
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+
 	/* Everyone's had a chance to ask for space, so now create the DSM. */
 	InitializeParallelDSM(pcxt);
 
@@ -466,6 +473,20 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers)
 		pei->instrumentation = instrumentation;
 	}
 
+	/* Create a DSA area that can be used by the leader and all workers. */
+	area_space = shm_toc_allocate(pcxt->toc, dsa_minimum_size());
+	shm_toc_insert(pcxt->toc, PARALLEL_KEY_AREA, area_space);
+	pei->area = dsa_create_in_place(area_space, dsa_minimum_size(),
+									LWTRANCHE_PARALLEL_EXEC_AREA,
+									"parallel query memory area",
+									pcxt->seg);
+
+	/*
+	 * Make the area available to executor nodes running in the leader.  See
+	 * also ParallelQueryMain which makes it available to workers.
+	 */
+	estate->es_query_area = pei->area;
+
 	/*
 	 * Give parallel-aware nodes a chance to initialize their shared data.
 	 * This also initializes the elements of instrumentation->ps_instrument,
@@ -571,6 +592,11 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
 void
 ExecParallelCleanup(ParallelExecutorInfo *pei)
 {
+	if (pei->area != NULL)
+	{
+		dsa_detach(pei->area);
+		pei->area = NULL;
+	}
 	if (pei->pcxt != NULL)
 	{
 		DestroyParallelContext(pei->pcxt);
@@ -728,6 +754,8 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	QueryDesc  *queryDesc;
 	SharedExecutorInstrumentation *instrumentation;
 	int			instrument_options = 0;
+	void *area_space;
+	dsa_area *area;
 
 	/* Set up DestReceiver, SharedExecutorInstrumentation, and QueryDesc. */
 	receiver = ExecParallelGetReceiver(seg, toc);
@@ -739,8 +767,13 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Prepare to track buffer usage during query execution. */
 	InstrStartParallelQuery();
 
+	/* Attach to the dynamic shared memory area. */
+	area_space = shm_toc_lookup(toc, PARALLEL_KEY_AREA);
+	area = dsa_attach_in_place(area_space, seg);
+
 	/* Start up the executor, have it run the plan, and then shut it down. */
 	ExecutorStart(queryDesc, 0);
+	queryDesc->planstate->state->es_query_area = area;
 	ExecParallelInitializeWorker(queryDesc->planstate, toc);
 	ExecutorRun(queryDesc, ForwardScanDirection, 0L);
 	ExecutorFinish(queryDesc);
@@ -758,6 +791,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	ExecutorEnd(queryDesc);
 
 	/* Cleanup. */
+	dsa_detach(area);
 	FreeQueryDesc(queryDesc);
 	(*receiver->rDestroy) (receiver);
 }
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index f4c6d37..4b57474 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -17,6 +17,7 @@
 #include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
 #include "nodes/plannodes.h"
+#include "utils/dsa.h"
 
 typedef struct SharedExecutorInstrumentation SharedExecutorInstrumentation;
 
@@ -27,6 +28,7 @@ typedef struct ParallelExecutorInfo
 	BufferUsage *buffer_usage;
 	SharedExecutorInstrumentation *instrumentation;
 	shm_mq_handle **tqueue;
+	dsa_area *area;
 	bool		finished;
 } ParallelExecutorInfo;
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 8004d85..0837c42 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -20,6 +20,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "utils/dsa.h"
 #include "utils/hsearch.h"
 #include "utils/reltrigger.h"
 #include "utils/sortsupport.h"
@@ -422,6 +423,9 @@ typedef struct EState
 	HeapTuple  *es_epqTuple;	/* array of EPQ substitute tuples */
 	bool	   *es_epqTupleSet; /* true if EPQ tuple is provided */
 	bool	   *es_epqScanDone; /* true if EPQ tuple has been fetched */
+
+	/* The per-query shared memory area to use for parallel execution. */
+	dsa_area *es_query_area;
 } EState;
 
 
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 9a2d869..951e421 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -235,6 +235,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_BUFFER_MAPPING,
 	LWTRANCHE_LOCK_MANAGER,
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
+	LWTRANCHE_PARALLEL_EXEC_AREA,
 	LWTRANCHE_FIRST_USER_DEFINED
 }	BuiltinTrancheIds;

Haribabu Kommi

kommi.haribabu@gmail.com

about 9 years ago

In reply to: Thomas Munro (#6)

Re: Creating a DSA area to provide work space for parallel execution

On Thu, Dec 1, 2016 at 10:35 PM, Thomas Munro <thomas.munro@enterprisedb.com

wrote:

On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Here's a new version to apply on top of dsa-v7.patch.

Here's a version to go with dsa-v8.patch.

Moved to next CF with "needs review" status.

Regards,
Hari Babu
Fujitsu Australia

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Thomas Munro (#6)

Re: Creating a DSA area to provide work space for parallel execution

On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Here's a new version to apply on top of dsa-v7.patch.

Here's a version to go with dsa-v8.patch.

Thomas has spent a fair amount of time beating me up off-list about
the fact that we have no way to recycle LWLock tranche IDs, and I've
spent a fair amount of time trying to defend that decision, but it's
really a problem here. This is all well and good the way it's written
provided that there's only one parallel context in existence at a
time, but should that number ever exceed 1, which it can, then the
tranche's array_base will be incorrect for LWLocks in one or the other
tranche.

That *almost* doesn't matter. If you're not compiling with dtrace or
LOCK_DEBUG or LWLOCK_STATS, you'll be fine. And if you are compiling
with one of those things, I believe the consequences will be no worse
than an occasional nonsensical lock ID. It's halfway tempting to just
accept that as a known wart, but, uggh, that sucks.

It's a bit hard to come up with a better alternative. We could set
aside a certain number of tranche IDs for parallel contexts, say 16.
Then as long as the same backend doesn't try to do create more than 16
parallel contexts at the same time, we'll be fine. And a backend
really shouldn't have more than 16 Gather nodes active at the same
time. But it could. In fact, even if the planner never created more
than one Gather node in the same plan (which it sometimes does), the
leader can have multiple parallel contexts active for different
queries at the same time. It could fire off a Gather and then later
some part of that plan that's running in the master could call a
function that tries to execute some OTHER query which also happens to
involve a Gather and then while the master is executing its part of
THAT query the same thing could happen all over again, so any
arbitrary limit we install here can be exceeded in, admittedly,
extremely contrived scenarios.

Moreover, the LWLock tranche system itself is limited to a 16-bit
integer for tranche IDs, so there can't be more than 65536 of those
over the lifetime of the cluster no matter what. Since
008608b9d51061b1f598c197477b3dc7be9c4a64, that limit is rather
pointless; as things stand today, we could change the tranche ID to a
32-bit integer without losing anything. But changing that might eat
up bit space that somebody wants to use later to solve some other
problem, and anyway by itself it doesn't fix anything. Also, it would
allow LWLockTrancheArray to get regrettably large.

Another alternative is to have the DSA system fix up the tranche base
address every time we enter a DSA function that might need to take a
lock. The DSA code never returns with any relevant locks held, so the
base address can't get obsoleted by some other DSA while a lock using
the old base address is still held. That's sort of ugly too, and it
adds a little overhead to fix a problem that will bite almost nobody,
but it's formally correct and seems fairly watertight. Basically each
DSA function that needs to take a lock would need to first do this:

area->lwlock_tranche.array_base = &area->control->pools[0];

Maybe that's not too bad... thoughts?

BTW, I just noticed that the dsa_area_control member called "lock" is
going to get a fairly random lock ID, based on the number of bytes by
which "lock" follows "pools". Wouldn't it be better to put all of the
LWLocks - both the general locks and the per-pool locks - in one
array, probably with the general lock first, so that T_ID() will do
the right thing for such locks?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Robert Haas (#8)

1 attachment(s)

Re: Creating a DSA area to provide work space for parallel execution

On Mon, Dec 5, 2016 at 3:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Here's a new version to apply on top of dsa-v7.patch.

Here's a version to go with dsa-v8.patch.

Thomas has spent a fair amount of time beating me up off-list about
the fact that we have no way to recycle LWLock tranche IDs, and I've
spent a fair amount of time trying to defend that decision, but it's
really a problem here. This is all well and good the way it's written
provided that there's only one parallel context in existence at a
time, but should that number ever exceed 1, which it can, then the
tranche's array_base will be incorrect for LWLocks in one or the other
tranche.

That *almost* doesn't matter. If you're not compiling with dtrace or
LOCK_DEBUG or LWLOCK_STATS, you'll be fine. And if you are compiling
with one of those things, I believe the consequences will be no worse
than an occasional nonsensical lock ID. It's halfway tempting to just
accept that as a known wart, but, uggh, that sucks.

It's a bit hard to come up with a better alternative.

After thinking about this some more, I realized that the problem here
boils down to T_ID() not being smart enough to cope with multiple
instances of the same tranche at different base addresses. We can
either make it more complicated so that it can do that, or (drum roll,
please!) get rid of it altogether. I don't think making it more
complicated is very desirable, because I think that we end up
computing T_ID() for every lock acquisition and release whenever
--enable-dtrace is used, even if dtrace is not actually in use. And
the usefulness of T_ID() for debugging is pretty marginal, with one
important exception, which is that currently T_ID() is used to
distinguish between individual LWLocks in the main tranche. So here's
my proposal:

1. Give every LWLock in the main tranche a separate tranche ID. This
has been previously proposed, so it's not a revolutionary concept.

2. Always identify LWLocks in pg_stat_activity only by tranche ID,
never offset within the tranche, not even for the main tranche. This
results in pg_stat_activity saying "LWLock" rather than either
"LWLockNamed" or "LWLockTranche", which is a user-visible behavior
change but not AFAICS a very upsetting one.

3. Change the dtrace probes that currently pass both T_NAME() and
T_ID() to pass only T_NAME(). This is a minor loss of information for
dtrace, but in view of (1) it's not a very significant loss.

4. Change LOCK_DEBUG and LWLOCK_STATS output that identifies locks
using T_NAME() and T_ID() to instead identify them by T_NAME() and the
pointer address. Since these are only developer debugging facilities
not intended for production builds, I think it's OK to expose the
pointer address, and it's arguably MORE useful to do so than to expose
an offset into an array with an unknown base address.

5. Remove T_ID() from the can't-happen elog() in LWLockRelease().

6. Remove T_ID() itself. And then, since that's the only client of
array_base/array_stride, remove those too. And then, since there's
nothing left in LWLockTranche except for the tranche name, get rid of
the whole structure, simplifying a whole bunch of code all over the
system.

Patch implementing all of this attached. There's further
simplification that could be done here -- with array_stride gone, we
could do more at compile time rather than run-time -- but I think that
can be left for another day.

The overall result of this is a considerable savings of code:

It also noticeably reduces the number of bytes of machine code
generated for lwlock.c:

[rhaas pgsql]$ size src/backend/storage/lmgr/lwlock.o # echo master
__TEXT __DATA __OBJC others dec hex
11815 3360 0 46430 61605 f0a5
[rhaas pgsql]$ size src/backend/storage/lmgr/lwlock.o # echo lwlock
__TEXT __DATA __OBJC others dec hex
11199 3264 0 45487 59950 ea2e

That's better than a 5% reduction in the code size of a very hot
module just by removing something that almost nobody uses or cares
about. That's with --enable-dtrace but without --enable-cassert.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

remove_lwlock_t_id-v1.patchtext/x-patch; charset=US-ASCII; name=remove_lwlock_t_id-v1.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 128ee13..5b58d2e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -646,18 +646,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
        <itemizedlist>
         <listitem>
          <para>
-          <literal>LWLockNamed</>: The backend is waiting for a specific named
-          lightweight lock.  Each such lock protects a particular data
-          structure in shared memory.  <literal>wait_event</> will contain
-          the name of the lightweight lock.
-         </para>
-        </listitem>
-        <listitem>
-         <para>
-          <literal>LWLockTranche</>: The backend is waiting for one of a
-          group of related lightweight locks.  All locks in the group perform
-          a similar function; <literal>wait_event</> will identify the general
-          purpose of locks in that group.
+          <literal>LWLock</>: The backend is waiting for a lightweight lock.
+          Each such lock protects a particular data structure in shared memory.
+          <literal>wait_event</> will contain a name identifying the purpose
+          of the lightweight lock.  (Some locks have specific names; others
+          are part of a group of locks each with a similar purpose.)
          </para>
         </listitem>
         <listitem>
@@ -825,7 +818,7 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
 
       <tbody>
        <row>
-        <entry morerows="41"><literal>LWLockNamed</></entry>
+        <entry morerows="57"><literal>LWLock</></entry>
         <entry><literal>ShmemIndexLock</></entry>
         <entry>Waiting to find or allocate space in shared memory.</entry>
        </row>
@@ -1011,7 +1004,6 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry>Waiting to read or update old snapshot control information.</entry>
         </row>
         <row>
-         <entry morerows="15"><literal>LWLockTranche</></entry>
          <entry><literal>clog</></entry>
          <entry>Waiting for I/O on a clog (transaction status) buffer.</entry>
         </row>
@@ -1279,7 +1271,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
  pid  | wait_event_type |  wait_event
 ------+-----------------+---------------
  2540 | Lock            | relation
- 6644 | LWLockNamed     | ProcArrayLock
+ 6644 | LWLock          | ProcArrayLock
 (2 rows)
 </programlisting>
    </para>
@@ -3347,55 +3339,49 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
     </row>
     <row>
      <entry><literal>lwlock-acquire</literal></entry>
-     <entry><literal>(char *, int, LWLockMode)</literal></entry>
+     <entry><literal>(char *, LWLockMode)</literal></entry>
      <entry>Probe that fires when an LWLock has been acquired.
       arg0 is the LWLock's tranche.
-      arg1 is the LWLock's offset within its tranche.
-      arg2 is the requested lock mode, either exclusive or shared.</entry>
+      arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
      <entry><literal>lwlock-release</literal></entry>
-     <entry><literal>(char *, int)</literal></entry>
+     <entry><literal>(char *)</literal></entry>
      <entry>Probe that fires when an LWLock has been released (but note
       that any released waiters have not yet been awakened).
-      arg0 is the LWLock's tranche.
-      arg1 is the LWLock's offset within its tranche.</entry>
+      arg0 is the LWLock's tranche.</entry>
     </row>
     <row>
      <entry><literal>lwlock-wait-start</literal></entry>
-     <entry><literal>(char *, int, LWLockMode)</literal></entry>
+     <entry><literal>(char *, LWLockMode)</literal></entry>
      <entry>Probe that fires when an LWLock was not immediately available and
       a server process has begun to wait for the lock to become available.
       arg0 is the LWLock's tranche.
-      arg1 is the LWLock's offset within its tranche.
-      arg2 is the requested lock mode, either exclusive or shared.</entry>
+      arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
      <entry><literal>lwlock-wait-done</literal></entry>
-     <entry><literal>(char *, int, LWLockMode)</literal></entry>
+     <entry><literal>(char *, LWLockMode)</literal></entry>
      <entry>Probe that fires when a server process has been released from its
       wait for an LWLock (it does not actually have the lock yet).
       arg0 is the LWLock's tranche.
-      arg1 is the LWLock's offset within its tranche.
-      arg2 is the requested lock mode, either exclusive or shared.</entry>
+      arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
      <entry><literal>lwlock-condacquire</literal></entry>
-     <entry><literal>(char *, int, LWLockMode)</literal></entry>
+     <entry><literal>(char *, LWLockMode)</literal></entry>
      <entry>Probe that fires when an LWLock was successfully acquired when the
       caller specified no waiting.
       arg0 is the LWLock's tranche.
-      arg1 is the LWLock's offset within its tranche.
-      arg2 is the requested lock mode, either exclusive or shared.</entry>
+      arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
      <entry><literal>lwlock-condacquire-fail</literal></entry>
-     <entry><literal>(char *, int, LWLockMode)</literal></entry>
+     <entry><literal>(char *, LWLockMode)</literal></entry>
      <entry>Probe that fires when an LWLock was not successfully acquired when
       the caller specified no waiting.
       arg0 is the LWLock's tranche.
-      arg1 is the LWLock's offset within its tranche.
-      arg2 is the requested lock mode, either exclusive or shared.</entry>
+      arg1 is the requested lock mode, either exclusive or shared.</entry>
     </row>
     <row>
      <entry><literal>lock-wait-start</literal></entry>
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index bbae584..8b95f35 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -216,9 +216,6 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		Assert(strlen(name) + 1 < SLRU_MAX_NAME_LENGTH);
 		strlcpy(shared->lwlock_tranche_name, name, SLRU_MAX_NAME_LENGTH);
 		shared->lwlock_tranche_id = tranche_id;
-		shared->lwlock_tranche.name = shared->lwlock_tranche_name;
-		shared->lwlock_tranche.array_base = shared->buffer_locks;
-		shared->lwlock_tranche.array_stride = sizeof(LWLockPadded);
 
 		ptr += BUFFERALIGN(offset);
 		for (slotno = 0; slotno < nslots; slotno++)
@@ -237,7 +234,8 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		Assert(found);
 
 	/* Register SLRU tranche in the main tranches array */
-	LWLockRegisterTranche(shared->lwlock_tranche_id, &shared->lwlock_tranche);
+	LWLockRegisterTranche(shared->lwlock_tranche_id,
+						  shared->lwlock_tranche_name);
 
 	/*
 	 * Initialize the unshared control struct, including directory path. We
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 084401d..aa9ee5a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -517,7 +517,6 @@ typedef struct XLogCtlInsert
 	 * WAL insertion locks.
 	 */
 	WALInsertLockPadded *WALInsertLocks;
-	LWLockTranche WALInsertLockTranche;
 } XLogCtlInsert;
 
 /*
@@ -4688,7 +4687,7 @@ XLOGShmemInit(void)
 		/* Initialize local copy of WALInsertLocks and register the tranche */
 		WALInsertLocks = XLogCtl->Insert.WALInsertLocks;
 		LWLockRegisterTranche(LWTRANCHE_WAL_INSERT,
-							  &XLogCtl->Insert.WALInsertLockTranche);
+							  "wal_insert");
 		return;
 	}
 	memset(XLogCtl, 0, sizeof(XLogCtlData));
@@ -4711,11 +4710,7 @@ XLOGShmemInit(void)
 		(WALInsertLockPadded *) allocptr;
 	allocptr += sizeof(WALInsertLockPadded) * NUM_XLOGINSERT_LOCKS;
 
-	XLogCtl->Insert.WALInsertLockTranche.name = "wal_insert";
-	XLogCtl->Insert.WALInsertLockTranche.array_base = WALInsertLocks;
-	XLogCtl->Insert.WALInsertLockTranche.array_stride = sizeof(WALInsertLockPadded);
-
-	LWLockRegisterTranche(LWTRANCHE_WAL_INSERT, &XLogCtl->Insert.WALInsertLockTranche);
+	LWLockRegisterTranche(LWTRANCHE_WAL_INSERT, "wal_insert");
 	for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
 	{
 		LWLockInitialize(&WALInsertLocks[i].l.lock, LWTRANCHE_WAL_INSERT);
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index c7584cb..61e6a2c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3152,11 +3152,8 @@ pgstat_get_wait_event_type(uint32 wait_event_info)
 
 	switch (classId)
 	{
-		case PG_WAIT_LWLOCK_NAMED:
-			event_type = "LWLockNamed";
-			break;
-		case PG_WAIT_LWLOCK_TRANCHE:
-			event_type = "LWLockTranche";
+		case PG_WAIT_LWLOCK:
+			event_type = "LWLock";
 			break;
 		case PG_WAIT_LOCK:
 			event_type = "Lock";
@@ -3209,8 +3206,7 @@ pgstat_get_wait_event(uint32 wait_event_info)
 
 	switch (classId)
 	{
-		case PG_WAIT_LWLOCK_NAMED:
-		case PG_WAIT_LWLOCK_TRANCHE:
+		case PG_WAIT_LWLOCK:
 			event_name = GetLWLockIdentifier(classId, eventId);
 			break;
 		case PG_WAIT_LOCK:
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index cc2b513..0deec75 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -143,7 +143,6 @@ typedef struct ReplicationStateOnDisk
 typedef struct ReplicationStateCtl
 {
 	int			tranche_id;
-	LWLockTranche tranche;
 	ReplicationState states[FLEXIBLE_ARRAY_MEMBER];
 } ReplicationStateCtl;
 
@@ -474,11 +473,6 @@ ReplicationOriginShmemInit(void)
 		int			i;
 
 		replication_states_ctl->tranche_id = LWTRANCHE_REPLICATION_ORIGIN;
-		replication_states_ctl->tranche.name = "replication_origin";
-		replication_states_ctl->tranche.array_base =
-			&replication_states[0].lock;
-		replication_states_ctl->tranche.array_stride =
-			sizeof(ReplicationState);
 
 		MemSet(replication_states, 0, ReplicationOriginShmemSize());
 
@@ -488,7 +482,7 @@ ReplicationOriginShmemInit(void)
 	}
 
 	LWLockRegisterTranche(replication_states_ctl->tranche_id,
-						  &replication_states_ctl->tranche);
+						  "replication_origin");
 }
 
 /* ---------------------------------------------------------------------------
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index d8ed005..cf814d1 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -98,8 +98,6 @@ ReplicationSlot *MyReplicationSlot = NULL;
 int			max_replication_slots = 0;	/* the maximum number of replication
 										 * slots */
 
-static LWLockTranche ReplSlotIOLWLockTranche;
-
 static void ReplicationSlotDropAcquired(void);
 static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 
@@ -141,12 +139,8 @@ ReplicationSlotsShmemInit(void)
 		ShmemInitStruct("ReplicationSlot Ctl", ReplicationSlotsShmemSize(),
 						&found);
 
-	ReplSlotIOLWLockTranche.name = "replication_slot_io";
-	ReplSlotIOLWLockTranche.array_base =
-		((char *) ReplicationSlotCtl) + offsetof(ReplicationSlotCtlData, replication_slots) +offsetof(ReplicationSlot, io_in_progress_lock);
-	ReplSlotIOLWLockTranche.array_stride = sizeof(ReplicationSlot);
 	LWLockRegisterTranche(LWTRANCHE_REPLICATION_SLOT_IO_IN_PROGRESS,
-						  &ReplSlotIOLWLockTranche);
+						  "replication_slot_io");
 
 	if (!found)
 	{
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index a4163cf..c507389 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -21,8 +21,6 @@
 BufferDescPadded *BufferDescriptors;
 char	   *BufferBlocks;
 LWLockMinimallyPadded *BufferIOLWLockArray = NULL;
-LWLockTranche BufferIOLWLockTranche;
-LWLockTranche BufferContentLWLockTranche;
 WritebackContext BackendWritebackContext;
 CkptSortItem *CkptBufferIds;
 
@@ -90,18 +88,8 @@ InitBufferPool(void)
 						NBuffers * (Size) sizeof(LWLockMinimallyPadded),
 						&foundIOLocks);
 
-	BufferIOLWLockTranche.name = "buffer_io";
-	BufferIOLWLockTranche.array_base = BufferIOLWLockArray;
-	BufferIOLWLockTranche.array_stride = sizeof(LWLockMinimallyPadded);
-	LWLockRegisterTranche(LWTRANCHE_BUFFER_IO_IN_PROGRESS,
-						  &BufferIOLWLockTranche);
-
-	BufferContentLWLockTranche.name = "buffer_content";
-	BufferContentLWLockTranche.array_base =
-		((char *) BufferDescriptors) + offsetof(BufferDesc, content_lock);
-	BufferContentLWLockTranche.array_stride = sizeof(BufferDescPadded);
-	LWLockRegisterTranche(LWTRANCHE_BUFFER_CONTENT,
-						  &BufferContentLWLockTranche);
+	LWLockRegisterTranche(LWTRANCHE_BUFFER_IO_IN_PROGRESS, "buffer_io");
+	LWLockRegisterTranche(LWTRANCHE_BUFFER_CONTENT, "buffer_content");
 
 	/*
 	 * The array used to sort to-be-checkpointed buffer ids is located in
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index bf38470..0f63755 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -106,9 +106,6 @@ static TransactionId *KnownAssignedXids;
 static bool *KnownAssignedXidsValid;
 static TransactionId latestObservedXid = InvalidTransactionId;
 
-/* LWLock tranche for backend locks */
-static LWLockTranche ProcLWLockTranche;
-
 /*
  * If we're in STANDBY_SNAPSHOT_PENDING state, standbySnapshotPendingXmin is
  * the highest xid that might still be running that we don't have in
@@ -266,11 +263,7 @@ CreateSharedProcArray(void)
 	}
 
 	/* Register and initialize fields of ProcLWLockTranche */
-	ProcLWLockTranche.name = "proc";
-	ProcLWLockTranche.array_base = (char *) (ProcGlobal->allProcs) +
-		offsetof(PGPROC, backendLock);
-	ProcLWLockTranche.array_stride = sizeof(PGPROC);
-	LWLockRegisterTranche(LWTRANCHE_PROC, &ProcLWLockTranche);
+	LWLockRegisterTranche(LWTRANCHE_PROC, "proc");
 }
 
 /*
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 03c4c78..4b381e4 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -108,18 +108,14 @@ extern slock_t *ShmemLock;
 #define LW_SHARED_MASK				((uint32) ((1 << 24)-1))
 
 /*
- * This is indexed by tranche ID and stores metadata for all tranches known
+ * This is indexed by tranche ID and stores the names of all tranches known
  * to the current backend.
  */
-static LWLockTranche **LWLockTrancheArray = NULL;
+static char **LWLockTrancheArray = NULL;
 static int	LWLockTranchesAllocated = 0;
 
 #define T_NAME(lock) \
-	(LWLockTrancheArray[(lock)->tranche]->name)
-#define T_ID(lock) \
-	((int) ((((char *) lock) - \
-		((char *) LWLockTrancheArray[(lock)->tranche]->array_base)) / \
-		LWLockTrancheArray[(lock)->tranche]->array_stride))
+	(LWLockTrancheArray[(lock)->tranche])
 
 /*
  * This points to the main array of LWLocks in shared memory.  Backends inherit
@@ -127,10 +123,6 @@ static int	LWLockTranchesAllocated = 0;
  * where we have special measures to pass it down).
  */
 LWLockPadded *MainLWLockArray = NULL;
-static LWLockTranche MainLWLockTranche;
-static LWLockTranche BufMappingLWLockTranche;
-static LWLockTranche LockManagerLWLockTranche;
-static LWLockTranche PredicateLockManagerLWLockTranche;
 
 /*
  * We use this structure to keep track of locked LWLocks for release
@@ -175,7 +167,7 @@ static inline void LWLockReportWaitEnd(void);
 typedef struct lwlock_stats_key
 {
 	int			tranche;
-	int			instance;
+	void	   *instance;
 }	lwlock_stats_key;
 
 typedef struct lwlock_stats
@@ -202,32 +194,18 @@ PRINT_LWDEBUG(const char *where, LWLock *lock, LWLockMode mode)
 	if (Trace_lwlocks)
 	{
 		uint32		state = pg_atomic_read_u32(&lock->state);
-		int			id = T_ID(lock);
-
-		if (lock->tranche == 0 && id < NUM_INDIVIDUAL_LWLOCKS)
-			ereport(LOG,
-					(errhidestmt(true),
-					 errhidecontext(true),
-					 errmsg_internal("%d: %s(%s): excl %u shared %u haswaiters %u waiters %u rOK %d",
-									 MyProcPid,
-									 where, MainLWLockNames[id],
-									 (state & LW_VAL_EXCLUSIVE) != 0,
-									 state & LW_SHARED_MASK,
-									 (state & LW_FLAG_HAS_WAITERS) != 0,
-									 pg_atomic_read_u32(&lock->nwaiters),
-									 (state & LW_FLAG_RELEASE_OK) != 0)));
-		else
-			ereport(LOG,
-					(errhidestmt(true),
-					 errhidecontext(true),
-					 errmsg_internal("%d: %s(%s %d): excl %u shared %u haswaiters %u waiters %u rOK %d",
-									 MyProcPid,
-									 where, T_NAME(lock), id,
-									 (state & LW_VAL_EXCLUSIVE) != 0,
-									 state & LW_SHARED_MASK,
-									 (state & LW_FLAG_HAS_WAITERS) != 0,
-									 pg_atomic_read_u32(&lock->nwaiters),
-									 (state & LW_FLAG_RELEASE_OK) != 0)));
+
+		ereport(LOG,
+				(errhidestmt(true),
+				 errhidecontext(true),
+				 errmsg_internal("%d: %s(%s %p): excl %u shared %u haswaiters %u waiters %u rOK %d",
+								 MyProcPid,
+								 where, T_NAME(lock), lock,
+								 (state & LW_VAL_EXCLUSIVE) != 0,
+								 state & LW_SHARED_MASK,
+								 (state & LW_FLAG_HAS_WAITERS) != 0,
+								 pg_atomic_read_u32(&lock->nwaiters),
+								 (state & LW_FLAG_RELEASE_OK) != 0)));
 	}
 }
 
@@ -237,20 +215,11 @@ LOG_LWDEBUG(const char *where, LWLock *lock, const char *msg)
 	/* hide statement & context here, otherwise the log is just too verbose */
 	if (Trace_lwlocks)
 	{
-		int			id = T_ID(lock);
-
-		if (lock->tranche == 0 && id < NUM_INDIVIDUAL_LWLOCKS)
-			ereport(LOG,
-					(errhidestmt(true),
-					 errhidecontext(true),
-					 errmsg_internal("%s(%s): %s", where,
-									 MainLWLockNames[id], msg)));
-		else
-			ereport(LOG,
-					(errhidestmt(true),
-					 errhidecontext(true),
-					 errmsg_internal("%s(%s %d): %s", where,
-									 T_NAME(lock), id, msg)));
+		ereport(LOG,
+				(errhidestmt(true),
+				 errhidecontext(true),
+				 errmsg_internal("%s(%s %p): %s", where,
+								 T_NAME(lock), lock, msg)));
 	}
 }
 
@@ -315,8 +284,8 @@ print_lwlock_stats(int code, Datum arg)
 	while ((lwstats = (lwlock_stats *) hash_seq_search(&scan)) != NULL)
 	{
 		fprintf(stderr,
-				"PID %d lwlock %s %d: shacq %u exacq %u blk %u spindelay %u dequeue self %u\n",
-				MyProcPid, LWLockTrancheArray[lwstats->key.tranche]->name,
+				"PID %d lwlock %s %p: shacq %u exacq %u blk %u spindelay %u dequeue self %u\n",
+				MyProcPid, LWLockTrancheArray[lwstats->key.tranche],
 				lwstats->key.instance, lwstats->sh_acquire_count,
 				lwstats->ex_acquire_count, lwstats->block_count,
 				lwstats->spin_delay_count, lwstats->dequeue_self_count);
@@ -342,7 +311,7 @@ get_lwlock_stats_entry(LWLock *lock)
 
 	/* Fetch or create the entry. */
 	key.tranche = lock->tranche;
-	key.instance = T_ID(lock);
+	key.instance = lock;
 	lwstats = hash_search(lwlock_stats_htab, &key, HASH_ENTER, &found);
 	if (!found)
 	{
@@ -464,7 +433,7 @@ InitializeLWLocks(void)
 
 	/* Initialize all individual LWLocks in main array */
 	for (id = 0, lock = MainLWLockArray; id < NUM_INDIVIDUAL_LWLOCKS; id++, lock++)
-		LWLockInitialize(&lock->lock, LWTRANCHE_MAIN);
+		LWLockInitialize(&lock->lock, id);
 
 	/* Initialize buffer mapping LWLocks in main array */
 	lock = MainLWLockArray + NUM_INDIVIDUAL_LWLOCKS;
@@ -506,10 +475,8 @@ InitializeLWLocks(void)
 			name = trancheNames;
 			trancheNames += strlen(request->tranche_name) + 1;
 			strcpy(name, request->tranche_name);
-			tranche->lwLockTranche.name = name;
 			tranche->trancheId = LWLockNewTrancheId();
-			tranche->lwLockTranche.array_base = lock;
-			tranche->lwLockTranche.array_stride = sizeof(LWLockPadded);
+			tranche->trancheName = name;
 
 			for (j = 0; j < request->num_lwlocks; j++, lock++)
 				LWLockInitialize(&lock->lock, tranche->trancheId);
@@ -527,39 +494,25 @@ RegisterLWLockTranches(void)
 
 	if (LWLockTrancheArray == NULL)
 	{
-		LWLockTranchesAllocated = 32;
-		LWLockTrancheArray = (LWLockTranche **)
+		LWLockTranchesAllocated = 64;
+		LWLockTrancheArray = (char **)
 			MemoryContextAllocZero(TopMemoryContext,
-						  LWLockTranchesAllocated * sizeof(LWLockTranche *));
+						  LWLockTranchesAllocated * sizeof(char *));
 		Assert(LWLockTranchesAllocated >= LWTRANCHE_FIRST_USER_DEFINED);
 	}
 
-	MainLWLockTranche.name = "main";
-	MainLWLockTranche.array_base = MainLWLockArray;
-	MainLWLockTranche.array_stride = sizeof(LWLockPadded);
-	LWLockRegisterTranche(LWTRANCHE_MAIN, &MainLWLockTranche);
+	for (i = 0; i < NUM_INDIVIDUAL_LWLOCKS; ++i)
+		LWLockRegisterTranche(i, MainLWLockNames[i]);
 
-	BufMappingLWLockTranche.name = "buffer_mapping";
-	BufMappingLWLockTranche.array_base = MainLWLockArray + NUM_INDIVIDUAL_LWLOCKS;
-	BufMappingLWLockTranche.array_stride = sizeof(LWLockPadded);
-	LWLockRegisterTranche(LWTRANCHE_BUFFER_MAPPING, &BufMappingLWLockTranche);
-
-	LockManagerLWLockTranche.name = "lock_manager";
-	LockManagerLWLockTranche.array_base = MainLWLockArray + NUM_INDIVIDUAL_LWLOCKS +
-		NUM_BUFFER_PARTITIONS;
-	LockManagerLWLockTranche.array_stride = sizeof(LWLockPadded);
-	LWLockRegisterTranche(LWTRANCHE_LOCK_MANAGER, &LockManagerLWLockTranche);
-
-	PredicateLockManagerLWLockTranche.name = "predicate_lock_manager";
-	PredicateLockManagerLWLockTranche.array_base = MainLWLockArray + NUM_INDIVIDUAL_LWLOCKS +
-		NUM_BUFFER_PARTITIONS + NUM_LOCK_PARTITIONS;
-	PredicateLockManagerLWLockTranche.array_stride = sizeof(LWLockPadded);
-	LWLockRegisterTranche(LWTRANCHE_PREDICATE_LOCK_MANAGER, &PredicateLockManagerLWLockTranche);
+	LWLockRegisterTranche(LWTRANCHE_BUFFER_MAPPING, "buffer_mapping");
+	LWLockRegisterTranche(LWTRANCHE_LOCK_MANAGER, "lock_manager");
+	LWLockRegisterTranche(LWTRANCHE_PREDICATE_LOCK_MANAGER,
+						  "predicate_lock_manager");
 
 	/* Register named tranches. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
 		LWLockRegisterTranche(NamedLWLockTrancheArray[i].trancheId,
-							  &NamedLWLockTrancheArray[i].lwLockTranche);
+							  NamedLWLockTrancheArray[i].trancheName);
 }
 
 /*
@@ -633,7 +586,7 @@ LWLockNewTrancheId(void)
  * (TopMemoryContext, static variable, or similar).
  */
 void
-LWLockRegisterTranche(int tranche_id, LWLockTranche *tranche)
+LWLockRegisterTranche(int tranche_id, char *tranche_name)
 {
 	Assert(LWLockTrancheArray != NULL);
 
@@ -645,15 +598,14 @@ LWLockRegisterTranche(int tranche_id, LWLockTranche *tranche)
 		while (i <= tranche_id)
 			i *= 2;
 
-		LWLockTrancheArray = (LWLockTranche **)
-			repalloc(LWLockTrancheArray,
-					 i * sizeof(LWLockTranche *));
+		LWLockTrancheArray = (char **)
+			repalloc(LWLockTrancheArray, i * sizeof(char *));
 		LWLockTranchesAllocated = i;
 		while (j < LWLockTranchesAllocated)
 			LWLockTrancheArray[j++] = NULL;
 	}
 
-	LWLockTrancheArray[tranche_id] = tranche;
+	LWLockTrancheArray[tranche_id] = tranche_name;
 }
 
 /*
@@ -729,12 +681,7 @@ LWLockInitialize(LWLock *lock, int tranche_id)
 static inline void
 LWLockReportWaitStart(LWLock *lock)
 {
-	int			lockId = T_ID(lock);
-
-	if (lock->tranche == 0)
-		pgstat_report_wait_start(PG_WAIT_LWLOCK_NAMED | (uint16) lockId);
-	else
-		pgstat_report_wait_start(PG_WAIT_LWLOCK_TRANCHE | lock->tranche);
+	pgstat_report_wait_start(PG_WAIT_LWLOCK | lock->tranche);
 }
 
 /*
@@ -752,10 +699,7 @@ LWLockReportWaitEnd(void)
 const char *
 GetLWLockIdentifier(uint32 classId, uint16 eventId)
 {
-	if (classId == PG_WAIT_LWLOCK_NAMED)
-		return MainLWLockNames[eventId];
-
-	Assert(classId == PG_WAIT_LWLOCK_TRANCHE);
+	Assert(classId == PG_WAIT_LWLOCK);
 
 	/*
 	 * It is quite possible that user has registered tranche in one of the
@@ -763,10 +707,10 @@ GetLWLockIdentifier(uint32 classId, uint16 eventId)
 	 * all of them, so we can't assume the tranche is registered here.
 	 */
 	if (eventId >= LWLockTranchesAllocated ||
-		LWLockTrancheArray[eventId]->name == NULL)
+		LWLockTrancheArray[eventId] == NULL)
 		return "extension";
 
-	return LWLockTrancheArray[eventId]->name;
+	return LWLockTrancheArray[eventId];
 }
 
 /*
@@ -1279,7 +1223,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 #endif
 
 		LWLockReportWaitStart(lock);
-		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), T_ID(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
 
 		for (;;)
 		{
@@ -1301,7 +1245,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 		}
 #endif
 
-		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), T_ID(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
 		LWLockReportWaitEnd();
 
 		LOG_LWDEBUG("LWLockAcquire", lock, "awakened");
@@ -1310,7 +1254,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 		result = false;
 	}
 
-	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), T_ID(lock), mode);
+	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode);
 
 	/* Add lock to list of locks held by this backend */
 	held_lwlocks[num_held_lwlocks].lock = lock;
@@ -1361,14 +1305,14 @@ LWLockConditionalAcquire(LWLock *lock, LWLockMode mode)
 		RESUME_INTERRUPTS();
 
 		LOG_LWDEBUG("LWLockConditionalAcquire", lock, "failed");
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), T_ID(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), mode);
 	}
 	else
 	{
 		/* Add lock to list of locks held by this backend */
 		held_lwlocks[num_held_lwlocks].lock = lock;
 		held_lwlocks[num_held_lwlocks++].mode = mode;
-		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), T_ID(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), mode);
 	}
 	return !mustwait;
 }
@@ -1440,7 +1384,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 #endif
 
 			LWLockReportWaitStart(lock);
-			TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), T_ID(lock), mode);
+			TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
 
 			for (;;)
 			{
@@ -1458,7 +1402,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 				Assert(nwaiters < MAX_BACKENDS);
 			}
 #endif
-			TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), T_ID(lock), mode);
+			TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
 			LWLockReportWaitEnd();
 
 			LOG_LWDEBUG("LWLockAcquireOrWait", lock, "awakened");
@@ -1488,8 +1432,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 		/* Failed to get lock, so release interrupt holdoff */
 		RESUME_INTERRUPTS();
 		LOG_LWDEBUG("LWLockAcquireOrWait", lock, "failed");
-		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), T_ID(lock),
-													 mode);
+		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), mode);
 	}
 	else
 	{
@@ -1497,7 +1440,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 		/* Add lock to list of locks held by this backend */
 		held_lwlocks[num_held_lwlocks].lock = lock;
 		held_lwlocks[num_held_lwlocks++].mode = mode;
-		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), T_ID(lock), mode);
+		TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), mode);
 	}
 
 	return !mustwait;
@@ -1657,8 +1600,7 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 #endif
 
 		LWLockReportWaitStart(lock);
-		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), T_ID(lock),
-										   LW_EXCLUSIVE);
+		TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), LW_EXCLUSIVE);
 
 		for (;;)
 		{
@@ -1677,8 +1619,7 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 		}
 #endif
 
-		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), T_ID(lock),
-										  LW_EXCLUSIVE);
+		TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), LW_EXCLUSIVE);
 		LWLockReportWaitEnd();
 
 		LOG_LWDEBUG("LWLockWaitForVar", lock, "awakened");
@@ -1686,7 +1627,7 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 		/* Now loop back and check the status of the lock again. */
 	}
 
-	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), T_ID(lock), LW_EXCLUSIVE);
+	TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), LW_EXCLUSIVE);
 
 	/*
 	 * Fix the process wait semaphore's count for any absorbed wakeups.
@@ -1784,7 +1725,7 @@ LWLockRelease(LWLock *lock)
 			break;
 
 	if (i < 0)
-		elog(ERROR, "lock %s %d is not held", T_NAME(lock), T_ID(lock));
+		elog(ERROR, "lock %s is not held", T_NAME(lock));
 
 	mode = held_lwlocks[i].mode;
 
@@ -1829,7 +1770,7 @@ LWLockRelease(LWLock *lock)
 		LWLockWakeup(lock);
 	}
 
-	TRACE_POSTGRESQL_LWLOCK_RELEASE(T_NAME(lock), T_ID(lock));
+	TRACE_POSTGRESQL_LWLOCK_RELEASE(T_NAME(lock));
 
 	/*
 	 * Now okay to allow cancel/die interrupts.
diff --git a/src/backend/utils/mmgr/dsa.c b/src/backend/utils/mmgr/dsa.c
index abe3f1a..988a297 100644
--- a/src/backend/utils/mmgr/dsa.c
+++ b/src/backend/utils/mmgr/dsa.c
@@ -362,9 +362,6 @@ struct dsa_area
 	/* Pointer to the control object in shared memory. */
 	dsa_area_control *control;
 
-	/* The lock tranche for this process. */
-	LWLockTranche lwlock_tranche;
-
 	/* Has the mapping been pinned? */
 	bool		mapping_pinned;
 
@@ -1207,10 +1204,8 @@ create_internal(void *place, size_t size,
 	area->mapping_pinned = false;
 	memset(area->segment_maps, 0, sizeof(dsa_segment_map) * DSA_MAX_SEGMENTS);
 	area->high_segment_index = 0;
-	area->lwlock_tranche.array_base = &area->control->pools[0];
-	area->lwlock_tranche.array_stride = sizeof(dsa_area_pool);
-	area->lwlock_tranche.name = control->lwlock_tranche_name;
-	LWLockRegisterTranche(control->lwlock_tranche_id, &area->lwlock_tranche);
+	LWLockRegisterTranche(control->lwlock_tranche_id,
+						  control->lwlock_tranche_name);
 	LWLockInitialize(&control->lock, control->lwlock_tranche_id);
 	for (i = 0; i < DSA_NUM_SIZE_CLASSES; ++i)
 		LWLockInitialize(DSA_SCLASS_LOCK(area, i),
@@ -1267,10 +1262,8 @@ attach_internal(void *place, dsm_segment *segment, dsa_handle handle)
 	memset(&area->segment_maps[0], 0,
 		   sizeof(dsa_segment_map) * DSA_MAX_SEGMENTS);
 	area->high_segment_index = 0;
-	area->lwlock_tranche.array_base = &area->control->pools[0];
-	area->lwlock_tranche.array_stride = sizeof(dsa_area_pool);
-	area->lwlock_tranche.name = control->lwlock_tranche_name;
-	LWLockRegisterTranche(control->lwlock_tranche_id, &area->lwlock_tranche);
+	LWLockRegisterTranche(control->lwlock_tranche_id,
+						  control->lwlock_tranche_name);
 
 	/* Set up the segment map for this process's mapping. */
 	segment_map = &area->segment_maps[0];
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index 2f92dfa..adcebe2 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -28,14 +28,14 @@ provider postgresql {
 	probe transaction__commit(LocalTransactionId);
 	probe transaction__abort(LocalTransactionId);
 
-	probe lwlock__acquire(const char *, int, LWLockMode);
-	probe lwlock__release(const char *, int);
-	probe lwlock__wait__start(const char *, int, LWLockMode);
-	probe lwlock__wait__done(const char *, int, LWLockMode);
-	probe lwlock__condacquire(const char *, int, LWLockMode);
-	probe lwlock__condacquire__fail(const char *, int, LWLockMode);
-	probe lwlock__acquire__or__wait(const char *, int, LWLockMode);
-	probe lwlock__acquire__or__wait__fail(const char *, int, LWLockMode);
+	probe lwlock__acquire(const char *, LWLockMode);
+	probe lwlock__release(const char *);
+	probe lwlock__wait__start(const char *, LWLockMode);
+	probe lwlock__wait__done(const char *, LWLockMode);
+	probe lwlock__condacquire(const char *, LWLockMode);
+	probe lwlock__condacquire__fail(const char *, LWLockMode);
+	probe lwlock__acquire__or__wait(const char *, LWLockMode);
+	probe lwlock__acquire__or__wait__fail(const char *, LWLockMode);
 
 	probe lock__wait__start(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
 	probe lock__wait__done(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 5fcebc5..ca8ab7b 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -104,7 +104,6 @@ typedef struct SlruSharedData
 
 	/* LWLocks */
 	int			lwlock_tranche_id;
-	LWLockTranche lwlock_tranche;
 	char		lwlock_tranche_name[SLRU_MAX_NAME_LENGTH];
 	LWLockPadded *buffer_locks;
 } SlruSharedData;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 152ff06..282f8ae 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -715,8 +715,7 @@ typedef enum BackendState
  * Wait Classes
  * ----------
  */
-#define PG_WAIT_LWLOCK_NAMED		0x01000000U
-#define PG_WAIT_LWLOCK_TRANCHE		0x02000000U
+#define PG_WAIT_LWLOCK				0x01000000U
 #define PG_WAIT_LOCK				0x03000000U
 #define PG_WAIT_BUFFER_PIN			0x04000000U
 #define PG_WAIT_ACTIVITY			0x05000000U
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 9a2d869..db1c687 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -25,32 +25,6 @@
 struct PGPROC;
 
 /*
- * Prior to PostgreSQL 9.4, every lightweight lock in the system was stored
- * in a single array.  For convenience and for compatibility with past
- * releases, we still have a main array, but it's now also permissible to
- * store LWLocks elsewhere in the main shared memory segment or in a dynamic
- * shared memory segment.  Each array of lwlocks forms a separate "tranche".
- *
- * It's occasionally necessary to identify a particular LWLock "by name"; e.g.
- * because we wish to report the lock to dtrace.  We could store a name or
- * other identifying information in the lock itself, but since it's common
- * to have many nearly-identical locks (e.g. one per buffer) this would end
- * up wasting significant amounts of memory.  Instead, each lwlock stores a
- * tranche ID which tells us which array it's part of.  Based on that, we can
- * figure out where the lwlock lies within the array using the data structure
- * shown below; the lock is then identified based on the tranche name and
- * computed array index.  We need the array stride because the array might not
- * be an array of lwlocks, but rather some larger data structure that includes
- * one or more lwlocks per element.
- */
-typedef struct LWLockTranche
-{
-	const char *name;
-	void	   *array_base;
-	Size		array_stride;
-} LWLockTranche;
-
-/*
  * Code outside of lwlock.c should not manipulate the contents of this
  * structure directly, but we have to declare it here to allow LWLocks to be
  * incorporated into other data structures.
@@ -118,8 +92,8 @@ extern char *MainLWLockNames[];
 /* struct for storing named tranche information */
 typedef struct NamedLWLockTranche
 {
-	LWLockTranche lwLockTranche;
 	int			trancheId;
+	char	   *trancheName;
 } NamedLWLockTranche;
 
 extern PGDLLIMPORT NamedLWLockTranche *NamedLWLockTrancheArray;
@@ -199,9 +173,9 @@ extern LWLockPadded *GetNamedLWLockTranche(const char *tranche_name);
  * There is another, more flexible method of obtaining lwlocks. First, call
  * LWLockNewTrancheId just once to obtain a tranche ID; this allocates from
  * a shared counter.  Next, each individual process using the tranche should
- * call LWLockRegisterTranche() to associate that tranche ID with appropriate
- * metadata.  Finally, LWLockInitialize should be called just once per lwlock,
- * passing the tranche ID as an argument.
+ * call LWLockRegisterTranche() to associate that tranche ID with a name.
+ * Finally, LWLockInitialize should be called just once per lwlock, passing
+ * the tranche ID as an argument.
  *
  * It may seem strange that each process using the tranche must register it
  * separately, but dynamic shared memory segments aren't guaranteed to be
@@ -209,17 +183,18 @@ extern LWLockPadded *GetNamedLWLockTranche(const char *tranche_name);
  * registration in the main shared memory segment wouldn't work for that case.
  */
 extern int	LWLockNewTrancheId(void);
-extern void LWLockRegisterTranche(int tranche_id, LWLockTranche *tranche);
+extern void LWLockRegisterTranche(int tranche_id, char *tranche_name);
 extern void LWLockInitialize(LWLock *lock, int tranche_id);
 
 /*
- * We reserve a few predefined tranche IDs.  A call to LWLockNewTrancheId
- * will never return a value less than LWTRANCHE_FIRST_USER_DEFINED.
+ * Every tranche ID less than NUM_INDIVIDUAL_LWLOCKS is reserved; also,
+ * we reserve additional tranche IDs for builtin tranches not included in
+ * the set of individual LWLocks.  A call to LWLockNewTrancheId will never
+ * return a value less than LWTRANCHE_FIRST_USER_DEFINED.
  */
 typedef enum BuiltinTrancheIds
 {
-	LWTRANCHE_MAIN,
-	LWTRANCHE_CLOG_BUFFERS,
+	LWTRANCHE_CLOG_BUFFERS = NUM_INDIVIDUAL_LWLOCKS,
 	LWTRANCHE_COMMITTS_BUFFERS,
 	LWTRANCHE_SUBTRANS_BUFFERS,
 	LWTRANCHE_MXACTOFFSET_BUFFERS,

#10

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Robert Haas (#9)

Re: Creating a DSA area to provide work space for parallel execution

On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Thoughts?

Hearing no objections, I've gone ahead and committed this. If that
makes somebody really unhappy I can revert it, but I am betting that
the real story is that nobody cares about preserving T_ID().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Robert Haas (#10)

Re: Creating a DSA area to provide work space for parallel execution

On 2016-12-16 11:41:49 -0500, Robert Haas wrote:

On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Thoughts?

Hearing no objections, I've gone ahead and committed this. If that
makes somebody really unhappy I can revert it, but I am betting that
the real story is that nobody cares about preserving T_ID().

I don't care about T_ID, but I do care about breaking extensions using
lwlocks like for the 3rd release in a row or such. This is getting a
bit ridiculous.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Andres Freund (#11)

Re: Creating a DSA area to provide work space for parallel execution

On Fri, Dec 16, 2016 at 12:28 PM, Andres Freund <andres@anarazel.de> wrote:

On 2016-12-16 11:41:49 -0500, Robert Haas wrote:

On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Thoughts?

Hearing no objections, I've gone ahead and committed this. If that
makes somebody really unhappy I can revert it, but I am betting that
the real story is that nobody cares about preserving T_ID().

I don't care about T_ID, but I do care about breaking extensions using
lwlocks like for the 3rd release in a row or such. This is getting a
bit ridiculous.

Hmm, I hadn't thought about that. :-)

I guess we could put back array_base/array_stride and just ignore
them, but that hardly seems better. Then we're stuck with that wart
forever.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Robert Haas (#12)

Re: Creating a DSA area to provide work space for parallel execution

On Fri, Dec 16, 2016 at 12:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Dec 16, 2016 at 12:28 PM, Andres Freund <andres@anarazel.de> wrote:

On 2016-12-16 11:41:49 -0500, Robert Haas wrote:

On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Thoughts?

Hearing no objections, I've gone ahead and committed this. If that
makes somebody really unhappy I can revert it, but I am betting that
the real story is that nobody cares about preserving T_ID().

I don't care about T_ID, but I do care about breaking extensions using
lwlocks like for the 3rd release in a row or such. This is getting a
bit ridiculous.

Hmm, I hadn't thought about that. :-)

Err, that was supposed to be :-( As in sad, not happy.

I guess we could put back array_base/array_stride and just ignore
them, but that hardly seems better. Then we're stuck with that wart
forever.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Alvaro Herrera

alvherre@2ndquadrant.com

about 9 years ago

In reply to: Robert Haas (#10)

Re: Creating a DSA area to provide work space for parallel execution

Robert Haas wrote:

On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Thoughts?

Hearing no objections, I've gone ahead and committed this. If that
makes somebody really unhappy I can revert it, but I am betting that
the real story is that nobody cares about preserving T_ID().

AFAICT the comment on LWLockRegisterTranche is confused; it talks about
an allocated object being passed, but there isn't any.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Robert Haas (#12)

Re: Creating a DSA area to provide work space for parallel execution

On 2016-12-16 12:32:49 -0500, Robert Haas wrote:

On Fri, Dec 16, 2016 at 12:28 PM, Andres Freund <andres@anarazel.de> wrote:

On 2016-12-16 11:41:49 -0500, Robert Haas wrote:

On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Thoughts?

Hearing no objections, I've gone ahead and committed this. If that
makes somebody really unhappy I can revert it, but I am betting that
the real story is that nobody cares about preserving T_ID().

I don't care about T_ID, but I do care about breaking extensions using
lwlocks like for the 3rd release in a row or such. This is getting a
bit ridiculous.

Hmm, I hadn't thought about that. :-)

I guess we could put back array_base/array_stride and just ignore
them, but that hardly seems better. Then we're stuck with that wart
forever.

Yea, I don't think that's good either. I'm all for evolving APIs when
necessary, but constantly breaking the same API release after release
seems indicative of needing to spend a bit more time on it in the first
round. I've a few extensions (one of them citus) that work across
versions, and the ifdef-ery is significant.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Robert Haas (#13)

Re: Creating a DSA area to provide work space for parallel execution

On 2016-12-16 12:33:11 -0500, Robert Haas wrote:

On Fri, Dec 16, 2016 at 12:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Dec 16, 2016 at 12:28 PM, Andres Freund <andres@anarazel.de> wrote:

On 2016-12-16 11:41:49 -0500, Robert Haas wrote:

On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Thoughts?

Hearing no objections, I've gone ahead and committed this. If that
makes somebody really unhappy I can revert it, but I am betting that
the real story is that nobody cares about preserving T_ID().

I don't care about T_ID, but I do care about breaking extensions using
lwlocks like for the 3rd release in a row or such. This is getting a
bit ridiculous.

Hmm, I hadn't thought about that. :-)

Err, that was supposed to be :-( As in sad, not happy.

Both work for me ;)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Andres Freund (#15)

Re: Creating a DSA area to provide work space for parallel execution

On Fri, Dec 16, 2016 at 12:37 PM, Andres Freund <andres@anarazel.de> wrote:

Yea, I don't think that's good either. I'm all for evolving APIs when
necessary, but constantly breaking the same API release after release
seems indicative of needing to spend a bit more time on it in the first
round.

I am not sure the issue was time so much as the ability to foresee all
the problems we'd want to solve. 9.4 added tranches and converted
everything to LWLock * instead of LWLockId, but I think all of the old
APIs still worked. At that point, we didn't have parallel query and
we weren't that close to having it, so I was loathe to do anything too
invasive. 9.5 removed LWLockAcquireWithVar() and added
LWLockReleaseClearVar(), but most of the API was still fine. 9.6
moved almost everything to tranches and removed RequestAddinLWLocks()
and LWLockAssign(), which was a big break for extensions -- but that
wasn't because of parallel query, but rather because we wanted to use
tranches to support the wait_event stuff and we also wanted to be able
to pad different tranches differently. This latest change is inspired
by the fact that the 9.4-era changes to support parallel query weren't
quite smart enough to be able to cope with the possibility of multiple
tranches with the same tranche ID in a reasonable way. That last one
is indeed an oversight but in January of 2014 it wasn't very clear
that we were going to have tranche-ified every LWLock in the system,
without which this change wouldn't be possible. Quite a lot of work
by at least 3 or 4 different people went into that tranche-ification
effort.

I think it's quite surprising how fast the LWLock system has evolved
over the last few years. When I first started working on PostgreSQL
in 2008, there was no LWLockAcquireOrWait, none of the Var stuff, the
padding was much less sophisticated, no tranches, no atomics, volatile
qualifiers all over the place... and all of that has changed in the
last 5 years. Pretty amazing, actually, IMHO. If our LWLocks improve
as much between now and 2021 as they have between 2011 and now, it'll
be worth almost any amount of API breakage to get there. I don't
personally have any plans or ideas that would involve breaking things
for extensions again any time soon, but I won't be very surprised if
somebody else comes up with one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Alvaro Herrera (#14)

Re: Creating a DSA area to provide work space for parallel execution

On Fri, Dec 16, 2016 at 12:36 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Robert Haas wrote:

On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Thoughts?

Hearing no objections, I've gone ahead and committed this. If that
makes somebody really unhappy I can revert it, but I am betting that
the real story is that nobody cares about preserving T_ID().

AFAICT the comment on LWLockRegisterTranche is confused; it talks about
an allocated object being passed, but there isn't any.

Oops. Thanks, will fix.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Alvaro Herrera

alvherre@2ndquadrant.com

about 9 years ago

In reply to: Robert Haas (#17)

Re: Creating a DSA area to provide work space for parallel execution

Robert Haas wrote:

I am not sure the issue was time so much as the ability to foresee all
the problems we'd want to solve.

I think all that movement is okay. It's not like we're breaking things
to no purpose. The amount of effort that has to go into making
extensions compile with changed APIs is not *that* bad, surely; it's
pretty clear that we need to keep moving forward. All the changes you
listed that required lwlock changed have clearly been worth the
breakage, IMO.

I think it's quite surprising how fast the LWLock system has evolved
over the last few years. When I first started working on PostgreSQL
in 2008, there was no LWLockAcquireOrWait, none of the Var stuff, the
padding was much less sophisticated, no tranches, no atomics, volatile
qualifiers all over the place... and all of that has changed in the
last 5 years. Pretty amazing, actually, IMHO.

Yes, I agree.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Thomas Munro

thomas.munro@enterprisedb.com

about 9 years ago

In reply to: Robert Haas (#10)

Re: Creating a DSA area to provide work space for parallel execution

On Sat, Dec 17, 2016 at 5:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Thoughts?

Hearing no objections, I've gone ahead and committed this. If that
makes somebody really unhappy I can revert it, but I am betting that
the real story is that nobody cares about preserving T_ID().

I suppose LWLock could include a uint16 member 'id' without making
LWLock any larger, since it would replace the padding between
'tranche' and 'state'. But I think a better solution, if anyone
really wants these T_ID numbers from a dtrace script, would be to add
lock address to the existing lwlock probes, and then figure out a way
to add some new probes to report the base and stride in the right
places so you can do the book keeping in dtrace variables.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Thomas Munro (#20)

Re: Creating a DSA area to provide work space for parallel execution

On Sun, Dec 18, 2016 at 10:33 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Sat, Dec 17, 2016 at 5:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Thoughts?

Hearing no objections, I've gone ahead and committed this. If that
makes somebody really unhappy I can revert it, but I am betting that
the real story is that nobody cares about preserving T_ID().

I suppose LWLock could include a uint16 member 'id' without making
LWLock any larger, since it would replace the padding between
'tranche' and 'state'. But I think a better solution, if anyone
really wants these T_ID numbers from a dtrace script, would be to add
lock address to the existing lwlock probes, and then figure out a way
to add some new probes to report the base and stride in the right
places so you can do the book keeping in dtrace variables.

Interesting idea. My bet is that nobody cares about dtrace very much.
probes.d was first added in 2006, and continued to gradually get new
probes (all from submissions by Robert Lor) until 2009. Since then,
nothing has happened except for perfunctory maintenance by various
committers trying to solve other problems who had to maintain the work
that had already been done whether they cared about it or not. (I
notice that the probes lwlock-acquire-or-wait and
lwlock-acquire-or-wait-fail have never been documented.) I would be
tempted to rip the whole thing out as an attractive nuisance, except
that I bet somebody would complain about that...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Thomas Munro (#6)

Re: Creating a DSA area to provide work space for parallel execution

On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Here's a new version to apply on top of dsa-v7.patch.

Here's a version to go with dsa-v8.patch.

All right, so I've committed this, but not before (1) renaming some
things that you added, (2) adding documentation for the new LWLock
tranche, (3) not creating the DSA in the "simulated DSM using
backend-private memory" case, and (4) some cosmetic tweaks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23

Thomas Munro

thomas.munro@enterprisedb.com

about 9 years ago

In reply to: Robert Haas (#22)

Re: Creating a DSA area to provide work space for parallel execution

On Tue, Dec 20, 2016 at 11:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Here's a new version to apply on top of dsa-v7.patch.

Here's a version to go with dsa-v8.patch.

All right, so I've committed this, but not before (1) renaming some
things that you added, (2) adding documentation for the new LWLock
tranche, (3) not creating the DSA in the "simulated DSM using
backend-private memory" case, and (4) some cosmetic tweaks.

Thanks! Hmm, so now in rare circumstances we can finish up running
ExecXXXInitializeDSM, but later have estate->es_query_dsa == NULL.
This is something to be careful about.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Thomas Munro (#23)

Re: Creating a DSA area to provide work space for parallel execution

On Mon, Dec 19, 2016 at 6:35 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Tue, Dec 20, 2016 at 11:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Here's a new version to apply on top of dsa-v7.patch.

Here's a version to go with dsa-v8.patch.

All right, so I've committed this, but not before (1) renaming some
things that you added, (2) adding documentation for the new LWLock
tranche, (3) not creating the DSA in the "simulated DSM using
backend-private memory" case, and (4) some cosmetic tweaks.

Thanks! Hmm, so now in rare circumstances we can finish up running
ExecXXXInitializeDSM, but later have estate->es_query_dsa == NULL.
This is something to be careful about.

Yes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

about 9 years ago

In reply to: Robert Haas (#21)

Re: Creating a DSA area to provide work space for parallel execution

At Mon, 19 Dec 2016 12:24:38 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoY7cHfGQRjYRixEsvBp63Ud9AgBuksc4oWS-Lu7tX5=TA@mail.gmail.com>

Interesting idea. My bet is that nobody cares about dtrace very much.

FWIW, I just had an inquiry about system tap for PostgreSQL but
he probed using function names. I mean, at least few people care
it, not nobody, but...

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers