Creating a DSA area to provide work space for parallel execution

Started by Thomas Munroover 9 years ago25 messageshackers
Jump to latest
#1Thomas Munro
thomas.munro@gmail.com

Hi hackers,

A couple of months ago I proposed dynamic shared areas[1]/messages/by-id/CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com. DSA areas
are dynamically sized shared memory heaps that backends can use to
share data, building on top of the existing DSM infrastructure.

One target use case for DSA areas is to provide work space for
parallel query execution. To that end, here is a patch to create a
DSA area for use by executor code. The area is automatically attached
to the leader and all worker processes for the duration of the
parallel query, and is available as estate->es_query_area.

Backends already have access to shared memory through a single DSM
segment managed with a table-of-contents. The TOC provides a way to
carve out some shared storage space for individual executor nodes and
look it up later by plan node ID. That works for things like
ParallelHeapScanDescData whose size is known up front, but not so well
if you need something more like a heap in which to build shared data
structures. Through estate->es_query_area, a parallel-aware executor
node can use and recycle arbitrary amounts of shared memory with an
allocate/free interface.

Motivating use cases include shared bitmaps and shared hash tables
(patches to follow).

Currently, this doesn't mean you don't also need the existing DSM
segment. In order share data structures in the DSA area, you need a
way to exchange pointers to find them, and the existing segment + TOC
mechanism is ideal for that.

One obvious problem is that this patch results in at least *two* DSM
segments being created for every parallel query execution: the main
segment used for parallel execution, and then the initial segment
managed by the DSA area. One thought is that DSA areas are the more
general mechanism, so perhaps we should figure out how to store
contents of the existing segment in it. The TOC interface would need
a few tweaks to be able to live in memory allocated with dsa_allocate,
and they we'd need to share that address with other backends so that
they could find it (cf the current approach of finding the TOC at the
start of the segment). I haven't prototyped that yet. That'd involve
changing the wording "InitializeDSM" that appears in various places
including the FDW API, which has been putting me off...

This patch depends on dsa-v2.patch[1]/messages/by-id/CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com.

[1]: /messages/by-id/CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

dsa-area-for-executor-v1.patchapplication/octet-stream; name=dsa-area-for-executor-v1.patchDownload+43-0
#2Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#1)
Re: Creating a DSA area to provide work space for parallel execution

On Wed, Oct 5, 2016 at 10:32 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

One obvious problem is that this patch results in at least *two* DSM
segments being created for every parallel query execution: the main
segment used for parallel execution, and then the initial segment
managed by the DSA area. One thought is that DSA areas are the more
general mechanism, so perhaps we should figure out how to store
contents of the existing segment in it. The TOC interface would need
a few tweaks to be able to live in memory allocated with dsa_allocate,
and they we'd need to share that address with other backends so that
they could find it (cf the current approach of finding the TOC at the
start of the segment). I haven't prototyped that yet. That'd involve
changing the wording "InitializeDSM" that appears in various places
including the FDW API, which has been putting me off...

... or we could allow DSA areas to be constructed inside existing
shmem, as in the attached patch which requires dsa_create_in_place,
from the patch at
/messages/by-id/CAEepm=0-pbokaQdCXhtYn=w64MmdJz4hYW7qcSU235ar276x7w@mail.gmail.com
.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

dsa-area-for-executor-v2.patchapplication/octet-stream; name=dsa-area-for-executor-v2.patchDownload+37-0
#3Dilip Kumar
dilipbalaut@gmail.com
In reply to: Thomas Munro (#2)
Re: Creating a DSA area to provide work space for parallel execution

On Wed, Nov 23, 2016 at 5:42 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

... or we could allow DSA areas to be constructed inside existing
shmem, as in the attached patch which requires dsa_create_in_place,
from the patch at
/messages/by-id/CAEepm=0-pbokaQdCXhtYn=w64MmdJz4hYW7qcSU235ar276x7w@mail.gmail.com
.

Seems like there is problem in this patch..

In below code, pei->area is not yet allocated at this point , so
estate->es_query_area will always me NULL ?

+ estate->es_query_area = pei->area;
+
/* Create a parallel context. */
pcxt = CreateParallelContext(ParallelQueryMain, nworkers);
pei->pcxt = pcxt;
@@ -413,6 +423,10 @@ ExecInitParallelPlan(PlanState *planstate, EState
*estate, int nworkers)
shm_toc_estimate_keys(&pcxt->estimator, 1);
}

+ /* Estimate space for DSA area. */
+ shm_toc_estimate_chunk(&pcxt->estimator, PARALLEL_AREA_SIZE);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
  /* Everyone's had a chance to ask for space, so now create the DSM. */
  InitializeParallelDSM(pcxt);

@@ -466,6 +480,14 @@ ExecInitParallelPlan(PlanState *planstate, EState
*estate, int nworkers)
pei->instrumentation = instrumentation;
}

+ /* Create a DSA area that can be used by the leader and all workers. */
+ area_space = shm_toc_allocate(pcxt->toc, PARALLEL_AREA_SIZE);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_AREA, area_space);
+ pei->area = dsa_create_in_place(area_space, PARALLEL_AREA_SIZE,
+ LWTRANCHE_PARALLEL_EXEC_AREA,
+   "parallel query memory area");

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#3)
Re: Creating a DSA area to provide work space for parallel execution

I have one more question,

In V1 we were calling dsa_detach in ExecParallelCleanup and in
ParallelQueryMain, but it's removed in v2.

Any specific reason ?
Does this need to be used differently ?

 ExecParallelCleanup(ParallelExecutorInfo *pei)
 {
+ if (pei->area != NULL)
+ {
+ dsa_detach(pei->area);
+ pei->area = NULL;
+ }

After this changes, I am getting DSM segment leak warning.

I am calling dsa_allocate and dsa_free.

On Thu, Nov 24, 2016 at 8:09 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Nov 23, 2016 at 5:42 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

... or we could allow DSA areas to be constructed inside existing
shmem, as in the attached patch which requires dsa_create_in_place,
from the patch at
/messages/by-id/CAEepm=0-pbokaQdCXhtYn=w64MmdJz4hYW7qcSU235ar276x7w@mail.gmail.com
.

Seems like there is problem in this patch..

In below code, pei->area is not yet allocated at this point , so
estate->es_query_area will always me NULL ?

+ estate->es_query_area = pei->area;
+
/* Create a parallel context. */
pcxt = CreateParallelContext(ParallelQueryMain, nworkers);
pei->pcxt = pcxt;
@@ -413,6 +423,10 @@ ExecInitParallelPlan(PlanState *planstate, EState
*estate, int nworkers)
shm_toc_estimate_keys(&pcxt->estimator, 1);
}

+ /* Estimate space for DSA area. */
+ shm_toc_estimate_chunk(&pcxt->estimator, PARALLEL_AREA_SIZE);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Everyone's had a chance to ask for space, so now create the DSM. */
InitializeParallelDSM(pcxt);

@@ -466,6 +480,14 @@ ExecInitParallelPlan(PlanState *planstate, EState
*estate, int nworkers)
pei->instrumentation = instrumentation;
}

+ /* Create a DSA area that can be used by the leader and all workers. */
+ area_space = shm_toc_allocate(pcxt->toc, PARALLEL_AREA_SIZE);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_AREA, area_space);
+ pei->area = dsa_create_in_place(area_space, PARALLEL_AREA_SIZE,
+ LWTRANCHE_PARALLEL_EXEC_AREA,
+   "parallel query memory area");

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Thomas Munro
thomas.munro@gmail.com
In reply to: Dilip Kumar (#4)
Re: Creating a DSA area to provide work space for parallel execution

On Fri, Nov 25, 2016 at 4:32 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

I have one more question,

In V1 we were calling dsa_detach in ExecParallelCleanup and in
ParallelQueryMain, but it's removed in v2.

Any specific reason ?
Does this need to be used differently ?

ExecParallelCleanup(ParallelExecutorInfo *pei)
{
+ if (pei->area != NULL)
+ {
+ dsa_detach(pei->area);
+ pei->area = NULL;
+ }

After this changes, I am getting DSM segment leak warning.

Thanks! I had some chicken-vs-egg problems dealing with cleanup of
DSM segments belonging to DSA areas created inside DSM segments.
Here's a new version to apply on top of dsa-v7.patch.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

dsa-area-for-executor-v3.patchapplication/octet-stream; name=dsa-area-for-executor-v3.patchDownload+45-0
#6Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#5)
Re: Creating a DSA area to provide work space for parallel execution

On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Here's a new version to apply on top of dsa-v7.patch.

Here's a version to go with dsa-v8.patch.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

dsa-area-for-executor-v4.patchapplication/octet-stream; name=dsa-area-for-executor-v4.patchDownload+41-0
#7Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Thomas Munro (#6)
Re: Creating a DSA area to provide work space for parallel execution

On Thu, Dec 1, 2016 at 10:35 PM, Thomas Munro <thomas.munro@enterprisedb.com

wrote:

On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Here's a new version to apply on top of dsa-v7.patch.

Here's a version to go with dsa-v8.patch.

Moved to next CF with "needs review" status.

Regards,
Hari Babu
Fujitsu Australia

#8Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#6)
Re: Creating a DSA area to provide work space for parallel execution

On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Here's a new version to apply on top of dsa-v7.patch.

Here's a version to go with dsa-v8.patch.

Thomas has spent a fair amount of time beating me up off-list about
the fact that we have no way to recycle LWLock tranche IDs, and I've
spent a fair amount of time trying to defend that decision, but it's
really a problem here. This is all well and good the way it's written
provided that there's only one parallel context in existence at a
time, but should that number ever exceed 1, which it can, then the
tranche's array_base will be incorrect for LWLocks in one or the other
tranche.

That *almost* doesn't matter. If you're not compiling with dtrace or
LOCK_DEBUG or LWLOCK_STATS, you'll be fine. And if you are compiling
with one of those things, I believe the consequences will be no worse
than an occasional nonsensical lock ID. It's halfway tempting to just
accept that as a known wart, but, uggh, that sucks.

It's a bit hard to come up with a better alternative. We could set
aside a certain number of tranche IDs for parallel contexts, say 16.
Then as long as the same backend doesn't try to do create more than 16
parallel contexts at the same time, we'll be fine. And a backend
really shouldn't have more than 16 Gather nodes active at the same
time. But it could. In fact, even if the planner never created more
than one Gather node in the same plan (which it sometimes does), the
leader can have multiple parallel contexts active for different
queries at the same time. It could fire off a Gather and then later
some part of that plan that's running in the master could call a
function that tries to execute some OTHER query which also happens to
involve a Gather and then while the master is executing its part of
THAT query the same thing could happen all over again, so any
arbitrary limit we install here can be exceeded in, admittedly,
extremely contrived scenarios.

Moreover, the LWLock tranche system itself is limited to a 16-bit
integer for tranche IDs, so there can't be more than 65536 of those
over the lifetime of the cluster no matter what. Since
008608b9d51061b1f598c197477b3dc7be9c4a64, that limit is rather
pointless; as things stand today, we could change the tranche ID to a
32-bit integer without losing anything. But changing that might eat
up bit space that somebody wants to use later to solve some other
problem, and anyway by itself it doesn't fix anything. Also, it would
allow LWLockTrancheArray to get regrettably large.

Another alternative is to have the DSA system fix up the tranche base
address every time we enter a DSA function that might need to take a
lock. The DSA code never returns with any relevant locks held, so the
base address can't get obsoleted by some other DSA while a lock using
the old base address is still held. That's sort of ugly too, and it
adds a little overhead to fix a problem that will bite almost nobody,
but it's formally correct and seems fairly watertight. Basically each
DSA function that needs to take a lock would need to first do this:

area->lwlock_tranche.array_base = &area->control->pools[0];

Maybe that's not too bad... thoughts?

BTW, I just noticed that the dsa_area_control member called "lock" is
going to get a fairly random lock ID, based on the number of bytes by
which "lock" follows "pools". Wouldn't it be better to put all of the
LWLocks - both the general locks and the per-pool locks - in one
array, probably with the general lock first, so that T_ID() will do
the right thing for such locks?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#8)
Re: Creating a DSA area to provide work space for parallel execution

On Mon, Dec 5, 2016 at 3:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Here's a new version to apply on top of dsa-v7.patch.

Here's a version to go with dsa-v8.patch.

Thomas has spent a fair amount of time beating me up off-list about
the fact that we have no way to recycle LWLock tranche IDs, and I've
spent a fair amount of time trying to defend that decision, but it's
really a problem here. This is all well and good the way it's written
provided that there's only one parallel context in existence at a
time, but should that number ever exceed 1, which it can, then the
tranche's array_base will be incorrect for LWLocks in one or the other
tranche.

That *almost* doesn't matter. If you're not compiling with dtrace or
LOCK_DEBUG or LWLOCK_STATS, you'll be fine. And if you are compiling
with one of those things, I believe the consequences will be no worse
than an occasional nonsensical lock ID. It's halfway tempting to just
accept that as a known wart, but, uggh, that sucks.

It's a bit hard to come up with a better alternative.

After thinking about this some more, I realized that the problem here
boils down to T_ID() not being smart enough to cope with multiple
instances of the same tranche at different base addresses. We can
either make it more complicated so that it can do that, or (drum roll,
please!) get rid of it altogether. I don't think making it more
complicated is very desirable, because I think that we end up
computing T_ID() for every lock acquisition and release whenever
--enable-dtrace is used, even if dtrace is not actually in use. And
the usefulness of T_ID() for debugging is pretty marginal, with one
important exception, which is that currently T_ID() is used to
distinguish between individual LWLocks in the main tranche. So here's
my proposal:

1. Give every LWLock in the main tranche a separate tranche ID. This
has been previously proposed, so it's not a revolutionary concept.

2. Always identify LWLocks in pg_stat_activity only by tranche ID,
never offset within the tranche, not even for the main tranche. This
results in pg_stat_activity saying "LWLock" rather than either
"LWLockNamed" or "LWLockTranche", which is a user-visible behavior
change but not AFAICS a very upsetting one.

3. Change the dtrace probes that currently pass both T_NAME() and
T_ID() to pass only T_NAME(). This is a minor loss of information for
dtrace, but in view of (1) it's not a very significant loss.

4. Change LOCK_DEBUG and LWLOCK_STATS output that identifies locks
using T_NAME() and T_ID() to instead identify them by T_NAME() and the
pointer address. Since these are only developer debugging facilities
not intended for production builds, I think it's OK to expose the
pointer address, and it's arguably MORE useful to do so than to expose
an offset into an array with an unknown base address.

5. Remove T_ID() from the can't-happen elog() in LWLockRelease().

6. Remove T_ID() itself. And then, since that's the only client of
array_base/array_stride, remove those too. And then, since there's
nothing left in LWLockTranche except for the tranche name, get rid of
the whole structure, simplifying a whole bunch of code all over the
system.

Patch implementing all of this attached. There's further
simplification that could be done here -- with array_stride gone, we
could do more at compile time rather than run-time -- but I think that
can be left for another day.

The overall result of this is a considerable savings of code:

doc/src/sgml/monitoring.sgml | 52 ++++-----
src/backend/access/transam/slru.c | 6 +-
src/backend/access/transam/xlog.c | 9 +-
src/backend/postmaster/pgstat.c | 10 +-
src/backend/replication/logical/origin.c | 8 +-
src/backend/replication/slot.c | 8 +-
src/backend/storage/buffer/buf_init.c | 16 +--
src/backend/storage/ipc/procarray.c | 9 +-
src/backend/storage/lmgr/lwlock.c | 175 ++++++++++---------------------
src/backend/utils/mmgr/dsa.c | 15 +--
src/backend/utils/probes.d | 16 +--
src/include/access/slru.h | 1 -
src/include/pgstat.h | 3 +-
src/include/storage/lwlock.h | 45 ++------
14 files changed, 112 insertions(+), 261 deletions(-)

It also noticeably reduces the number of bytes of machine code
generated for lwlock.c:

[rhaas pgsql]$ size src/backend/storage/lmgr/lwlock.o # echo master
__TEXT __DATA __OBJC others dec hex
11815 3360 0 46430 61605 f0a5
[rhaas pgsql]$ size src/backend/storage/lmgr/lwlock.o # echo lwlock
__TEXT __DATA __OBJC others dec hex
11199 3264 0 45487 59950 ea2e

That's better than a 5% reduction in the code size of a very hot
module just by removing something that almost nobody uses or cares
about. That's with --enable-dtrace but without --enable-cassert.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

remove_lwlock_t_id-v1.patchtext/x-patch; charset=US-ASCII; name=remove_lwlock_t_id-v1.patchDownload+112-261
#10Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#9)
Re: Creating a DSA area to provide work space for parallel execution

On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Thoughts?

Hearing no objections, I've gone ahead and committed this. If that
makes somebody really unhappy I can revert it, but I am betting that
the real story is that nobody cares about preserving T_ID().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#10)
Re: Creating a DSA area to provide work space for parallel execution

On 2016-12-16 11:41:49 -0500, Robert Haas wrote:

On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Thoughts?

Hearing no objections, I've gone ahead and committed this. If that
makes somebody really unhappy I can revert it, but I am betting that
the real story is that nobody cares about preserving T_ID().

I don't care about T_ID, but I do care about breaking extensions using
lwlocks like for the 3rd release in a row or such. This is getting a
bit ridiculous.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#11)
Re: Creating a DSA area to provide work space for parallel execution

On Fri, Dec 16, 2016 at 12:28 PM, Andres Freund <andres@anarazel.de> wrote:

On 2016-12-16 11:41:49 -0500, Robert Haas wrote:

On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Thoughts?

Hearing no objections, I've gone ahead and committed this. If that
makes somebody really unhappy I can revert it, but I am betting that
the real story is that nobody cares about preserving T_ID().

I don't care about T_ID, but I do care about breaking extensions using
lwlocks like for the 3rd release in a row or such. This is getting a
bit ridiculous.

Hmm, I hadn't thought about that. :-)

I guess we could put back array_base/array_stride and just ignore
them, but that hardly seems better. Then we're stuck with that wart
forever.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#12)
Re: Creating a DSA area to provide work space for parallel execution

On Fri, Dec 16, 2016 at 12:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Dec 16, 2016 at 12:28 PM, Andres Freund <andres@anarazel.de> wrote:

On 2016-12-16 11:41:49 -0500, Robert Haas wrote:

On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Thoughts?

Hearing no objections, I've gone ahead and committed this. If that
makes somebody really unhappy I can revert it, but I am betting that
the real story is that nobody cares about preserving T_ID().

I don't care about T_ID, but I do care about breaking extensions using
lwlocks like for the 3rd release in a row or such. This is getting a
bit ridiculous.

Hmm, I hadn't thought about that. :-)

Err, that was supposed to be :-( As in sad, not happy.

I guess we could put back array_base/array_stride and just ignore
them, but that hardly seems better. Then we're stuck with that wart
forever.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#10)
Re: Creating a DSA area to provide work space for parallel execution

Robert Haas wrote:

On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Thoughts?

Hearing no objections, I've gone ahead and committed this. If that
makes somebody really unhappy I can revert it, but I am betting that
the real story is that nobody cares about preserving T_ID().

AFAICT the comment on LWLockRegisterTranche is confused; it talks about
an allocated object being passed, but there isn't any.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#12)
Re: Creating a DSA area to provide work space for parallel execution

On 2016-12-16 12:32:49 -0500, Robert Haas wrote:

On Fri, Dec 16, 2016 at 12:28 PM, Andres Freund <andres@anarazel.de> wrote:

On 2016-12-16 11:41:49 -0500, Robert Haas wrote:

On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Thoughts?

Hearing no objections, I've gone ahead and committed this. If that
makes somebody really unhappy I can revert it, but I am betting that
the real story is that nobody cares about preserving T_ID().

I don't care about T_ID, but I do care about breaking extensions using
lwlocks like for the 3rd release in a row or such. This is getting a
bit ridiculous.

Hmm, I hadn't thought about that. :-)

I guess we could put back array_base/array_stride and just ignore
them, but that hardly seems better. Then we're stuck with that wart
forever.

Yea, I don't think that's good either. I'm all for evolving APIs when
necessary, but constantly breaking the same API release after release
seems indicative of needing to spend a bit more time on it in the first
round. I've a few extensions (one of them citus) that work across
versions, and the ifdef-ery is significant.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#13)
Re: Creating a DSA area to provide work space for parallel execution

On 2016-12-16 12:33:11 -0500, Robert Haas wrote:

On Fri, Dec 16, 2016 at 12:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Dec 16, 2016 at 12:28 PM, Andres Freund <andres@anarazel.de> wrote:

On 2016-12-16 11:41:49 -0500, Robert Haas wrote:

On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Thoughts?

Hearing no objections, I've gone ahead and committed this. If that
makes somebody really unhappy I can revert it, but I am betting that
the real story is that nobody cares about preserving T_ID().

I don't care about T_ID, but I do care about breaking extensions using
lwlocks like for the 3rd release in a row or such. This is getting a
bit ridiculous.

Hmm, I hadn't thought about that. :-)

Err, that was supposed to be :-( As in sad, not happy.

Both work for me ;)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#15)
Re: Creating a DSA area to provide work space for parallel execution

On Fri, Dec 16, 2016 at 12:37 PM, Andres Freund <andres@anarazel.de> wrote:

Yea, I don't think that's good either. I'm all for evolving APIs when
necessary, but constantly breaking the same API release after release
seems indicative of needing to spend a bit more time on it in the first
round.

I am not sure the issue was time so much as the ability to foresee all
the problems we'd want to solve. 9.4 added tranches and converted
everything to LWLock * instead of LWLockId, but I think all of the old
APIs still worked. At that point, we didn't have parallel query and
we weren't that close to having it, so I was loathe to do anything too
invasive. 9.5 removed LWLockAcquireWithVar() and added
LWLockReleaseClearVar(), but most of the API was still fine. 9.6
moved almost everything to tranches and removed RequestAddinLWLocks()
and LWLockAssign(), which was a big break for extensions -- but that
wasn't because of parallel query, but rather because we wanted to use
tranches to support the wait_event stuff and we also wanted to be able
to pad different tranches differently. This latest change is inspired
by the fact that the 9.4-era changes to support parallel query weren't
quite smart enough to be able to cope with the possibility of multiple
tranches with the same tranche ID in a reasonable way. That last one
is indeed an oversight but in January of 2014 it wasn't very clear
that we were going to have tranche-ified every LWLock in the system,
without which this change wouldn't be possible. Quite a lot of work
by at least 3 or 4 different people went into that tranche-ification
effort.

I think it's quite surprising how fast the LWLock system has evolved
over the last few years. When I first started working on PostgreSQL
in 2008, there was no LWLockAcquireOrWait, none of the Var stuff, the
padding was much less sophisticated, no tranches, no atomics, volatile
qualifiers all over the place... and all of that has changed in the
last 5 years. Pretty amazing, actually, IMHO. If our LWLocks improve
as much between now and 2021 as they have between 2011 and now, it'll
be worth almost any amount of API breakage to get there. I don't
personally have any plans or ideas that would involve breaking things
for extensions again any time soon, but I won't be very surprised if
somebody else comes up with one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#14)
Re: Creating a DSA area to provide work space for parallel execution

On Fri, Dec 16, 2016 at 12:36 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Robert Haas wrote:

On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Thoughts?

Hearing no objections, I've gone ahead and committed this. If that
makes somebody really unhappy I can revert it, but I am betting that
the real story is that nobody cares about preserving T_ID().

AFAICT the comment on LWLockRegisterTranche is confused; it talks about
an allocated object being passed, but there isn't any.

Oops. Thanks, will fix.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#17)
Re: Creating a DSA area to provide work space for parallel execution

Robert Haas wrote:

I am not sure the issue was time so much as the ability to foresee all
the problems we'd want to solve.

I think all that movement is okay. It's not like we're breaking things
to no purpose. The amount of effort that has to go into making
extensions compile with changed APIs is not *that* bad, surely; it's
pretty clear that we need to keep moving forward. All the changes you
listed that required lwlock changed have clearly been worth the
breakage, IMO.

I think it's quite surprising how fast the LWLock system has evolved
over the last few years. When I first started working on PostgreSQL
in 2008, there was no LWLockAcquireOrWait, none of the Var stuff, the
padding was much less sophisticated, no tranches, no atomics, volatile
qualifiers all over the place... and all of that has changed in the
last 5 years. Pretty amazing, actually, IMHO.

Yes, I agree.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Thomas Munro
thomas.munro@gmail.com
In reply to: Robert Haas (#10)
Re: Creating a DSA area to provide work space for parallel execution

On Sat, Dec 17, 2016 at 5:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Thoughts?

Hearing no objections, I've gone ahead and committed this. If that
makes somebody really unhappy I can revert it, but I am betting that
the real story is that nobody cares about preserving T_ID().

I suppose LWLock could include a uint16 member 'id' without making
LWLock any larger, since it would replace the padding between
'tranche' and 'state'. But I think a better solution, if anyone
really wants these T_ID numbers from a dtrace script, would be to add
lock address to the existing lwlock probes, and then figure out a way
to add some new probes to report the base and stride in the right
places so you can do the book keeping in dtrace variables.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#20)
#22Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#6)
#23Thomas Munro
thomas.munro@gmail.com
In reply to: Robert Haas (#22)
#24Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#23)
#25Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Robert Haas (#21)