es_query_dsa is broken
Hi hackers,
While reviewing commit c6755e23 I realised that es_query_dsa is
broken. It might have made some kind of sense as a name and a concept
in an earlier version of the proposal to add a DSA area for parallel
query's use, when the DSA area was going to be independent of parallel
query DSM segments and could have been used for the whole query. But
as it stands, each Gather [Merge] node has its own DSA area in its own
DSM segment, and that needs to be the one available to the nodes of
the child plan when executed in the leader process and it needs to be
not available to any other nodes in the tree. It's OK for the workers
to have just one es_query_dsa set up at the beginning and used for the
whole lifetime of the executor, but it's not OK for the leader to
install it and forget about it as it does now.
The attached draft patch (for discussion only) shows one way to fix
that: only install it for the duration of Gather[Merge]'s
ExecProcNode(child), instead of doing it in ExecInitParallelPlan().
Also for the [Re]IntializeDSM invocations.
This type of install-call-clear coding isn't ideal, but unfortunately
the area doesn't exist until the first time through ExecGather runs,
and we don't seem to have another suitable scope (ie some object
easily accessible to all children of a given Gather) to put it in
right now.
Better ideas?
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
fix-es_query_dsa-v1.patchapplication/octet-stream; name=fix-es_query_dsa-v1.patchDownload
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 53c5254be13..25dd82ece74 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -319,7 +319,7 @@ EstimateParamExecSpace(EState *estate, Bitmapset *params)
* parameter array) and then the datum as serialized by datumSerialize().
*/
static dsa_pointer
-SerializeParamExecParams(EState *estate, Bitmapset *params)
+SerializeParamExecParams(EState *estate, Bitmapset *params, dsa_area *area)
{
Size size;
int nparams;
@@ -330,8 +330,8 @@ SerializeParamExecParams(EState *estate, Bitmapset *params)
/* Allocate enough space for the current parameter values. */
size = EstimateParamExecSpace(estate, params);
- handle = dsa_allocate(estate->es_query_dsa, size);
- start_address = dsa_get_address(estate->es_query_dsa, handle);
+ handle = dsa_allocate(area, size);
+ start_address = dsa_get_address(area, handle);
/* First write the number of parameters as a 4-byte integer. */
nparams = bms_num_members(params);
@@ -716,12 +716,6 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
LWTRANCHE_PARALLEL_QUERY_DSA,
pcxt->seg);
- /*
- * Make the area available to executor nodes running in the leader.
- * See also ParallelQueryMain which makes it available to workers.
- */
- estate->es_query_dsa = pei->area;
-
/*
* Serialize parameters, if any, using DSA storage. We don't dare use
* the main parallel query DSM for this because we might relaunch
@@ -730,7 +724,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
*/
if (!bms_is_empty(sendParams))
{
- pei->param_exec = SerializeParamExecParams(estate, sendParams);
+ pei->param_exec = SerializeParamExecParams(estate, sendParams,
+ pei->area);
fpes->param_exec = pei->param_exec;
}
}
@@ -743,7 +738,9 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
d.pcxt = pcxt;
d.instrumentation = instrumentation;
d.nnodes = 0;
+ estate->es_query_dsa = pei->area;
ExecParallelInitializeDSM(planstate, &d);
+ estate->es_query_dsa = NULL;
/*
* Make sure that the world hasn't shifted under our feet. This could
@@ -812,19 +809,22 @@ ExecParallelReinitialize(PlanState *planstate,
/* Free any serialized parameters from the last round. */
if (DsaPointerIsValid(fpes->param_exec))
{
- dsa_free(estate->es_query_dsa, fpes->param_exec);
+ dsa_free(pei->area, fpes->param_exec);
fpes->param_exec = InvalidDsaPointer;
}
/* Serialize current parameter values if required. */
if (!bms_is_empty(sendParams))
{
- pei->param_exec = SerializeParamExecParams(estate, sendParams);
+ pei->param_exec = SerializeParamExecParams(estate, sendParams,
+ pei->area);
fpes->param_exec = pei->param_exec;
}
/* Traverse plan tree and let each child node reset associated state. */
+ estate->es_query_dsa = pei->area;
ExecParallelReInitializeDSM(planstate, pei->pcxt);
+ estate->es_query_dsa = NULL;
}
/*
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 212612b5351..42315d727ac 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -287,7 +287,11 @@ gather_getnext(GatherState *gatherstate)
if (gatherstate->need_to_scan_locally)
{
+ EState *estate = gatherstate->ps.state;
+
+ estate->es_query_dsa = gatherstate->pei->area;
outerTupleSlot = ExecProcNode(outerPlan);
+ estate->es_query_dsa = NULL;
if (!TupIsNull(outerTupleSlot))
return outerTupleSlot;
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 166f2064ff7..a40635ffe14 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -637,8 +637,11 @@ gather_merge_readnext(GatherMergeState *gm_state, int reader, bool nowait)
{
PlanState *outerPlan = outerPlanState(gm_state);
TupleTableSlot *outerTupleSlot;
+ EState *estate = gm_state->ps.state;
+ estate->es_query_dsa = gm_state->pei->area;
outerTupleSlot = ExecProcNode(outerPlan);
+ estate->es_query_dsa = NULL;
if (!TupIsNull(outerTupleSlot))
{
On Wed, Nov 29, 2017 at 7:30 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
While reviewing commit c6755e23 I realised that es_query_dsa is
broken. It might have made some kind of sense as a name and a concept
in an earlier version of the proposal to add a DSA area for parallel
query's use, when the DSA area was going to be independent of parallel
query DSM segments and could have been used for the whole query. But
as it stands, each Gather [Merge] node has its own DSA area in its own
DSM segment, and that needs to be the one available to the nodes of
the child plan when executed in the leader process and it needs to be
not available to any other nodes in the tree. It's OK for the workers
to have just one es_query_dsa set up at the beginning and used for the
whole lifetime of the executor, but it's not OK for the leader to
install it and forget about it as it does now.
Ugh.
Better ideas?
How about this:
1. Remove es_query_dsa altogether.
2. Add a dsa_area * to ExecParallelInitializeDSMContext.
3. In ExecParallelInitializeDSM, pass the dsa_area * as a separate to
the per-node-type function.
4. If the per-node-type function cares about the dsa_area *, it is
responsible for saving a pointer to it in the PlanState node.
5. Also pass it down via the ParallelWorkerContext.
In v10 we might need to go with a solution like what you've sketched
here, because Tom will complain about breaking binary compatibility
with EState (and maybe other PlanState nodes) in a released branch.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Nov 30, 2017 at 4:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
Better ideas?
How about this:
1. Remove es_query_dsa altogether.
2. Add a dsa_area * to ExecParallelInitializeDSMContext.
3. In ExecParallelInitializeDSM, pass the dsa_area * as a separate to
the per-node-type function.
4. If the per-node-type function cares about the dsa_area *, it is
responsible for saving a pointer to it in the PlanState node.
5. Also pass it down via the ParallelWorkerContext.In v10 we might need to go with a solution like what you've sketched
here, because Tom will complain about breaking binary compatibility
with EState (and maybe other PlanState nodes) in a released branch.
I will post both versions. I've been stuck for a while now trying to
come up with a query that actually breaks, so I can show that it's
fixed...
--
Thomas Munro
http://www.enterprisedb.com
On Thu, Nov 30, 2017 at 11:19 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Thu, Nov 30, 2017 at 4:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
Better ideas?
How about this:
1. Remove es_query_dsa altogether.
2. Add a dsa_area * to ExecParallelInitializeDSMContext.
3. In ExecParallelInitializeDSM, pass the dsa_area * as a separate to
the per-node-type function.
4. If the per-node-type function cares about the dsa_area *, it is
responsible for saving a pointer to it in the PlanState node.
5. Also pass it down via the ParallelWorkerContext.In v10 we might need to go with a solution like what you've sketched
here, because Tom will complain about breaking binary compatibility
with EState (and maybe other PlanState nodes) in a released branch.
Here's a pair of versions of that patch tidied up a bit, for
REL_10_STABLE and master. Unfortunately I've been unable to come up
with a reproducer that shows misbehaviour (something like what was
reported on -bugs[1]/messages/by-id/CAEepm=1V-ZjAAyG-xj6s7ESXFhzLsRBpyX=BLz-w2DS=ObNu4A@mail.gmail.com which appears to be a case of calling
dsa_get_address(wrong_area, some_dsa_pointer)).
I don't yet have a patch to remove es_query_dsa. Instead of adding a
dsa_area * parameter to a ton of per-node functions as you suggested,
I'm wondering if we shouldn't just pass the whole ParallelExecutorInfo
object into all ExecXXXInitializeDSM() and ExecXXXInitializeWorker()
functions so they can hold onto it if they want to. In the worker
case it'd be only partially filled out: just area and pcxt, and pcxt
would also be only partially filled out. Alternatively I could create
a new struct ParallelExecutorWorkerInfo which has just the bit needed
for workers (that'd replace ParallelWorkerContext, which I now realise
was put in the wrong place -- it doesn't belong in access/parallel.h,
it belongs in an executor header with an executor-sounding name). I'd
like to get a couple of other things done before I come back to this.
[1]: /messages/by-id/CAEepm=1V-ZjAAyG-xj6s7ESXFhzLsRBpyX=BLz-w2DS=ObNu4A@mail.gmail.com
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
fix-es_query_dsa-pg10-v2.patchapplication/octet-stream; name=fix-es_query_dsa-pg10-v2.patchDownload
From a73660edd7ed91c59c08ff7c5278cda9344f2068 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Mon, 4 Dec 2017 21:32:27 +1300
Subject: [PATCH] Make sure es_query_dsa is set correctly in the leader
process.
Commit e13029a5ce353574516c64fd1ec9c50201e705fd added es_query_dsa as a member
of EState to provide a DSA area for use by executor nodes. That works for
worker processes where all executor nodes use the same shared memory, but
doesn't work correctly in the leader process where there may be more than one
Gather or Gather Merge node each with its own DSA area. Repair by installing
each Gather or Gather Merge node's DSA area into es_query_dsa only for the
duration of appropriate calls into the query tree below. A bigger change
adopting better scoping will be studied for version 11.
Author: Thomas Munro
Discussion: https://postgr.es/m/CAEepm=1U6as=brnVvMNixEV2tpi8NuyQoTmO8Qef0-VV+=7MDA@mail.gmail.com
---
src/backend/executor/execParallel.c | 14 ++++++++------
src/backend/executor/nodeGather.c | 5 +++++
src/backend/executor/nodeGatherMerge.c | 4 ++++
3 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 7dda399daf3..989cf5b80b1 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -543,12 +543,6 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers)
pcxt->seg);
}
- /*
- * Make the area available to executor nodes running in the leader. See
- * also ParallelQueryMain which makes it available to workers.
- */
- estate->es_query_dsa = pei->area;
-
/*
* Give parallel-aware nodes a chance to initialize their shared data.
* This also initializes the elements of instrumentation->ps_instrument,
@@ -557,7 +551,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers)
d.pcxt = pcxt;
d.instrumentation = instrumentation;
d.nnodes = 0;
+
+ /* Install our DSA area while initializing the plan. */
+ estate->es_query_dsa = pei->area;
ExecParallelInitializeDSM(planstate, &d);
+ estate->es_query_dsa = NULL;
/*
* Make sure that the world hasn't shifted under our feet. This could
@@ -609,6 +607,8 @@ void
ExecParallelReinitialize(PlanState *planstate,
ParallelExecutorInfo *pei)
{
+ EState *estate = planstate->state;
+
/* Old workers must already be shut down */
Assert(pei->finished);
@@ -618,7 +618,9 @@ ExecParallelReinitialize(PlanState *planstate,
pei->finished = false;
/* Traverse plan tree and let each child node reset associated state. */
+ estate->es_query_dsa = pei->area;
ExecParallelReInitializeDSM(planstate, pei->pcxt);
+ estate->es_query_dsa = NULL;
}
/*
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 89f592828c1..7be2a568521 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -278,7 +278,12 @@ gather_getnext(GatherState *gatherstate)
if (gatherstate->need_to_scan_locally)
{
+ EState *estate = gatherstate->ps.state;
+
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa = gatherstate->pei->area;
outerTupleSlot = ExecProcNode(outerPlan);
+ estate->es_query_dsa = NULL;
if (!TupIsNull(outerTupleSlot))
return outerTupleSlot;
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 6b173543564..4a3f2ab5a6c 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -627,8 +627,12 @@ gather_merge_readnext(GatherMergeState *gm_state, int reader, bool nowait)
{
PlanState *outerPlan = outerPlanState(gm_state);
TupleTableSlot *outerTupleSlot;
+ EState *estate = gm_state->ps.state;
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa = gm_state->pei->area;
outerTupleSlot = ExecProcNode(outerPlan);
+ estate->es_query_dsa = NULL;
if (!TupIsNull(outerTupleSlot))
{
--
2.15.0
fix-es_query_dsa-pg11-v2.patchapplication/octet-stream; name=fix-es_query_dsa-pg11-v2.patchDownload
From 070659cc7052ec4cc4f98fda088af6575cad8e00 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Thu, 30 Nov 2017 01:39:05 +1300
Subject: [PATCH] Make sure es_query_dsa is set correctly in the leader
process.
Commit e13029a5ce353574516c64fd1ec9c50201e705fd added es_query_dsa as a member
of EState to provide a DSA area for use by executor nodes. That works for
worker processes where all executor nodes use the same shared memory, but
doesn't work correctly in the leader process where there may be more than one
Gather or Gather Merge node each with its own DSA area. Repair by installing
each Gather or Gather Merge node's DSA area into es_query_dsa only for the
duration of appropriate calls into the query tree below. A bigger change
adopting better scoping will be studied for version 11.
Author: Thomas Munro
Discussion: https://postgr.es/m/CAEepm=1U6as=brnVvMNixEV2tpi8NuyQoTmO8Qef0-VV+=7MDA@mail.gmail.com
---
src/backend/executor/execParallel.c | 26 ++++++++++++++------------
src/backend/executor/nodeGather.c | 5 +++++
src/backend/executor/nodeGatherMerge.c | 4 ++++
3 files changed, 23 insertions(+), 12 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 53c5254be13..57cbb5b31de 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -319,7 +319,7 @@ EstimateParamExecSpace(EState *estate, Bitmapset *params)
* parameter array) and then the datum as serialized by datumSerialize().
*/
static dsa_pointer
-SerializeParamExecParams(EState *estate, Bitmapset *params)
+SerializeParamExecParams(EState *estate, Bitmapset *params, dsa_area *area)
{
Size size;
int nparams;
@@ -330,8 +330,8 @@ SerializeParamExecParams(EState *estate, Bitmapset *params)
/* Allocate enough space for the current parameter values. */
size = EstimateParamExecSpace(estate, params);
- handle = dsa_allocate(estate->es_query_dsa, size);
- start_address = dsa_get_address(estate->es_query_dsa, handle);
+ handle = dsa_allocate(area, size);
+ start_address = dsa_get_address(area, handle);
/* First write the number of parameters as a 4-byte integer. */
nparams = bms_num_members(params);
@@ -716,12 +716,6 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
LWTRANCHE_PARALLEL_QUERY_DSA,
pcxt->seg);
- /*
- * Make the area available to executor nodes running in the leader.
- * See also ParallelQueryMain which makes it available to workers.
- */
- estate->es_query_dsa = pei->area;
-
/*
* Serialize parameters, if any, using DSA storage. We don't dare use
* the main parallel query DSM for this because we might relaunch
@@ -730,7 +724,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
*/
if (!bms_is_empty(sendParams))
{
- pei->param_exec = SerializeParamExecParams(estate, sendParams);
+ pei->param_exec = SerializeParamExecParams(estate, sendParams,
+ pei->area);
fpes->param_exec = pei->param_exec;
}
}
@@ -743,7 +738,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
d.pcxt = pcxt;
d.instrumentation = instrumentation;
d.nnodes = 0;
+
+ /* Install our DSA area while initializing the plan. */
+ estate->es_query_dsa = pei->area;
ExecParallelInitializeDSM(planstate, &d);
+ estate->es_query_dsa = NULL;
/*
* Make sure that the world hasn't shifted under our feet. This could
@@ -812,19 +811,22 @@ ExecParallelReinitialize(PlanState *planstate,
/* Free any serialized parameters from the last round. */
if (DsaPointerIsValid(fpes->param_exec))
{
- dsa_free(estate->es_query_dsa, fpes->param_exec);
+ dsa_free(pei->area, fpes->param_exec);
fpes->param_exec = InvalidDsaPointer;
}
/* Serialize current parameter values if required. */
if (!bms_is_empty(sendParams))
{
- pei->param_exec = SerializeParamExecParams(estate, sendParams);
+ pei->param_exec = SerializeParamExecParams(estate, sendParams,
+ pei->area);
fpes->param_exec = pei->param_exec;
}
/* Traverse plan tree and let each child node reset associated state. */
+ estate->es_query_dsa = pei->area;
ExecParallelReInitializeDSM(planstate, pei->pcxt);
+ estate->es_query_dsa = NULL;
}
/*
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 212612b5351..ab71e936d0b 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -287,7 +287,12 @@ gather_getnext(GatherState *gatherstate)
if (gatherstate->need_to_scan_locally)
{
+ EState *estate = gatherstate->ps.state;
+
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa = gatherstate->pei->area;
outerTupleSlot = ExecProcNode(outerPlan);
+ estate->es_query_dsa = NULL;
if (!TupIsNull(outerTupleSlot))
return outerTupleSlot;
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 166f2064ff7..b68b0a9297b 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -637,8 +637,12 @@ gather_merge_readnext(GatherMergeState *gm_state, int reader, bool nowait)
{
PlanState *outerPlan = outerPlanState(gm_state);
TupleTableSlot *outerTupleSlot;
+ EState *estate = gm_state->ps.state;
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa = gm_state->pei->area;
outerTupleSlot = ExecProcNode(outerPlan);
+ estate->es_query_dsa = NULL;
if (!TupIsNull(outerTupleSlot))
{
--
2.15.0
On Tue, Dec 5, 2017 at 6:45 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Thu, Nov 30, 2017 at 11:19 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:On Thu, Nov 30, 2017 at 4:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
In v10 we might need to go with a solution like what you've sketched
here, because Tom will complain about breaking binary compatibility
with EState (and maybe other PlanState nodes) in a released branch.Here's a pair of versions of that patch tidied up a bit, for
REL_10_STABLE and master. Unfortunately I've been unable to come up
with a reproducer that shows misbehaviour (something like what was
reported on -bugs[1] which appears to be a case of calling
dsa_get_address(wrong_area, some_dsa_pointer)).
+ EState *estate = gatherstate->ps.state;
+
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa = gatherstate->pei->area;
outerTupleSlot = ExecProcNode(outerPlan);
+ estate->es_query_dsa = NULL;
Won't the above coding pattern create a problem, if ExecProcNode
throws an error and outer block catches it and continues execution
(consider the case of execution inside PL blocks)?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Dec 5, 2017 at 7:24 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
+ EState *estate = gatherstate->ps.state; + + /* Install our DSA area while executing the plan. */ + estate->es_query_dsa = gatherstate->pei->area; outerTupleSlot = ExecProcNode(outerPlan); + estate->es_query_dsa = NULL;Won't the above coding pattern create a problem, if ExecProcNode
throws an error and outer block catches it and continues execution
(consider the case of execution inside PL blocks)?
I don't see what the problem is. The query that got aborted by the
error wouldn't be sharing an EState with one that didn't. Control
would have to return to someplace inside the same ExecProcNode and it
would have to return normally.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Dec 6, 2017 at 1:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Dec 5, 2017 at 7:24 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
+ EState *estate = gatherstate->ps.state; + + /* Install our DSA area while executing the plan. */ + estate->es_query_dsa = gatherstate->pei->area; outerTupleSlot = ExecProcNode(outerPlan); + estate->es_query_dsa = NULL;Won't the above coding pattern create a problem, if ExecProcNode
throws an error and outer block catches it and continues execution
(consider the case of execution inside PL blocks)?I don't see what the problem is. The query that got aborted by the
error wouldn't be sharing an EState with one that didn't.
That's right. Ignore my comment, I got confused. Other than that, I
don't see any problem with the code as such apart from that it looks
slightly hacky. I think Thomas or someone needs to develop a patch on
the lines you have mentioned or what Thomas was trying to describe in
his email and see how it comes out.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 6, 2017 at 10:16 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 6, 2017 at 1:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Dec 5, 2017 at 7:24 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
+ EState *estate = gatherstate->ps.state; + + /* Install our DSA area while executing the plan. */ + estate->es_query_dsa = gatherstate->pei->area; outerTupleSlot = ExecProcNode(outerPlan); + estate->es_query_dsa = NULL;Won't the above coding pattern create a problem, if ExecProcNode
throws an error and outer block catches it and continues execution
(consider the case of execution inside PL blocks)?I don't see what the problem is. The query that got aborted by the
error wouldn't be sharing an EState with one that didn't.That's right. Ignore my comment, I got confused. Other than that, I
don't see any problem with the code as such apart from that it looks
slightly hacky. I think Thomas or someone needs to develop a patch on
the lines you have mentioned or what Thomas was trying to describe in
his email and see how it comes out.
Yeah, it is a bit hacky, but I can't see a another way to fix it
without changing released APIs and it's only for one release and will
certainly be unhackified in v11. For v11 I think we need to decide
between:
1. Removing es_query_dsa and injecting the right context into the
executor tree as discussed.
2. Another idea mentioned by Robert in an off-list chat: We could
consolidate all DSM segments in a multi-gather plan into one. See the
nearby thread where someone had over a hundred Gather nodes and had to
crank up max_connections to reserve enough DSM slots. Of course,
optimising for that case doesn't make too much sense (I suspect
multi-gather plans will become less likely with Parallel Append and
Parallel Hash in the picture anyway), but it would reduce a bunch of
duplicated work we do when it happens as well as scarce slot
consumption. If we did that, then all of a sudden es_query_dsa would
make sense again (ie it'd be whole-query scoped), and we could revert
that hacky change.
Or we could do both things anyway.
--
Thomas Munro
http://www.enterprisedb.com
On Thu, Dec 7, 2017 at 12:51 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Wed, Dec 6, 2017 at 10:16 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 6, 2017 at 1:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Dec 5, 2017 at 7:24 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
+ EState *estate = gatherstate->ps.state; + + /* Install our DSA area while executing the plan. */ + estate->es_query_dsa = gatherstate->pei->area; outerTupleSlot = ExecProcNode(outerPlan); + estate->es_query_dsa = NULL;Won't the above coding pattern create a problem, if ExecProcNode
throws an error and outer block catches it and continues execution
(consider the case of execution inside PL blocks)?I don't see what the problem is. The query that got aborted by the
error wouldn't be sharing an EState with one that didn't.That's right. Ignore my comment, I got confused. Other than that, I
don't see any problem with the code as such apart from that it looks
slightly hacky. I think Thomas or someone needs to develop a patch on
the lines you have mentioned or what Thomas was trying to describe in
his email and see how it comes out.
Andreas Seltenreich found a query[1]/messages/by-id/CAEepm=2tfz1XNDcyU_a5ZiEaopz6+bBo9vQY+iJVLTtUVNztcQ@mail.gmail.com that suffers from this bug
(thanks!), and Amit confirmed that this patch fixes it, but further
sqlsmith fuzz testing with the patch applied also revealed that it
failed to handle the case where pei is NULL because parallelism was
inhibited. Here's a new version to fix that. Someone might prefer a
coding with "if" rather than the ternary operator, but I didn't have a
strong preference.
[1]: /messages/by-id/CAEepm=2tfz1XNDcyU_a5ZiEaopz6+bBo9vQY+iJVLTtUVNztcQ@mail.gmail.com
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
fix-es_query_dsa-pg10-v3.patchapplication/octet-stream; name=fix-es_query_dsa-pg10-v3.patchDownload
From 03eb2f2b8777f8b919e64a7df10be79c0329fecf Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Mon, 4 Dec 2017 21:32:27 +1300
Subject: [PATCH] Make sure es_query_dsa is set correctly in the leader
process.
Commit e13029a5ce353574516c64fd1ec9c50201e705fd added es_query_dsa as a member
of EState to provide a DSA area for use by executor nodes. That works for
worker processes where all executor nodes use the same shared memory, but
doesn't work correctly in the leader process where there may be more than one
Gather or Gather Merge node each with its own DSA area. Repair by installing
each Gather or Gather Merge node's DSA area into es_query_dsa only for the
duration of appropriate calls into the query tree below. A bigger change
adopting better scoping will be studied for version 11.
Author: Thomas Munro
Reviewed-By: Amit Kapila, Robert Haas
Tested-By: Alexander Voytsekhovskyy, Amit Kapila, Andreas Seltenreich
Discussion: https://postgr.es/m/CAEepm=1U6as=brnVvMNixEV2tpi8NuyQoTmO8Qef0-VV+=7MDA@mail.gmail.com
---
src/backend/executor/execParallel.c | 14 ++++++++------
src/backend/executor/nodeGather.c | 6 ++++++
src/backend/executor/nodeGatherMerge.c | 4 ++++
3 files changed, 18 insertions(+), 6 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 7dda399daf3..989cf5b80b1 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -543,12 +543,6 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers)
pcxt->seg);
}
- /*
- * Make the area available to executor nodes running in the leader. See
- * also ParallelQueryMain which makes it available to workers.
- */
- estate->es_query_dsa = pei->area;
-
/*
* Give parallel-aware nodes a chance to initialize their shared data.
* This also initializes the elements of instrumentation->ps_instrument,
@@ -557,7 +551,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, int nworkers)
d.pcxt = pcxt;
d.instrumentation = instrumentation;
d.nnodes = 0;
+
+ /* Install our DSA area while initializing the plan. */
+ estate->es_query_dsa = pei->area;
ExecParallelInitializeDSM(planstate, &d);
+ estate->es_query_dsa = NULL;
/*
* Make sure that the world hasn't shifted under our feet. This could
@@ -609,6 +607,8 @@ void
ExecParallelReinitialize(PlanState *planstate,
ParallelExecutorInfo *pei)
{
+ EState *estate = planstate->state;
+
/* Old workers must already be shut down */
Assert(pei->finished);
@@ -618,7 +618,9 @@ ExecParallelReinitialize(PlanState *planstate,
pei->finished = false;
/* Traverse plan tree and let each child node reset associated state. */
+ estate->es_query_dsa = pei->area;
ExecParallelReInitializeDSM(planstate, pei->pcxt);
+ estate->es_query_dsa = NULL;
}
/*
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 89f592828c1..597cbfaa16d 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -278,7 +278,13 @@ gather_getnext(GatherState *gatherstate)
if (gatherstate->need_to_scan_locally)
{
+ EState *estate = gatherstate->ps.state;
+
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ gatherstate->pei ? gatherstate->pei->area : NULL;
outerTupleSlot = ExecProcNode(outerPlan);
+ estate->es_query_dsa = NULL;
if (!TupIsNull(outerTupleSlot))
return outerTupleSlot;
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 6b173543564..ee98f4cf30c 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -627,8 +627,12 @@ gather_merge_readnext(GatherMergeState *gm_state, int reader, bool nowait)
{
PlanState *outerPlan = outerPlanState(gm_state);
TupleTableSlot *outerTupleSlot;
+ EState *estate = gm_state->ps.state;
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa = gm_state->pei ? gm_state->pei->area : NULL;
outerTupleSlot = ExecProcNode(outerPlan);
+ estate->es_query_dsa = NULL;
if (!TupIsNull(outerTupleSlot))
{
--
2.15.0
fix-es_query_dsa-pg11-v3.patchapplication/octet-stream; name=fix-es_query_dsa-pg11-v3.patchDownload
From fb643168973e3bbb2fb12a220c93931cde794bb0 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Thu, 30 Nov 2017 01:39:05 +1300
Subject: [PATCH] Make sure es_query_dsa is set correctly in the leader
process.
Commit e13029a5ce353574516c64fd1ec9c50201e705fd added es_query_dsa as a member
of EState to provide a DSA area for use by executor nodes. That works for
worker processes where all executor nodes use the same shared memory, but
doesn't work correctly in the leader process where there may be more than one
Gather or Gather Merge node each with its own DSA area. Repair by installing
each Gather or Gather Merge node's DSA area into es_query_dsa only for the
duration of appropriate calls into the query tree below. A bigger change
adopting better scoping will be studied for version 11.
Author: Thomas Munro
Reviewed-By: Amit Kapila, Robert Haas
Tested-By: Alexander Voytsekhovskyy, Amit Kapila, Andreas Seltenreich
Discussion: https://postgr.es/m/CAEepm=1U6as=brnVvMNixEV2tpi8NuyQoTmO8Qef0-VV+=7MDA@mail.gmail.com
---
src/backend/executor/execParallel.c | 26 ++++++++++++++------------
src/backend/executor/nodeGather.c | 6 ++++++
src/backend/executor/nodeGatherMerge.c | 4 ++++
3 files changed, 24 insertions(+), 12 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index d57cdbd4e15..6b6064637b8 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -330,7 +330,7 @@ EstimateParamExecSpace(EState *estate, Bitmapset *params)
* parameter array) and then the datum as serialized by datumSerialize().
*/
static dsa_pointer
-SerializeParamExecParams(EState *estate, Bitmapset *params)
+SerializeParamExecParams(EState *estate, Bitmapset *params, dsa_area *area)
{
Size size;
int nparams;
@@ -341,8 +341,8 @@ SerializeParamExecParams(EState *estate, Bitmapset *params)
/* Allocate enough space for the current parameter values. */
size = EstimateParamExecSpace(estate, params);
- handle = dsa_allocate(estate->es_query_dsa, size);
- start_address = dsa_get_address(estate->es_query_dsa, handle);
+ handle = dsa_allocate(area, size);
+ start_address = dsa_get_address(area, handle);
/* First write the number of parameters as a 4-byte integer. */
nparams = bms_num_members(params);
@@ -736,12 +736,6 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
LWTRANCHE_PARALLEL_QUERY_DSA,
pcxt->seg);
- /*
- * Make the area available to executor nodes running in the leader.
- * See also ParallelQueryMain which makes it available to workers.
- */
- estate->es_query_dsa = pei->area;
-
/*
* Serialize parameters, if any, using DSA storage. We don't dare use
* the main parallel query DSM for this because we might relaunch
@@ -750,7 +744,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
*/
if (!bms_is_empty(sendParams))
{
- pei->param_exec = SerializeParamExecParams(estate, sendParams);
+ pei->param_exec = SerializeParamExecParams(estate, sendParams,
+ pei->area);
fpes->param_exec = pei->param_exec;
}
}
@@ -763,7 +758,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
d.pcxt = pcxt;
d.instrumentation = instrumentation;
d.nnodes = 0;
+
+ /* Install our DSA area while initializing the plan. */
+ estate->es_query_dsa = pei->area;
ExecParallelInitializeDSM(planstate, &d);
+ estate->es_query_dsa = NULL;
/*
* Make sure that the world hasn't shifted under our feet. This could
@@ -832,19 +831,22 @@ ExecParallelReinitialize(PlanState *planstate,
/* Free any serialized parameters from the last round. */
if (DsaPointerIsValid(fpes->param_exec))
{
- dsa_free(estate->es_query_dsa, fpes->param_exec);
+ dsa_free(pei->area, fpes->param_exec);
fpes->param_exec = InvalidDsaPointer;
}
/* Serialize current parameter values if required. */
if (!bms_is_empty(sendParams))
{
- pei->param_exec = SerializeParamExecParams(estate, sendParams);
+ pei->param_exec = SerializeParamExecParams(estate, sendParams,
+ pei->area);
fpes->param_exec = pei->param_exec;
}
/* Traverse plan tree and let each child node reset associated state. */
+ estate->es_query_dsa = pei->area;
ExecParallelReInitializeDSM(planstate, pei->pcxt);
+ estate->es_query_dsa = NULL;
}
/*
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index a44cf8409af..1697ae650d7 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -277,7 +277,13 @@ gather_getnext(GatherState *gatherstate)
if (gatherstate->need_to_scan_locally)
{
+ EState *estate = gatherstate->ps.state;
+
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ gatherstate->pei ? gatherstate->pei->area : NULL;
outerTupleSlot = ExecProcNode(outerPlan);
+ estate->es_query_dsa = NULL;
if (!TupIsNull(outerTupleSlot))
return outerTupleSlot;
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 4a8a59eabf1..a69777aa951 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -637,8 +637,12 @@ gather_merge_readnext(GatherMergeState *gm_state, int reader, bool nowait)
{
PlanState *outerPlan = outerPlanState(gm_state);
TupleTableSlot *outerTupleSlot;
+ EState *estate = gm_state->ps.state;
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa = gm_state->pei ? gm_state->pei->area : NULL;
outerTupleSlot = ExecProcNode(outerPlan);
+ estate->es_query_dsa = NULL;
if (!TupIsNull(outerTupleSlot))
{
--
2.15.0
On Sun, Dec 17, 2017 at 7:35 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
Andreas Seltenreich found a query[1] that suffers from this bug
(thanks!), and Amit confirmed that this patch fixes it, but further
sqlsmith fuzz testing with the patch applied also revealed that it
failed to handle the case where pei is NULL because parallelism was
inhibited. Here's a new version to fix that. Someone might prefer a
coding with "if" rather than the ternary operator, but I didn't have a
strong preference.
Looks OK to me. Committed and back-patched to v10.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2017-12-07 12:51:56 +1300, Thomas Munro wrote:
1. Removing es_query_dsa and injecting the right context into the
executor tree as discussed.2. Another idea mentioned by Robert in an off-list chat: We could
consolidate all DSM segments in a multi-gather plan into one. See the
nearby thread where someone had over a hundred Gather nodes and had to
crank up max_connections to reserve enough DSM slots. Of course,
optimising for that case doesn't make too much sense (I suspect
multi-gather plans will become less likely with Parallel Append and
Parallel Hash in the picture anyway), but it would reduce a bunch of
duplicated work we do when it happens as well as scarce slot
consumption. If we did that, then all of a sudden es_query_dsa would
make sense again (ie it'd be whole-query scoped), and we could revert
that hacky change.Or we could do both things anyway.
This is an open item for v11:
Tidy up es_query_dsa and possibly ParallelWorkerContext?
Original commit: e13029a5ce353574516c64fd1ec9c50201e705fd (principal author: Thomas Munro; owner: Robert Haas)
Bug fix: fd7c0fa732d97a4b4ebb58730e6244ea30d0a618
While the bug was fixed with something back-patchable, we should considering improving this situation. As discussed in the above-linked thread, options might include (1) getting rid of es_query_dsa entirely and injecting dependencies into nodes, (2) making all Gather nodes share the same DSM segment so there truly could be per-query DSA segment.
Do we want to make any changes here for v11? If not, are we ok with just
closing the entry and waiting till it bugs anybody for some reason?
Greetings,
Andres Freund
On Thu, Apr 12, 2018 at 4:04 AM, Andres Freund <andres@anarazel.de> wrote:
This is an open item for v11:
Tidy up es_query_dsa and possibly ParallelWorkerContext?
Original commit: e13029a5ce353574516c64fd1ec9c50201e705fd (principal author: Thomas Munro; owner: Robert Haas)
Bug fix: fd7c0fa732d97a4b4ebb58730e6244ea30d0a618
While the bug was fixed with something back-patchable, we should considering improving this situation. As discussed in the above-linked thread, options might include (1) getting rid of es_query_dsa entirely and injecting dependencies into nodes, (2) making all Gather nodes share the same DSM segment so there truly could be per-query DSA segment.Do we want to make any changes here for v11? If not, are we ok with just
closing the entry and waiting till it bugs anybody for some reason?
I think we should probably do both of the things listed above, but
given where we are and given that it's refactoring work and not a
stop-ship issue, I propose to do that in the v12 cycle. I'll go
remove that from the open items if no one objects soon.
--
Thomas Munro
http://www.enterprisedb.com