generic plans and "initial" pruning
Executing generic plans involving partitions is known to become slower
as partition count grows due to a number of bottlenecks, with
AcquireExecutorLocks() showing at the top in profiles.
Previous attempt at solving that problem was by David Rowley [1]/messages/by-id/CAKJS1f_kfRQ3ZpjQyHC7=PK9vrhxiHBQFZ+hc0JCwwnRKkF3hg@mail.gmail.com,
where he proposed delaying locking of *all* partitions appearing under
an Append/MergeAppend until "initial" pruning is done during the
executor initialization phase. A problem with that approach that he
has described in [2]/messages/by-id/CAKJS1f99JNe+sw5E3qWmS+HeLMFaAhehKO67J1Ym3pXv0XBsxw@mail.gmail.com is that leaving partitions unlocked can lead to
race conditions where the Plan node belonging to a partition can be
invalidated when a concurrent session successfully alters the
partition between AcquireExecutorLocks() saying the plan is okay to
execute and then actually executing it.
However, using an idea that Robert suggested to me off-list a little
while back, it seems possible to determine the set of partitions that
we can safely skip locking. The idea is to look at the "initial" or
"pre-execution" pruning instructions contained in a given Append or
MergeAppend node when AcquireExecutorLocks() is collecting the
relations to lock and consider relations from only those sub-nodes
that survive performing those instructions. I've attempted
implementing that idea in the attached patch.
Note that "initial" pruning steps are now performed twice when
executing generic plans: once in AcquireExecutorLocks() to find
partitions to be locked, and a 2nd time in ExecInit[Merge]Append() to
determine the set of partition sub-nodes to be initialized for
execution, though I wasn't able to come up with a good idea to avoid
this duplication.
Using the following benchmark setup:
pgbench testdb -i --partitions=$nparts > /dev/null 2>&1
pgbench -n testdb -S -T 30 -Mprepared
And plan_cache_mode = force_generic_plan,
I get following numbers:
HEAD:
32 tps = 20561.776403 (without initial connection time)
64 tps = 12553.131423 (without initial connection time)
128 tps = 13330.365696 (without initial connection time)
256 tps = 8605.723120 (without initial connection time)
512 tps = 4435.951139 (without initial connection time)
1024 tps = 2346.902973 (without initial connection time)
2048 tps = 1334.680971 (without initial connection time)
Patched:
32 tps = 27554.156077 (without initial connection time)
64 tps = 27531.161310 (without initial connection time)
128 tps = 27138.305677 (without initial connection time)
256 tps = 25825.467724 (without initial connection time)
512 tps = 19864.386305 (without initial connection time)
1024 tps = 18742.668944 (without initial connection time)
2048 tps = 16312.412704 (without initial connection time)
--
Amit Langote
EDB: http://www.enterprisedb.com
[1]: /messages/by-id/CAKJS1f_kfRQ3ZpjQyHC7=PK9vrhxiHBQFZ+hc0JCwwnRKkF3hg@mail.gmail.com
[2]: /messages/by-id/CAKJS1f99JNe+sw5E3qWmS+HeLMFaAhehKO67J1Ym3pXv0XBsxw@mail.gmail.com
Attachments:
v1-0001-Teach-AcquireExecutorLocks-to-acquire-fewer-locks.patchapplication/octet-stream; name=v1-0001-Teach-AcquireExecutorLocks-to-acquire-fewer-locks.patchDownload
From ed4de69e7ae180eca380ae581152b6650175661f Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v1] Teach AcquireExecutorLocks() to acquire fewer locks in
some cases
Currently, AcquireExecutorLocks() loops over the range table of a
given PlannedStmt and locks all relations found therein, even those
that won't actually be scanned during execution due to being
eliminated by "initial" pruning that is applied during the
initialization of their owning Append or MergeAppend node. This makes
AcquireExecutorLocks() itself do the "initial" pruning on nodes that
support it and lock only those relations that are contained in the
subnodes that survive the pruning.
To that end, AcquireExecutorLocks() now loops over a bitmapset of
RT indexes, those of the RTEs of "lockable" relations, instead of
the whole range table to find such entries. When pruning is possible,
the bitmapset is constructed by walking the plan tree to locate
nodes that allow "initial" (or "pre-execution") pruning and
disregarding relations from subnodes that don't survive the pruning
instructions.
PlannedStmt gets a bitmapset field to store the RT indexes of
lockable relations that is populated when contructing the flat range
table in setrefs.c. It is used as is in the absence of any prunable
nodes.
PlannedStmt also gets a new field that indicates whether any of the
nodes of the plan tree contain "initial" (or "pre-execution") pruning
steps, which saves the trouble of walking the plan tree only to find
whether that's the case.
ExecFindInitialMatchingSubPlans() is refactored to allow being
called outside a full-fledged executor context.
---
src/backend/executor/execParallel.c | 2 +
src/backend/executor/execPartition.c | 534 ++++++++++++++++++-------
src/backend/executor/nodeAppend.c | 39 +-
src/backend/executor/nodeMergeAppend.c | 39 +-
src/backend/nodes/copyfuncs.c | 4 +
src/backend/nodes/nodeFuncs.c | 121 +++++-
src/backend/nodes/outfuncs.c | 5 +
src/backend/nodes/readfuncs.c | 4 +
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 10 +
src/backend/partitioning/partprune.c | 57 ++-
src/backend/utils/cache/plancache.c | 217 +++++++++-
src/include/executor/execPartition.h | 13 +-
src/include/nodes/nodeFuncs.h | 3 +
src/include/nodes/pathnodes.h | 6 +
src/include/nodes/plannodes.h | 15 +
src/include/partitioning/partprune.h | 3 +
17 files changed, 866 insertions(+), 208 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f8a4a40e7b..d14e60724b 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -182,8 +182,10 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->usesPreExecPruning = false;
pstmt->planTree = plan;
pstmt->rtable = estate->es_range_table;
+ pstmt->relationRTIs = NULL;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 5c723bc54e..8c63272398 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -24,6 +24,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -186,7 +187,8 @@ static void ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate);
+ PlanState *planstate,
+ ExprContext *econtext);
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
@@ -1511,8 +1513,7 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
/*
* ExecCreatePartitionPruneState
- * Build the data structure required for calling
- * ExecFindInitialMatchingSubPlans and ExecFindMatchingSubPlans.
+ * Build the data structure for run-time pruning
*
* 'planstate' is the parent plan node's execution state.
*
@@ -1526,10 +1527,20 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* as children. The data stored in each PartitionedRelPruningData can be
* re-used each time we re-evaluate which partitions match the pruning steps
* provided in each PartitionedRelPruneInfo.
+ *
+ * This does not consider initial_pruning_steps because they must already have
+ * been performed by the caller and the subplans remaining after doing so are
+ * given as 'initially_valid_subplans'. The translation data to be put into
+ * PartitionPruneState that allows conversion of partition indexes into subplan
+ * indexes are updated here to account for the unneeded subplans having been
+ * removed by initial pruning. 'nsubplans' gives the number of subplans that
+ * were present before initial pruning.
*/
PartitionPruneState *
ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo)
+ PartitionPruneInfo *partitionpruneinfo,
+ Bitmapset *initially_valid_subplans,
+ int nsubplans)
{
EState *estate = planstate->state;
PartitionPruneState *prunestate;
@@ -1537,6 +1548,15 @@ ExecCreatePartitionPruneState(PlanState *planstate,
ListCell *lc;
int i;
+ /*
+ * Only create a PartitionPruneState if pruning needs to be performed
+ * during the execution of the owning plan. Note that this means the
+ * initial pruning steps which are used to determine the set of subplans
+ * that are valid for actual execution are performed without creating a
+ * PartitionPruneState; see ExecFindInitialMatchingSubPlans().
+ */
+ Assert(partitionpruneinfo->contains_exec_steps);
+
/* For data reading, executor always omits detached partitions */
if (estate->es_partition_directory == NULL)
estate->es_partition_directory =
@@ -1555,7 +1575,6 @@ ExecCreatePartitionPruneState(PlanState *planstate,
prunestate->execparamids = NULL;
/* other_subplans can change at runtime, so we need our own copy */
prunestate->other_subplans = bms_copy(partitionpruneinfo->other_subplans);
- prunestate->do_initial_prune = false; /* may be set below */
prunestate->do_exec_prune = false; /* may be set below */
prunestate->num_partprunedata = n_part_hierarchies;
@@ -1702,23 +1721,17 @@ ExecCreatePartitionPruneState(PlanState *planstate,
pprune->present_parts = bms_copy(pinfo->present_parts);
/*
- * Initialize pruning contexts as needed.
+ * Initialize pruning contexts as needed, ignoring any
+ * initial_pruning_steps because they must already have been
+ * performed.
*/
- pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
- {
- ExecInitPruningContext(&pprune->initial_context,
- pinfo->initial_pruning_steps,
- partdesc, partkey, planstate);
- /* Record whether initial pruning is needed at any level */
- prunestate->do_initial_prune = true;
- }
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
if (pinfo->exec_pruning_steps)
{
ExecInitPruningContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ planstate->ps_ExprContext);
/* Record whether exec pruning is needed at any level */
prunestate->do_exec_prune = true;
}
@@ -1735,18 +1748,136 @@ ExecCreatePartitionPruneState(PlanState *planstate,
i++;
}
+ /*
+ * If exec-time pruning is required and subplans appear to have been
+ * pruned by initial pruning steps, then we must re-sequence the subplan
+ * indexes so that ExecFindMatchingSubPlans() properly returns the indexes
+ * of the subplans that have remained after initial pruning, that is,
+ * initially_valid_subplans.
+ *
+ * We can safely skip this when !do_exec_prune, even though that leaves
+ * invalid data in pruneinfo, because that data won't be consulted again
+ * (cf initial Assert in ExecFindMatchingSubPlans).
+ */
+ if (prunestate->do_exec_prune &&
+ bms_num_members(initially_valid_subplans) < nsubplans)
+ {
+ int *new_subplan_indexes;
+ Bitmapset *new_other_subplans;
+ int i;
+ int newidx;
+
+ /*
+ * First we must build a temporary array which maps old subplan
+ * indexes to new ones. For convenience of initialization, we use
+ * 1-based indexes in this array and leave pruned items as 0.
+ */
+ new_subplan_indexes = (int *) palloc0(sizeof(int) * nsubplans);
+ newidx = 1;
+ i = -1;
+ while ((i = bms_next_member(initially_valid_subplans, i)) >= 0)
+ {
+ Assert(i < nsubplans);
+ new_subplan_indexes[i] = newidx++;
+ }
+
+ /*
+ * Now we can update each PartitionedRelPruneInfo's subplan_map with
+ * new subplan indexes. We must also recompute its present_parts
+ * bitmap.
+ */
+ for (i = 0; i < prunestate->num_partprunedata; i++)
+ {
+ PartitionPruningData *prunedata = prunestate->partprunedata[i];
+ int j;
+
+ /*
+ * Within each hierarchy, we perform this loop in back-to-front
+ * order so that we determine present_parts for the lowest-level
+ * partitioned tables first. This way we can tell whether a
+ * sub-partitioned table's partitions were entirely pruned so we
+ * can exclude it from the current level's present_parts.
+ */
+ for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
+ {
+ PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
+ int nparts = pprune->nparts;
+ int k;
+
+ /* We just rebuild present_parts from scratch */
+ bms_free(pprune->present_parts);
+ pprune->present_parts = NULL;
+
+ for (k = 0; k < nparts; k++)
+ {
+ int oldidx = pprune->subplan_map[k];
+ int subidx;
+
+ /*
+ * If this partition existed as a subplan then change the
+ * old subplan index to the new subplan index. The new
+ * index may become -1 if the partition was pruned above,
+ * or it may just come earlier in the subplan list due to
+ * some subplans being removed earlier in the list. If
+ * it's a subpartition, add it to present_parts unless
+ * it's entirely pruned.
+ */
+ if (oldidx >= 0)
+ {
+ Assert(oldidx < nsubplans);
+ pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
+
+ if (new_subplan_indexes[oldidx] > 0)
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
+ }
+ else if ((subidx = pprune->subpart_map[k]) >= 0)
+ {
+ PartitionedRelPruningData *subprune;
+
+ subprune = &prunedata->partrelprunedata[subidx];
+
+ if (!bms_is_empty(subprune->present_parts))
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
+ }
+ }
+ }
+ }
+
+ /*
+ * We must also recompute the other_subplans set, since indexes in it
+ * may change.
+ */
+ new_other_subplans = NULL;
+ i = -1;
+ while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
+ new_other_subplans = bms_add_member(new_other_subplans,
+ new_subplan_indexes[i] - 1);
+
+ bms_free(prunestate->other_subplans);
+ prunestate->other_subplans = new_other_subplans;
+
+ pfree(new_subplan_indexes);
+ }
+
return prunestate;
}
/*
* Initialize a PartitionPruneContext for the given list of pruning steps.
+ *
+ * At least one of 'planstate' or 'econtext' must be passed to be able to
+ * successfully evaluate any non-Const expressions contained in the
+ * steps.
*/
static void
ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate)
+ PlanState *planstate,
+ ExprContext *econtext)
{
int n_steps;
int partnatts;
@@ -1767,6 +1898,7 @@ ExecInitPruningContext(PartitionPruneContext *context,
context->ppccontext = CurrentMemoryContext;
context->planstate = planstate;
+ context->exprcontext = econtext;
/* Initialize expression state for each expression we need */
context->exprstates = (ExprState **)
@@ -1795,8 +1927,13 @@ ExecInitPruningContext(PartitionPruneContext *context,
step->step.step_id,
keyno);
- context->exprstates[stateidx] =
- ExecInitExpr(expr, context->planstate);
+ if (planstate == NULL)
+ context->exprstates[stateidx] =
+ ExecInitExprWithParams(expr,
+ econtext->ecxt_param_list_info);
+ else
+ context->exprstates[stateidx] =
+ ExecInitExpr(expr, context->planstate);
}
keyno++;
}
@@ -1809,171 +1946,283 @@ ExecInitPruningContext(PartitionPruneContext *context,
* pruning, disregarding any pruning constraints involving PARAM_EXEC
* Params.
*
- * If additional pruning passes will be required (because of PARAM_EXEC
- * Params), we must also update the translation data that allows conversion
- * of partition indexes into subplan indexes to account for the unneeded
- * subplans having been removed.
+ * Must only be called once per 'pruneinfo', and only if initial pruning is
+ * required.
*
- * Must only be called once per 'prunestate', and only if initial pruning
- * is required.
+ * 'param' contains information about any EXTERN parameters that might be
+ * present in the initial pruning steps.
*
- * 'nsubplans' must be passed as the total number of unpruned subplans.
+ * The RT indexes of unpruned parents are returned in *parentrelids if asked
+ * for by the caller.
*/
Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
+ExecFindInitialMatchingSubPlans(PartitionPruneInfo *pruneinfo,
+ EState *estate, List *rtable,
+ ParamListInfo params,
+ Bitmapset **parentrelids)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
+ MemoryContext tmpcontext;
int i;
+ ListCell *lc;
+ int n_part_hierarchies;
+ bool free_estate = false;
+ ExprContext *econtext;
+ PartitionPruningData **partprunedata;
+ PartitionDirectory pdir;
- /* Caller error if we get here without do_initial_prune */
- Assert(prunestate->do_initial_prune);
-
- /*
- * Switch to a temp context to avoid leaking memory in the executor's
- * query-lifespan memory context.
- */
- oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
-
- /*
- * For each hierarchy, do the pruning tests, and add nondeletable
- * subplans' indexes to "result".
- */
- for (i = 0; i < prunestate->num_partprunedata; i++)
- {
- PartitionPruningData *prunedata;
- PartitionedRelPruningData *pprune;
+ /* Caller error if we get here without contains_init_steps */
+ Assert(pruneinfo->contains_init_steps);
- prunedata = prunestate->partprunedata[i];
- pprune = &prunedata->partrelprunedata[0];
- /* Perform pruning without using PARAM_EXEC Params */
- find_matching_subplans_recurse(prunedata, pprune, true, &result);
+ if (parentrelids)
+ *parentrelids = NULL;
- /* Expression eval may have used space in node's ps_ExprContext too */
- if (pprune->initial_pruning_steps)
- ResetExprContext(pprune->initial_context.planstate->ps_ExprContext);
+ /* Set up EState if not in the executor proper. */
+ if (estate == NULL)
+ {
+ estate = CreateExecutorState();
+ estate->es_param_list_info = params;
+ free_estate = true;
}
- /* Add in any subplans that partition pruning didn't account for */
- result = bms_add_members(result, prunestate->other_subplans);
-
- MemoryContextSwitchTo(oldcontext);
+ /* An ExprContext to evaluate expressions. */
+ econtext = CreateExprContext(estate);
- /* Copy result out of the temp context before we reset it */
- result = bms_copy(result);
+ /* PartitionDirectory, creating one if not there already. */
+ pdir = estate->es_partition_directory;
+ if (pdir == NULL)
+ {
+ /* Omits detached partitions, just like in the executor proper. */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+ estate->es_partition_directory = pdir;
+ }
- MemoryContextReset(prunestate->prune_context);
+ /* A temporary context to allocate stuff needded to run pruning steps. */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
/*
- * If exec-time pruning is required and we pruned subplans above, then we
- * must re-sequence the subplan indexes so that ExecFindMatchingSubPlans
- * properly returns the indexes from the subplans which will remain after
- * execution of this function.
+ * Stuff that follows matches exactly what ExecCreatePartitionPruneState()
+ * does, except we don't need a PartitionPruneState here, so don't call
+ * that function.
*
- * We can safely skip this when !do_exec_prune, even though that leaves
- * invalid data in prunestate, because that data won't be consulted again
- * (cf initial Assert in ExecFindMatchingSubPlans).
+ * XXX some refactoring might be good.
*/
- if (prunestate->do_exec_prune && bms_num_members(result) < nsubplans)
+
+ /* PartitionPruningData for each partition hierarachy. */
+ n_part_hierarchies = list_length(pruneinfo->prune_infos);
+ Assert(n_part_hierarchies > 0);
+ partprunedata = (PartitionPruningData **)
+ palloc(sizeof(PartitionPruningData *) * n_part_hierarchies);
+ i = 0;
+ foreach(lc, pruneinfo->prune_infos)
{
- int *new_subplan_indexes;
- Bitmapset *new_other_subplans;
- int i;
- int newidx;
+ PartitionPruningData *prunedata;
+ List *partrelpruneinfos = lfirst_node(List, lc);
+ int npartrelpruneinfos = list_length(partrelpruneinfos);
+ ListCell *lc2;
+ int j;
- /*
- * First we must build a temporary array which maps old subplan
- * indexes to new ones. For convenience of initialization, we use
- * 1-based indexes in this array and leave pruned items as 0.
- */
- new_subplan_indexes = (int *) palloc0(sizeof(int) * nsubplans);
- newidx = 1;
- i = -1;
- while ((i = bms_next_member(result, i)) >= 0)
- {
- Assert(i < nsubplans);
- new_subplan_indexes[i] = newidx++;
- }
+ /* PartitionedRelPruningData per parent in the hierarchy. */
+ prunedata = (PartitionPruningData *)
+ palloc(offsetof(PartitionPruningData, partrelprunedata) +
+ npartrelpruneinfos * sizeof(PartitionedRelPruningData));
+ partprunedata[i] = prunedata;
+ prunedata->num_partrelprunedata = npartrelpruneinfos;
- /*
- * Now we can update each PartitionedRelPruneInfo's subplan_map with
- * new subplan indexes. We must also recompute its present_parts
- * bitmap.
- */
- for (i = 0; i < prunestate->num_partprunedata; i++)
+ j = 0;
+ foreach(lc2, partrelpruneinfos)
{
- PartitionPruningData *prunedata = prunestate->partprunedata[i];
- int j;
+ PartitionedRelPruneInfo *pinfo = lfirst_node(PartitionedRelPruneInfo, lc2);
+ PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
+ RangeTblEntry *partrte = rt_fetch(pinfo->rtindex, rtable);
+ Relation partrel;
+ PartitionDesc partdesc;
+ PartitionKey partkey;
/*
- * Within each hierarchy, we perform this loop in back-to-front
- * order so that we determine present_parts for the lowest-level
- * partitioned tables first. This way we can tell whether a
- * sub-partitioned table's partitions were entirely pruned so we
- * can exclude it from the current level's present_parts.
+ * We can rely on the copies of the partitioned table's partition
+ * key and partition descriptor appearing in its relcache entry,
+ * because that entry will be held open and locked while the
+ * PartitionedRelPruningData is in use.
*/
- for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
+ partrel = table_open(partrte->relid, partrte->rellockmode);
+ partkey = RelationGetPartitionKey(partrel);
+ partdesc = PartitionDirectoryLookup(pdir, partrel);
+
+ /*
+ * Initialize the subplan_map and subpart_map.
+ *
+ * Because we request detached partitions to be included, and
+ * detaching waits for old transactions, it is safe to assume that
+ * no partitions have disappeared since this query was planned.
+ *
+ * However, new partitions may have been added.
+ */
+ Assert(partdesc->nparts >= pinfo->nparts);
+ pprune->nparts = partdesc->nparts;
+ pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ if (partdesc->nparts == pinfo->nparts)
{
- PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
- int nparts = pprune->nparts;
- int k;
+ /*
+ * There are no new partitions, so this is simple. We can
+ * simply point to the subpart_map from the plan, but we must
+ * copy the subplan_map since we may change it later.
+ */
+ pprune->subpart_map = pinfo->subpart_map;
+ memcpy(pprune->subplan_map, pinfo->subplan_map,
+ sizeof(int) * pinfo->nparts);
- /* We just rebuild present_parts from scratch */
- bms_free(pprune->present_parts);
- pprune->present_parts = NULL;
+ /*
+ * Double-check that the list of unpruned relations has not
+ * changed. (Pruned partitions are not in relid_map[].)
+ */
+#ifdef USE_ASSERT_CHECKING
+ for (int k = 0; k < pinfo->nparts; k++)
+ {
+ Assert(partdesc->oids[k] == pinfo->relid_map[k] ||
+ pinfo->subplan_map[k] == -1);
+ }
+#endif
+ }
+ else
+ {
+ int pd_idx = 0;
+ int pp_idx;
- for (k = 0; k < nparts; k++)
+ /*
+ * Some new partitions have appeared since plan time, and
+ * those are reflected in our PartitionDesc but were not
+ * present in the one used to construct subplan_map and
+ * subpart_map. So we must construct new and longer arrays
+ * where the partitions that were originally present map to
+ * the same sub-structures, and any added partitions map to
+ * -1, as if the new partitions had been pruned.
+ *
+ * Note: pinfo->relid_map[] may contain InvalidOid entries for
+ * partitions pruned by the planner. We cannot tell exactly
+ * which of the partdesc entries these correspond to, but we
+ * don't have to; just skip over them. The non-pruned
+ * relid_map entries, however, had better be a subset of the
+ * partdesc entries and in the same order.
+ */
+ pprune->subpart_map = palloc(sizeof(int) * partdesc->nparts);
+ for (pp_idx = 0; pp_idx < partdesc->nparts; pp_idx++)
{
- int oldidx = pprune->subplan_map[k];
- int subidx;
+ /* Skip any InvalidOid relid_map entries */
+ while (pd_idx < pinfo->nparts &&
+ !OidIsValid(pinfo->relid_map[pd_idx]))
+ pd_idx++;
- /*
- * If this partition existed as a subplan then change the
- * old subplan index to the new subplan index. The new
- * index may become -1 if the partition was pruned above,
- * or it may just come earlier in the subplan list due to
- * some subplans being removed earlier in the list. If
- * it's a subpartition, add it to present_parts unless
- * it's entirely pruned.
- */
- if (oldidx >= 0)
+ if (pd_idx < pinfo->nparts &&
+ pinfo->relid_map[pd_idx] == partdesc->oids[pp_idx])
{
- Assert(oldidx < nsubplans);
- pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
-
- if (new_subplan_indexes[oldidx] > 0)
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
+ /* match... */
+ pprune->subplan_map[pp_idx] =
+ pinfo->subplan_map[pd_idx];
+ pprune->subpart_map[pp_idx] =
+ pinfo->subpart_map[pd_idx];
+ pd_idx++;
}
- else if ((subidx = pprune->subpart_map[k]) >= 0)
+ else
{
- PartitionedRelPruningData *subprune;
-
- subprune = &prunedata->partrelprunedata[subidx];
-
- if (!bms_is_empty(subprune->present_parts))
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
+ /* this partdesc entry is not in the plan */
+ pprune->subplan_map[pp_idx] = -1;
+ pprune->subpart_map[pp_idx] = -1;
}
}
+
+ /*
+ * It might seem that we need to skip any trailing InvalidOid
+ * entries in pinfo->relid_map before checking that we scanned
+ * all of the relid_map. But we will have skipped them above,
+ * because they must correspond to some partdesc->oids
+ * entries; we just couldn't tell which.
+ */
+ if (pd_idx != pinfo->nparts)
+ elog(ERROR, "could not match partition child tables to plan elements");
}
+
+ /* present_parts is also subject to later modification */
+ pprune->present_parts = bms_copy(pinfo->present_parts);
+ pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
+ if (pprune->initial_pruning_steps)
+ ExecInitPruningContext(&pprune->initial_context,
+ pprune->initial_pruning_steps,
+ partdesc, partkey, NULL, econtext);
+
+ table_close(partrel, NoLock);
+ j++;
}
+ i++;
+ }
+
+ /*
+ * For each hierarchy, do the pruning tests, and add nondeletable
+ * subplans' indexes to result.
+ */
+ for (i = 0; i < n_part_hierarchies; i++)
+ {
+ PartitionPruningData *prunedata = partprunedata[i];
+ PartitionedRelPruningData *pprune;
/*
- * We must also recompute the other_subplans set, since indexes in it
- * may change.
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
*/
- new_other_subplans = NULL;
- i = -1;
- while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
- new_other_subplans = bms_add_member(new_other_subplans,
- new_subplan_indexes[i] - 1);
+ pprune = &prunedata->partrelprunedata[0];
+ find_matching_subplans_recurse(prunedata, pprune, true, &result);
- bms_free(prunestate->other_subplans);
- prunestate->other_subplans = new_other_subplans;
+ /*
+ * Collect the RT indexes of surviving parents if the callers asked
+ * to see them.
+ */
+ if (parentrelids)
+ {
+ int j;
+ List *partrelpruneinfos = list_nth_node(List,
+ pruneinfo->prune_infos,
+ i);
- pfree(new_subplan_indexes);
+ for (j = 0; j < prunedata->num_partrelprunedata; j++)
+ {
+ PartitionedRelPruneInfo *pinfo = list_nth_node(PartitionedRelPruneInfo,
+ partrelpruneinfos, j);
+
+ pprune = &prunedata->partrelprunedata[j];
+ if (!bms_is_empty(pprune->present_parts))
+ *parentrelids = bms_add_member(*parentrelids, pinfo->rtindex);
+ }
+ }
+
+ /* Release space used up in our ExprContext. */
+ ResetExprContext(econtext);
+ }
+
+ /* Add in any subplans that partition pruning didn't account for. */
+ result = bms_add_members(result, pruneinfo->other_subplans);
+
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Copy result out of the temp context before we reset it */
+ result = bms_copy(result);
+ if (parentrelids)
+ *parentrelids = bms_copy(*parentrelids);
+
+ /* Safe to drop the temporary context */
+ MemoryContextDelete(tmpcontext);
+
+ /* Free the ExprState, and EState if needed. */
+ FreeExprContext(econtext, true);
+ if (free_estate)
+ {
+ FreeExecutorState(estate);
+ estate = NULL;
}
return result;
@@ -2018,6 +2267,11 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
prunedata = prunestate->partprunedata[i];
pprune = &prunedata->partrelprunedata[0];
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
find_matching_subplans_recurse(prunedata, pprune, false, &result);
/* Expression eval may have used space in node's ps_ExprContext too */
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 6a2daa6e76..7f813476ab 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -136,24 +136,15 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
- PartitionPruneState *prunestate;
-
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &appendstate->ps);
-
- /* Create the working data structure for pruning. */
- prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
- node->part_prune_info);
- appendstate->as_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
+ if (node->part_prune_info->contains_init_steps)
{
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->appendplans));
-
+ validsubplans =
+ ExecFindInitialMatchingSubPlans(node->part_prune_info,
+ estate, estate->es_range_table,
+ estate->es_param_list_info,
+ NULL);
nplans = bms_num_members(validsubplans);
+ Assert(nplans >= 0);
}
else
{
@@ -163,12 +154,26 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
validsubplans = bms_add_range(NULL, 0, nplans - 1);
}
+ /* Create the working data structure for run-time pruning. */
+ if (node->part_prune_info->contains_exec_steps)
+ {
+ PartitionPruneState *prunestate;
+
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, &appendstate->ps);
+ prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
+ node->part_prune_info,
+ validsubplans,
+ list_length(node->appendplans));
+
+ appendstate->as_prune_state = prunestate;
+ }
/*
* When no run-time pruning is required and there's at least one
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ else
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 617bffb206..51c5c3433d 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -84,23 +84,15 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
- PartitionPruneState *prunestate;
-
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &mergestate->ps);
-
- prunestate = ExecCreatePartitionPruneState(&mergestate->ps,
- node->part_prune_info);
- mergestate->ms_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
+ if (node->part_prune_info->contains_init_steps)
{
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->mergeplans));
-
+ validsubplans =
+ ExecFindInitialMatchingSubPlans(node->part_prune_info,
+ estate, estate->es_range_table,
+ estate->es_param_list_info,
+ NULL);
nplans = bms_num_members(validsubplans);
+ Assert(nplans >= 0);
}
else
{
@@ -110,13 +102,28 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
validsubplans = bms_add_range(NULL, 0, nplans - 1);
}
+ /* Create the working data structure for run-time pruning. */
+ if (node->part_prune_info->contains_exec_steps)
+ {
+ PartitionPruneState *prunestate;
+
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, &mergestate->ps);
+ prunestate = ExecCreatePartitionPruneState(&mergestate->ps,
+ node->part_prune_info,
+ validsubplans,
+ list_length(node->mergeplans));
+
+ mergestate->ms_prune_state = prunestate;
+ }
/*
* When no run-time pruning is required and there's at least one
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ else
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
+
}
else
{
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index df0b747883..57f2fce3d4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -94,9 +94,11 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(transientPlan);
COPY_SCALAR_FIELD(dependsOnRole);
COPY_SCALAR_FIELD(parallelModeNeeded);
+ COPY_SCALAR_FIELD(usesPreExecPruning);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
COPY_NODE_FIELD(rtable);
+ COPY_BITMAPSET_FIELD(relationRTIs);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
COPY_NODE_FIELD(subplans);
@@ -1277,6 +1279,8 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(contains_init_steps);
+ COPY_SCALAR_FIELD(contains_exec_steps);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index e276264882..a13ee087a8 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -31,7 +31,10 @@ static bool planstate_walk_subplans(List *plans, bool (*walker) (),
void *context);
static bool planstate_walk_members(PlanState **planstates, int nplans,
bool (*walker) (), void *context);
-
+static bool plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context);
+static bool plan_walk_members(List *plans, bool (*walker) (), void *context);
/*
* exprType -
@@ -4105,3 +4108,119 @@ planstate_walk_members(PlanState **planstates, int nplans,
return false;
}
+
+/*
+ * plan_tree_walker --- walk plantrees
+ *
+ * The walker has already visited the current node, and so we need only
+ * recurse into any sub-nodes it has.
+ */
+bool
+plan_tree_walker(Plan *plan,
+ bool (*walker) (),
+ void *context)
+{
+ ListCell *lc;
+
+ /* Guard against stack overflow due to overly complex plan trees */
+ check_stack_depth();
+
+ /* initPlan-s */
+ if (plan_walk_subplans(plan->initPlan, walker, context))
+ return true;
+
+ /* lefttree */
+ if (outerPlan(plan))
+ {
+ if (walker(outerPlan(plan), context))
+ return true;
+ }
+
+ /* righttree */
+ if (innerPlan(plan))
+ {
+ if (walker(innerPlan(plan), context))
+ return true;
+ }
+
+ /* special child plans */
+ switch (nodeTag(plan))
+ {
+ case T_Append:
+ if (plan_walk_members(((Append *) plan)->appendplans,
+ walker, context))
+ return true;
+ break;
+ case T_MergeAppend:
+ if (plan_walk_members(((MergeAppend *) plan)->mergeplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapAnd:
+ if (plan_walk_members(((BitmapAnd *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapOr:
+ if (plan_walk_members(((BitmapOr *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_SubqueryScan:
+ if (walker(((SubqueryScan *) plan)->subplan, context))
+ return true;
+ break;
+ case T_CustomScan:
+ foreach(lc, ((CustomScan *) plan)->custom_plans)
+ {
+ if (walker((Plan *) lfirst(lc), context))
+ return true;
+ }
+ break;
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * Walk a list of SubPlans (or initPlans, which also use SubPlan nodes).
+ */
+static bool
+plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context)
+{
+ ListCell *lc;
+ PlannedStmt *plannedstmt = (PlannedStmt *) context;
+
+ foreach(lc, plans)
+ {
+ SubPlan *sp = lfirst_node(SubPlan, lc);
+ Plan *p = list_nth(plannedstmt->subplans, sp->plan_id - 1);
+
+ if (walker(p, context))
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Walk the constituent plans of a ModifyTable, Append, MergeAppend,
+ * BitmapAnd, or BitmapOr node.
+ */
+static bool
+plan_walk_members(List *plans, bool (*walker) (), void *context)
+{
+ ListCell *lc;
+
+ foreach(lc, plans)
+ {
+ if (walker(lfirst(lc), context))
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 91a89b6d51..8364633d2e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -312,9 +312,11 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(transientPlan);
WRITE_BOOL_FIELD(dependsOnRole);
WRITE_BOOL_FIELD(parallelModeNeeded);
+ WRITE_BOOL_FIELD(usesPreExecPruning);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
WRITE_NODE_FIELD(rtable);
+ WRITE_BITMAPSET_FIELD(relationRTIs);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
WRITE_NODE_FIELD(subplans);
@@ -1003,6 +1005,8 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(contains_init_steps);
+ WRITE_BOOL_FIELD(contains_exec_steps);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -2273,6 +2277,7 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_NODE_FIELD(subplans);
WRITE_BITMAPSET_FIELD(rewindPlanIDs);
WRITE_NODE_FIELD(finalrtable);
+ WRITE_BITMAPSET_FIELD(relationRTIs);
WRITE_NODE_FIELD(finalrowmarks);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index d79af6e56e..df06782c3c 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1585,9 +1585,11 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(transientPlan);
READ_BOOL_FIELD(dependsOnRole);
READ_BOOL_FIELD(parallelModeNeeded);
+ READ_BOOL_FIELD(usesPreExecPruning);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
READ_NODE_FIELD(rtable);
+ READ_BITMAPSET_FIELD(relationRTIs);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
READ_NODE_FIELD(subplans);
@@ -2533,6 +2535,8 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(contains_init_steps);
+ READ_BOOL_FIELD(contains_exec_steps);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index bd01ec0526..37a07cb258 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -517,8 +517,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->transientPlan = glob->transientPlan;
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
+ result->usesPreExecPruning = glob->usesPreExecPruning;
result->planTree = top_plan;
result->rtable = glob->finalrtable;
+ result->relationRTIs = glob->relationRTIs;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 6ccec759bd..4616dc675d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -483,6 +483,7 @@ static void
add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte)
{
RangeTblEntry *newrte;
+ Index rti = list_length(glob->finalrtable) + 1;
/* flat copy to duplicate all the scalar fields */
newrte = (RangeTblEntry *) palloc(sizeof(RangeTblEntry));
@@ -517,7 +518,10 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte)
* but it would probably cost more cycles than it would save.
*/
if (newrte->rtekind == RTE_RELATION)
+ {
+ glob->relationRTIs = bms_add_member(glob->relationRTIs, rti);
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
+ }
}
/*
@@ -1515,6 +1519,9 @@ set_append_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (aplan->part_prune_info->contains_init_steps)
+ root->glob->usesPreExecPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
@@ -1579,6 +1586,9 @@ set_mergeappend_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (mplan->part_prune_info->contains_init_steps)
+ root->glob->usesPreExecPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index e00edbe5c8..d2874f716e 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *contains_init_steps,
+ bool *contains_exec_steps);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -230,6 +232,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool contains_init_steps = false;
+ bool contains_exec_steps = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -309,12 +313,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_contains_init_steps,
+ partrel_contains_exec_steps;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_contains_init_steps,
+ &partrel_contains_exec_steps);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -323,6 +331,10 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+ if (!contains_init_steps)
+ contains_init_steps = partrel_contains_init_steps;
+ if (!contains_exec_steps)
+ contains_exec_steps = partrel_contains_exec_steps;
}
pfree(relid_subplan_map);
@@ -337,6 +349,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->contains_init_steps = contains_init_steps;
+ pruneinfo->contains_exec_steps = contains_exec_steps;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -435,13 +449,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *contains_init_steps and *contains_exec_steps are set to indicate
+ * that the returned PartitionedRelPruneInfos contains pruning steps
+ * that can be performed before and during execution, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *contains_init_steps,
+ bool *contains_exec_steps)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -452,6 +471,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *contains_init_steps = false;
+ *contains_exec_steps = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -539,6 +562,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * by noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -613,6 +639,11 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ if (!*contains_init_steps)
+ *contains_init_steps = (initial_pruning_steps != NIL);
+ if (!*contains_exec_steps)
+ *contains_exec_steps = (exec_pruning_steps != NIL);
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -798,6 +829,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
/* These are not valid when being called from the planner */
context.planstate = NULL;
+ context.exprcontext = NULL;
context.exprstates = NULL;
/* Actual pruning happens here. */
@@ -808,8 +840,8 @@ prune_append_rel_partitions(RelOptInfo *rel)
* get_matching_partitions
* Determine partitions that survive partition pruning
*
- * Note: context->planstate must be set to a valid PlanState when the
- * pruning_steps were generated with a target other than PARTTARGET_PLANNER.
+ * Note: context->exprcontext must be valid when the pruning_steps were
+ * generated with a target other than PARTTARGET_PLANNER.
*
* Returns a Bitmapset of the RelOptInfo->part_rels indexes of the surviving
* partitions.
@@ -3654,7 +3686,7 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
* exprstate array.
*
* Note that the evaluated result may be in the per-tuple memory context of
- * context->planstate->ps_ExprContext, and we may have leaked other memory
+ * context->exprcontext, and we may have leaked other memory
* there too. This memory must be recovered by resetting that ExprContext
* after we're done with the pruning operation (see execPartition.c).
*/
@@ -3677,13 +3709,18 @@ partkey_datum_from_expr(PartitionPruneContext *context,
ExprContext *ectx;
/*
- * We should never see a non-Const in a step unless we're running in
- * the executor.
+ * We should never see a non-Const in a step unless the caller has
+ * passed a valid ExprContext.
+ *
+ * When context->planstate is valid, context->exprcontext is same
+ * as context->planstate->ps_ExprContext.
*/
- Assert(context->planstate != NULL);
+ Assert(context->planstate != NULL || context->exprcontext != NULL);
+ Assert(context->planstate == NULL ||
+ (context->exprcontext == context->planstate->ps_ExprContext));
exprstate = context->exprstates[stateidx];
- ectx = context->planstate->ps_ExprContext;
+ ectx = context->exprcontext;
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 6767eae8f2..6161907ace 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -58,6 +58,7 @@
#include "access/transam.h"
#include "catalog/namespace.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
@@ -99,14 +100,26 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, bool acquire,
+ ParamListInfo boundParams);
+struct GetLockableRelations_context
+{
+ PlannedStmt *plannedstmt;
+ Bitmapset *relations;
+ ParamListInfo params;
+};
+static Bitmapset *GetLockableRelations(PlannedStmt *plannedstmt,
+ ParamListInfo boundParams);
+static bool GetLockableRelations_worker(Plan *plan,
+ struct GetLockableRelations_context *context);
+static Bitmapset *get_plan_scanrelids(Plan *plan);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -792,7 +805,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
CachedPlan *plan = plansource->gplan;
@@ -826,7 +839,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ AcquireExecutorLocks(plan->stmt_list, true, boundParams);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +861,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ AcquireExecutorLocks(plan->stmt_list, false, boundParams);
}
/*
@@ -1160,7 +1173,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1366,7 +1379,6 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
foreach(lc, plan->stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc);
- ListCell *lc2;
if (plannedstmt->commandType == CMD_UTILITY)
return false;
@@ -1375,13 +1387,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
* We have to grovel through the rtable because it's likely to contain
* an RTE_RESULT relation, rather than being totally empty.
*/
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (rte->rtekind == RTE_RELATION)
- return false;
- }
+ if (!bms_is_empty(plannedstmt->relationRTIs))
+ return false;
}
/*
@@ -1740,14 +1747,15 @@ QueryListGetPrimaryStmt(List *stmts)
* or release them if acquire is false.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, bool acquire, ParamListInfo boundParams)
{
ListCell *lc1;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ Bitmapset *relations;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1765,9 +1773,22 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Fetch the RT indexes of only the relations that will be actually
+ * scanned when the plan is executed. This skips over scan nodes
+ * appearing as child subnodes of any Append/MergeAppend nodes present
+ * in the plan tree. It does so by performing
+ * ExecFindInitialMatchingSubPlans() to run any pruning steps
+ * contained in those nodes that can be safely run at this point, using
+ * 'boundParams' to evaluate any EXTERN parameters contained in the
+ * steps.
+ */
+ relations = GetLockableRelations(plannedstmt, boundParams);
+
+ rti = -1;
+ while ((rti = bms_next_member(relations, rti)) >= 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1786,6 +1807,166 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * GetLockableRelations
+ * Returns set of RT indexes of relations that must be locked by
+ * AcquireExecutorLocks()
+ */
+static Bitmapset *
+GetLockableRelations(PlannedStmt *plannedstmt, ParamListInfo boundParams)
+{
+ ListCell *lc;
+ struct GetLockableRelations_context context;
+
+ /* None of the relation scanning nodes are prunable here. */
+ if (!plannedstmt->usesPreExecPruning)
+ return plannedstmt->relationRTIs;
+
+ /*
+ * Look for prunable nodes in the main plan tree, followed by those in
+ * subplans.
+ */
+ context.plannedstmt = plannedstmt;
+ context.params = boundParams;
+ context.relations = NULL;
+
+ (void) GetLockableRelations_worker(plannedstmt->planTree, &context);
+
+ foreach(lc, plannedstmt->subplans)
+ {
+ Plan *subplan = lfirst(lc);
+
+ (void) GetLockableRelations_worker(subplan, &context);
+ }
+
+ return context.relations;
+}
+
+/*
+ * GetLockableRelations_worker
+ * Adds RT indexes of relations to be scanned by plan to
+ * context->relations
+ *
+ * For plan node types that support pruning, this only adds child plan
+ * subnodes that satisfy the "initial" pruning steps.
+ */
+static bool
+GetLockableRelations_worker(Plan *plan,
+ struct GetLockableRelations_context *context)
+{
+ if (plan == NULL)
+ return false;
+
+ switch(nodeTag(plan))
+ {
+ /* Nodes scanning a relation or relations. */
+ case T_SeqScan:
+ case T_SampleScan:
+ case T_IndexScan:
+ case T_IndexOnlyScan:
+ case T_BitmapHeapScan:
+ case T_TidScan:
+ case T_TidRangeScan:
+ context->relations = bms_add_member(context->relations,
+ ((Scan *) plan)->scanrelid);
+ return false;
+ case T_ForeignScan:
+ context->relations = bms_add_members(context->relations,
+ ((ForeignScan *) plan)->fs_relids);
+ return false;
+ case T_CustomScan:
+ context->relations = bms_add_members(context->relations,
+ ((CustomScan *) plan)->custom_relids);
+ return false;
+
+ /* Nodes containing prunable subnodes. */
+ case T_Append:
+ case T_MergeAppend:
+ {
+ PlannedStmt *plannedstmt = context->plannedstmt;
+ List *rtable = plannedstmt->rtable;
+ ParamListInfo params = context->params;
+ PartitionPruneInfo *pruneinfo;
+ Bitmapset *validsubplans;
+ Bitmapset *parentrelids;
+
+ pruneinfo = IsA(plan, Append) ?
+ ((Append *) plan)->part_prune_info :
+ ((MergeAppend *) plan)->part_prune_info;
+
+ if (pruneinfo && pruneinfo->contains_init_steps)
+ {
+ int i;
+ List *subplans = IsA(plan, Append) ?
+ ((Append *) plan)->appendplans :
+ ((MergeAppend *) plan)->mergeplans;
+
+ validsubplans =
+ ExecFindInitialMatchingSubPlans(pruneinfo,
+ NULL, rtable,
+ params,
+ &parentrelids);
+
+ /* All relevant parents must be locked. */
+ Assert(bms_num_members(parentrelids) > 0);
+ context->relations = bms_add_members(context->relations,
+ parentrelids);
+
+ /* And all leaf partitions that will be scanned. */
+ i = -1;
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ context->relations =
+ bms_add_members(context->relations,
+ get_plan_scanrelids(subplan));
+ }
+
+ return false;
+ }
+ }
+ break;
+
+ default:
+ break;
+ }
+
+ return plan_tree_walker(plan, GetLockableRelations_worker,
+ (void *) context);
+}
+
+/*
+ * get_plan_scanrelid
+ * Returns RT indexes of the relation(s) scanned by plan
+ */
+static Bitmapset *
+get_plan_scanrelids(Plan *plan)
+{
+ if (plan == NULL)
+ return NULL;
+
+ switch(nodeTag(plan))
+ {
+ case T_SeqScan:
+ case T_SampleScan:
+ case T_IndexScan:
+ case T_IndexOnlyScan:
+ case T_BitmapHeapScan:
+ case T_TidScan:
+ case T_TidRangeScan:
+ return bms_make_singleton(((Scan *) plan)->scanrelid);
+ case T_ForeignScan:
+ return ((ForeignScan *) plan)->fs_relids;
+ case T_CustomScan:
+ return ((CustomScan *) plan)->custom_relids;
+ default:
+ break;
+ }
+
+ return NULL;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 694e38b7dd..0eeaf3e79d 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -90,8 +90,6 @@ typedef struct PartitionPruningData
* These must not be pruned.
* prune_context A short-lived memory context in which to execute the
* partition pruning functions.
- * do_initial_prune true if pruning should be performed during executor
- * startup (at any hierarchy level).
* do_exec_prune true if pruning should be performed during
* executor run (at any hierarchy level).
* num_partprunedata Number of items in "partprunedata" array.
@@ -104,7 +102,6 @@ typedef struct PartitionPruneState
Bitmapset *execparamids;
Bitmapset *other_subplans;
MemoryContext prune_context;
- bool do_initial_prune;
bool do_exec_prune;
int num_partprunedata;
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
@@ -120,9 +117,13 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
PartitionTupleRouting *proute);
extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
+ PartitionPruneInfo *partitionpruneinfo,
+ Bitmapset *initially_valid_subplans,
+ int nsubplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
-extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
- int nsubplans);
+extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneInfo *pruneinfo,
+ EState *estate, List *rtable,
+ ParamListInfo params,
+ Bitmapset **parentrelids);
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/nodeFuncs.h b/src/include/nodes/nodeFuncs.h
index 03a346c01d..8b985a4706 100644
--- a/src/include/nodes/nodeFuncs.h
+++ b/src/include/nodes/nodeFuncs.h
@@ -158,5 +158,8 @@ extern bool raw_expression_tree_walker(Node *node, bool (*walker) (),
struct PlanState;
extern bool planstate_tree_walker(struct PlanState *planstate, bool (*walker) (),
void *context);
+struct Plan;
+extern bool plan_tree_walker(struct Plan *plan, bool (*walker) (),
+ void *context);
#endif /* NODEFUNCS_H */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 324d92880b..d041b4d924 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -101,6 +101,9 @@ typedef struct PlannerGlobal
List *finalrtable; /* "flat" rangetable for executor */
+ Bitmapset *relationRTIs; /* Indexes of RTE_RELATION entries in range
+ * table */
+
List *finalrowmarks; /* "flat" list of PlanRowMarks */
List *resultRelations; /* "flat" list of integer RT indexes */
@@ -129,6 +132,9 @@ typedef struct PlannerGlobal
char maxParallelHazard; /* worst PROPARALLEL hazard level */
+ bool usesPreExecPruning; /* Do some Plan nodes use pre-execution
+ * partition pruning */
+
PartitionDirectory partition_directory; /* partition descriptors */
} PlannerGlobal;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index be3c30704a..23bf04578b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -59,12 +59,18 @@ typedef struct PlannedStmt
bool parallelModeNeeded; /* parallel mode required to execute? */
+ bool usesPreExecPruning; /* Do some Plan nodes use pre-execution
+ * partition pruning */
+
int jitFlags; /* which forms of JIT should be performed */
struct Plan *planTree; /* tree of Plan nodes */
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *relationRTIs; /* Indexes of RTE_RELATION entries in range
+ * table */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1157,6 +1163,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * contains_init_steps Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * contains_exec_steps Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1165,6 +1178,8 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ bool contains_init_steps;
+ bool contains_exec_steps;
Bitmapset *other_subplans;
} PartitionPruneInfo;
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 5f51e73a4d..1c9c408f00 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -41,6 +41,8 @@ struct RelOptInfo;
* subsidiary data, such as the FmgrInfos.
* planstate Points to the parent plan node's PlanState when called
* during execution; NULL when called from the planner.
+ * exprcontext ExprContext to use during pre-execution pruning; planstate
+ * would be NULL in that case.
* exprstates Array of ExprStates, indexed as per PruneCxtStateIdx; one
* for each partition key in each pruning step. Allocated if
* planstate is non-NULL, otherwise NULL.
@@ -56,6 +58,7 @@ typedef struct PartitionPruneContext
FmgrInfo *stepcmpfuncs;
MemoryContext ppccontext;
PlanState *planstate;
+ ExprContext *exprcontext;
ExprState **exprstates;
} PartitionPruneContext;
--
2.24.1
On Sat, Dec 25, 2021 at 9:06 AM Amit Langote <amitlangote09@gmail.com> wrote:
Executing generic plans involving partitions is known to become slower
as partition count grows due to a number of bottlenecks, with
AcquireExecutorLocks() showing at the top in profiles.Previous attempt at solving that problem was by David Rowley [1],
where he proposed delaying locking of *all* partitions appearing under
an Append/MergeAppend until "initial" pruning is done during the
executor initialization phase. A problem with that approach that he
has described in [2] is that leaving partitions unlocked can lead to
race conditions where the Plan node belonging to a partition can be
invalidated when a concurrent session successfully alters the
partition between AcquireExecutorLocks() saying the plan is okay to
execute and then actually executing it.However, using an idea that Robert suggested to me off-list a little
while back, it seems possible to determine the set of partitions that
we can safely skip locking. The idea is to look at the "initial" or
"pre-execution" pruning instructions contained in a given Append or
MergeAppend node when AcquireExecutorLocks() is collecting the
relations to lock and consider relations from only those sub-nodes
that survive performing those instructions. I've attempted
implementing that idea in the attached patch.
In which cases, we will have "pre-execution" pruning instructions that
can be used to skip locking partitions? Can you please give a few
examples where this approach will be useful?
The benchmark is showing good results, indeed.
--
Best Wishes,
Ashutosh Bapat
On Tue, Dec 28, 2021 at 22:12 Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
wrote:
On Sat, Dec 25, 2021 at 9:06 AM Amit Langote <amitlangote09@gmail.com>
wrote:Executing generic plans involving partitions is known to become slower
as partition count grows due to a number of bottlenecks, with
AcquireExecutorLocks() showing at the top in profiles.Previous attempt at solving that problem was by David Rowley [1],
where he proposed delaying locking of *all* partitions appearing under
an Append/MergeAppend until "initial" pruning is done during the
executor initialization phase. A problem with that approach that he
has described in [2] is that leaving partitions unlocked can lead to
race conditions where the Plan node belonging to a partition can be
invalidated when a concurrent session successfully alters the
partition between AcquireExecutorLocks() saying the plan is okay to
execute and then actually executing it.However, using an idea that Robert suggested to me off-list a little
while back, it seems possible to determine the set of partitions that
we can safely skip locking. The idea is to look at the "initial" or
"pre-execution" pruning instructions contained in a given Append or
MergeAppend node when AcquireExecutorLocks() is collecting the
relations to lock and consider relations from only those sub-nodes
that survive performing those instructions. I've attempted
implementing that idea in the attached patch.In which cases, we will have "pre-execution" pruning instructions that
can be used to skip locking partitions? Can you please give a few
examples where this approach will be useful?
This is mainly to be useful for prepared queries, so something like:
prepare q as select * from partitioned_table where key = $1;
And that too when execute q(…) uses a generic plan. Generic plans are
problematic because it must contain nodes for all partitions (without any
plan time pruning), which means CheckCachedPlan() has to spend time
proportional to the number of partitions to determine that the plan is
still usable / has not been invalidated; most of that is
AcquireExecutorLocks().
Other bottlenecks, not addressed in this patch, pertain to some executor
startup/shutdown subroutines that process the range table of a PlannedStmt
in its entirety, whose length is also proportional to the number of
partitions when the plan is generic.
The benchmark is showing good results, indeed.
Thanks.
--
Amit Langote
EDB: http://www.enterprisedb.com
On Fri, Dec 31, 2021 at 7:56 AM Amit Langote <amitlangote09@gmail.com> wrote:
On Tue, Dec 28, 2021 at 22:12 Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Sat, Dec 25, 2021 at 9:06 AM Amit Langote <amitlangote09@gmail.com> wrote:
Executing generic plans involving partitions is known to become slower
as partition count grows due to a number of bottlenecks, with
AcquireExecutorLocks() showing at the top in profiles.Previous attempt at solving that problem was by David Rowley [1],
where he proposed delaying locking of *all* partitions appearing under
an Append/MergeAppend until "initial" pruning is done during the
executor initialization phase. A problem with that approach that he
has described in [2] is that leaving partitions unlocked can lead to
race conditions where the Plan node belonging to a partition can be
invalidated when a concurrent session successfully alters the
partition between AcquireExecutorLocks() saying the plan is okay to
execute and then actually executing it.However, using an idea that Robert suggested to me off-list a little
while back, it seems possible to determine the set of partitions that
we can safely skip locking. The idea is to look at the "initial" or
"pre-execution" pruning instructions contained in a given Append or
MergeAppend node when AcquireExecutorLocks() is collecting the
relations to lock and consider relations from only those sub-nodes
that survive performing those instructions. I've attempted
implementing that idea in the attached patch.In which cases, we will have "pre-execution" pruning instructions that
can be used to skip locking partitions? Can you please give a few
examples where this approach will be useful?This is mainly to be useful for prepared queries, so something like:
prepare q as select * from partitioned_table where key = $1;
And that too when execute q(…) uses a generic plan. Generic plans are problematic because it must contain nodes for all partitions (without any plan time pruning), which means CheckCachedPlan() has to spend time proportional to the number of partitions to determine that the plan is still usable / has not been invalidated; most of that is AcquireExecutorLocks().
Other bottlenecks, not addressed in this patch, pertain to some executor startup/shutdown subroutines that process the range table of a PlannedStmt in its entirety, whose length is also proportional to the number of partitions when the plan is generic.
The benchmark is showing good results, indeed.
Indeed.
Here are few comments for v1 patch:
+ /* Caller error if we get here without contains_init_steps */
+ Assert(pruneinfo->contains_init_steps);
- prunedata = prunestate->partprunedata[i];
- pprune = &prunedata->partrelprunedata[0];
- /* Perform pruning without using PARAM_EXEC Params */
- find_matching_subplans_recurse(prunedata, pprune, true, &result);
+ if (parentrelids)
+ *parentrelids = NULL;
You got two blank lines after Assert.
--
+ /* Set up EState if not in the executor proper. */
+ if (estate == NULL)
+ {
+ estate = CreateExecutorState();
+ estate->es_param_list_info = params;
+ free_estate = true;
}
... [Skip]
+ if (free_estate)
+ {
+ FreeExecutorState(estate);
+ estate = NULL;
}
I think this work should be left to the caller.
--
/*
* Stuff that follows matches exactly what ExecCreatePartitionPruneState()
* does, except we don't need a PartitionPruneState here, so don't call
* that function.
*
* XXX some refactoring might be good.
*/
+1, while doing it would be nice if foreach_current_index() is used
instead of the i & j sequence in the respective foreach() block, IMO.
--
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ context->relations =
+ bms_add_members(context->relations,
+ get_plan_scanrelids(subplan));
+ }
I think instead of get_plan_scanrelids() the
GetLockableRelations_worker() can be used; if so, then no need to add
get_plan_scanrelids() function.
--
/* Nodes containing prunable subnodes. */
+ case T_MergeAppend:
+ {
+ PlannedStmt *plannedstmt = context->plannedstmt;
+ List *rtable = plannedstmt->rtable;
+ ParamListInfo params = context->params;
+ PartitionPruneInfo *pruneinfo;
+ Bitmapset *validsubplans;
+ Bitmapset *parentrelids;
...
if (pruneinfo && pruneinfo->contains_init_steps)
{
int i;
...
return false;
}
}
break;
Most of the declarations need to be moved inside the if-block.
Also, initially, I was a bit concerned regarding this code block
inside GetLockableRelations_worker(), what if (pruneinfo &&
pruneinfo->contains_init_steps) evaluated to false? After debugging I
realized that plan_tree_walker() will do the needful -- a bit of
comment would have helped.
--
+ case T_CustomScan:
+ foreach(lc, ((CustomScan *) plan)->custom_plans)
+ {
+ if (walker((Plan *) lfirst(lc), context))
+ return true;
+ }
+ break;
Why not plan_walk_members() call like other nodes?
Regards,
Amul
On Fri, Dec 24, 2021 at 10:36 PM Amit Langote <amitlangote09@gmail.com> wrote:
However, using an idea that Robert suggested to me off-list a little
while back, it seems possible to determine the set of partitions that
we can safely skip locking. The idea is to look at the "initial" or
"pre-execution" pruning instructions contained in a given Append or
MergeAppend node when AcquireExecutorLocks() is collecting the
relations to lock and consider relations from only those sub-nodes
that survive performing those instructions. I've attempted
implementing that idea in the attached patch.
Hmm. The first question that occurs to me is whether this is fully safe.
Currently, AcquireExecutorLocks calls LockRelationOid for every
relation involved in the query. That means we will probably lock at
least one relation on which we previously had no lock and thus
AcceptInvalidationMessages(). That will end up marking the query as no
longer valid and CheckCachedPlan() will realize this and tell the
caller to replan. In the corner case where we already hold all the
required locks, we will not accept invalidation messages at this
point, but must have done so after acquiring the last of the locks
required, and if that didn't mark the plan invalid, it can't be
invalid now either. Either way, everything is fine.
With the proposed patch, we might never lock some of the relations
involved in the query. Therefore, if one of those relations has been
modified in some way that would invalidate the plan, we will
potentially fail to discover this, and will use the plan anyway. For
instance, suppose there's one particular partition that has an extra
index and the plan involves an Index Scan using that index. Now
suppose that the scan of the partition in question is pruned, but
meanwhile, the index has been dropped. Now we're running a plan that
scans a nonexistent index. Admittedly, we're not running that part of
the plan. But is that enough for this to be safe? There are things
(like EXPLAIN or auto_explain) that we might try to do even on a part
of the plan tree that we don't try to run. Those things might break,
because for example we won't be able to look up the name of an index
in the catalogs for EXPLAIN output if the index is gone.
This is just a relatively simple example and I think there are
probably a bunch of others. There are a lot of kinds of DDL that could
be performed on a partition that gets pruned away: DROP INDEX is just
one example. The point is that to my knowledge we have no existing
case where we try to use a plan that might be only partly valid, so if
we introduce one, there's some risk there. I thought for a while, too,
about whether changes to some object in a part of the plan that we're
not executing could break things for the rest of the plan even if we
never do anything with the plan but execute it. I can't quite see any
actual hazard. For example, I thought about whether we might try to
get the tuple descriptor for the pruned-away object and get a
different tuple descriptor than we were expecting. I think we can't,
because (1) the pruned object has to be a partition, and tuple
descriptors have to match throughout the partitioning hierarchy,
except for column ordering, which currently can't be changed
after-the-fact and (2) IIRC, the tuple descriptor is stored in the
plan and not reconstructed at runtime and (3) if we don't end up
opening the relation because it's pruned, then we certainly can't do
anything with its tuple descriptor. But it might be worth giving more
thought to the question of whether there's any other way we could be
depending on the details of an object that ended up getting pruned.
Note that "initial" pruning steps are now performed twice when
executing generic plans: once in AcquireExecutorLocks() to find
partitions to be locked, and a 2nd time in ExecInit[Merge]Append() to
determine the set of partition sub-nodes to be initialized for
execution, though I wasn't able to come up with a good idea to avoid
this duplication.
I think this is something that will need to be fixed somehow. Apart
from the CPU cost, it's scary to imagine that the set of nodes on
which we acquired locks might be different from the set of nodes that
we initialize. If we do the same computation twice, there must be some
non-zero probability of getting a different answer the second time,
even if the circumstances under which it would actually happen are
remote. Consider, for example, a function that is labeled IMMUTABLE
but is really VOLATILE. Now maybe you can get the system to lock one
set of partitions and then initialize a different set of partitions. I
don't think we want to try to reason about what consequences that
might have and prove that somehow it's going to be OK; I think we want
to nail the door shut very tightly to make sure that it can't.
--
Robert Haas
EDB: http://www.enterprisedb.com
Thanks for taking the time to look at this.
On Wed, Jan 12, 2022 at 1:22 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Dec 24, 2021 at 10:36 PM Amit Langote <amitlangote09@gmail.com> wrote:
However, using an idea that Robert suggested to me off-list a little
while back, it seems possible to determine the set of partitions that
we can safely skip locking. The idea is to look at the "initial" or
"pre-execution" pruning instructions contained in a given Append or
MergeAppend node when AcquireExecutorLocks() is collecting the
relations to lock and consider relations from only those sub-nodes
that survive performing those instructions. I've attempted
implementing that idea in the attached patch.Hmm. The first question that occurs to me is whether this is fully safe.
Currently, AcquireExecutorLocks calls LockRelationOid for every
relation involved in the query. That means we will probably lock at
least one relation on which we previously had no lock and thus
AcceptInvalidationMessages(). That will end up marking the query as no
longer valid and CheckCachedPlan() will realize this and tell the
caller to replan. In the corner case where we already hold all the
required locks, we will not accept invalidation messages at this
point, but must have done so after acquiring the last of the locks
required, and if that didn't mark the plan invalid, it can't be
invalid now either. Either way, everything is fine.With the proposed patch, we might never lock some of the relations
involved in the query. Therefore, if one of those relations has been
modified in some way that would invalidate the plan, we will
potentially fail to discover this, and will use the plan anyway. For
instance, suppose there's one particular partition that has an extra
index and the plan involves an Index Scan using that index. Now
suppose that the scan of the partition in question is pruned, but
meanwhile, the index has been dropped. Now we're running a plan that
scans a nonexistent index. Admittedly, we're not running that part of
the plan. But is that enough for this to be safe? There are things
(like EXPLAIN or auto_explain) that we might try to do even on a part
of the plan tree that we don't try to run. Those things might break,
because for example we won't be able to look up the name of an index
in the catalogs for EXPLAIN output if the index is gone.This is just a relatively simple example and I think there are
probably a bunch of others. There are a lot of kinds of DDL that could
be performed on a partition that gets pruned away: DROP INDEX is just
one example. The point is that to my knowledge we have no existing
case where we try to use a plan that might be only partly valid, so if
we introduce one, there's some risk there. I thought for a while, too,
about whether changes to some object in a part of the plan that we're
not executing could break things for the rest of the plan even if we
never do anything with the plan but execute it. I can't quite see any
actual hazard. For example, I thought about whether we might try to
get the tuple descriptor for the pruned-away object and get a
different tuple descriptor than we were expecting. I think we can't,
because (1) the pruned object has to be a partition, and tuple
descriptors have to match throughout the partitioning hierarchy,
except for column ordering, which currently can't be changed
after-the-fact and (2) IIRC, the tuple descriptor is stored in the
plan and not reconstructed at runtime and (3) if we don't end up
opening the relation because it's pruned, then we certainly can't do
anything with its tuple descriptor. But it might be worth giving more
thought to the question of whether there's any other way we could be
depending on the details of an object that ended up getting pruned.
I have pondered on the possible hazards before writing the patch,
mainly because the concerns about a previously discussed proposal were
along similar lines [1]/messages/by-id/CA+TgmoZN-80143F8OhN8Cn5-uDae5miLYVwMapAuc+7+Z7pyNg@mail.gmail.com.
IIUC, you're saying the plan tree is subject to inspection by non-core
code before ExecutorStart() has initialized a PlanState tree, which
must have discarded pruned portions of the plan tree. I wouldn't
claim to have scanned *all* of the core code that could possibly
access the invalidated portions of the plan tree, but from what I have
seen, I couldn't find any site that does. An ExecutorStart_hook()
gets to do that, but from what I can see it is expected to call
standard_ExecutorStart() before doing its thing and supposedly only
looks at the PlanState tree, which must be valid. Actually, EXPLAIN
also does ExecutorStart() before starting to look at the plan (the
PlanState tree), so must not run into pruned plan tree nodes. All
that said, it does sound like wishful thinking to say that no problems
can possibly occur.
At first, I had tried to implement this such that the
Append/MergeAppend nodes are edited to record the result of initial
pruning, but it felt wrong to be munging the plan tree in plancache.c.
Or, maybe this won't be a concern if performing ExecutorStart() is
made a part of CheckCachedPlan() somehow, which would then take locks
on the relation as the PlanState tree is built capturing any plan
invalidations, instead of AcquireExecutorLocks(). That does sound like
an ambitious undertaking though.
Note that "initial" pruning steps are now performed twice when
executing generic plans: once in AcquireExecutorLocks() to find
partitions to be locked, and a 2nd time in ExecInit[Merge]Append() to
determine the set of partition sub-nodes to be initialized for
execution, though I wasn't able to come up with a good idea to avoid
this duplication.I think this is something that will need to be fixed somehow. Apart
from the CPU cost, it's scary to imagine that the set of nodes on
which we acquired locks might be different from the set of nodes that
we initialize. If we do the same computation twice, there must be some
non-zero probability of getting a different answer the second time,
even if the circumstances under which it would actually happen are
remote. Consider, for example, a function that is labeled IMMUTABLE
but is really VOLATILE. Now maybe you can get the system to lock one
set of partitions and then initialize a different set of partitions. I
don't think we want to try to reason about what consequences that
might have and prove that somehow it's going to be OK; I think we want
to nail the door shut very tightly to make sure that it can't.
Yeah, the premise of the patch is that "initial" pruning steps produce
the same result both times. I assumed that would be true because the
pruning steps are not allowed to contain any VOLATILE expressions.
Regarding the possibility that IMMUTABLE labeling of functions may be
incorrect, I haven't considered if the runtime pruning code can cope
or whether it should try to. If such a case does occur in practice,
the bad outcome would be an Assert failure in
ExecGetRangeTableRelation() or using a partition unlocked in the
non-assert builds, the latter of which feels especially bad.
--
Amit Langote
EDB: http://www.enterprisedb.com
[1]: /messages/by-id/CA+TgmoZN-80143F8OhN8Cn5-uDae5miLYVwMapAuc+7+Z7pyNg@mail.gmail.com
On Wed, Jan 12, 2022 at 9:32 AM Amit Langote <amitlangote09@gmail.com> wrote:
I have pondered on the possible hazards before writing the patch,
mainly because the concerns about a previously discussed proposal were
along similar lines [1].
True. I think that the hazards are narrower with this proposal,
because if you *delay* locking a partition that you eventually need,
then you might end up trying to actually execute a portion of the plan
that's no longer valid. That seems like hopelessly bad news. On the
other hand, with this proposal, you skip locking altogether, but only
for parts of the plan that you don't plan to execute. That's still
kind of scary, but not to nearly the same degree.
IIUC, you're saying the plan tree is subject to inspection by non-core
code before ExecutorStart() has initialized a PlanState tree, which
must have discarded pruned portions of the plan tree. I wouldn't
claim to have scanned *all* of the core code that could possibly
access the invalidated portions of the plan tree, but from what I have
seen, I couldn't find any site that does. An ExecutorStart_hook()
gets to do that, but from what I can see it is expected to call
standard_ExecutorStart() before doing its thing and supposedly only
looks at the PlanState tree, which must be valid. Actually, EXPLAIN
also does ExecutorStart() before starting to look at the plan (the
PlanState tree), so must not run into pruned plan tree nodes. All
that said, it does sound like wishful thinking to say that no problems
can possibly occur.
Yeah. I don't think it's only non-core code we need to worry about
either. What if I just do EXPLAIN ANALYZE on a prepared query that
ends up pruning away some stuff? IIRC, the pruned subplans are not
shown, so we might escape disaster here, but FWIW if I'd committed
that code I would have pushed hard for showing those and saying "(not
executed)" .... so it's not too crazy to imagine a world in which
things work that way.
At first, I had tried to implement this such that the
Append/MergeAppend nodes are edited to record the result of initial
pruning, but it felt wrong to be munging the plan tree in plancache.c.
It is. You can't munge the plan tree: it's required to be strictly
read-only once generated. It can be serialized and deserialized for
transmission to workers, and it can be shared across executions.
Or, maybe this won't be a concern if performing ExecutorStart() is
made a part of CheckCachedPlan() somehow, which would then take locks
on the relation as the PlanState tree is built capturing any plan
invalidations, instead of AcquireExecutorLocks(). That does sound like
an ambitious undertaking though.
On the surface that would seem to involve abstraction violations, but
maybe that could be finessed somehow. The plancache shouldn't know too
much about what the executor is going to do with the plan, but it
could ask the executor to perform a step that has been designed for
use by the plancache. I guess the core problem here is how to pass
around information that is node-specific before we've stood up the
executor state tree. Maybe the executor could have a function that
does the pruning and returns some kind of array of results that can be
used both to decide what to lock and also what to consider as pruned
at the start of execution. (I'm hand-waving about the details because
I don't know.)
Yeah, the premise of the patch is that "initial" pruning steps produce
the same result both times. I assumed that would be true because the
pruning steps are not allowed to contain any VOLATILE expressions.
Regarding the possibility that IMMUTABLE labeling of functions may be
incorrect, I haven't considered if the runtime pruning code can cope
or whether it should try to. If such a case does occur in practice,
the bad outcome would be an Assert failure in
ExecGetRangeTableRelation() or using a partition unlocked in the
non-assert builds, the latter of which feels especially bad.
Right. I think it's OK for a query to produce wrong answers under
those kinds of conditions - the user has broken everything and gets to
keep all the pieces - but doing stuff that might violate fundamental
assumptions of the system like "relations can only be accessed when
holding a lock on them" feels quite bad. It's not a stretch to imagine
that failing to follow those invariants could take the whole system
down, which is clearly too severe a consequence for the user's failure
to label things properly.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Jan 6, 2022 at 3:45 PM Amul Sul <sulamul@gmail.com> wrote:
Here are few comments for v1 patch:
Thanks Amul. I'm thinking about Robert's latest comments addressing
which may need some rethinking of this whole design, but I decided to
post a v2 taking care of your comments.
+ /* Caller error if we get here without contains_init_steps */ + Assert(pruneinfo->contains_init_steps);- prunedata = prunestate->partprunedata[i];
- pprune = &prunedata->partrelprunedata[0];- /* Perform pruning without using PARAM_EXEC Params */ - find_matching_subplans_recurse(prunedata, pprune, true, &result); + if (parentrelids) + *parentrelids = NULL;You got two blank lines after Assert.
Fixed.
--
+ /* Set up EState if not in the executor proper. */ + if (estate == NULL) + { + estate = CreateExecutorState(); + estate->es_param_list_info = params; + free_estate = true; }... [Skip]
+ if (free_estate) + { + FreeExecutorState(estate); + estate = NULL; }I think this work should be left to the caller.
Done. Also see below...
/*
* Stuff that follows matches exactly what ExecCreatePartitionPruneState()
* does, except we don't need a PartitionPruneState here, so don't call
* that function.
*
* XXX some refactoring might be good.
*/+1, while doing it would be nice if foreach_current_index() is used
instead of the i & j sequence in the respective foreach() block, IMO.
Actually, I rewrote this part quite significantly so that most of the
code remains in its existing place. I decided to let
GetLockableRelations_walker() create a PartitionPruneState and pass
that to ExecFindInitialMatchingSubPlans() that is now left more or
less as is. Instead, ExecCreatePartitionPruneState() is changed to be
callable from outside the executor.
The temporary EState is no longer necessary. ExprContext,
PartitionDirectory, etc. are now managed in the caller,
GetLockableRelations_walker().
--
+ while ((i = bms_next_member(validsubplans, i)) >= 0) + { + Plan *subplan = list_nth(subplans, i); + + context->relations = + bms_add_members(context->relations, + get_plan_scanrelids(subplan)); + }I think instead of get_plan_scanrelids() the
GetLockableRelations_worker() can be used; if so, then no need to add
get_plan_scanrelids() function.
You're right, done.
--
/* Nodes containing prunable subnodes. */ + case T_MergeAppend: + { + PlannedStmt *plannedstmt = context->plannedstmt; + List *rtable = plannedstmt->rtable; + ParamListInfo params = context->params; + PartitionPruneInfo *pruneinfo; + Bitmapset *validsubplans; + Bitmapset *parentrelids;...
if (pruneinfo && pruneinfo->contains_init_steps)
{
int i;
...
return false;
}
}
break;Most of the declarations need to be moved inside the if-block.
Done.
Also, initially, I was a bit concerned regarding this code block
inside GetLockableRelations_worker(), what if (pruneinfo &&
pruneinfo->contains_init_steps) evaluated to false? After debugging I
realized that plan_tree_walker() will do the needful -- a bit of
comment would have helped.
You're right. Added a dummy else {} block with just the comment saying so.
+ case T_CustomScan: + foreach(lc, ((CustomScan *) plan)->custom_plans) + { + if (walker((Plan *) lfirst(lc), context)) + return true; + } + break;Why not plan_walk_members() call like other nodes?
Makes sense, done.
Again, most/all of this patch might need to be thrown away, but here
it is anyway.
--
Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v2-0001-Teach-AcquireExecutorLocks-to-acquire-fewer-locks.patchapplication/octet-stream; name=v2-0001-Teach-AcquireExecutorLocks-to-acquire-fewer-locks.patchDownload
From 5a900d2415bc17ca10607140c6faf502bf7b803c Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v2] Teach AcquireExecutorLocks() to acquire fewer locks in
some cases
Currently, AcquireExecutorLocks() loops over the range table of a
given PlannedStmt and locks all relations found therein, even those
that won't actually be scanned during execution due to being
eliminated by "initial" pruning that is applied during the
initialization of their owning Append or MergeAppend node. This makes
AcquireExecutorLocks() itself do the "initial" pruning on nodes that
support it and lock only those relations that are contained in the
subnodes that survive the pruning.
To that end, AcquireExecutorLocks() now loops over a bitmapset of
RT indexes, those of the RTEs of "lockable" relations, instead of
the whole range table to find such entries. When pruning is possible,
the bitmapset is constructed by walking the plan tree to locate
nodes that allow "initial" (or "pre-execution") pruning and
disregarding relations from subnodes that don't survive the pruning
instructions.
PlannedStmt gets a bitmapset field to store the RT indexes of
lockable relations that is populated when contructing the flat range
table in setrefs.c. It is used as is in the absence of any prunable
nodes.
PlannedStmt also gets a new field that indicates whether any of the
nodes of the plan tree contain "initial" (or "pre-execution") pruning
steps, which saves the trouble of walking the plan tree only to find
whether that's the case.
ExecFindInitialMatchingSubPlans() is refactored to allow being
called outside a full-fledged executor context.
---
src/backend/executor/execParallel.c | 2 +
src/backend/executor/execPartition.c | 159 ++++++++++++-----
src/backend/executor/nodeAppend.c | 14 +-
src/backend/executor/nodeMergeAppend.c | 15 +-
src/backend/nodes/copyfuncs.c | 3 +
src/backend/nodes/nodeFuncs.c | 118 +++++++++++++
src/backend/nodes/outfuncs.c | 4 +
src/backend/nodes/readfuncs.c | 3 +
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 10 ++
src/backend/partitioning/partprune.c | 46 +++--
src/backend/utils/cache/plancache.c | 229 +++++++++++++++++++++++--
src/include/executor/execPartition.h | 9 +-
src/include/nodes/nodeFuncs.h | 3 +
src/include/nodes/pathnodes.h | 6 +
src/include/nodes/plannodes.h | 11 ++
src/include/partitioning/partprune.h | 2 +
17 files changed, 563 insertions(+), 73 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 5dd8ab7db2..a2979d7602 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -182,8 +182,10 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->usesPreExecPruning = false;
pstmt->planTree = plan;
pstmt->rtable = estate->es_range_table;
+ pstmt->relationRTIs = NULL;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 90ed1485d1..1a0a5814e4 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -24,6 +24,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -186,7 +187,8 @@ static void ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate);
+ PlanState *planstate,
+ ExprContext *econtext);
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
@@ -1514,7 +1516,9 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* Build the data structure required for calling
* ExecFindInitialMatchingSubPlans and ExecFindMatchingSubPlans.
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called from outside of the executor, in which case
+ * 'rtable', 'econtext', and 'partdir' must have been provided.
*
* 'partitionpruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1529,18 +1533,19 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
*/
PartitionPruneState *
ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo)
+ PartitionPruneInfo *partitionpruneinfo,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert(partdir != NULL && econtext != NULL &&
+ (estate != NULL || rtable != NIL));
n_part_hierarchies = list_length(partitionpruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1591,19 +1596,34 @@ ExecCreatePartitionPruneState(PlanState *planstate,
PartitionedRelPruneInfo *pinfo = lfirst_node(PartitionedRelPruneInfo, lc2);
PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
Relation partrel;
+ bool close_partrel = false;
PartitionDesc partdesc;
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+
+ partrel = table_open(rte->relid, rte->rellockmode);
+ close_partrel = true;
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /* Safe to close partrel, if necessary, keeping the lock taken. */
+ if (close_partrel)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1709,26 +1729,31 @@ ExecCreatePartitionPruneState(PlanState *planstate,
{
ExecInitPruningContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether initial pruning is needed at any level */
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps)
{
- ExecInitPruningContext(&pprune->exec_context,
- pinfo->exec_pruning_steps,
- partdesc, partkey, planstate);
- /* Record whether exec pruning is needed at any level */
- prunestate->do_exec_prune = true;
- }
+ if (pinfo->exec_pruning_steps)
+ {
+ ExecInitPruningContext(&pprune->exec_context,
+ pinfo->exec_pruning_steps,
+ partdesc, partkey, planstate,
+ econtext);
+ /* Record whether exec pruning is needed at any level */
+ prunestate->do_exec_prune = true;
+ }
- /*
- * Accumulate the IDs of all PARAM_EXEC Params affecting the
- * partitioning decisions at this plan node.
- */
- prunestate->execparamids = bms_add_members(prunestate->execparamids,
- pinfo->execparamids);
+ /*
+ * Accumulate the IDs of all PARAM_EXEC Params affecting the
+ * partitioning decisions at this plan node.
+ */
+ prunestate->execparamids = bms_add_members(prunestate->execparamids,
+ pinfo->execparamids);
+ }
j++;
}
@@ -1740,13 +1765,18 @@ ExecCreatePartitionPruneState(PlanState *planstate,
/*
* Initialize a PartitionPruneContext for the given list of pruning steps.
+ *
+ * At least one of 'planstate' or 'econtext' must be passed to be able to
+ * successfully evaluate any non-Const expressions contained in the
+ * steps.
*/
static void
ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate)
+ PlanState *planstate,
+ ExprContext *econtext)
{
int n_steps;
int partnatts;
@@ -1767,6 +1797,7 @@ ExecInitPruningContext(PartitionPruneContext *context,
context->ppccontext = CurrentMemoryContext;
context->planstate = planstate;
+ context->exprcontext = econtext;
/* Initialize expression state for each expression we need */
context->exprstates = (ExprState **)
@@ -1795,8 +1826,13 @@ ExecInitPruningContext(PartitionPruneContext *context,
step->step.step_id,
keyno);
- context->exprstates[stateidx] =
- ExecInitExpr(expr, context->planstate);
+ if (planstate == NULL)
+ context->exprstates[stateidx] =
+ ExecInitExprWithParams(expr,
+ econtext->ecxt_param_list_info);
+ else
+ context->exprstates[stateidx] =
+ ExecInitExpr(expr, context->planstate);
}
keyno++;
}
@@ -1818,9 +1854,15 @@ ExecInitPruningContext(PartitionPruneContext *context,
* is required.
*
* 'nsubplans' must be passed as the total number of unpruned subplans.
+ *
+ * The RT indexes of unpruned parents are returned in *parentrelids if asked
+ * for by the caller, in which case 'pruneinfo' must also be passed because
+ * that is where the RT indexes are to be found.
*/
Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
+ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **parentrelids)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -1830,11 +1872,14 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
Assert(prunestate->do_initial_prune);
/*
- * Switch to a temp context to avoid leaking memory in the executor's
- * query-lifespan memory context.
+ * Switch to a temp context to avoid leaking memory in the longer-term
+ * memory context.
*/
oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
+ if (parentrelids)
+ *parentrelids = NULL;
+
/*
* For each hierarchy, do the pruning tests, and add nondeletable
* subplans' indexes to "result".
@@ -1845,14 +1890,42 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
PartitionedRelPruningData *pprune;
prunedata = prunestate->partprunedata[i];
+
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
pprune = &prunedata->partrelprunedata[0];
/* Perform pruning without using PARAM_EXEC Params */
find_matching_subplans_recurse(prunedata, pprune, true, &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /*
+ * Collect the RT indexes of surviving parents if the callers asked
+ * to see them.
+ */
+ if (parentrelids)
+ {
+ int j;
+ List *partrelpruneinfos = list_nth_node(List,
+ pruneinfo->prune_infos,
+ i);
+
+ for (j = 0; j < prunedata->num_partrelprunedata; j++)
+ {
+ PartitionedRelPruneInfo *pinfo = list_nth_node(PartitionedRelPruneInfo,
+ partrelpruneinfos, j);
+
+ pprune = &prunedata->partrelprunedata[j];
+ if (!bms_is_empty(pprune->present_parts))
+ *parentrelids = bms_add_member(*parentrelids, pinfo->rtindex);
+ }
+ }
+
+ /* Expression eval may have used space in ExprContext too */
if (pprune->initial_pruning_steps)
- ResetExprContext(pprune->initial_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->initial_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
@@ -1862,9 +1935,12 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (parentrelids)
+ *parentrelids = bms_copy(*parentrelids);
MemoryContextReset(prunestate->prune_context);
+
/*
* If exec-time pruning is required and we pruned subplans above, then we
* must re-sequence the subplan indexes so that ExecFindMatchingSubPlans
@@ -2018,11 +2094,16 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
prunedata = prunestate->partprunedata[i];
pprune = &prunedata->partrelprunedata[0];
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
find_matching_subplans_recurse(prunedata, pprune, false, &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
- ResetExprContext(pprune->exec_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->exec_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 7937f1c88f..51aac946fa 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -62,6 +62,7 @@
#include "executor/execPartition.h"
#include "executor/nodeAppend.h"
#include "miscadmin.h"
+#include "partitioning/partdesc.h"
#include "pgstat.h"
#include "storage/latch.h"
@@ -141,9 +142,16 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, &appendstate->ps);
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
/* Create the working data structure for pruning. */
prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
- node->part_prune_info);
+ node->part_prune_info, true,
+ NIL, appendstate->ps.ps_ExprContext,
+ estate->es_partition_directory);
appendstate->as_prune_state = prunestate;
/* Perform an initial partition prune, if required. */
@@ -151,7 +159,9 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
{
/* Determine which subplans survive initial pruning */
validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->appendplans));
+ list_length(node->appendplans),
+ node->part_prune_info,
+ NULL);
nplans = bms_num_members(validsubplans);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 418f89dea8..7d1185ec9d 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -43,6 +43,7 @@
#include "executor/nodeMergeAppend.h"
#include "lib/binaryheap.h"
#include "miscadmin.h"
+#include "partitioning/partdesc.h"
/*
* We have one slot for each item in the heap array. We use SlotNumber
@@ -89,8 +90,16 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, &mergestate->ps);
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /* Create the working data structure for pruning. */
prunestate = ExecCreatePartitionPruneState(&mergestate->ps,
- node->part_prune_info);
+ node->part_prune_info, true,
+ NIL, mergestate->ps.ps_ExprContext,
+ estate->es_partition_directory);
mergestate->ms_prune_state = prunestate;
/* Perform an initial partition prune, if required. */
@@ -98,7 +107,9 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
{
/* Determine which subplans survive initial pruning */
validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->mergeplans));
+ list_length(node->mergeplans),
+ node->part_prune_info,
+ NULL);
nplans = bms_num_members(validsubplans);
}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 456d563f34..7eb474bc31 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -94,9 +94,11 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(transientPlan);
COPY_SCALAR_FIELD(dependsOnRole);
COPY_SCALAR_FIELD(parallelModeNeeded);
+ COPY_SCALAR_FIELD(usesPreExecPruning);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
COPY_NODE_FIELD(rtable);
+ COPY_BITMAPSET_FIELD(relationRTIs);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
COPY_NODE_FIELD(subplans);
@@ -1278,6 +1280,7 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(contains_init_steps);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index acc17da717..37313fb31e 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -31,6 +31,10 @@ static bool planstate_walk_subplans(List *plans, bool (*walker) (),
void *context);
static bool planstate_walk_members(PlanState **planstates, int nplans,
bool (*walker) (), void *context);
+static bool plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context);
+static bool plan_walk_members(List *plans, bool (*walker) (), void *context);
/*
@@ -4147,3 +4151,117 @@ planstate_walk_members(PlanState **planstates, int nplans,
return false;
}
+
+/*
+ * plan_tree_walker --- walk plantrees
+ *
+ * The walker has already visited the current node, and so we need only
+ * recurse into any sub-nodes it has.
+ */
+bool
+plan_tree_walker(Plan *plan,
+ bool (*walker) (),
+ void *context)
+{
+ ListCell *lc;
+
+ /* Guard against stack overflow due to overly complex plan trees */
+ check_stack_depth();
+
+ /* initPlan-s */
+ if (plan_walk_subplans(plan->initPlan, walker, context))
+ return true;
+
+ /* lefttree */
+ if (outerPlan(plan))
+ {
+ if (walker(outerPlan(plan), context))
+ return true;
+ }
+
+ /* righttree */
+ if (innerPlan(plan))
+ {
+ if (walker(innerPlan(plan), context))
+ return true;
+ }
+
+ /* special child plans */
+ switch (nodeTag(plan))
+ {
+ case T_Append:
+ if (plan_walk_members(((Append *) plan)->appendplans,
+ walker, context))
+ return true;
+ break;
+ case T_MergeAppend:
+ if (plan_walk_members(((MergeAppend *) plan)->mergeplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapAnd:
+ if (plan_walk_members(((BitmapAnd *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapOr:
+ if (plan_walk_members(((BitmapOr *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_CustomScan:
+ if (plan_walk_members(((CustomScan *) plan)->custom_plans,
+ walker, context))
+ return true;
+ break;
+ case T_SubqueryScan:
+ if (walker(((SubqueryScan *) plan)->subplan, context))
+ return true;
+ break;
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * Walk a list of SubPlans (or initPlans, which also use SubPlan nodes).
+ */
+static bool
+plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context)
+{
+ ListCell *lc;
+ PlannedStmt *plannedstmt = (PlannedStmt *) context;
+
+ foreach(lc, plans)
+ {
+ SubPlan *sp = lfirst_node(SubPlan, lc);
+ Plan *p = list_nth(plannedstmt->subplans, sp->plan_id - 1);
+
+ if (walker(p, context))
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Walk the constituent plans of a ModifyTable, Append, MergeAppend,
+ * BitmapAnd, or BitmapOr node.
+ */
+static bool
+plan_walk_members(List *plans, bool (*walker) (), void *context)
+{
+ ListCell *lc;
+
+ foreach(lc, plans)
+ {
+ if (walker(lfirst(lc), context))
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index c0bf27d28b..e23d6b2c15 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -312,9 +312,11 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(transientPlan);
WRITE_BOOL_FIELD(dependsOnRole);
WRITE_BOOL_FIELD(parallelModeNeeded);
+ WRITE_BOOL_FIELD(usesPreExecPruning);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
WRITE_NODE_FIELD(rtable);
+ WRITE_BITMAPSET_FIELD(relationRTIs);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
WRITE_NODE_FIELD(subplans);
@@ -1004,6 +1006,7 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(contains_init_steps);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -2274,6 +2277,7 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_NODE_FIELD(subplans);
WRITE_BITMAPSET_FIELD(rewindPlanIDs);
WRITE_NODE_FIELD(finalrtable);
+ WRITE_BITMAPSET_FIELD(relationRTIs);
WRITE_NODE_FIELD(finalrowmarks);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 3f68f7c18d..8b3caeef03 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1585,9 +1585,11 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(transientPlan);
READ_BOOL_FIELD(dependsOnRole);
READ_BOOL_FIELD(parallelModeNeeded);
+ READ_BOOL_FIELD(usesPreExecPruning);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
READ_NODE_FIELD(rtable);
+ READ_BITMAPSET_FIELD(relationRTIs);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
READ_NODE_FIELD(subplans);
@@ -2534,6 +2536,7 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(contains_init_steps);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index bd09f85aea..3f35f8f892 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -517,8 +517,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->transientPlan = glob->transientPlan;
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
+ result->usesPreExecPruning = glob->usesPreExecPruning;
result->planTree = top_plan;
result->rtable = glob->finalrtable;
+ result->relationRTIs = glob->relationRTIs;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index e44ae971b4..d34a7eb621 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -483,6 +483,7 @@ static void
add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte)
{
RangeTblEntry *newrte;
+ Index rti = list_length(glob->finalrtable) + 1;
/* flat copy to duplicate all the scalar fields */
newrte = (RangeTblEntry *) palloc(sizeof(RangeTblEntry));
@@ -517,7 +518,10 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte)
* but it would probably cost more cycles than it would save.
*/
if (newrte->rtekind == RTE_RELATION)
+ {
+ glob->relationRTIs = bms_add_member(glob->relationRTIs, rti);
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
+ }
}
/*
@@ -1540,6 +1544,9 @@ set_append_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (aplan->part_prune_info->contains_init_steps)
+ root->glob->usesPreExecPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
@@ -1604,6 +1611,9 @@ set_mergeappend_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (mplan->part_prune_info->contains_init_steps)
+ root->glob->usesPreExecPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 1bc00826c1..3e3c6c78df 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,8 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *contains_init_steps);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -230,6 +231,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool contains_init_steps = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -309,12 +311,14 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_contains_init_steps;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_contains_init_steps);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -323,6 +327,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+ if (!contains_init_steps)
+ contains_init_steps = partrel_contains_init_steps;
}
pfree(relid_subplan_map);
@@ -337,6 +343,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->contains_init_steps = contains_init_steps;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -435,13 +442,17 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *contains_init_steps and are set to indicate that the returned
+ * PartitionedRelPruneInfos contains pruning steps that can be performed
+ * before execution begins.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *contains_init_steps)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -452,6 +463,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *contains_init_steps = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -539,6 +553,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * by noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -613,6 +630,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ if (!*contains_init_steps)
+ *contains_init_steps = (initial_pruning_steps != NIL);
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -798,6 +818,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
/* These are not valid when being called from the planner */
context.planstate = NULL;
+ context.exprcontext = NULL;
context.exprstates = NULL;
/* Actual pruning happens here. */
@@ -808,8 +829,8 @@ prune_append_rel_partitions(RelOptInfo *rel)
* get_matching_partitions
* Determine partitions that survive partition pruning
*
- * Note: context->planstate must be set to a valid PlanState when the
- * pruning_steps were generated with a target other than PARTTARGET_PLANNER.
+ * Note: context->exprcontext must be valid when the pruning_steps were
+ * generated with a target other than PARTTARGET_PLANNER.
*
* Returns a Bitmapset of the RelOptInfo->part_rels indexes of the surviving
* partitions.
@@ -3654,7 +3675,7 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
* exprstate array.
*
* Note that the evaluated result may be in the per-tuple memory context of
- * context->planstate->ps_ExprContext, and we may have leaked other memory
+ * context->exprcontext, and we may have leaked other memory
* there too. This memory must be recovered by resetting that ExprContext
* after we're done with the pruning operation (see execPartition.c).
*/
@@ -3677,13 +3698,18 @@ partkey_datum_from_expr(PartitionPruneContext *context,
ExprContext *ectx;
/*
- * We should never see a non-Const in a step unless we're running in
- * the executor.
+ * We should never see a non-Const in a step unless the caller has
+ * passed a valid ExprContext.
+ *
+ * When context->planstate is valid, context->exprcontext is same
+ * as context->planstate->ps_ExprContext.
*/
- Assert(context->planstate != NULL);
+ Assert(context->planstate != NULL || context->exprcontext != NULL);
+ Assert(context->planstate == NULL ||
+ (context->exprcontext == context->planstate->ps_ExprContext));
exprstate = context->exprstates[stateidx];
- ectx = context->planstate->ps_ExprContext;
+ ectx = context->exprcontext;
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 4a9055e6bb..6c4c6f0d95 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -58,12 +58,14 @@
#include "access/transam.h"
#include "catalog/namespace.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/optimizer.h"
#include "parser/analyze.h"
#include "parser/parsetree.h"
+#include "partitioning/partdesc.h"
#include "storage/lmgr.h"
#include "tcop/pquery.h"
#include "tcop/utility.h"
@@ -99,14 +101,25 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, bool acquire,
+ ParamListInfo boundParams);
+struct GetLockableRelations_context
+{
+ PlannedStmt *plannedstmt;
+ Bitmapset *relations;
+ ParamListInfo params;
+};
+static Bitmapset *GetLockableRelations(PlannedStmt *plannedstmt,
+ ParamListInfo boundParams);
+static bool GetLockableRelations_worker(Plan *plan,
+ struct GetLockableRelations_context *context);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -792,7 +805,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
CachedPlan *plan = plansource->gplan;
@@ -826,7 +839,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ AcquireExecutorLocks(plan->stmt_list, true, boundParams);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +861,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ AcquireExecutorLocks(plan->stmt_list, false, boundParams);
}
/*
@@ -1160,7 +1173,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1366,7 +1379,6 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
foreach(lc, plan->stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc);
- ListCell *lc2;
if (plannedstmt->commandType == CMD_UTILITY)
return false;
@@ -1375,13 +1387,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
* We have to grovel through the rtable because it's likely to contain
* an RTE_RESULT relation, rather than being totally empty.
*/
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (rte->rtekind == RTE_RELATION)
- return false;
- }
+ if (!bms_is_empty(plannedstmt->relationRTIs))
+ return false;
}
/*
@@ -1740,14 +1747,15 @@ QueryListGetPrimaryStmt(List *stmts)
* or release them if acquire is false.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, bool acquire, ParamListInfo boundParams)
{
ListCell *lc1;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ Bitmapset *relations;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1765,9 +1773,22 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Fetch the RT indexes of only the relations that will be actually
+ * scanned when the plan is executed. This skips over scan nodes
+ * appearing as child subnodes of any Append/MergeAppend nodes present
+ * in the plan tree. It does so by performing
+ * ExecFindInitialMatchingSubPlans() to run any pruning steps
+ * contained in those nodes that can be safely run at this point, using
+ * 'boundParams' to evaluate any EXTERN parameters contained in the
+ * steps.
+ */
+ relations = GetLockableRelations(plannedstmt, boundParams);
+
+ rti = -1;
+ while ((rti = bms_next_member(relations, rti)) >= 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1786,6 +1807,178 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * GetLockableRelations
+ * Returns set of RT indexes of relations that must be locked by
+ * AcquireExecutorLocks()
+ */
+static Bitmapset *
+GetLockableRelations(PlannedStmt *plannedstmt, ParamListInfo boundParams)
+{
+ ListCell *lc;
+ struct GetLockableRelations_context context;
+
+ /* None of the relation scanning nodes are prunable here. */
+ if (!plannedstmt->usesPreExecPruning)
+ return plannedstmt->relationRTIs;
+
+ /*
+ * Look for prunable nodes in the main plan tree, followed by those in
+ * subplans.
+ */
+ context.plannedstmt = plannedstmt;
+ context.params = boundParams;
+ context.relations = NULL;
+
+ (void) GetLockableRelations_worker(plannedstmt->planTree, &context);
+
+ foreach(lc, plannedstmt->subplans)
+ {
+ Plan *subplan = lfirst(lc);
+
+ (void) GetLockableRelations_worker(subplan, &context);
+ }
+
+ return context.relations;
+}
+
+/*
+ * GetLockableRelations_worker
+ * Adds RT indexes of relations to be scanned by plan to
+ * context->relations
+ *
+ * For plan node types that support pruning, this only adds child plan
+ * subnodes that satisfy the "initial" pruning steps.
+ */
+static bool
+GetLockableRelations_worker(Plan *plan,
+ struct GetLockableRelations_context *context)
+{
+ if (plan == NULL)
+ return false;
+
+ switch(nodeTag(plan))
+ {
+ /* Nodes scanning a relation or relations. */
+ case T_SeqScan:
+ case T_SampleScan:
+ case T_IndexScan:
+ case T_IndexOnlyScan:
+ case T_BitmapHeapScan:
+ case T_TidScan:
+ case T_TidRangeScan:
+ context->relations = bms_add_member(context->relations,
+ ((Scan *) plan)->scanrelid);
+ return false;
+ case T_ForeignScan:
+ context->relations = bms_add_members(context->relations,
+ ((ForeignScan *) plan)->fs_relids);
+ return false;
+ case T_CustomScan:
+ context->relations = bms_add_members(context->relations,
+ ((CustomScan *) plan)->custom_relids);
+ return false;
+
+ /* Nodes containing prunable subnodes. */
+ case T_Append:
+ case T_MergeAppend:
+ {
+ PartitionPruneInfo *pruneinfo;
+
+ if (IsA(plan, Append))
+ pruneinfo = ((Append *) plan)->part_prune_info;
+ else
+ pruneinfo = ((MergeAppend *) plan)->part_prune_info;
+
+ if (pruneinfo && pruneinfo->contains_init_steps)
+ {
+ List *rtable = context->plannedstmt->rtable;
+ ParamListInfo params = context->params;
+ List *subplans;
+ Bitmapset *validsubplans;
+ Bitmapset *parentrelids;
+ int i;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+
+ if (IsA(plan, Append))
+ subplans = ((Append *) plan)->appendplans;
+ else
+ subplans = ((MergeAppend *) plan)->mergeplans;
+
+ /*
+ * A temporary context to allocate stuff needded to run
+ * the pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /* An ExprContext to evaluate expressions. */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+
+ /*
+ * PartitionDirectory, to look up partition descriptors
+ * Omits detached partitions, just like in the executor
+ * proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+ prunestate = ExecCreatePartitionPruneState(NULL,
+ pruneinfo, false,
+ rtable, econtext,
+ pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the "initial" pruning. */
+ validsubplans =
+ ExecFindInitialMatchingSubPlans(prunestate,
+ list_length(subplans),
+ pruneinfo,
+ &parentrelids);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ /* All relevant parents must be locked. */
+ Assert(bms_num_members(parentrelids) > 0);
+ context->relations = bms_add_members(context->relations,
+ parentrelids);
+
+ /* And all leaf partitions that will be scanned. */
+ i = -1;
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ GetLockableRelations_worker(subplan, context);
+ }
+
+ return false;
+ }
+ else
+ {
+ /*
+ * plan_tree_walker() will take care of walking *all* of
+ * the node's child subplans to collect their relids.
+ */
+ }
+ }
+ break;
+
+ default:
+ break;
+ }
+
+ return plan_tree_walker(plan, GetLockableRelations_worker,
+ (void *) context);
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 603d8becc4..7b77c8d20e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -120,9 +120,14 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
PartitionTupleRouting *proute);
extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
+ PartitionPruneInfo *partitionpruneinfo,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
- int nsubplans);
+ int nsubplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **parentrelids);
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/nodeFuncs.h b/src/include/nodes/nodeFuncs.h
index 93c60bde66..fca107ad65 100644
--- a/src/include/nodes/nodeFuncs.h
+++ b/src/include/nodes/nodeFuncs.h
@@ -158,5 +158,8 @@ extern bool raw_expression_tree_walker(Node *node, bool (*walker) (),
struct PlanState;
extern bool planstate_tree_walker(struct PlanState *planstate, bool (*walker) (),
void *context);
+struct Plan;
+extern bool plan_tree_walker(struct Plan *plan, bool (*walker) (),
+ void *context);
#endif /* NODEFUNCS_H */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1f33fe13c1..c1a38bfbdc 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -101,6 +101,9 @@ typedef struct PlannerGlobal
List *finalrtable; /* "flat" rangetable for executor */
+ Bitmapset *relationRTIs; /* Indexes of RTE_RELATION entries in range
+ * table */
+
List *finalrowmarks; /* "flat" list of PlanRowMarks */
List *resultRelations; /* "flat" list of integer RT indexes */
@@ -129,6 +132,9 @@ typedef struct PlannerGlobal
char maxParallelHazard; /* worst PROPARALLEL hazard level */
+ bool usesPreExecPruning; /* Do some Plan nodes use pre-execution
+ * partition pruning */
+
PartitionDirectory partition_directory; /* partition descriptors */
} PlannerGlobal;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 0b518ce6b2..bdb72f7cbf 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -59,12 +59,18 @@ typedef struct PlannedStmt
bool parallelModeNeeded; /* parallel mode required to execute? */
+ bool usesPreExecPruning; /* Do some Plan nodes use pre-execution
+ * partition pruning */
+
int jitFlags; /* which forms of JIT should be performed */
struct Plan *planTree; /* tree of Plan nodes */
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *relationRTIs; /* Indexes of RTE_RELATION entries in range
+ * table */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1172,6 +1178,10 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * contains_init_steps Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1180,6 +1190,7 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ bool contains_init_steps;
Bitmapset *other_subplans;
} PartitionPruneInfo;
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index ee11b6feae..90684efa25 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -41,6 +41,7 @@ struct RelOptInfo;
* subsidiary data, such as the FmgrInfos.
* planstate Points to the parent plan node's PlanState when called
* during execution; NULL when called from the planner.
+ * exprcontext ExprContext to use when evaluating pruning expressions
* exprstates Array of ExprStates, indexed as per PruneCxtStateIdx; one
* for each partition key in each pruning step. Allocated if
* planstate is non-NULL, otherwise NULL.
@@ -56,6 +57,7 @@ typedef struct PartitionPruneContext
FmgrInfo *stepcmpfuncs;
MemoryContext ppccontext;
PlanState *planstate;
+ ExprContext *exprcontext;
ExprState **exprstates;
} PartitionPruneContext;
--
2.24.1
On Fri, Jan 14, 2022 at 11:10 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Jan 6, 2022 at 3:45 PM Amul Sul <sulamul@gmail.com> wrote:
Here are few comments for v1 patch:
Thanks Amul. I'm thinking about Robert's latest comments addressing
which may need some rethinking of this whole design, but I decided to
post a v2 taking care of your comments.
cfbot tells me there is an unused variable warning, which is fixed in
the attached v3.
--
Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v3-0001-Teach-AcquireExecutorLocks-to-acquire-fewer-locks.patchapplication/octet-stream; name=v3-0001-Teach-AcquireExecutorLocks-to-acquire-fewer-locks.patchDownload
From cb413a3129be3f8be32bbb93f592186bceb416d1 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v3] Teach AcquireExecutorLocks() to acquire fewer locks in
some cases
Currently, AcquireExecutorLocks() loops over the range table of a
given PlannedStmt and locks all relations found therein, even those
that won't actually be scanned during execution due to being
eliminated by "initial" pruning that is applied during the
initialization of their owning Append or MergeAppend node. This makes
AcquireExecutorLocks() itself do the "initial" pruning on nodes that
support it and lock only those relations that are contained in the
subnodes that survive the pruning.
To that end, AcquireExecutorLocks() now loops over a bitmapset of
RT indexes, those of the RTEs of "lockable" relations, instead of
the whole range table to find such entries. When pruning is possible,
the bitmapset is constructed by walking the plan tree to locate
nodes that allow "initial" (or "pre-execution") pruning and
disregarding relations from subnodes that don't survive the pruning
instructions.
PlannedStmt gets a bitmapset field to store the RT indexes of
lockable relations that is populated when contructing the flat range
table in setrefs.c. It is used as is in the absence of any prunable
nodes.
PlannedStmt also gets a new field that indicates whether any of the
nodes of the plan tree contain "initial" (or "pre-execution") pruning
steps, which saves the trouble of walking the plan tree only to find
whether that's the case.
ExecFindInitialMatchingSubPlans() is refactored to allow being
called outside a full-fledged executor context.
---
src/backend/executor/execParallel.c | 2 +
src/backend/executor/execPartition.c | 159 ++++++++++++-----
src/backend/executor/nodeAppend.c | 14 +-
src/backend/executor/nodeMergeAppend.c | 15 +-
src/backend/nodes/copyfuncs.c | 3 +
src/backend/nodes/nodeFuncs.c | 116 +++++++++++++
src/backend/nodes/outfuncs.c | 4 +
src/backend/nodes/readfuncs.c | 3 +
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 10 ++
src/backend/partitioning/partprune.c | 46 +++--
src/backend/utils/cache/plancache.c | 229 +++++++++++++++++++++++--
src/include/executor/execPartition.h | 9 +-
src/include/nodes/nodeFuncs.h | 3 +
src/include/nodes/pathnodes.h | 6 +
src/include/nodes/plannodes.h | 11 ++
src/include/partitioning/partprune.h | 2 +
17 files changed, 561 insertions(+), 73 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 5dd8ab7db2..a2979d7602 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -182,8 +182,10 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->usesPreExecPruning = false;
pstmt->planTree = plan;
pstmt->rtable = estate->es_range_table;
+ pstmt->relationRTIs = NULL;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 90ed1485d1..1a0a5814e4 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -24,6 +24,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -186,7 +187,8 @@ static void ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate);
+ PlanState *planstate,
+ ExprContext *econtext);
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
@@ -1514,7 +1516,9 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* Build the data structure required for calling
* ExecFindInitialMatchingSubPlans and ExecFindMatchingSubPlans.
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called from outside of the executor, in which case
+ * 'rtable', 'econtext', and 'partdir' must have been provided.
*
* 'partitionpruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1529,18 +1533,19 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
*/
PartitionPruneState *
ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo)
+ PartitionPruneInfo *partitionpruneinfo,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert(partdir != NULL && econtext != NULL &&
+ (estate != NULL || rtable != NIL));
n_part_hierarchies = list_length(partitionpruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1591,19 +1596,34 @@ ExecCreatePartitionPruneState(PlanState *planstate,
PartitionedRelPruneInfo *pinfo = lfirst_node(PartitionedRelPruneInfo, lc2);
PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
Relation partrel;
+ bool close_partrel = false;
PartitionDesc partdesc;
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+
+ partrel = table_open(rte->relid, rte->rellockmode);
+ close_partrel = true;
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /* Safe to close partrel, if necessary, keeping the lock taken. */
+ if (close_partrel)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1709,26 +1729,31 @@ ExecCreatePartitionPruneState(PlanState *planstate,
{
ExecInitPruningContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether initial pruning is needed at any level */
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps)
{
- ExecInitPruningContext(&pprune->exec_context,
- pinfo->exec_pruning_steps,
- partdesc, partkey, planstate);
- /* Record whether exec pruning is needed at any level */
- prunestate->do_exec_prune = true;
- }
+ if (pinfo->exec_pruning_steps)
+ {
+ ExecInitPruningContext(&pprune->exec_context,
+ pinfo->exec_pruning_steps,
+ partdesc, partkey, planstate,
+ econtext);
+ /* Record whether exec pruning is needed at any level */
+ prunestate->do_exec_prune = true;
+ }
- /*
- * Accumulate the IDs of all PARAM_EXEC Params affecting the
- * partitioning decisions at this plan node.
- */
- prunestate->execparamids = bms_add_members(prunestate->execparamids,
- pinfo->execparamids);
+ /*
+ * Accumulate the IDs of all PARAM_EXEC Params affecting the
+ * partitioning decisions at this plan node.
+ */
+ prunestate->execparamids = bms_add_members(prunestate->execparamids,
+ pinfo->execparamids);
+ }
j++;
}
@@ -1740,13 +1765,18 @@ ExecCreatePartitionPruneState(PlanState *planstate,
/*
* Initialize a PartitionPruneContext for the given list of pruning steps.
+ *
+ * At least one of 'planstate' or 'econtext' must be passed to be able to
+ * successfully evaluate any non-Const expressions contained in the
+ * steps.
*/
static void
ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate)
+ PlanState *planstate,
+ ExprContext *econtext)
{
int n_steps;
int partnatts;
@@ -1767,6 +1797,7 @@ ExecInitPruningContext(PartitionPruneContext *context,
context->ppccontext = CurrentMemoryContext;
context->planstate = planstate;
+ context->exprcontext = econtext;
/* Initialize expression state for each expression we need */
context->exprstates = (ExprState **)
@@ -1795,8 +1826,13 @@ ExecInitPruningContext(PartitionPruneContext *context,
step->step.step_id,
keyno);
- context->exprstates[stateidx] =
- ExecInitExpr(expr, context->planstate);
+ if (planstate == NULL)
+ context->exprstates[stateidx] =
+ ExecInitExprWithParams(expr,
+ econtext->ecxt_param_list_info);
+ else
+ context->exprstates[stateidx] =
+ ExecInitExpr(expr, context->planstate);
}
keyno++;
}
@@ -1818,9 +1854,15 @@ ExecInitPruningContext(PartitionPruneContext *context,
* is required.
*
* 'nsubplans' must be passed as the total number of unpruned subplans.
+ *
+ * The RT indexes of unpruned parents are returned in *parentrelids if asked
+ * for by the caller, in which case 'pruneinfo' must also be passed because
+ * that is where the RT indexes are to be found.
*/
Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
+ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **parentrelids)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -1830,11 +1872,14 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
Assert(prunestate->do_initial_prune);
/*
- * Switch to a temp context to avoid leaking memory in the executor's
- * query-lifespan memory context.
+ * Switch to a temp context to avoid leaking memory in the longer-term
+ * memory context.
*/
oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
+ if (parentrelids)
+ *parentrelids = NULL;
+
/*
* For each hierarchy, do the pruning tests, and add nondeletable
* subplans' indexes to "result".
@@ -1845,14 +1890,42 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
PartitionedRelPruningData *pprune;
prunedata = prunestate->partprunedata[i];
+
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
pprune = &prunedata->partrelprunedata[0];
/* Perform pruning without using PARAM_EXEC Params */
find_matching_subplans_recurse(prunedata, pprune, true, &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /*
+ * Collect the RT indexes of surviving parents if the callers asked
+ * to see them.
+ */
+ if (parentrelids)
+ {
+ int j;
+ List *partrelpruneinfos = list_nth_node(List,
+ pruneinfo->prune_infos,
+ i);
+
+ for (j = 0; j < prunedata->num_partrelprunedata; j++)
+ {
+ PartitionedRelPruneInfo *pinfo = list_nth_node(PartitionedRelPruneInfo,
+ partrelpruneinfos, j);
+
+ pprune = &prunedata->partrelprunedata[j];
+ if (!bms_is_empty(pprune->present_parts))
+ *parentrelids = bms_add_member(*parentrelids, pinfo->rtindex);
+ }
+ }
+
+ /* Expression eval may have used space in ExprContext too */
if (pprune->initial_pruning_steps)
- ResetExprContext(pprune->initial_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->initial_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
@@ -1862,9 +1935,12 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (parentrelids)
+ *parentrelids = bms_copy(*parentrelids);
MemoryContextReset(prunestate->prune_context);
+
/*
* If exec-time pruning is required and we pruned subplans above, then we
* must re-sequence the subplan indexes so that ExecFindMatchingSubPlans
@@ -2018,11 +2094,16 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
prunedata = prunestate->partprunedata[i];
pprune = &prunedata->partrelprunedata[0];
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
find_matching_subplans_recurse(prunedata, pprune, false, &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
- ResetExprContext(pprune->exec_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->exec_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 7937f1c88f..51aac946fa 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -62,6 +62,7 @@
#include "executor/execPartition.h"
#include "executor/nodeAppend.h"
#include "miscadmin.h"
+#include "partitioning/partdesc.h"
#include "pgstat.h"
#include "storage/latch.h"
@@ -141,9 +142,16 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, &appendstate->ps);
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
/* Create the working data structure for pruning. */
prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
- node->part_prune_info);
+ node->part_prune_info, true,
+ NIL, appendstate->ps.ps_ExprContext,
+ estate->es_partition_directory);
appendstate->as_prune_state = prunestate;
/* Perform an initial partition prune, if required. */
@@ -151,7 +159,9 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
{
/* Determine which subplans survive initial pruning */
validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->appendplans));
+ list_length(node->appendplans),
+ node->part_prune_info,
+ NULL);
nplans = bms_num_members(validsubplans);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 418f89dea8..7d1185ec9d 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -43,6 +43,7 @@
#include "executor/nodeMergeAppend.h"
#include "lib/binaryheap.h"
#include "miscadmin.h"
+#include "partitioning/partdesc.h"
/*
* We have one slot for each item in the heap array. We use SlotNumber
@@ -89,8 +90,16 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, &mergestate->ps);
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /* Create the working data structure for pruning. */
prunestate = ExecCreatePartitionPruneState(&mergestate->ps,
- node->part_prune_info);
+ node->part_prune_info, true,
+ NIL, mergestate->ps.ps_ExprContext,
+ estate->es_partition_directory);
mergestate->ms_prune_state = prunestate;
/* Perform an initial partition prune, if required. */
@@ -98,7 +107,9 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
{
/* Determine which subplans survive initial pruning */
validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->mergeplans));
+ list_length(node->mergeplans),
+ node->part_prune_info,
+ NULL);
nplans = bms_num_members(validsubplans);
}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index b105c26381..4b539f792b 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -94,9 +94,11 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(transientPlan);
COPY_SCALAR_FIELD(dependsOnRole);
COPY_SCALAR_FIELD(parallelModeNeeded);
+ COPY_SCALAR_FIELD(usesPreExecPruning);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
COPY_NODE_FIELD(rtable);
+ COPY_BITMAPSET_FIELD(relationRTIs);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
COPY_NODE_FIELD(subplans);
@@ -1278,6 +1280,7 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(contains_init_steps);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index acc17da717..d2de60711d 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -31,6 +31,10 @@ static bool planstate_walk_subplans(List *plans, bool (*walker) (),
void *context);
static bool planstate_walk_members(PlanState **planstates, int nplans,
bool (*walker) (), void *context);
+static bool plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context);
+static bool plan_walk_members(List *plans, bool (*walker) (), void *context);
/*
@@ -4147,3 +4151,115 @@ planstate_walk_members(PlanState **planstates, int nplans,
return false;
}
+
+/*
+ * plan_tree_walker --- walk plantrees
+ *
+ * The walker has already visited the current node, and so we need only
+ * recurse into any sub-nodes it has.
+ */
+bool
+plan_tree_walker(Plan *plan,
+ bool (*walker) (),
+ void *context)
+{
+ /* Guard against stack overflow due to overly complex plan trees */
+ check_stack_depth();
+
+ /* initPlan-s */
+ if (plan_walk_subplans(plan->initPlan, walker, context))
+ return true;
+
+ /* lefttree */
+ if (outerPlan(plan))
+ {
+ if (walker(outerPlan(plan), context))
+ return true;
+ }
+
+ /* righttree */
+ if (innerPlan(plan))
+ {
+ if (walker(innerPlan(plan), context))
+ return true;
+ }
+
+ /* special child plans */
+ switch (nodeTag(plan))
+ {
+ case T_Append:
+ if (plan_walk_members(((Append *) plan)->appendplans,
+ walker, context))
+ return true;
+ break;
+ case T_MergeAppend:
+ if (plan_walk_members(((MergeAppend *) plan)->mergeplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapAnd:
+ if (plan_walk_members(((BitmapAnd *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapOr:
+ if (plan_walk_members(((BitmapOr *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_CustomScan:
+ if (plan_walk_members(((CustomScan *) plan)->custom_plans,
+ walker, context))
+ return true;
+ break;
+ case T_SubqueryScan:
+ if (walker(((SubqueryScan *) plan)->subplan, context))
+ return true;
+ break;
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * Walk a list of SubPlans (or initPlans, which also use SubPlan nodes).
+ */
+static bool
+plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context)
+{
+ ListCell *lc;
+ PlannedStmt *plannedstmt = (PlannedStmt *) context;
+
+ foreach(lc, plans)
+ {
+ SubPlan *sp = lfirst_node(SubPlan, lc);
+ Plan *p = list_nth(plannedstmt->subplans, sp->plan_id - 1);
+
+ if (walker(p, context))
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Walk the constituent plans of a ModifyTable, Append, MergeAppend,
+ * BitmapAnd, or BitmapOr node.
+ */
+static bool
+plan_walk_members(List *plans, bool (*walker) (), void *context)
+{
+ ListCell *lc;
+
+ foreach(lc, plans)
+ {
+ if (walker(lfirst(lc), context))
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index d28cea1567..994971d0cb 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -312,9 +312,11 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(transientPlan);
WRITE_BOOL_FIELD(dependsOnRole);
WRITE_BOOL_FIELD(parallelModeNeeded);
+ WRITE_BOOL_FIELD(usesPreExecPruning);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
WRITE_NODE_FIELD(rtable);
+ WRITE_BITMAPSET_FIELD(relationRTIs);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
WRITE_NODE_FIELD(subplans);
@@ -1004,6 +1006,7 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(contains_init_steps);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -2274,6 +2277,7 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_NODE_FIELD(subplans);
WRITE_BITMAPSET_FIELD(rewindPlanIDs);
WRITE_NODE_FIELD(finalrtable);
+ WRITE_BITMAPSET_FIELD(relationRTIs);
WRITE_NODE_FIELD(finalrowmarks);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 3f68f7c18d..8b3caeef03 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1585,9 +1585,11 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(transientPlan);
READ_BOOL_FIELD(dependsOnRole);
READ_BOOL_FIELD(parallelModeNeeded);
+ READ_BOOL_FIELD(usesPreExecPruning);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
READ_NODE_FIELD(rtable);
+ READ_BITMAPSET_FIELD(relationRTIs);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
READ_NODE_FIELD(subplans);
@@ -2534,6 +2536,7 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(contains_init_steps);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index bd09f85aea..3f35f8f892 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -517,8 +517,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->transientPlan = glob->transientPlan;
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
+ result->usesPreExecPruning = glob->usesPreExecPruning;
result->planTree = top_plan;
result->rtable = glob->finalrtable;
+ result->relationRTIs = glob->relationRTIs;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index e44ae971b4..d34a7eb621 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -483,6 +483,7 @@ static void
add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte)
{
RangeTblEntry *newrte;
+ Index rti = list_length(glob->finalrtable) + 1;
/* flat copy to duplicate all the scalar fields */
newrte = (RangeTblEntry *) palloc(sizeof(RangeTblEntry));
@@ -517,7 +518,10 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte)
* but it would probably cost more cycles than it would save.
*/
if (newrte->rtekind == RTE_RELATION)
+ {
+ glob->relationRTIs = bms_add_member(glob->relationRTIs, rti);
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
+ }
}
/*
@@ -1540,6 +1544,9 @@ set_append_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (aplan->part_prune_info->contains_init_steps)
+ root->glob->usesPreExecPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
@@ -1604,6 +1611,9 @@ set_mergeappend_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (mplan->part_prune_info->contains_init_steps)
+ root->glob->usesPreExecPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 1bc00826c1..3e3c6c78df 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,8 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *contains_init_steps);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -230,6 +231,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool contains_init_steps = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -309,12 +311,14 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_contains_init_steps;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_contains_init_steps);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -323,6 +327,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+ if (!contains_init_steps)
+ contains_init_steps = partrel_contains_init_steps;
}
pfree(relid_subplan_map);
@@ -337,6 +343,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->contains_init_steps = contains_init_steps;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -435,13 +442,17 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *contains_init_steps and are set to indicate that the returned
+ * PartitionedRelPruneInfos contains pruning steps that can be performed
+ * before execution begins.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *contains_init_steps)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -452,6 +463,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *contains_init_steps = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -539,6 +553,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * by noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -613,6 +630,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ if (!*contains_init_steps)
+ *contains_init_steps = (initial_pruning_steps != NIL);
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -798,6 +818,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
/* These are not valid when being called from the planner */
context.planstate = NULL;
+ context.exprcontext = NULL;
context.exprstates = NULL;
/* Actual pruning happens here. */
@@ -808,8 +829,8 @@ prune_append_rel_partitions(RelOptInfo *rel)
* get_matching_partitions
* Determine partitions that survive partition pruning
*
- * Note: context->planstate must be set to a valid PlanState when the
- * pruning_steps were generated with a target other than PARTTARGET_PLANNER.
+ * Note: context->exprcontext must be valid when the pruning_steps were
+ * generated with a target other than PARTTARGET_PLANNER.
*
* Returns a Bitmapset of the RelOptInfo->part_rels indexes of the surviving
* partitions.
@@ -3654,7 +3675,7 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
* exprstate array.
*
* Note that the evaluated result may be in the per-tuple memory context of
- * context->planstate->ps_ExprContext, and we may have leaked other memory
+ * context->exprcontext, and we may have leaked other memory
* there too. This memory must be recovered by resetting that ExprContext
* after we're done with the pruning operation (see execPartition.c).
*/
@@ -3677,13 +3698,18 @@ partkey_datum_from_expr(PartitionPruneContext *context,
ExprContext *ectx;
/*
- * We should never see a non-Const in a step unless we're running in
- * the executor.
+ * We should never see a non-Const in a step unless the caller has
+ * passed a valid ExprContext.
+ *
+ * When context->planstate is valid, context->exprcontext is same
+ * as context->planstate->ps_ExprContext.
*/
- Assert(context->planstate != NULL);
+ Assert(context->planstate != NULL || context->exprcontext != NULL);
+ Assert(context->planstate == NULL ||
+ (context->exprcontext == context->planstate->ps_ExprContext));
exprstate = context->exprstates[stateidx];
- ectx = context->planstate->ps_ExprContext;
+ ectx = context->exprcontext;
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 4a9055e6bb..6c4c6f0d95 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -58,12 +58,14 @@
#include "access/transam.h"
#include "catalog/namespace.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/optimizer.h"
#include "parser/analyze.h"
#include "parser/parsetree.h"
+#include "partitioning/partdesc.h"
#include "storage/lmgr.h"
#include "tcop/pquery.h"
#include "tcop/utility.h"
@@ -99,14 +101,25 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, bool acquire,
+ ParamListInfo boundParams);
+struct GetLockableRelations_context
+{
+ PlannedStmt *plannedstmt;
+ Bitmapset *relations;
+ ParamListInfo params;
+};
+static Bitmapset *GetLockableRelations(PlannedStmt *plannedstmt,
+ ParamListInfo boundParams);
+static bool GetLockableRelations_worker(Plan *plan,
+ struct GetLockableRelations_context *context);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -792,7 +805,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
CachedPlan *plan = plansource->gplan;
@@ -826,7 +839,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ AcquireExecutorLocks(plan->stmt_list, true, boundParams);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +861,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ AcquireExecutorLocks(plan->stmt_list, false, boundParams);
}
/*
@@ -1160,7 +1173,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1366,7 +1379,6 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
foreach(lc, plan->stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc);
- ListCell *lc2;
if (plannedstmt->commandType == CMD_UTILITY)
return false;
@@ -1375,13 +1387,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
* We have to grovel through the rtable because it's likely to contain
* an RTE_RESULT relation, rather than being totally empty.
*/
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (rte->rtekind == RTE_RELATION)
- return false;
- }
+ if (!bms_is_empty(plannedstmt->relationRTIs))
+ return false;
}
/*
@@ -1740,14 +1747,15 @@ QueryListGetPrimaryStmt(List *stmts)
* or release them if acquire is false.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, bool acquire, ParamListInfo boundParams)
{
ListCell *lc1;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ Bitmapset *relations;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1765,9 +1773,22 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Fetch the RT indexes of only the relations that will be actually
+ * scanned when the plan is executed. This skips over scan nodes
+ * appearing as child subnodes of any Append/MergeAppend nodes present
+ * in the plan tree. It does so by performing
+ * ExecFindInitialMatchingSubPlans() to run any pruning steps
+ * contained in those nodes that can be safely run at this point, using
+ * 'boundParams' to evaluate any EXTERN parameters contained in the
+ * steps.
+ */
+ relations = GetLockableRelations(plannedstmt, boundParams);
+
+ rti = -1;
+ while ((rti = bms_next_member(relations, rti)) >= 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1786,6 +1807,178 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * GetLockableRelations
+ * Returns set of RT indexes of relations that must be locked by
+ * AcquireExecutorLocks()
+ */
+static Bitmapset *
+GetLockableRelations(PlannedStmt *plannedstmt, ParamListInfo boundParams)
+{
+ ListCell *lc;
+ struct GetLockableRelations_context context;
+
+ /* None of the relation scanning nodes are prunable here. */
+ if (!plannedstmt->usesPreExecPruning)
+ return plannedstmt->relationRTIs;
+
+ /*
+ * Look for prunable nodes in the main plan tree, followed by those in
+ * subplans.
+ */
+ context.plannedstmt = plannedstmt;
+ context.params = boundParams;
+ context.relations = NULL;
+
+ (void) GetLockableRelations_worker(plannedstmt->planTree, &context);
+
+ foreach(lc, plannedstmt->subplans)
+ {
+ Plan *subplan = lfirst(lc);
+
+ (void) GetLockableRelations_worker(subplan, &context);
+ }
+
+ return context.relations;
+}
+
+/*
+ * GetLockableRelations_worker
+ * Adds RT indexes of relations to be scanned by plan to
+ * context->relations
+ *
+ * For plan node types that support pruning, this only adds child plan
+ * subnodes that satisfy the "initial" pruning steps.
+ */
+static bool
+GetLockableRelations_worker(Plan *plan,
+ struct GetLockableRelations_context *context)
+{
+ if (plan == NULL)
+ return false;
+
+ switch(nodeTag(plan))
+ {
+ /* Nodes scanning a relation or relations. */
+ case T_SeqScan:
+ case T_SampleScan:
+ case T_IndexScan:
+ case T_IndexOnlyScan:
+ case T_BitmapHeapScan:
+ case T_TidScan:
+ case T_TidRangeScan:
+ context->relations = bms_add_member(context->relations,
+ ((Scan *) plan)->scanrelid);
+ return false;
+ case T_ForeignScan:
+ context->relations = bms_add_members(context->relations,
+ ((ForeignScan *) plan)->fs_relids);
+ return false;
+ case T_CustomScan:
+ context->relations = bms_add_members(context->relations,
+ ((CustomScan *) plan)->custom_relids);
+ return false;
+
+ /* Nodes containing prunable subnodes. */
+ case T_Append:
+ case T_MergeAppend:
+ {
+ PartitionPruneInfo *pruneinfo;
+
+ if (IsA(plan, Append))
+ pruneinfo = ((Append *) plan)->part_prune_info;
+ else
+ pruneinfo = ((MergeAppend *) plan)->part_prune_info;
+
+ if (pruneinfo && pruneinfo->contains_init_steps)
+ {
+ List *rtable = context->plannedstmt->rtable;
+ ParamListInfo params = context->params;
+ List *subplans;
+ Bitmapset *validsubplans;
+ Bitmapset *parentrelids;
+ int i;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+
+ if (IsA(plan, Append))
+ subplans = ((Append *) plan)->appendplans;
+ else
+ subplans = ((MergeAppend *) plan)->mergeplans;
+
+ /*
+ * A temporary context to allocate stuff needded to run
+ * the pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /* An ExprContext to evaluate expressions. */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+
+ /*
+ * PartitionDirectory, to look up partition descriptors
+ * Omits detached partitions, just like in the executor
+ * proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+ prunestate = ExecCreatePartitionPruneState(NULL,
+ pruneinfo, false,
+ rtable, econtext,
+ pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the "initial" pruning. */
+ validsubplans =
+ ExecFindInitialMatchingSubPlans(prunestate,
+ list_length(subplans),
+ pruneinfo,
+ &parentrelids);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ /* All relevant parents must be locked. */
+ Assert(bms_num_members(parentrelids) > 0);
+ context->relations = bms_add_members(context->relations,
+ parentrelids);
+
+ /* And all leaf partitions that will be scanned. */
+ i = -1;
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ GetLockableRelations_worker(subplan, context);
+ }
+
+ return false;
+ }
+ else
+ {
+ /*
+ * plan_tree_walker() will take care of walking *all* of
+ * the node's child subplans to collect their relids.
+ */
+ }
+ }
+ break;
+
+ default:
+ break;
+ }
+
+ return plan_tree_walker(plan, GetLockableRelations_worker,
+ (void *) context);
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 603d8becc4..7b77c8d20e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -120,9 +120,14 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
PartitionTupleRouting *proute);
extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
+ PartitionPruneInfo *partitionpruneinfo,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
- int nsubplans);
+ int nsubplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **parentrelids);
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/nodeFuncs.h b/src/include/nodes/nodeFuncs.h
index 93c60bde66..fca107ad65 100644
--- a/src/include/nodes/nodeFuncs.h
+++ b/src/include/nodes/nodeFuncs.h
@@ -158,5 +158,8 @@ extern bool raw_expression_tree_walker(Node *node, bool (*walker) (),
struct PlanState;
extern bool planstate_tree_walker(struct PlanState *planstate, bool (*walker) (),
void *context);
+struct Plan;
+extern bool plan_tree_walker(struct Plan *plan, bool (*walker) (),
+ void *context);
#endif /* NODEFUNCS_H */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1f3845b3fe..ffde93ef13 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -101,6 +101,9 @@ typedef struct PlannerGlobal
List *finalrtable; /* "flat" rangetable for executor */
+ Bitmapset *relationRTIs; /* Indexes of RTE_RELATION entries in range
+ * table */
+
List *finalrowmarks; /* "flat" list of PlanRowMarks */
List *resultRelations; /* "flat" list of integer RT indexes */
@@ -129,6 +132,9 @@ typedef struct PlannerGlobal
char maxParallelHazard; /* worst PROPARALLEL hazard level */
+ bool usesPreExecPruning; /* Do some Plan nodes use pre-execution
+ * partition pruning */
+
PartitionDirectory partition_directory; /* partition descriptors */
} PlannerGlobal;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 0b518ce6b2..bdb72f7cbf 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -59,12 +59,18 @@ typedef struct PlannedStmt
bool parallelModeNeeded; /* parallel mode required to execute? */
+ bool usesPreExecPruning; /* Do some Plan nodes use pre-execution
+ * partition pruning */
+
int jitFlags; /* which forms of JIT should be performed */
struct Plan *planTree; /* tree of Plan nodes */
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *relationRTIs; /* Indexes of RTE_RELATION entries in range
+ * table */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1172,6 +1178,10 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * contains_init_steps Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1180,6 +1190,7 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ bool contains_init_steps;
Bitmapset *other_subplans;
} PartitionPruneInfo;
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index ee11b6feae..90684efa25 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -41,6 +41,7 @@ struct RelOptInfo;
* subsidiary data, such as the FmgrInfos.
* planstate Points to the parent plan node's PlanState when called
* during execution; NULL when called from the planner.
+ * exprcontext ExprContext to use when evaluating pruning expressions
* exprstates Array of ExprStates, indexed as per PruneCxtStateIdx; one
* for each partition key in each pruning step. Allocated if
* planstate is non-NULL, otherwise NULL.
@@ -56,6 +57,7 @@ typedef struct PartitionPruneContext
FmgrInfo *stepcmpfuncs;
MemoryContext ppccontext;
PlanState *planstate;
+ ExprContext *exprcontext;
ExprState **exprstates;
} PartitionPruneContext;
--
2.24.1
On Tue, 11 Jan 2022 at 16:22, Robert Haas <robertmhaas@gmail.com> wrote:
This is just a relatively simple example and I think there are
probably a bunch of others. There are a lot of kinds of DDL that could
be performed on a partition that gets pruned away: DROP INDEX is just
one example.
I haven't followed this in any detail, but this patch and its goal of
reducing the O(N) drag effect on partition execution time is very
important. Locking a long list of objects that then get pruned is very
wasteful, as the results show.
Ideally, we want an O(1) algorithm for single partition access and DDL
is rare. So perhaps that is the starting point for a safe design -
invent a single lock or cache that allows us to check if the partition
hierarchy has changed in any way, and if so, replan, if not, skip
locks.
Please excuse me if this idea falls short, if so, please just note my
comment about how important this is. Thanks.
--
Simon Riggs http://www.EnterpriseDB.com/
Hi Simon,
On Tue, Jan 18, 2022 at 4:44 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
On Tue, 11 Jan 2022 at 16:22, Robert Haas <robertmhaas@gmail.com> wrote:
This is just a relatively simple example and I think there are
probably a bunch of others. There are a lot of kinds of DDL that could
be performed on a partition that gets pruned away: DROP INDEX is just
one example.I haven't followed this in any detail, but this patch and its goal of
reducing the O(N) drag effect on partition execution time is very
important. Locking a long list of objects that then get pruned is very
wasteful, as the results show.Ideally, we want an O(1) algorithm for single partition access and DDL
is rare. So perhaps that is the starting point for a safe design -
invent a single lock or cache that allows us to check if the partition
hierarchy has changed in any way, and if so, replan, if not, skip
locks.
Rearchitecting partition locking to be O(1) seems like a project of
non-trivial complexity as Robert mentioned in a related email thread
couple of years ago:
/messages/by-id/CA+TgmoYbtm1uuDne3rRp_uNA2RFiBwXX1ngj3RSLxOfc3oS7cQ@mail.gmail.com
Pursuing that kind of a project would perhaps have been more
worthwhile if the locking issue had affected more than just this
particular case, that is, the case of running prepared statements over
partitioned tables using generic plans. Addressing this by
rearchitecting run-time pruning (and plancache to some degree) seemed
like it might lead to this getting fixed in a bounded timeframe. I
admit that the concerns that Robert has raised about the patch make me
want to reconsider that position, though maybe it's too soon to
conclude.
--
Amit Langote
EDB: http://www.enterprisedb.com
On Tue, 18 Jan 2022 at 08:10, Amit Langote <amitlangote09@gmail.com> wrote:
Hi Simon,
On Tue, Jan 18, 2022 at 4:44 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:On Tue, 11 Jan 2022 at 16:22, Robert Haas <robertmhaas@gmail.com> wrote:
This is just a relatively simple example and I think there are
probably a bunch of others. There are a lot of kinds of DDL that could
be performed on a partition that gets pruned away: DROP INDEX is just
one example.I haven't followed this in any detail, but this patch and its goal of
reducing the O(N) drag effect on partition execution time is very
important. Locking a long list of objects that then get pruned is very
wasteful, as the results show.Ideally, we want an O(1) algorithm for single partition access and DDL
is rare. So perhaps that is the starting point for a safe design -
invent a single lock or cache that allows us to check if the partition
hierarchy has changed in any way, and if so, replan, if not, skip
locks.Rearchitecting partition locking to be O(1) seems like a project of
non-trivial complexity as Robert mentioned in a related email thread
couple of years ago:/messages/by-id/CA+TgmoYbtm1uuDne3rRp_uNA2RFiBwXX1ngj3RSLxOfc3oS7cQ@mail.gmail.com
I agree, completely redesigning locking is a major project. But that
isn't what I suggested, which was to find an O(1) algorithm to solve
the safety issue. I'm sure there is an easy way to check one lock,
maybe a new one/new kind, rather than N.
Why does the safety issue exist? Why is it important to be able to
concurrently access parts of the hierarchy with DDL? Those are not
critical points.
If we asked them, most users would trade a 10x performance gain for
some restrictions on DDL. If anyone cares, make it an option, but most
people will use it.
Maybe force all DDL, or just DDL that would cause safety issues, to
update a hierarchy version number, so queries can tell whether they
need to replan. Don't know, just looking for an O(1) solution.
--
Simon Riggs http://www.EnterpriseDB.com/
On Tue, Jan 18, 2022 at 3:10 AM Amit Langote <amitlangote09@gmail.com> wrote:
Pursuing that kind of a project would perhaps have been more
worthwhile if the locking issue had affected more than just this
particular case, that is, the case of running prepared statements over
partitioned tables using generic plans. Addressing this by
rearchitecting run-time pruning (and plancache to some degree) seemed
like it might lead to this getting fixed in a bounded timeframe. I
admit that the concerns that Robert has raised about the patch make me
want to reconsider that position, though maybe it's too soon to
conclude.
I wasn't trying to say that your approach was dead in the water. It
does create a situation that can't happen today, and such things are
scary and need careful thought. But redesigning the locking mechanism
would need careful thought, too ... maybe even more of it than sorting
this out.
I do also agree with Simon that this is an important problem to which
we need to find some solution.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Jan 18, 2022 at 7:28 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
On Tue, 18 Jan 2022 at 08:10, Amit Langote <amitlangote09@gmail.com> wrote:
On Tue, Jan 18, 2022 at 4:44 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:I haven't followed this in any detail, but this patch and its goal of
reducing the O(N) drag effect on partition execution time is very
important. Locking a long list of objects that then get pruned is very
wasteful, as the results show.Ideally, we want an O(1) algorithm for single partition access and DDL
is rare. So perhaps that is the starting point for a safe design -
invent a single lock or cache that allows us to check if the partition
hierarchy has changed in any way, and if so, replan, if not, skip
locks.Rearchitecting partition locking to be O(1) seems like a project of
non-trivial complexity as Robert mentioned in a related email thread
couple of years ago:/messages/by-id/CA+TgmoYbtm1uuDne3rRp_uNA2RFiBwXX1ngj3RSLxOfc3oS7cQ@mail.gmail.com
I agree, completely redesigning locking is a major project. But that
isn't what I suggested, which was to find an O(1) algorithm to solve
the safety issue. I'm sure there is an easy way to check one lock,
maybe a new one/new kind, rather than N.
I misread your email then, sorry.
Why does the safety issue exist? Why is it important to be able to
concurrently access parts of the hierarchy with DDL? Those are not
critical points.If we asked them, most users would trade a 10x performance gain for
some restrictions on DDL. If anyone cares, make it an option, but most
people will use it.Maybe force all DDL, or just DDL that would cause safety issues, to
update a hierarchy version number, so queries can tell whether they
need to replan. Don't know, just looking for an O(1) solution.
Yeah, it would be great if it would suffice to take a single lock on
the partitioned table mentioned in the query, rather than on all
elements of the partition tree added to the plan. AFAICS, ways to get
that are 1) Prevent modifying non-root partition tree elements, 2)
Make it so that locking a partitioned table becomes a proxy for having
locked all of its descendents, 3) Invent a Plan representation for
scanning partitioned tables such that adding the descendent tables
that survive plan-time pruning to the plan doesn't require locking
them too. IIUC, you've mentioned 1 and 2. I think I've seen 3
mentioned in the past discussions on this topic, but I guess the
research on whether that's doable has never been done.
--
Amit Langote
EDB: http://www.enterprisedb.com
On Tue, Jan 18, 2022 at 11:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Jan 18, 2022 at 3:10 AM Amit Langote <amitlangote09@gmail.com> wrote:
Pursuing that kind of a project would perhaps have been more
worthwhile if the locking issue had affected more than just this
particular case, that is, the case of running prepared statements over
partitioned tables using generic plans. Addressing this by
rearchitecting run-time pruning (and plancache to some degree) seemed
like it might lead to this getting fixed in a bounded timeframe. I
admit that the concerns that Robert has raised about the patch make me
want to reconsider that position, though maybe it's too soon to
conclude.I wasn't trying to say that your approach was dead in the water. It
does create a situation that can't happen today, and such things are
scary and need careful thought. But redesigning the locking mechanism
would need careful thought, too ... maybe even more of it than sorting
this out.
Yes, agreed.
--
Amit Langote
EDB: http://www.enterprisedb.com
On Wed, 19 Jan 2022 at 08:31, Amit Langote <amitlangote09@gmail.com> wrote:
Maybe force all DDL, or just DDL that would cause safety issues, to
update a hierarchy version number, so queries can tell whether they
need to replan. Don't know, just looking for an O(1) solution.Yeah, it would be great if it would suffice to take a single lock on
the partitioned table mentioned in the query, rather than on all
elements of the partition tree added to the plan. AFAICS, ways to get
that are 1) Prevent modifying non-root partition tree elements,
Can we reuse the concept of Strong/Weak locking here?
When a DDL request is in progress (for that partitioned table), take
all required locks for safety. When a DDL request is not in progress,
take minimal locks knowing it is safe.
We can take a single PartitionTreeModificationLock, nowait to prove
that we do not need all locks. DDL would request the lock in exclusive
mode. (Other mechanisms possible).
--
Simon Riggs http://www.EnterpriseDB.com/
On Thu, Jan 13, 2022 at 3:20 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jan 12, 2022 at 9:32 AM Amit Langote <amitlangote09@gmail.com> wrote:
Or, maybe this won't be a concern if performing ExecutorStart() is
made a part of CheckCachedPlan() somehow, which would then take locks
on the relation as the PlanState tree is built capturing any plan
invalidations, instead of AcquireExecutorLocks(). That does sound like
an ambitious undertaking though.On the surface that would seem to involve abstraction violations, but
maybe that could be finessed somehow. The plancache shouldn't know too
much about what the executor is going to do with the plan, but it
could ask the executor to perform a step that has been designed for
use by the plancache. I guess the core problem here is how to pass
around information that is node-specific before we've stood up the
executor state tree. Maybe the executor could have a function that
does the pruning and returns some kind of array of results that can be
used both to decide what to lock and also what to consider as pruned
at the start of execution. (I'm hand-waving about the details because
I don't know.)
The attached patch implements this idea. Sorry for the delay in
getting this out and thanks to Robert for the off-list discussions on
this.
So the new executor "step" you mention is the function ExecutorPrep in
the patch, which calls a recursive function ExecPrepNode on the plan
tree's top node, much as ExecutorStart calls (via InitPlan)
ExecInitNode to construct a PlanState tree for actual execution
paralleling the plan tree.
For now, ExecutorPrep() / ExecPrepNode() does mainly two things if and
as it walks the plan tree: 1) Extract the RT indexes of RTE_RELATION
entries and add them to a bitmapset in the result struct, 2) If the
node contains a PartitionPruneInfo, perform its "initial pruning
steps" and store the result of doing so in a per-plan-node node called
PlanPrepOutput. The bitmapset and the array containing per-plan-node
PlanPrepOutput nodes are returned in a node called ExecPrepOutput,
which is the result of ExecutorPrep, to its calling module (say,
plancache.c), which, after it's done using that information, must pass
it forward to subsequent execution steps. That is done by passing it,
via the module's callers, to CreateQueryDesc() which remembers the
ExecPrepOutput in QueryDesc that is eventually passed to
ExecutorStart().
A bunch of other details are mentioned in the patch's commit message,
which I'm pasting below for anyone reading to spot any obvious flaws
(no-go's) of this approach:
Invent a new executor "prep" phase
The new phase, implemented by execMain.c:ExecutorPrep() and its
recursive underling execProcnode.c:ExecPrepNode(), takes a query's
PlannedStmt and processes the plan tree contained in it to produce
a ExecPrepOutput node as result.
As the plan tree is walked, each node must add the RT index(es) of
any relation(s) that it directly manipulates to a bitmapset member of
ExecPrepOutput (for example, an IndexScan node must add the Scan's
scanrelid). Also, each node may want to make a PlanPrepOutput node
containing additional information that may be of interest to the
calling module or to the later execution phases, if the node can
provide one (for example, an Append node may perform initial pruning
and add a set of "initially valid subplans" to the PlanPrepOutput).
The PlanPrepOutput nodess of all the plan nodes are added to an array
in the ExecPrepOutput, which is indexed using the individual nodes'
plan_node_id; a NULL is stored in the array slots of nodes that
don't have anything interesting to add to the PlanPrepOutput.
The ExecPrepOutput thus produced is passed to CreateQueryDesc()
and subsequently to ExecutorStart() via QueryDesc, which then makes
it available to the executor routines via the query's EState.
The main goal of adding this new phase is, for now, to allow cached
cached generic plans containing scans of partitioned tables using
Append/MergeAppend to be executed more efficiently by the prep phase
doing any initial pruning, instead of deferring that to
ExecutorStart(). That may allow AcquireExecutorLocks() on the plan
to lock only only the minimal set of relations/partitions, that is
those whose subplans survive the initial pruning.
Implementation notes:
* To allow initial pruning to be done as part of the pre-execution
prep phase as opposed to as part of ExecutorStart(), this refactors
ExecCreatePartitionPruneState() and ExecFindInitialMatchingSubPlans()
to pass the information needed to do initial pruning directly as
parameters instead of getting that from the EState and the PlanState
of the parent Append/MergeAppend, both of which would not be
available in ExecutorPrep(). Another, sort of non-essential-to-this-
goal, refactoring this does is moving the partition pruning
initialization stanzas in ExecInitAppend() and ExecInitMergeAppend()
both of which contain the same cod into its own function
ExecInitPartitionPruning().
* To pass the ExecPrepOutput(s) created by the plancache module's
invocation of ExecutorPrep() to the callers of the module, which in
turn would pass them down to ExecutorStart(), CachedPlan gets a new
List field that stores those ExecPrepOutputs, containing one element
for each PlannedStmt also contained in the CachedPlan. The new list
is stored in a child context of the context containing the
PlannedStmts, though unlike the latter, it is reset on every
invocation of CheckCachedPlan(), which in turn calls ExecutorPrep()
with a new set of bound Params.
* AcquireExecutorLocks() is now made to loop over a bitmapset of RT
indexes, those of relations returned in ExecPrepOutput, instead of
over the whole range table. With initial pruning that is also done
as part of ExcecutorPrep(), only relations from non-pruned nodes of
the plan tree would get locked as a result of this new arrangement.
* PlannedStmt gets a new field usesPrepExecPruning that indicates
whether any of the nodes of the plan tree contain "initial" (or
"pre-execution") pruning steps, which saves ExecutorPrep() the
trouble of walking the plan tree only to find out whether that's
the case.
* PartitionPruneInfo nodes now explicitly stores whether the steps
contained in any of the individual PartitionedRelPruneInfos embedded
in it contain initial pruning steps (those that can be performed
during ExecutorPrep) and execution pruning steps (those that can only
be performed during ExecutorRun), as flags contains_initial_steps and
contains_exec_steps, respectively. In fact, the aforementioned
PlannedStmt field's value is a logical OR of the values of the former
across all PartitionPruneInfo nodes embedded in the plan tree.
* PlannedStmt also gets a bitmapset field to store the RT indexes of
all relation RTEs referenced in the query that is populated when
contructing the flat range table in setrefs.c, which effectively
contains all the relations that the planner must have locked. In the
case of a cached plan, AcquireExecutorLocks() must lock all of those
relations, except those whose subnodes get pruned as result of
ExecutorPrep().
* PlannedStmt gets yet another field numPlanNodes that records the
highest plan_node_id assigned to any of the node contained in the
tree, which serves as the size to use when allocating the
PlanPrepOutput array.
Maybe this should be more than one patch? Say:
0001 to add ExecutorPrep and the boilerplate,
0002 to teach plancache.c to use the new facility
Thoughts?
--
Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v4-0001-Invent-a-new-executor-prep-phase.patchapplication/octet-stream; name=v4-0001-Invent-a-new-executor-prep-phase.patchDownload
From 7d29fea0fcf8e6aec2877804555dd0239fdaf1be Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v4] Invent a new executor "prep" phase
The new phase, implemented by execMain.c:ExecutorPrep() and its
recursive underling execProcnode.c:ExecPrepNode(), takes a query's
PlannedStmt and processes the plan tree contained in it to produce
a ExecPrepOutput node as result.
As the plan tree is walked, each node must add the RT index(es) of
any relation(s) that it directly manipulates to a bitmapset member of
ExecPrepOutput (for example, an IndexScan node must add the Scan's
scanrelid). Also, each node may want to make a PlanPrepOutput node
containing additional information that may be of interest to the
calling module or to the later execution phases, if the node can
provide one (for example, an Append node may perform initial pruning
and add a set of "initially valid subplans" to the PlanPrepOutput).
The PlanPrepOutput nodess of all the plan nodes are added to an array
in the ExecPrepOutput, which is indexed using the individual nodes'
plan_node_id; a NULL is stored in the array slots of nodes that
don't have anything interesting to add to the PlanPrepOutput.
The ExecPrepOutput thus produced is passed to CreateQueryDesc()
and subsequently to ExecutorStart() via QueryDesc, which then makes
it available to the executor routines via the query's EState.
The main goal of adding this new phase is, for now, to allow cached
cached generic plans containing scans of partitioned tables using
Append/MergeAppend to be executed more efficiently by the prep phase
doing any initial pruning, instead of deferring that to
ExecutorStart(). That may allow AcquireExecutorLocks() on the plan
to lock only only the minimal set of relations/partitions, that is
those whose subplans survive the initial pruning.
Implementation notes:
* To allow initial pruning to be done as part of the pre-execution
prep phase as opposed to as part of ExecutorStart(), this refactors
ExecCreatePartitionPruneState() and ExecFindInitialMatchingSubPlans()
to pass the information needed to do initial pruning directly as
parameters instead of getting that from the EState and the PlanState
of the parent Append/MergeAppend, both of which would not be
available in ExecutorPrep(). Another, sort of non-essential-to-this-
goal, refactoring this does is moving the partition pruning
initialization stanzas in ExecInitAppend() and ExecInitMergeAppend()
both of which contain the same cod into its own function
ExecInitPartitionPruning().
* To pass the ExecPrepOutput(s) created by the plancache module's
invocation of ExecutorPrep() to the callers of the module, which in
turn would pass them down to ExecutorStart(), CachedPlan gets a new
List field that stores those ExecPrepOutputs, containing one element
for each PlannedStmt also contained in the CachedPlan. The new list
is stored in a child context of the context containing the
PlannedStmts, though unlike the latter, it is reset on every
invocation of CheckCachedPlan(), which in turn calls ExecutorPrep()
with a new set of bound Params.
* AcquireExecutorLocks() is now made to loop over a bitmapset of RT
indexes, those of relations returned in ExecPrepOutput, instead of
over the whole range table. With initial pruning that is also done
as part of ExcecutorPrep(), only relations from non-pruned nodes of
the plan tree would get locked as a result of this new arrangement.
* PlannedStmt gets a new field usesPrepExecPruning that indicates
whether any of the nodes of the plan tree contain "initial" (or
"pre-execution") pruning steps, which saves ExecutorPrep() the
trouble of walking the plan tree only to find out whether that's
the case.
* PartitionPruneInfo nodes now explicitly stores whether the steps
contained in any of the individual PartitionedRelPruneInfos embedded
in it contain initial pruning steps (those that can be performed
during ExecutorPrep) and execution pruning steps (those that can only
be performed during ExecutorRun), as flags contains_initial_steps and
contains_exec_steps, respectively. In fact, the aforementioned
PlannedStmt field's value is a logical OR of the values of the former
across all PartitionPruneInfo nodes embedded in the plan tree.
* PlannedStmt also gets a bitmapset field to store the RT indexes of
all relation RTEs referenced in the query that is populated when
contructing the flat range table in setrefs.c, which effectively
contains all the relations that the planner must have locked. In the
case of a cached plan, AcquireExecutorLocks() must lock all of those
relations, except those whose subnodes get pruned as result of
ExecutorPrep().
* PlannedStmt gets yet another field numPlanNodes that records the
highest plan_node_id assigned to any of the node contained in the
tree, which serves as the size to use when allocating the
PlanPrepOutput array.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 13 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 17 +-
src/backend/executor/README | 18 +
src/backend/executor/execMain.c | 48 ++
src/backend/executor/execParallel.c | 4 +-
src/backend/executor/execPartition.c | 538 +++++++++++++-----
src/backend/executor/execProcnode.c | 206 +++++++
src/backend/executor/execUtils.c | 8 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAgg.c | 13 +
src/backend/executor/nodeAppend.c | 91 ++-
src/backend/executor/nodeBitmapAnd.c | 18 +
src/backend/executor/nodeBitmapHeapscan.c | 14 +
src/backend/executor/nodeBitmapIndexscan.c | 14 +
src/backend/executor/nodeBitmapOr.c | 18 +
src/backend/executor/nodeCtescan.c | 12 +
src/backend/executor/nodeCustom.c | 18 +
src/backend/executor/nodeForeignscan.c | 12 +
src/backend/executor/nodeFunctionscan.c | 13 +
src/backend/executor/nodeGather.c | 13 +
src/backend/executor/nodeGatherMerge.c | 13 +
src/backend/executor/nodeGroup.c | 13 +
src/backend/executor/nodeHash.c | 13 +
src/backend/executor/nodeHashjoin.c | 14 +
src/backend/executor/nodeIncrementalSort.c | 14 +
src/backend/executor/nodeIndexonlyscan.c | 14 +
src/backend/executor/nodeIndexscan.c | 14 +
src/backend/executor/nodeLimit.c | 13 +
src/backend/executor/nodeLockRows.c | 13 +
src/backend/executor/nodeMaterial.c | 13 +
src/backend/executor/nodeMemoize.c | 13 +
src/backend/executor/nodeMergeAppend.c | 90 ++-
src/backend/executor/nodeMergejoin.c | 14 +
src/backend/executor/nodeModifyTable.c | 26 +
.../executor/nodeNamedtuplestorescan.c | 13 +
src/backend/executor/nodeNestloop.c | 14 +
src/backend/executor/nodeProjectSet.c | 13 +
src/backend/executor/nodeRecursiveunion.c | 14 +
src/backend/executor/nodeResult.c | 13 +
src/backend/executor/nodeSamplescan.c | 14 +
src/backend/executor/nodeSeqscan.c | 13 +
src/backend/executor/nodeSetOp.c | 13 +
src/backend/executor/nodeSort.c | 13 +
src/backend/executor/nodeSubplan.c | 12 +
src/backend/executor/nodeSubqueryscan.c | 14 +
src/backend/executor/nodeTableFuncscan.c | 13 +
src/backend/executor/nodeTidrangescan.c | 14 +
src/backend/executor/nodeTidscan.c | 15 +-
src/backend/executor/nodeUnique.c | 13 +
src/backend/executor/nodeValuesscan.c | 13 +
src/backend/executor/nodeWindowAgg.c | 13 +
src/backend/executor/nodeWorktablescan.c | 12 +
src/backend/executor/spi.c | 14 +-
src/backend/nodes/copyfuncs.c | 49 ++
src/backend/nodes/outfuncs.c | 6 +
src/backend/nodes/readfuncs.c | 5 +
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/plan/setrefs.c | 10 +
src/backend/partitioning/partprune.c | 57 +-
src/backend/tcop/postgres.c | 15 +-
src/backend/tcop/pquery.c | 21 +-
src/backend/utils/cache/plancache.c | 155 +++--
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execPartition.h | 19 +-
src/include/executor/execdesc.h | 2 +
src/include/executor/executor.h | 3 +
src/include/executor/nodeAgg.h | 1 +
src/include/executor/nodeAppend.h | 1 +
src/include/executor/nodeBitmapAnd.h | 1 +
src/include/executor/nodeBitmapHeapscan.h | 1 +
src/include/executor/nodeBitmapIndexscan.h | 1 +
src/include/executor/nodeBitmapOr.h | 1 +
src/include/executor/nodeCtescan.h | 1 +
src/include/executor/nodeCustom.h | 1 +
src/include/executor/nodeForeignscan.h | 1 +
src/include/executor/nodeFunctionscan.h | 1 +
src/include/executor/nodeGather.h | 1 +
src/include/executor/nodeGatherMerge.h | 1 +
src/include/executor/nodeGroup.h | 1 +
src/include/executor/nodeHash.h | 1 +
src/include/executor/nodeHashjoin.h | 1 +
src/include/executor/nodeIncrementalSort.h | 1 +
src/include/executor/nodeIndexonlyscan.h | 1 +
src/include/executor/nodeIndexscan.h | 1 +
src/include/executor/nodeLimit.h | 1 +
src/include/executor/nodeLockRows.h | 1 +
src/include/executor/nodeMaterial.h | 1 +
src/include/executor/nodeMemoize.h | 1 +
src/include/executor/nodeMergeAppend.h | 1 +
src/include/executor/nodeMergejoin.h | 1 +
src/include/executor/nodeModifyTable.h | 1 +
.../executor/nodeNamedtuplestorescan.h | 1 +
src/include/executor/nodeNestloop.h | 1 +
src/include/executor/nodeProjectSet.h | 1 +
src/include/executor/nodeRecursiveunion.h | 1 +
src/include/executor/nodeResult.h | 2 +
src/include/executor/nodeSamplescan.h | 1 +
src/include/executor/nodeSeqscan.h | 1 +
src/include/executor/nodeSetOp.h | 1 +
src/include/executor/nodeSort.h | 1 +
src/include/executor/nodeSubplan.h | 1 +
src/include/executor/nodeSubqueryscan.h | 1 +
src/include/executor/nodeTableFuncscan.h | 1 +
src/include/executor/nodeTidrangescan.h | 1 +
src/include/executor/nodeTidscan.h | 1 +
src/include/executor/nodeUnique.h | 1 +
src/include/executor/nodeValuesscan.h | 1 +
src/include/executor/nodeWindowAgg.h | 1 +
src/include/executor/nodeWorktablescan.h | 1 +
src/include/nodes/execnodes.h | 78 +++
src/include/nodes/nodeFuncs.h | 3 +
src/include/nodes/nodes.h | 5 +
src/include/nodes/pathnodes.h | 6 +
src/include/nodes/plannodes.h | 17 +
src/include/partitioning/partprune.h | 2 +
src/include/tcop/tcopprot.h | 2 +-
src/include/utils/plancache.h | 5 +
src/include/utils/portal.h | 5 +
124 files changed, 1866 insertions(+), 285 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 3283ef50d0..bb7d5e65ea 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -542,7 +542,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index b970997c34..9ee82824a1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, ExecPrepOutput *execprep,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, execprep, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index a2e77c418a..214a345aa2 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -741,8 +741,10 @@ execute_sql_string(const char *sql)
RawStmt *parsetree = lfirst_node(RawStmt, lc1);
MemoryContext per_parsetree_context,
oldcontext;
- List *stmt_list;
- ListCell *lc2;
+ List *stmt_list,
+ *stmt_execprep_list;
+ ListCell *lc2,
+ *lc3;
/*
* We do the work for each parsetree in a short-lived context, to
@@ -762,11 +764,13 @@ execute_sql_string(const char *sql)
NULL,
0,
NULL);
- stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL);
+ stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL,
+ &stmt_execprep_list);
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, stmt_execprep_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecPrepOutput *execprep = lfirst_node(ExecPrepOutput, lc3);
CommandCounterIncrement();
@@ -777,6 +781,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ execprep,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 05e7b60059..4ef44aaf23 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -416,7 +416,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 9902c5c566..0bea2dd18f 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -107,6 +107,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ list_make1(NULL), /* no ExecPrepOutput to pass */
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 206d2bbbf9..ac188a7347 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -189,6 +189,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *plan_execprep_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -229,6 +230,7 @@ ExecuteQuery(ParseState *pstate,
/* Replan if needed, and increment plan refcount for portal */
cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
plan_list = cplan->stmt_list;
+ plan_execprep_list = cplan->stmt_execprep_list;
/*
* DO NOT add any logic that could possibly throw an error between
@@ -238,7 +240,7 @@ ExecuteQuery(ParseState *pstate,
NULL,
query_string,
entry->plansource->commandTag,
- plan_list,
+ plan_list, plan_execprep_list,
cplan);
/*
@@ -610,7 +612,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *plan_execprep_list;
+ ListCell *p,
+ *pe;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -666,15 +670,18 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ plan_execprep_list = cplan->stmt_execprep_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pe, plan_execprep_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ ExecPrepOutput *execprep = lfirst_node(ExecPrepOutput, pe);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, execprep, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index bf5e70860d..c25db66ff0 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,21 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+A plan tree may also be made to go through ExecutorPrep() to collect some
+information about the individual plan nodes that may help optimize the
+actual execution of the plan. Such information about each plan node is put
+into a PlanPrepOutput node if the plan node type supports producing one and
+stored in an array in ExecPrepOutput that in turn represents the output of
+a ExecutorPrep() run. The PlanPrepOutput array is indexed with plan_node_id
+of the individual plan nodes. An example of what such information may look
+like is in the "prep" routine of the Append node (ExecPrepAppend), which does
+partition pruning using "initial steps", that is, pruning with expressions
+that can evaluated even before the actual execution has started. That produces
+a set of "initially valid subplans" that is put into the PlanPrepOutput
+belonging to Append that can be used as-is by the initializer routine of the
+Append node (nodeAppend.c: ExecInitAppend) to only initialize the plan state
+trees of those subplans.
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -247,6 +262,9 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorPrep ] --- an optional step to walk over the plan tree to produce
+ an ExecPrepOutput to be passed to CreateQueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 549d9eb696..e38966295e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -103,6 +103,52 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorPrep
+ *
+ * This optional executor routine must be called if the PlannedStmt
+ * indicates that some nodes in the planTree can perform preparatory
+ * actions, such as pre-execution/initial pruning
+ *
+ * Returned information includes the set of RT indexes of relations referenced
+ * in the plan, and a PlanPrepOutput node for each node in the planTree if the
+ * node type supports producing one.
+ *
+ * This may lock relations whose information may be used to produce the
+ * PlanPrepOutput nodes. For example, a partitioned table before perusing its
+ * PartitionPruneInfo contained in an Append node to do the pruning the result
+ * of which is used to populate the Append node's PlanPrepOutput.
+ */
+ExecPrepOutput *
+ExecutorPrep(ExecPrepContext *context)
+{
+ ExecPrepOutput *result = makeNode(ExecPrepOutput);
+
+ result->numPlanNodes = context->stmt->numPlanNodes;
+ result->planPrepResults = palloc0(sizeof(PlanPrepOutput *) *
+ result->numPlanNodes);
+ if (!context->stmt->usesPreExecPruning)
+ {
+ /* Shortcut */
+ result->relationRTIs = bms_copy(context->stmt->relationRTIs);
+ }
+ else
+ {
+ /* Go find the nodes that need any "prep" work done. */
+ ListCell *lc;
+
+ foreach(lc, context->stmt->subplans)
+ {
+ Plan *subplan = lfirst(lc);
+
+ ExecPrepNode(subplan, context, result);
+ }
+
+ ExecPrepNode(context->stmt->planTree, context, result);
+ }
+
+ return result;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -804,6 +850,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ ExecPrepOutput *execprep = queryDesc->execprep;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -823,6 +870,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_execprep = execprep;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 5dd8ab7db2..0567534358 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -182,8 +182,10 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->usesPreExecPruning = false;
pstmt->planTree = plan;
pstmt->rtable = estate->es_range_table;
+ pstmt->relationRTIs = NULL;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
@@ -1248,7 +1250,7 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, NULL, /* XXX pass ExecPrepOutput too? */
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 90ed1485d1..75292fbd21 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -24,6 +24,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -186,7 +187,11 @@ static void ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate);
+ PlanState *planstate,
+ ExprContext *econtext);
+static void ExecPartitionPruneFixSubPlanIndexes(PartitionPruneState *prunestate,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans);
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
@@ -1476,8 +1481,9 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or even before during ExecutorPrep().
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1485,10 +1491,28 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
*
* Functions:
*
+ * ExecInitPartitionPruning:
+ * This determines the initially valid subplans by doing pruning with
+ * only pre-execution pruning expressions, that is, expressions in the
+ * query that were matched to the partition key(s), whose values are
+ * known at executor startup (excludeing expressions containing
+ * PARAM_EXEC Params); see ExecFindInitialMatchingSubPlans(). The
+ * PartitionPruneState thus created, which stores the details about
+ * mapping the partition indexes returned by the partition pruning code
+ * into subplan indexes, is also returned for use during subsquent
+ * pruning. Pruned subplans must be removed from the parent plan's list
+ * of subplans to be executed, so this also remaps the partition indexes
+ * in the PartitionPruneState to the new indexes of surviving subplans.
+ *
+ * ExecPrepDoInitialPruning:
+ * Do ExecFindInitialMatchingSubPlans as part of ExecPrepNode() on the
+ * parent plan node
+ *
* ExecCreatePartitionPruneState:
* Creates the PartitionPruneState required by each of the two pruning
* functions. Details stored include how to map the partition index
* returned by the partition pruning code into subplan indexes.
+ * (Note: Use ExecInitPartitionPruning() rather than use this directly.)
*
* ExecFindInitialMatchingSubPlans:
* Returns indexes of matching subplans. Partition pruning is attempted
@@ -1500,6 +1524,7 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* remap of the partition index to subplan index map and the newly
* created map provides indexes only for subplans which remain after
* calling this function.
+ * (Note: Use ExecInitPartitionPruning() rather than use this directly.)
*
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating all available
@@ -1514,7 +1539,9 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* Build the data structure required for calling
* ExecFindInitialMatchingSubPlans and ExecFindMatchingSubPlans.
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable', 'econtext', and 'partdir' must be provided.
*
* 'partitionpruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1529,18 +1556,20 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
*/
PartitionPruneState *
ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo)
+ PartitionPruneInfo *partitionpruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert(partdir != NULL && econtext != NULL &&
+ (estate != NULL || rtable != NIL));
n_part_hierarchies = list_length(partitionpruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1591,19 +1620,34 @@ ExecCreatePartitionPruneState(PlanState *planstate,
PartitionedRelPruneInfo *pinfo = lfirst_node(PartitionedRelPruneInfo, lc2);
PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
Relation partrel;
+ bool close_partrel = false;
PartitionDesc partdesc;
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+
+ partrel = table_open(rte->relid, rte->rellockmode);
+ close_partrel = true;
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /* Safe to close partrel, if necessary, keeping the lock taken. */
+ if (close_partrel)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1705,30 +1749,32 @@ ExecCreatePartitionPruneState(PlanState *planstate,
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
ExecInitPruningContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether initial pruning is needed at any level */
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
ExecInitPruningContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether exec pruning is needed at any level */
prunestate->do_exec_prune = true;
- }
- /*
- * Accumulate the IDs of all PARAM_EXEC Params affecting the
- * partitioning decisions at this plan node.
- */
- prunestate->execparamids = bms_add_members(prunestate->execparamids,
- pinfo->execparamids);
+ /*
+ * Accumulate the IDs of all PARAM_EXEC Params affecting the
+ * partitioning decisions at this plan node.
+ */
+ prunestate->execparamids = bms_add_members(prunestate->execparamids,
+ pinfo->execparamids);
+ }
j++;
}
@@ -1740,13 +1786,18 @@ ExecCreatePartitionPruneState(PlanState *planstate,
/*
* Initialize a PartitionPruneContext for the given list of pruning steps.
+ *
+ * At least one of 'planstate' or 'econtext' must be passed to be able to
+ * successfully evaluate any non-Const expressions contained in the
+ * steps.
*/
static void
ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate)
+ PlanState *planstate,
+ ExprContext *econtext)
{
int n_steps;
int partnatts;
@@ -1767,6 +1818,7 @@ ExecInitPruningContext(PartitionPruneContext *context,
context->ppccontext = CurrentMemoryContext;
context->planstate = planstate;
+ context->exprcontext = econtext;
/* Initialize expression state for each expression we need */
context->exprstates = (ExprState **)
@@ -1795,14 +1847,269 @@ ExecInitPruningContext(PartitionPruneContext *context,
step->step.step_id,
keyno);
- context->exprstates[stateidx] =
- ExecInitExpr(expr, context->planstate);
+ if (planstate == NULL)
+ context->exprstates[stateidx] =
+ ExecInitExprWithParams(expr,
+ econtext->ecxt_param_list_info);
+ else
+ context->exprstates[stateidx] =
+ ExecInitExpr(expr, context->planstate);
}
keyno++;
}
}
}
+Bitmapset *
+ExecInitPartitionPruning(PlanState *planstate, int n_total_subplans,
+ PartitionPruneInfo *pruneinfo,
+ PartitionPruneState **prunestate)
+{
+ Bitmapset *validsubplans;
+ Plan *plan = planstate->plan;
+ EState *estate = planstate->state;
+ PlanPrepOutput *planPrepResult = NULL;
+ bool do_pruning = (pruneinfo->contains_init_steps ||
+ pruneinfo->contains_exec_steps);
+
+ *prunestate = NULL;
+ if (estate->es_execprep)
+ {
+ planPrepResult = ExecPrepFetchPlanPrepOutput(estate->es_execprep,
+ plan);
+
+ Assert(planPrepResult != NULL);
+ /* No need to do initial pruning again, only exec pruning. */
+ do_pruning = pruneinfo->contains_exec_steps;
+ }
+
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PlanPrepOutput.
+ */
+ *prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo,
+ planPrepResult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
+
+ /*
+ * Perform an initial partition prune, if required.
+ */
+ if (planPrepResult)
+ {
+ /* ExecutorPrep() already did it for us! */
+ validsubplans = planPrepResult->initially_valid_subnodes;
+ }
+ else if (*prunestate && (*prunestate)->do_initial_prune)
+ {
+ /* Determine which subplans survive initial pruning */
+ validsubplans = ExecFindInitialMatchingSubPlans(*prunestate, pruneinfo,
+ NULL);
+ }
+ else
+ {
+ /* We'll need to initialize all subplans */
+ Assert(n_total_subplans > 0);
+ validsubplans = bms_add_range(NULL, 0, n_total_subplans - 1);
+ }
+
+ /*
+ * If exec-time pruning is required and subplans are pruned by initial
+ * pruning, then we must re-sequence the subplan indexes so that
+ * ExecFindMatchingSubPlans properly returns the indexes from the
+ * subplans which will remain after initial pruning.
+ *
+ * We can safely skip this when !do_exec_prune, even though that leaves
+ * invalid data in prunestate, because that data won't be consulted again
+ * (cf initial Assert in ExecFindMatchingSubPlans).
+ */
+ if (*prunestate && (*prunestate)->do_exec_prune &&
+ bms_num_members(validsubplans) < n_total_subplans)
+ ExecPartitionPruneFixSubPlanIndexes(*prunestate, validsubplans,
+ n_total_subplans);
+
+ return validsubplans;
+}
+
+/*
+ * ExecPrepDoInitialPruning
+ * Perform initial pruning as part of doing ExecPrepNode() on the parent
+ * plan node
+ */
+Bitmapset *
+ExecPrepDoInitialPruning(PartitionPruneInfo *pruneinfo,
+ List *rtable, ParamListInfo params,
+ Bitmapset **parentrelids)
+{
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *validsubplans;
+
+ /*
+ * A temporary context to allocate stuff needded to run
+ * the pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /* An ExprContext to evaluate expressions. */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+
+ /*
+ * PartitionDirectory, to look up partition descriptors
+ * Omits detached partitions, just like in the executor
+ * proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+ prunestate = ExecCreatePartitionPruneState(NULL, pruneinfo,
+ true, false,
+ rtable, econtext,
+ pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the "initial" pruning. */
+ validsubplans =
+ ExecFindInitialMatchingSubPlans(prunestate,
+ pruneinfo,
+ parentrelids);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return validsubplans;
+}
+
+/*
+ * ExecPartitionPruneFixSubPlanIndexes
+ * Fix mapping of partition indexes to subplan indexes contained in
+ * prunestate by considering the new list of subplans that survived
+ * initial pruning
+ *
+ * Subplans would be previously indexed 0..(n_total_subplans - 1), though
+ * now should be changed to index range 0..num(initially_valid_subplans).
+ */
+static void
+ExecPartitionPruneFixSubPlanIndexes(PartitionPruneState *prunestate,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans)
+{
+ int *new_subplan_indexes;
+ Bitmapset *new_other_subplans;
+ int i;
+ int newidx;
+
+ /*
+ * First we must build a temporary array which maps old subplan
+ * indexes to new ones. For convenience of initialization, we use
+ * 1-based indexes in this array and leave pruned items as 0.
+ */
+ new_subplan_indexes = (int *) palloc0(sizeof(int) * n_total_subplans);
+ newidx = 1;
+ i = -1;
+ while ((i = bms_next_member(initially_valid_subplans, i)) >= 0)
+ {
+ Assert(i < n_total_subplans);
+ new_subplan_indexes[i] = newidx++;
+ }
+
+ /*
+ * Now we can update each PartitionedRelPruneInfo's subplan_map with
+ * new subplan indexes. We must also recompute its present_parts
+ * bitmap.
+ */
+ for (i = 0; i < prunestate->num_partprunedata; i++)
+ {
+ PartitionPruningData *prunedata = prunestate->partprunedata[i];
+ int j;
+
+ /*
+ * Within each hierarchy, we perform this loop in back-to-front
+ * order so that we determine present_parts for the lowest-level
+ * partitioned tables first. This way we can tell whether a
+ * sub-partitioned table's partitions were entirely pruned so we
+ * can exclude it from the current level's present_parts.
+ */
+ for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
+ {
+ PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
+ int nparts = pprune->nparts;
+ int k;
+
+ /* We just rebuild present_parts from scratch */
+ bms_free(pprune->present_parts);
+ pprune->present_parts = NULL;
+
+ for (k = 0; k < nparts; k++)
+ {
+ int oldidx = pprune->subplan_map[k];
+ int subidx;
+
+ /*
+ * If this partition existed as a subplan then change the
+ * old subplan index to the new subplan index. The new
+ * index may become -1 if the partition was pruned above,
+ * or it may just come earlier in the subplan list due to
+ * some subplans being removed earlier in the list. If
+ * it's a subpartition, add it to present_parts unless
+ * it's entirely pruned.
+ */
+ if (oldidx >= 0)
+ {
+ Assert(oldidx < n_total_subplans);
+ pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
+
+ if (new_subplan_indexes[oldidx] > 0)
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
+ }
+ else if ((subidx = pprune->subpart_map[k]) >= 0)
+ {
+ PartitionedRelPruningData *subprune;
+
+ subprune = &prunedata->partrelprunedata[subidx];
+
+ if (!bms_is_empty(subprune->present_parts))
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
+ }
+ }
+ }
+ }
+
+ /*
+ * We must also recompute the other_subplans set, since indexes in it
+ * may change.
+ */
+ new_other_subplans = NULL;
+ i = -1;
+ while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
+ new_other_subplans = bms_add_member(new_other_subplans,
+ new_subplan_indexes[i] - 1);
+
+ bms_free(prunestate->other_subplans);
+ prunestate->other_subplans = new_other_subplans;
+
+ pfree(new_subplan_indexes);
+}
+
/*
* ExecFindInitialMatchingSubPlans
* Identify the set of subplans that cannot be eliminated by initial
@@ -1817,10 +2124,14 @@ ExecInitPruningContext(PartitionPruneContext *context,
* Must only be called once per 'prunestate', and only if initial pruning
* is required.
*
- * 'nsubplans' must be passed as the total number of unpruned subplans.
+ * The RT indexes of unpruned parents are returned in *parentrelids if asked
+ * for by the caller, in which case 'pruneinfo' must also be passed because
+ * that is where the RT indexes are to be found.
*/
Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
+ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **parentrelids)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -1830,11 +2141,14 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
Assert(prunestate->do_initial_prune);
/*
- * Switch to a temp context to avoid leaking memory in the executor's
- * query-lifespan memory context.
+ * Switch to a temp context to avoid leaking memory in the longer-term
+ * memory context.
*/
oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
+ if (parentrelids)
+ *parentrelids = NULL;
+
/*
* For each hierarchy, do the pruning tests, and add nondeletable
* subplans' indexes to "result".
@@ -1845,14 +2159,42 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
PartitionedRelPruningData *pprune;
prunedata = prunestate->partprunedata[i];
+
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
pprune = &prunedata->partrelprunedata[0];
/* Perform pruning without using PARAM_EXEC Params */
find_matching_subplans_recurse(prunedata, pprune, true, &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /*
+ * Collect the RT indexes of surviving parents if the callers asked
+ * to see them.
+ */
+ if (parentrelids)
+ {
+ int j;
+ List *partrelpruneinfos = list_nth_node(List,
+ pruneinfo->prune_infos,
+ i);
+
+ for (j = 0; j < prunedata->num_partrelprunedata; j++)
+ {
+ PartitionedRelPruneInfo *pinfo = list_nth_node(PartitionedRelPruneInfo,
+ partrelpruneinfos, j);
+
+ pprune = &prunedata->partrelprunedata[j];
+ if (!bms_is_empty(pprune->present_parts))
+ *parentrelids = bms_add_member(*parentrelids, pinfo->rtindex);
+ }
+ }
+
+ /* Expression eval may have used space in ExprContext too */
if (pprune->initial_pruning_steps)
- ResetExprContext(pprune->initial_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->initial_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
@@ -1862,120 +2204,11 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (parentrelids)
+ *parentrelids = bms_copy(*parentrelids);
MemoryContextReset(prunestate->prune_context);
- /*
- * If exec-time pruning is required and we pruned subplans above, then we
- * must re-sequence the subplan indexes so that ExecFindMatchingSubPlans
- * properly returns the indexes from the subplans which will remain after
- * execution of this function.
- *
- * We can safely skip this when !do_exec_prune, even though that leaves
- * invalid data in prunestate, because that data won't be consulted again
- * (cf initial Assert in ExecFindMatchingSubPlans).
- */
- if (prunestate->do_exec_prune && bms_num_members(result) < nsubplans)
- {
- int *new_subplan_indexes;
- Bitmapset *new_other_subplans;
- int i;
- int newidx;
-
- /*
- * First we must build a temporary array which maps old subplan
- * indexes to new ones. For convenience of initialization, we use
- * 1-based indexes in this array and leave pruned items as 0.
- */
- new_subplan_indexes = (int *) palloc0(sizeof(int) * nsubplans);
- newidx = 1;
- i = -1;
- while ((i = bms_next_member(result, i)) >= 0)
- {
- Assert(i < nsubplans);
- new_subplan_indexes[i] = newidx++;
- }
-
- /*
- * Now we can update each PartitionedRelPruneInfo's subplan_map with
- * new subplan indexes. We must also recompute its present_parts
- * bitmap.
- */
- for (i = 0; i < prunestate->num_partprunedata; i++)
- {
- PartitionPruningData *prunedata = prunestate->partprunedata[i];
- int j;
-
- /*
- * Within each hierarchy, we perform this loop in back-to-front
- * order so that we determine present_parts for the lowest-level
- * partitioned tables first. This way we can tell whether a
- * sub-partitioned table's partitions were entirely pruned so we
- * can exclude it from the current level's present_parts.
- */
- for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
- {
- PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
- int nparts = pprune->nparts;
- int k;
-
- /* We just rebuild present_parts from scratch */
- bms_free(pprune->present_parts);
- pprune->present_parts = NULL;
-
- for (k = 0; k < nparts; k++)
- {
- int oldidx = pprune->subplan_map[k];
- int subidx;
-
- /*
- * If this partition existed as a subplan then change the
- * old subplan index to the new subplan index. The new
- * index may become -1 if the partition was pruned above,
- * or it may just come earlier in the subplan list due to
- * some subplans being removed earlier in the list. If
- * it's a subpartition, add it to present_parts unless
- * it's entirely pruned.
- */
- if (oldidx >= 0)
- {
- Assert(oldidx < nsubplans);
- pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
-
- if (new_subplan_indexes[oldidx] > 0)
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
- }
- else if ((subidx = pprune->subpart_map[k]) >= 0)
- {
- PartitionedRelPruningData *subprune;
-
- subprune = &prunedata->partrelprunedata[subidx];
-
- if (!bms_is_empty(subprune->present_parts))
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
- }
- }
- }
- }
-
- /*
- * We must also recompute the other_subplans set, since indexes in it
- * may change.
- */
- new_other_subplans = NULL;
- i = -1;
- while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
- new_other_subplans = bms_add_member(new_other_subplans,
- new_subplan_indexes[i] - 1);
-
- bms_free(prunestate->other_subplans);
- prunestate->other_subplans = new_other_subplans;
-
- pfree(new_subplan_indexes);
- }
-
return result;
}
@@ -2018,11 +2251,16 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
prunedata = prunestate->partprunedata[i];
pprune = &prunedata->partrelprunedata[0];
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
find_matching_subplans_recurse(prunedata, pprune, false, &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
- ResetExprContext(pprune->exec_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->exec_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index b5667e53e5..d5e10756ac 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -123,6 +123,209 @@ static TupleTableSlot *ExecProcNodeFirst(PlanState *node);
static TupleTableSlot *ExecProcNodeInstr(PlanState *node);
+/* ------------------------------------------------------------------------
+ * ExecPrepNode
+ * Recursively "prep" all the nodes in the plan tree rooted
+ * at 'node'.
+ *
+ * 'node' is the current node of the plan produced by the query planner
+ * 'context' is the information that may be necessary to do the prep
+ * work, (such as any EXTERN parameters in the query to do partition
+ * pruning with)
+ * 'result' is the output variable to add the result into
+ *
+ * NOTE: ExecPrepNode subroutine for a given node must add the RT indexes of
+ * any relations that it manipulates to result->relationRTIs. Optionally, it
+ * can produce a PlanPrepOutput node containing the information that may be of
+ * interest to later execution steps or to any intervening modules that have
+ * access to the ExecPrepOutput and put that in
+ * result->planPrepResults[plan->plan_node_id]. For example, nodes that
+ * supports partition pruning can perform the "initial" pruning steps to
+ * produce the set of "initially valid" subnodes that can be used as-is by the
+ * node's ExecInit* routine to only initialize those subnodes.
+ * ------------------------------------------------------------------------
+ */
+void
+ExecPrepNode(Plan *node, ExecPrepContext *context, ExecPrepOutput *result)
+{
+ ListCell *l;
+
+ /* Do nothing when we get to the end of a leaf on tree. */
+ if (node == NULL)
+ return;
+
+ /* Make sure there's enough stack available. */
+ check_stack_depth();
+
+ /*
+ * Write NULL for the node's PlanPruneOutput which the node's Prep routine
+ * might write over.
+ */
+ ExecPrepStorePlanPrepOutput(result, NULL, node);
+
+ switch (nodeTag(node))
+ {
+ /*
+ * control nodes
+ */
+ case T_Result:
+ ExecPrepResult((Result *) node, context, result);
+ break;
+ case T_ProjectSet:
+ ExecPrepProjectSet((ProjectSet *) node, context, result);
+ break;
+ case T_RecursiveUnion:
+ ExecPrepRecursiveUnion((RecursiveUnion *) node, context, result);
+ break;
+ case T_BitmapAnd:
+ ExecPrepBitmapAnd((BitmapAnd *) node, context, result);
+ break;
+ case T_BitmapOr:
+ ExecPrepBitmapOr((BitmapOr *) node, context, result);
+ break;
+ case T_ModifyTable:
+ ExecPrepModifyTable((ModifyTable *) node, context, result);
+ break;
+ case T_Append:
+ ExecPrepAppend((Append *) node, context, result);
+ break;
+ case T_MergeAppend:
+ ExecPrepMergeAppend((MergeAppend *) node, context, result);
+ break;
+
+ /*
+ * scan nodes
+ */
+ case T_SeqScan:
+ ExecPrepSeqScan((SeqScan *) node, context, result);
+ break;
+ case T_SampleScan:
+ ExecPrepSampleScan((SampleScan *) node, context, result);
+ break;
+ case T_IndexScan:
+ ExecPrepIndexScan((IndexScan *) node, context, result);
+ break;
+ case T_IndexOnlyScan:
+ ExecPrepIndexOnlyScan((IndexOnlyScan *) node, context, result);
+ break;
+ case T_BitmapIndexScan:
+ ExecPrepBitmapIndexScan((BitmapIndexScan *) node, context, result);
+ break;
+ case T_BitmapHeapScan:
+ ExecPrepBitmapHeapScan((BitmapHeapScan *) node, context, result);
+ break;
+ case T_TidScan:
+ ExecPrepTidScan((TidScan *) node, context, result);
+ break;
+ case T_TidRangeScan:
+ ExecPrepTidRangeScan((TidRangeScan *) node, context, result);
+ break;
+ case T_SubqueryScan:
+ ExecPrepSubqueryScan((SubqueryScan *) node, context, result);
+ break;
+ case T_FunctionScan:
+ ExecPrepFunctionScan((FunctionScan *) node, context, result);
+ break;
+ case T_TableFuncScan:
+ ExecPrepTableFuncScan((TableFuncScan *) node, context, result);
+ break;
+ case T_ValuesScan:
+ ExecPrepValuesScan((ValuesScan *) node, context, result);
+ break;
+ case T_CteScan:
+ ExecPrepCteScan((CteScan *) node, context, result);
+ break;
+ case T_NamedTuplestoreScan:
+ ExecPrepNamedTuplestoreScan((NamedTuplestoreScan *) node, context, result);
+ break;
+ case T_WorkTableScan:
+ ExecPrepWorkTableScan((WorkTableScan *) node, context, result);
+ break;
+ case T_ForeignScan:
+ ExecPrepForeignScan((ForeignScan *) node, context, result);
+ break;
+ case T_CustomScan:
+ ExecPrepCustomScan((CustomScan *) node, context, result);
+ break;
+
+ /*
+ * join nodes: subnodes handled below
+ */
+ case T_NestLoop:
+ ExecPrepNestLoop((NestLoop *) node, context, result);
+ break;
+ case T_MergeJoin:
+ ExecPrepMergeJoin((MergeJoin *) node, context, result);
+ break;
+ case T_HashJoin:
+ ExecPrepHashJoin((HashJoin *) node, context, result);
+ break;
+
+ /*
+ * materialization nodes: subnodes handled below
+ */
+ case T_Material:
+ ExecPrepMaterial((Material *) node, context, result);
+ break;
+ case T_Sort:
+ ExecPrepSort((Sort *) node, context, result);
+ break;
+ case T_IncrementalSort:
+ ExecPrepIncrementalSort((IncrementalSort *) node, context, result);
+ break;
+ case T_Memoize:
+ ExecPrepMemoize((Memoize *) node, context, result);
+ break;
+ case T_Group:
+ ExecPrepGroup((Group *) node, context, result);
+ break;
+ case T_Agg:
+ ExecPrepAgg((Agg *) node, context, result);
+ break;
+ case T_WindowAgg:
+ ExecPrepWindowAgg((WindowAgg *) node, context, result);
+ break;
+ case T_Unique:
+ ExecPrepUnique((Unique *) node, context, result);
+ break;
+ case T_Gather:
+ ExecPrepGather((Gather *) node, context, result);
+ break;
+ case T_GatherMerge:
+ ExecPrepGatherMerge((GatherMerge *) node, context, result);
+ break;
+ case T_Hash:
+ ExecPrepHash((Hash *) node, context, result);
+ break;
+ case T_SetOp:
+ ExecPrepSetOp((SetOp *) node, context, result);
+ break;
+ case T_LockRows:
+ ExecPrepLockRows((LockRows *) node, context, result);
+ break;
+ case T_Limit:
+ ExecPrepLimit((Limit *) node, context, result);
+ break;
+
+ default:
+ elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));
+ result = NULL; /* keep compiler quiet */
+ break;
+ }
+
+ /*
+ * Prep any initPlans present in this node. The planner put them in
+ * a separate list for us.
+ */
+ foreach(l, node->initPlan)
+ {
+ SubPlan *subplan = (SubPlan *) lfirst(l);
+
+ Assert(IsA(subplan, SubPlan));
+ ExecPrepSubPlan(subplan, context, result);
+ }
+}
+
/* ------------------------------------------------------------------------
* ExecInitNode
*
@@ -157,6 +360,9 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
*/
check_stack_depth();
+ /* Check that the PlanPrepOutput for the node looks sane if any. */
+ EXEC_PREP_OUTPUT_SANITY(node, estate);
+
switch (nodeTag(node))
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..5c85148b37 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,7 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_execprep = NULL;
estate->es_junkFilter = NULL;
@@ -785,6 +786,13 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rti > 0 && rti <= estate->es_range_table_size);
+ /*
+ * A cross-check that AcquireExecutorLocks() hasn't missed any relations
+ * it must not have.
+ */
+ Assert(estate->es_execprep == NULL ||
+ bms_is_member(rti, estate->es_execprep->relationRTIs));
+
rel = estate->es_relations[rti - 1];
if (rel == NULL)
{
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 29a68879ee..5f0ff2df2a 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 08cf569d8f..f3b0ec75d3 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3142,6 +3142,19 @@ hashagg_reset_spill_state(AggState *aggstate)
}
}
+/* ----------------------------------------------------------------
+ * ExecPrepAgg
+ *
+ * This "preps" the Agg node and the node's subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepAgg(Agg *node, ExecPrepContext *context, ExecPrepOutput *result)
+{
+ /* Nothing to do beside recursing to the subplan. */
+ ExecPrepNode(outerPlan(node), context, result);
+}
+
/* -----------------
* ExecInitAgg
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 7937f1c88f..a44c8079bd 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -62,6 +62,7 @@
#include "executor/execPartition.h"
#include "executor/nodeAppend.h"
#include "miscadmin.h"
+#include "partitioning/partdesc.h"
#include "pgstat.h"
#include "storage/latch.h"
@@ -94,6 +95,62 @@ static bool ExecAppendAsyncRequest(AppendState *node, TupleTableSlot **result);
static void ExecAppendAsyncEventWait(AppendState *node);
static void classify_matching_subplans(AppendState *node);
+/* ----------------------------------------------------------------
+ * ExecPrepAppend
+ *
+ * Prep an append node
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepAppend(Append *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ PartitionPruneInfo *pruneinfo = node->part_prune_info;
+
+ if (pruneinfo && pruneinfo->contains_init_steps)
+ {
+ List *rtable = context->stmt->rtable;
+ List *subplans = node->appendplans;
+ ParamListInfo params = context->params;
+ Bitmapset *parentrelids;
+ int i;
+ PlanPrepOutput *planPrepResult = makeNode(PlanPrepOutput);
+
+ planPrepResult->plan_node_id = node->plan.plan_node_id;
+ planPrepResult->initially_valid_subnodes =
+ ExecPrepDoInitialPruning(pruneinfo, rtable, params, &parentrelids);
+ /* Replace the NULL that ExecPrepNode() would've written. */
+ ExecPrepStorePlanPrepOutput(result, planPrepResult, &node->plan);
+
+ /* All relevant parents must be reported too. */
+ Assert(bms_num_members(parentrelids) > 0);
+ result->relationRTIs = bms_add_members(result->relationRTIs,
+ parentrelids);
+
+ /* And all leaf partitions that will be scanned. */
+ i = -1;
+ while ((i = bms_next_member(planPrepResult->initially_valid_subnodes, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ ExecPrepNode(subplan, context, result);
+ }
+ }
+ else
+ {
+ List *subplans = node->appendplans;
+ ListCell *lc;
+
+ /* Recurse to prep *all* of the node's child subplans. */
+ foreach(lc, subplans)
+ {
+ Plan *subplan = (Plan *) lfirst(lc);
+
+ ExecPrepNode(subplan, context, result);
+ }
+ }
+}
+
/* ----------------------------------------------------------------
* ExecInitAppend
*
@@ -136,39 +193,19 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
- PartitionPruneState *prunestate;
-
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &appendstate->ps);
-
- /* Create the working data structure for pruning. */
- prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
- node->part_prune_info);
- appendstate->as_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
- {
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->appendplans));
-
- nplans = bms_num_members(validsubplans);
- }
- else
- {
- /* We'll need to initialize all subplans */
- nplans = list_length(node->appendplans);
- Assert(nplans > 0);
- validsubplans = bms_add_range(NULL, 0, nplans - 1);
- }
+ validsubplans = ExecInitPartitionPruning(&appendstate->ps,
+ list_length(node->appendplans),
+ node->part_prune_info,
+ &appendstate->as_prune_state);
+ nplans = bms_num_members(validsubplans);
/*
* When no run-time pruning is required and there's at least one
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index b54c79f853..4ad3e5ff81 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -45,6 +45,24 @@ ExecBitmapAnd(PlanState *pstate)
return NULL;
}
+/* ----------------------------------------------------------------
+ * ExecPrepBitmapAnd
+ *
+ * This "preps" the BitmapAnd node and the subplans.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepBitmapAnd(BitmapAnd *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ ListCell *lc;
+
+ foreach(lc, node->bitmapplans)
+ {
+ ExecPrepNode((Plan *) lfirst(lc), context, result);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecInitBitmapAnd
*
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index f6fe07ad70..aaf215a4cc 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -696,6 +696,20 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
table_endscan(scanDesc);
}
+/* ----------------------------------------------------------------
+ * ExecPrepBitmapHeapScan
+ *
+ * This "preps" the BitmapHeapScan node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepBitmapHeapScan(BitmapHeapScan *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ result->relationRTIs = bms_add_member(result->relationRTIs,
+ node->scan.scanrelid);
+}
+
/* ----------------------------------------------------------------
* ExecInitBitmapHeapScan
*
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 551e47630d..bb766f71a2 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -201,6 +201,20 @@ ExecEndBitmapIndexScan(BitmapIndexScanState *node)
index_close(indexRelationDesc, NoLock);
}
+/* ----------------------------------------------------------------
+ * ExecPrepBitmapIndexScan
+ *
+ * This "preps" the BitmapIndexScan node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepBitmapIndexScan(BitmapIndexScan *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ result->relationRTIs = bms_add_member(result->relationRTIs,
+ node->scan.scanrelid);
+}
+
/* ----------------------------------------------------------------
* ExecInitBitmapIndexScan
*
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 2d57f11fe7..feb3e4a8d6 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -46,6 +46,24 @@ ExecBitmapOr(PlanState *pstate)
return NULL;
}
+/* ----------------------------------------------------------------
+ * ExecPrepBitmapOr
+ *
+ * This "preps" the BitmapOr node and the subplans.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepBitmapOr(BitmapOr *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ ListCell *lc;
+
+ foreach(lc, node->bitmapplans)
+ {
+ ExecPrepNode((Plan *) lfirst(lc), context, result);
+ }
+}
+
/* ----------------------------------------------------------------
* ExecInitBitmapOr
*
diff --git a/src/backend/executor/nodeCtescan.c b/src/backend/executor/nodeCtescan.c
index b9d7dec8a2..533cfb7874 100644
--- a/src/backend/executor/nodeCtescan.c
+++ b/src/backend/executor/nodeCtescan.c
@@ -166,6 +166,18 @@ ExecCteScan(PlanState *pstate)
(ExecScanRecheckMtd) CteScanRecheck);
}
+/* ----------------------------------------------------------------
+ * ExecPrepCteScan
+ *
+ * This "preps" the CteScan node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepCteScan(CteScan *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ /* nothing to do */
+}
/* ----------------------------------------------------------------
* ExecInitCteScan
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index 8f56bd8a23..0bf1636326 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -24,6 +24,24 @@
static TupleTableSlot *ExecCustomScan(PlanState *pstate);
+/* ----------------------------------------------------------------
+ * ExecPrepCustomScan
+ *
+ * This "preps" the CustomScan node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepCustomScan(CustomScan *node, ExecPrepContext *context, ExecPrepOutput *result)
+{
+ ListCell *lc;
+
+ result->relationRTIs = bms_add_members(result->relationRTIs,
+ node->custom_relids);
+ foreach(lc, node->custom_plans)
+ {
+ ExecPrepNode((Plan *) lfirst(lc), context, result);
+ }
+}
CustomScanState *
ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 5b9737c2ab..ffe17ec6d5 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -134,6 +134,18 @@ ExecForeignScan(PlanState *pstate)
(ExecScanRecheckMtd) ForeignRecheck);
}
+/* ----------------------------------------------------------------
+ * ExecPrepForeignScan
+ *
+ * This "preps" the ForeignScan node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepForeignScan(ForeignScan *node, ExecPrepContext *context, ExecPrepOutput *result)
+{
+ result->relationRTIs = bms_add_members(result->relationRTIs,
+ node->fs_relids);
+}
/* ----------------------------------------------------------------
* ExecInitForeignScan
diff --git a/src/backend/executor/nodeFunctionscan.c b/src/backend/executor/nodeFunctionscan.c
index 434379a5aa..df055ce01f 100644
--- a/src/backend/executor/nodeFunctionscan.c
+++ b/src/backend/executor/nodeFunctionscan.c
@@ -272,6 +272,19 @@ ExecFunctionScan(PlanState *pstate)
(ExecScanRecheckMtd) FunctionRecheck);
}
+/* ----------------------------------------------------------------
+ * ExecPrepFunctionScan
+ *
+ * This "preps" the FunctionScan node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepFunctionScan(FunctionScan *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ /* nothing to do*/
+}
+
/* ----------------------------------------------------------------
* ExecInitFunctionScan
* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 4f8a17df7d..0edb0ae13a 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -49,6 +49,19 @@ static TupleTableSlot *gather_getnext(GatherState *gatherstate);
static MinimalTuple gather_readnext(GatherState *gatherstate);
static void ExecShutdownGatherWorkers(GatherState *node);
+/* ----------------------------------------------------------------
+ * ExecPrepGather
+ *
+ * This "preps" the Gather node and the node's subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepGather(Gather *node, ExecPrepContext *context, ExecPrepOutput *result)
+{
+ /* Nothing to do beside recursing to the subplan. */
+ ExecPrepNode(outerPlan(node), context, result);
+}
+
/* ----------------------------------------------------------------
* ExecInitGather
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index a488cc6d8b..c564d4ac25 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -64,6 +64,19 @@ static bool gather_merge_readnext(GatherMergeState *gm_state, int reader,
bool nowait);
static void load_tuple_array(GatherMergeState *gm_state, int reader);
+/* ----------------------------------------------------------------
+ * ExecPrepGatherMerge
+ *
+ * This "preps" the GatherMerge node and the node's subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepGatherMerge(GatherMerge *node, ExecPrepContext *context, ExecPrepOutput *result)
+{
+ /* Nothing to do beside recursing to the subplan. */
+ ExecPrepNode(outerPlan(node), context, result);
+}
+
/* ----------------------------------------------------------------
* ExecInitGather
* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 666d02b58f..0e5bcf89bf 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -151,6 +151,19 @@ ExecGroup(PlanState *pstate)
}
}
+/* ----------------------------------------------------------------
+ * ExecPrepGroup
+ *
+ * This "preps" the Group node and the node's subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepGroup(Group *node, ExecPrepContext *context, ExecPrepOutput *result)
+{
+ /* Nothing to do beside recursing to the subplan. */
+ ExecPrepNode(outerPlan(node), context, result);
+}
+
/* -----------------
* ExecInitGroup
*
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 4d68a8b97b..d20e14c7fc 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -344,6 +344,19 @@ MultiExecParallelHash(HashState *node)
BarrierPhase(build_barrier) == PHJ_BUILD_DONE);
}
+/* ----------------------------------------------------------------
+ * ExecPrepHash
+ *
+ * This "preps" the hash node and the node's subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepHash(Hash *node, ExecPrepContext *context, ExecPrepOutput *result)
+{
+ /* Nothing to do beside recursing to the subplan. */
+ ExecPrepNode(outerPlan(node), context, result);
+}
+
/* ----------------------------------------------------------------
* ExecInitHash
*
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 88b870655e..5665c31873 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -607,6 +607,20 @@ ExecParallelHashJoin(PlanState *pstate)
return ExecHashJoinImpl(pstate, true);
}
+/* ----------------------------------------------------------------
+ * ExecPrepHashJoin
+ *
+ * This "preps" the HashJoin node and the node's children.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepHashJoin(HashJoin *node, ExecPrepContext *context, ExecPrepOutput *result)
+{
+ /* Nothing to do beside recursing to the children. */
+ ExecPrepNode(outerPlan(node), context, result);
+ ExecPrepNode(innerPlan(node), context, result);
+}
+
/* ----------------------------------------------------------------
* ExecInitHashJoin
*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index d6fb56dec7..c1c8fe2af6 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -964,6 +964,20 @@ ExecIncrementalSort(PlanState *pstate)
return slot;
}
+/* ----------------------------------------------------------------
+ * ExecPrepIncrementalSort
+ *
+ * This "preps" the IncrementalSort node and the node's subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepIncrementalSort(IncrementalSort *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ /* Nothing to do beside recursing to the subplan. */
+ ExecPrepNode(outerPlan(node), context, result);
+}
+
/* ----------------------------------------------------------------
* ExecInitIncrementalSort
*
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index eb3ddd2943..ccc60c38f5 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -476,6 +476,20 @@ ExecIndexOnlyRestrPos(IndexOnlyScanState *node)
index_restrpos(node->ioss_ScanDesc);
}
+/* ----------------------------------------------------------------
+ * ExecPrepIndexOnlyScan
+ *
+ * This "preps" the IndexOnlyScan node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepIndexOnlyScan(IndexOnlyScan *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ result->relationRTIs = bms_add_member(result->relationRTIs,
+ node->scan.scanrelid);
+}
+
/* ----------------------------------------------------------------
* ExecInitIndexOnlyScan
*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a91f135be7..5080abdd9d 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -885,6 +885,20 @@ ExecIndexRestrPos(IndexScanState *node)
index_restrpos(node->iss_ScanDesc);
}
+/* ----------------------------------------------------------------
+ * ExecPrepIndexScan
+ *
+ * This "preps" the IndexScan node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepIndexScan(IndexScan *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ result->relationRTIs = bms_add_member(result->relationRTIs,
+ node->scan.scanrelid);
+}
+
/* ----------------------------------------------------------------
* ExecInitIndexScan
*
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 1b91b123fa..00aa5dd577 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -437,6 +437,19 @@ compute_tuples_needed(LimitState *node)
return node->count + node->offset;
}
+/* ----------------------------------------------------------------
+ * ExecPrepLimit
+ *
+ * This "preps" the limit node and the node's subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepLimit(Limit *node, ExecPrepContext *context, ExecPrepOutput *result)
+{
+ /* Nothing to do beside recursing to the subplan. */
+ ExecPrepNode(outerPlan(node), context, result);
+}
+
/* ----------------------------------------------------------------
* ExecInitLimit
*
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 1a9dab25dd..9a3d2c5583 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -281,6 +281,19 @@ lnext:
return slot;
}
+/* ----------------------------------------------------------------
+ * ExecPrepLockRows
+ *
+ * This "preps" the LockRows node and the node's subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepLockRows(LockRows *node, ExecPrepContext *context, ExecPrepOutput *result)
+{
+ /* Nothing to do beside recursing to the subplan. */
+ ExecPrepNode(outerPlan(node), context, result);
+}
+
/* ----------------------------------------------------------------
* ExecInitLockRows
*
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 2cb27e0e9a..802bf37ff1 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -156,6 +156,19 @@ ExecMaterial(PlanState *pstate)
return ExecClearTuple(slot);
}
+/* ----------------------------------------------------------------
+ * ExecPrepMaterial
+ *
+ * This "preps" the Material node and the node's subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepMaterial(Material *node, ExecPrepContext *context, ExecPrepOutput *result)
+{
+ /* Nothing to do beside recursing to the subplan. */
+ ExecPrepNode(outerPlan(node), context, result);
+}
+
/* ----------------------------------------------------------------
* ExecInitMaterial
* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 55cdd5c4d9..eacfd5f3cb 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -902,6 +902,19 @@ ExecMemoize(PlanState *pstate)
} /* switch */
}
+/* ----------------------------------------------------------------
+ * ExecPrepMemoize
+ *
+ * This "preps" the Memoize node and the node's subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepMemoize(Memoize *node, ExecPrepContext *context, ExecPrepOutput *result)
+{
+ /* Nothing to do beside recursing to the subplan. */
+ ExecPrepNode(outerPlan(node), context, result);
+}
+
MemoizeState *
ExecInitMemoize(Memoize *node, EState *estate, int eflags)
{
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 418f89dea8..50f6429533 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -43,6 +43,7 @@
#include "executor/nodeMergeAppend.h"
#include "lib/binaryheap.h"
#include "miscadmin.h"
+#include "partitioning/partdesc.h"
/*
* We have one slot for each item in the heap array. We use SlotNumber
@@ -54,6 +55,62 @@ typedef int32 SlotNumber;
static TupleTableSlot *ExecMergeAppend(PlanState *pstate);
static int heap_compare_slots(Datum a, Datum b, void *arg);
+/* ----------------------------------------------------------------
+ * ExecPrepMergeAppend
+ *
+ * Prep an MergeAppend node
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepMergeAppend(MergeAppend *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ PartitionPruneInfo *pruneinfo = node->part_prune_info;
+
+ if (pruneinfo && pruneinfo->contains_init_steps)
+ {
+ List *rtable = context->stmt->rtable;
+ List *subplans = node->mergeplans;
+ ParamListInfo params = context->params;
+ Bitmapset *parentrelids;
+ int i;
+ PlanPrepOutput *planPrepResult = makeNode(PlanPrepOutput);
+
+ planPrepResult->plan_node_id = node->plan.plan_node_id;
+ planPrepResult->initially_valid_subnodes =
+ ExecPrepDoInitialPruning(pruneinfo, rtable, params, &parentrelids);
+ /* Replace the NULL that ExecPrepNode() would've written. */
+ ExecPrepStorePlanPrepOutput(result, planPrepResult, &node->plan);
+
+ /* All relevant parents must be reported too. */
+ Assert(bms_num_members(parentrelids) > 0);
+ result->relationRTIs = bms_add_members(result->relationRTIs,
+ parentrelids);
+
+ /* And all leaf partitions that will be scanned. */
+ i = -1;
+ while ((i = bms_next_member(planPrepResult->initially_valid_subnodes, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ ExecPrepNode(subplan, context, result);
+ }
+ }
+ else
+ {
+ List *subplans = node->mergeplans;
+ ListCell *lc;
+
+ /* Recurse to prep *all* of the node's child subplans. */
+ foreach(lc, subplans)
+ {
+ Plan *subplan = (Plan *) lfirst(lc);
+
+ ExecPrepNode(subplan, context, result);
+ }
+ }
+}
+
/* ----------------------------------------------------------------
* ExecInitMergeAppend
@@ -84,38 +141,19 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
- PartitionPruneState *prunestate;
-
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &mergestate->ps);
-
- prunestate = ExecCreatePartitionPruneState(&mergestate->ps,
- node->part_prune_info);
- mergestate->ms_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
- {
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->mergeplans));
-
- nplans = bms_num_members(validsubplans);
- }
- else
- {
- /* We'll need to initialize all subplans */
- nplans = list_length(node->mergeplans);
- Assert(nplans > 0);
- validsubplans = bms_add_range(NULL, 0, nplans - 1);
- }
+ validsubplans = ExecInitPartitionPruning(&mergestate->ps,
+ list_length(node->mergeplans),
+ node->part_prune_info,
+ &mergestate->ms_prune_state);
+ nplans = bms_num_members(validsubplans);
/*
* When no run-time pruning is required and there's at least one
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index a049bc4ae0..12b1790c8a 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1428,6 +1428,20 @@ ExecMergeJoin(PlanState *pstate)
}
}
+/* ----------------------------------------------------------------
+ * ExecPrepMergeJoin
+ *
+ * This "preps" the MergeJoin node and the node's children.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepMergeJoin(MergeJoin *node, ExecPrepContext *context, ExecPrepOutput *result)
+{
+ /* Nothing to do beside recursing to the children. */
+ ExecPrepNode(outerPlan(node), context, result);
+ ExecPrepNode(innerPlan(node), context, result);
+}
+
/* ----------------------------------------------------------------
* ExecInitMergeJoin
* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 5ec699a9bd..93a6ac062f 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2700,6 +2700,32 @@ ExecLookupResultRelByOid(ModifyTableState *node, Oid resultoid,
return NULL;
}
+/* ----------------------------------------------------------------
+ * ExecPrepModifyTable
+ *
+ * This "preps" the ModifyTable node and the subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepModifyTable(ModifyTable *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ ListCell *lc;
+
+ if (node->rootRelation > 0)
+ result->relationRTIs = bms_add_member(result->relationRTIs,
+ node->rootRelation);
+ result->relationRTIs = bms_add_member(result->relationRTIs,
+ node->nominalRelation);
+ foreach(lc, node->resultRelations)
+ {
+ result->relationRTIs = bms_add_member(result->relationRTIs,
+ lfirst_int(lc));
+ }
+
+ ExecPrepNode(outerPlan(node), context, result);
+}
+
/* ----------------------------------------------------------------
* ExecInitModifyTable
* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeNamedtuplestorescan.c b/src/backend/executor/nodeNamedtuplestorescan.c
index ca637b1b0e..5db23af93c 100644
--- a/src/backend/executor/nodeNamedtuplestorescan.c
+++ b/src/backend/executor/nodeNamedtuplestorescan.c
@@ -74,6 +74,19 @@ ExecNamedTuplestoreScan(PlanState *pstate)
(ExecScanRecheckMtd) NamedTuplestoreScanRecheck);
}
+/* ----------------------------------------------------------------
+ * ExecPrepNamedTuplestoreScan
+ *
+ * This "preps" the NamedTuplestoreScan node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepNamedTuplestoreScan(NamedTuplestoreScan *node,
+ ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ /* nothing to do */
+}
/* ----------------------------------------------------------------
* ExecInitNamedTuplestoreScan
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index 06767c3133..ffb3a94f07 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -255,6 +255,20 @@ ExecNestLoop(PlanState *pstate)
}
}
+/* ----------------------------------------------------------------
+ * ExecPrepNestLoop
+ *
+ * This "preps" the NestLoop node and the node's children.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepNestLoop(NestLoop *node, ExecPrepContext *context, ExecPrepOutput *result)
+{
+ /* Nothing to do beside recursing to the children. */
+ ExecPrepNode(outerPlan(node), context, result);
+ ExecPrepNode(innerPlan(node), context, result);
+}
+
/* ----------------------------------------------------------------
* ExecInitNestLoop
* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index ea40d61b0b..1d6085a3b4 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -208,6 +208,19 @@ ExecProjectSRF(ProjectSetState *node, bool continuing)
return NULL;
}
+/* ----------------------------------------------------------------
+ * ExecPrepProjectSet
+ *
+ * This "preps" the ProjectSet node and the subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepProjectSet(ProjectSet *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ ExecPrepNode(outerPlan(node), context, result);
+}
+
/* ----------------------------------------------------------------
* ExecInitProjectSet
*
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index 2d01ed7711..806c653c56 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -159,6 +159,20 @@ ExecRecursiveUnion(PlanState *pstate)
return NULL;
}
+/* ----------------------------------------------------------------
+ * ExecPrepRecursiveUnion
+ *
+ * This "preps" the RecursiveUnion node and the children.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepRecursiveUnion(RecursiveUnion *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ ExecPrepNode(outerPlan(node), context, result);
+ ExecPrepNode(innerPlan(node), context, result);
+}
+
/* ----------------------------------------------------------------
* ExecInitRecursiveUnion
* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index d0413e05de..14883b6764 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -169,6 +169,19 @@ ExecResultRestrPos(ResultState *node)
elog(ERROR, "Result nodes do not support mark/restore");
}
+/* ----------------------------------------------------------------
+ * ExecPrepResult
+ *
+ * This "preps" the Result node and the subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepResult(Result *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ ExecPrepNode(outerPlan(node), context, result);
+}
+
/* ----------------------------------------------------------------
* ExecInitResult
*
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index a03ae120f8..ef4c0775f7 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -89,6 +89,20 @@ ExecSampleScan(PlanState *pstate)
(ExecScanRecheckMtd) SampleRecheck);
}
+/* ----------------------------------------------------------------
+ * ExecPrepSampleScan
+ *
+ * This "preps" the SampleScan node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepSampleScan(SampleScan *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ result->relationRTIs = bms_add_member(result->relationRTIs,
+ node->scan.scanrelid);
+}
+
/* ----------------------------------------------------------------
* ExecInitSampleScan
* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 7b58cd9162..8964c1e9b2 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -114,6 +114,19 @@ ExecSeqScan(PlanState *pstate)
(ExecScanRecheckMtd) SeqRecheck);
}
+/* ----------------------------------------------------------------
+ * ExecPrepSeqScanScan
+ *
+ * This "preps" the SeqScan node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepSeqScan(SeqScan *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ result->relationRTIs = bms_add_member(result->relationRTIs,
+ node->scan.scanrelid);
+}
/* ----------------------------------------------------------------
* ExecInitSeqScan
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 4b428cfa39..312aa8511f 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -470,6 +470,19 @@ setop_retrieve_hash_table(SetOpState *setopstate)
return NULL;
}
+/* ----------------------------------------------------------------
+ * ExecPrepSetOp
+ *
+ * This "preps" the setop node and the node's subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepSetOp(SetOp *node, ExecPrepContext *context, ExecPrepOutput *result)
+{
+ /* Nothing to do beside recursing to the subplan. */
+ ExecPrepNode(outerPlan(node), context, result);
+}
+
/* ----------------------------------------------------------------
* ExecInitSetOp
*
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index 9481a622bf..c31f2634e8 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -203,6 +203,19 @@ ExecSort(PlanState *pstate)
return slot;
}
+/* ----------------------------------------------------------------
+ * ExecPrepSort
+ *
+ * This "preps" the Sort node and the node's subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepSort(Sort *node, ExecPrepContext *context, ExecPrepOutput *result)
+{
+ /* Nothing to do beside recursing to the subplan. */
+ ExecPrepNode(outerPlan(node), context, result);
+}
+
/* ----------------------------------------------------------------
* ExecInitSort
*
diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c
index 60d2290030..b95084ddb2 100644
--- a/src/backend/executor/nodeSubplan.c
+++ b/src/backend/executor/nodeSubplan.c
@@ -775,6 +775,18 @@ slotNoNulls(TupleTableSlot *slot)
return true;
}
+/* ----------------------------------------------------------------
+ * ExecPrepSubPlan
+ *
+ * This "preps" the SubPlan node and the node's subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepSubPlan(SubPlan *node, ExecPrepContext *context, ExecPrepOutput *result)
+{
+ /* nothing to do */
+}
+
/* ----------------------------------------------------------------
* ExecInitSubPlan
*
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 242c9cd4b9..cc0d62ca85 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -89,6 +89,20 @@ ExecSubqueryScan(PlanState *pstate)
(ExecScanRecheckMtd) SubqueryRecheck);
}
+/* ----------------------------------------------------------------
+ * ExecPrepSubqueryScan
+ *
+ * This "preps" the SubqueryScan node and the subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepSubqueryScan(SubqueryScan *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ /* Nothing to do beside recursing to the subplan. */
+ ExecPrepNode((Plan *) node->subplan, context, result);
+}
+
/* ----------------------------------------------------------------
* ExecInitSubqueryScan
* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTableFuncscan.c b/src/backend/executor/nodeTableFuncscan.c
index 0db4ed0c2f..dccecb3916 100644
--- a/src/backend/executor/nodeTableFuncscan.c
+++ b/src/backend/executor/nodeTableFuncscan.c
@@ -83,6 +83,19 @@ TableFuncRecheck(TableFuncScanState *node, TupleTableSlot *slot)
return true;
}
+/* ----------------------------------------------------------------
+ * ExecPrepTableFuncScan
+ *
+ * This "preps" the TableFuncScan node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepTableFuncScan(TableFuncScan *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ /* nothing to do*/
+}
+
/* ----------------------------------------------------------------
* ExecTableFuncScan(node)
*
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index d5bf1be787..1c05ce8035 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -340,6 +340,20 @@ ExecEndTidRangeScan(TidRangeScanState *node)
ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
+/* ----------------------------------------------------------------
+ * ExecPrepTidRangeScan
+ *
+ * This "preps" the TidRangeScan node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepTidRangeScan(TidRangeScan *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ result->relationRTIs = bms_add_member(result->relationRTIs,
+ node->scan.scanrelid);
+}
+
/* ----------------------------------------------------------------
* ExecInitTidRangeScan
*
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 4116d1f3b5..6031ab52b6 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -408,7 +408,6 @@ TidRecheck(TidScanState *node, TupleTableSlot *slot)
return true;
}
-
/* ----------------------------------------------------------------
* ExecTidScan(node)
*
@@ -483,6 +482,20 @@ ExecEndTidScan(TidScanState *node)
ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
+/* ----------------------------------------------------------------
+ * ExecPrepTidScan
+ *
+ * This "preps" the TidScan node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepTidScan(TidScan *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ result->relationRTIs = bms_add_member(result->relationRTIs,
+ node->scan.scanrelid);
+}
+
/* ----------------------------------------------------------------
* ExecInitTidScan
*
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 6c99d13a39..87c1b53515 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -104,6 +104,19 @@ ExecUnique(PlanState *pstate)
return ExecCopySlot(resultTupleSlot, slot);
}
+/* ----------------------------------------------------------------
+ * ExecPrepUnique
+ *
+ * This "preps" the unique node and the node's subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepUnique(Unique *node, ExecPrepContext *context, ExecPrepOutput *result)
+{
+ /* Nothing to do beside recursing to the subplan. */
+ ExecPrepNode(outerPlan(node), context, result);
+}
+
/* ----------------------------------------------------------------
* ExecInitUnique
*
diff --git a/src/backend/executor/nodeValuesscan.c b/src/backend/executor/nodeValuesscan.c
index dda1c59b23..6cf7fd77d6 100644
--- a/src/backend/executor/nodeValuesscan.c
+++ b/src/backend/executor/nodeValuesscan.c
@@ -203,6 +203,19 @@ ExecValuesScan(PlanState *pstate)
(ExecScanRecheckMtd) ValuesRecheck);
}
+/* ----------------------------------------------------------------
+ * ExecPrepValuesScan
+ *
+ * This "preps" the ValuesScan node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepValuesScan(ValuesScan *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ /* nothing to do */
+}
+
/* ----------------------------------------------------------------
* ExecInitValuesScan
* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 08ce05ca5a..90b7494bee 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2238,6 +2238,19 @@ ExecWindowAgg(PlanState *pstate)
return ExecProject(winstate->ss.ps.ps_ProjInfo);
}
+/* ----------------------------------------------------------------
+ * ExecPrepWindowAgg
+ *
+ * This "preps" the WindowAgg node and the node's subplan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepWindowAgg(WindowAgg *node, ExecPrepContext *context, ExecPrepOutput *result)
+{
+ /* Nothing to do beside recursing to the subplan. */
+ ExecPrepNode(outerPlan(node), context, result);
+}
+
/* -----------------
* ExecInitWindowAgg
*
diff --git a/src/backend/executor/nodeWorktablescan.c b/src/backend/executor/nodeWorktablescan.c
index 15fd71fb32..71a2ac7e40 100644
--- a/src/backend/executor/nodeWorktablescan.c
+++ b/src/backend/executor/nodeWorktablescan.c
@@ -121,6 +121,18 @@ ExecWorkTableScan(PlanState *pstate)
(ExecScanRecheckMtd) WorkTableScanRecheck);
}
+/* ----------------------------------------------------------------
+ * ExecPrepWorkTableScan
+ *
+ * This "preps" the WorkTableScan node.
+ * ----------------------------------------------------------------
+ */
+void
+ExecPrepWorkTableScan(WorkTableScan *node, ExecPrepContext *context,
+ ExecPrepOutput *result)
+{
+ /* nothing to do */
+}
/* ----------------------------------------------------------------
* ExecInitWorkTableScan
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index c93f90de9b..84c1b22ccb 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1485,6 +1485,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *stmt_execprep_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1566,6 +1567,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
/* Replan if needed, and increment plan refcount for portal */
cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ stmt_execprep_list = cplan->stmt_execprep_list;
if (!plan->saved)
{
@@ -1577,6 +1579,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
oldcontext = MemoryContextSwitchTo(portal->portalContext);
stmt_list = copyObject(stmt_list);
+ stmt_execprep_list = copyObject(stmt_execprep_list);
MemoryContextSwitchTo(oldcontext);
ReleaseCachedPlan(cplan, NULL);
cplan = NULL; /* portal shouldn't depend on cplan */
@@ -1590,6 +1593,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ stmt_execprep_list,
cplan);
/*
@@ -2380,7 +2384,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *stmt_execprep_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2459,6 +2465,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ stmt_execprep_list = cplan->stmt_execprep_list;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2496,9 +2503,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, stmt_execprep_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecPrepOutput *execprep = lfirst_node(ExecPrepOutput, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2570,7 +2578,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, execprep,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 6bd95bbce2..89101256cf 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -68,6 +68,18 @@
} \
} while (0)
+/* Copy a field that is an array with numElem of Node objects */
+#define COPY_NODE_ARRAY(fldname, numElem) \
+ do { \
+ int i; \
+ newnode->fldname = numElem > 0 ? \
+ palloc(numElem * sizeof(from->fldname[0])) : NULL; \
+ for (i = 0; i < numElem; i++) \
+ { \
+ newnode->fldname[i] = copyObject(from->fldname[i]); \
+ } \
+ } while (0)
+
/* Copy a parse location field (for Copy, this is same as scalar case) */
#define COPY_LOCATION_FIELD(fldname) \
(newnode->fldname = from->fldname)
@@ -94,9 +106,12 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(transientPlan);
COPY_SCALAR_FIELD(dependsOnRole);
COPY_SCALAR_FIELD(parallelModeNeeded);
+ COPY_SCALAR_FIELD(usesPreExecPruning);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
+ COPY_SCALAR_FIELD(numPlanNodes);
COPY_NODE_FIELD(rtable);
+ COPY_BITMAPSET_FIELD(relationRTIs);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
COPY_NODE_FIELD(subplans);
@@ -1278,6 +1293,8 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(contains_init_steps);
+ COPY_SCALAR_FIELD(contains_exec_steps);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
@@ -4984,6 +5001,28 @@ _copyBitString(const BitString *from)
return newnode;
}
+static ExecPrepOutput *
+_copyExecPrepOutput(const ExecPrepOutput *from)
+{
+ ExecPrepOutput *newnode = makeNode(ExecPrepOutput);
+
+ COPY_BITMAPSET_FIELD(relationRTIs);
+ COPY_SCALAR_FIELD(numPlanNodes);
+ COPY_NODE_ARRAY(planPrepResults, from->numPlanNodes);
+
+ return newnode;
+}
+
+static PlanPrepOutput *
+_copyPlanPrepOutput(const PlanPrepOutput *from)
+{
+ PlanPrepOutput *newnode = makeNode(PlanPrepOutput);
+
+ COPY_SCALAR_FIELD(plan_node_id);
+ COPY_BITMAPSET_FIELD(initially_valid_subnodes);
+
+ return newnode;
+}
static ForeignKeyCacheInfo *
_copyForeignKeyCacheInfo(const ForeignKeyCacheInfo *from)
@@ -5930,6 +5969,16 @@ copyObjectImpl(const void *from)
retval = _copyPublicationTable(from);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_ExecPrepOutput:
+ retval = _copyExecPrepOutput(from);
+ break;
+ case T_PlanPrepOutput:
+ retval = _copyPlanPrepOutput(from);
+ break;
+
/*
* MISCELLANEOUS NODES
*/
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 6bdad462c7..9fe247d505 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -312,9 +312,12 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(transientPlan);
WRITE_BOOL_FIELD(dependsOnRole);
WRITE_BOOL_FIELD(parallelModeNeeded);
+ WRITE_BOOL_FIELD(usesPreExecPruning);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
+ WRITE_INT_FIELD(numPlanNodes);
WRITE_NODE_FIELD(rtable);
+ WRITE_BITMAPSET_FIELD(relationRTIs);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
WRITE_NODE_FIELD(subplans);
@@ -1004,6 +1007,8 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(contains_init_steps);
+ WRITE_BOOL_FIELD(contains_exec_steps);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -2274,6 +2279,7 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_NODE_FIELD(subplans);
WRITE_BITMAPSET_FIELD(rewindPlanIDs);
WRITE_NODE_FIELD(finalrtable);
+ WRITE_BITMAPSET_FIELD(relationRTIs);
WRITE_NODE_FIELD(finalrowmarks);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 3f68f7c18d..7ecb9ad73c 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1585,9 +1585,12 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(transientPlan);
READ_BOOL_FIELD(dependsOnRole);
READ_BOOL_FIELD(parallelModeNeeded);
+ READ_BOOL_FIELD(usesPreExecPruning);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
+ READ_INT_FIELD(numPlanNodes);
READ_NODE_FIELD(rtable);
+ READ_BITMAPSET_FIELD(relationRTIs);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
READ_NODE_FIELD(subplans);
@@ -2534,6 +2537,8 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(contains_init_steps);
+ READ_BOOL_FIELD(contains_exec_steps);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index bd09f85aea..70c5b9d88b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -517,8 +517,11 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->transientPlan = glob->transientPlan;
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
+ result->usesPreExecPruning = glob->usesPreExecPruning;
result->planTree = top_plan;
+ result->numPlanNodes = glob->lastPlanNodeId;
result->rtable = glob->finalrtable;
+ result->relationRTIs = glob->relationRTIs;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index a7b11b7f03..c1b1cf503d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -483,6 +483,7 @@ static void
add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte)
{
RangeTblEntry *newrte;
+ Index rti = list_length(glob->finalrtable) + 1;
/* flat copy to duplicate all the scalar fields */
newrte = (RangeTblEntry *) palloc(sizeof(RangeTblEntry));
@@ -517,7 +518,10 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte)
* but it would probably cost more cycles than it would save.
*/
if (newrte->rtekind == RTE_RELATION)
+ {
+ glob->relationRTIs = bms_add_member(glob->relationRTIs, rti);
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
+ }
}
/*
@@ -1548,6 +1552,9 @@ set_append_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (aplan->part_prune_info->contains_init_steps)
+ root->glob->usesPreExecPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
@@ -1620,6 +1627,9 @@ set_mergeappend_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (mplan->part_prune_info->contains_init_steps)
+ root->glob->usesPreExecPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 1bc00826c1..390d4e4c06 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *contains_init_steps,
+ bool *contains_exec_steps);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -230,6 +232,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool contains_init_steps = false;
+ bool contains_exec_steps = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -309,12 +313,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_contains_init_steps;
+ bool partrel_contains_exec_steps;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_contains_init_steps,
+ &partrel_contains_exec_steps);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -323,6 +331,10 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+ if (!contains_init_steps)
+ contains_init_steps = partrel_contains_init_steps;
+ if (!contains_exec_steps)
+ contains_exec_steps = partrel_contains_exec_steps;
}
pfree(relid_subplan_map);
@@ -337,6 +349,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->contains_init_steps = contains_init_steps;
+ pruneinfo->contains_exec_steps = contains_exec_steps;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -435,13 +449,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *contains_init_steps and *contains_exec_steps are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *contains_init_steps,
+ bool *contains_exec_steps)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -452,6 +471,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *contains_init_steps = false;
+ *contains_exec_steps = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -539,6 +562,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * by noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -613,6 +639,11 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ if (!*contains_init_steps)
+ *contains_init_steps = (initial_pruning_steps != NIL);
+ if (!*contains_exec_steps)
+ *contains_exec_steps = (exec_pruning_steps != NIL);
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -798,6 +829,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
/* These are not valid when being called from the planner */
context.planstate = NULL;
+ context.exprcontext = NULL;
context.exprstates = NULL;
/* Actual pruning happens here. */
@@ -808,8 +840,8 @@ prune_append_rel_partitions(RelOptInfo *rel)
* get_matching_partitions
* Determine partitions that survive partition pruning
*
- * Note: context->planstate must be set to a valid PlanState when the
- * pruning_steps were generated with a target other than PARTTARGET_PLANNER.
+ * Note: context->exprcontext must be valid when the pruning_steps were
+ * generated with a target other than PARTTARGET_PLANNER.
*
* Returns a Bitmapset of the RelOptInfo->part_rels indexes of the surviving
* partitions.
@@ -3654,7 +3686,7 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
* exprstate array.
*
* Note that the evaluated result may be in the per-tuple memory context of
- * context->planstate->ps_ExprContext, and we may have leaked other memory
+ * context->exprcontext, and we may have leaked other memory
* there too. This memory must be recovered by resetting that ExprContext
* after we're done with the pruning operation (see execPartition.c).
*/
@@ -3677,13 +3709,18 @@ partkey_datum_from_expr(PartitionPruneContext *context,
ExprContext *ectx;
/*
- * We should never see a non-Const in a step unless we're running in
- * the executor.
+ * We should never see a non-Const in a step unless the caller has
+ * passed a valid ExprContext.
+ *
+ * When context->planstate is valid, context->exprcontext is same
+ * as context->planstate->ps_ExprContext.
*/
- Assert(context->planstate != NULL);
+ Assert(context->planstate != NULL || context->exprcontext != NULL);
+ Assert(context->planstate == NULL ||
+ (context->exprcontext == context->planstate->ps_ExprContext));
exprstate = context->exprstates[stateidx];
- ectx = context->planstate->ps_ExprContext;
+ ectx = context->exprcontext;
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index fda2e9360e..5d8f3fc3cb 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -910,15 +910,17 @@ pg_plan_query(Query *querytree, const char *query_string, int cursorOptions,
* For normal optimizable statements, invoke the planner. For utility
* statements, just make a wrapper PlannedStmt node.
*
- * The result is a list of PlannedStmt nodes.
+ * The result is a list of PlannedStmt nodes. Also, a NULL is appended to
+ * *execPrepResults for each PlannedStmt added to the returned list.
*/
List *
pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
- ParamListInfo boundParams)
+ ParamListInfo boundParams, List **stmt_execprep_list)
{
List *stmt_list = NIL;
ListCell *query_list;
+ *stmt_execprep_list = NIL;
foreach(query_list, querytrees)
{
Query *query = lfirst_node(Query, query_list);
@@ -942,6 +944,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
}
stmt_list = lappend(stmt_list, stmt);
+ *stmt_execprep_list = lappend(*stmt_execprep_list, NULL);
}
return stmt_list;
@@ -1045,7 +1048,8 @@ exec_simple_query(const char *query_string)
QueryCompletion qc;
MemoryContext per_parsetree_context = NULL;
List *querytree_list,
- *plantree_list;
+ *plantree_list,
+ *plantree_execprep_list;
Portal portal;
DestReceiver *receiver;
int16 format;
@@ -1132,7 +1136,8 @@ exec_simple_query(const char *query_string)
NULL, 0, NULL);
plantree_list = pg_plan_queries(querytree_list, query_string,
- CURSOR_OPT_PARALLEL_OK, NULL);
+ CURSOR_OPT_PARALLEL_OK, NULL,
+ &plantree_execprep_list);
/*
* Done with the snapshot used for parsing/planning.
@@ -1168,6 +1173,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ plantree_execprep_list,
NULL);
/*
@@ -1978,6 +1984,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ cplan->stmt_execprep_list,
cplan);
/* Done with the snapshot used for parameter I/O and parsing/planning */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5f907831a3..b76aa3ef3b 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, ExecPrepOutput *execprep,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecPrepOutput *execprep,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->execprep = execprep; /* ExecutorPrep() output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +124,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * execprep: ExecutorPrep() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +137,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ ExecPrepOutput *execprep,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +149,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, execprep, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -490,6 +494,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ linitial_node(ExecPrepOutput, portal->stmt_execpreps),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1190,7 +1195,8 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *stmtlist_item,
+ *execpreplist_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1211,9 +1217,12 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ forboth(stmtlist_item, portal->stmts,
+ execpreplist_item, portal->stmt_execpreps)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ ExecPrepOutput *execprep = lfirst_node(ExecPrepOutput,
+ execpreplist_item);
/*
* If we got a cancel signal in prior command, quit
@@ -1271,7 +1280,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, execprep,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1280,7 +1289,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, execprep,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 4a9055e6bb..221738dddc 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -58,12 +58,14 @@
#include "access/transam.h"
#include "catalog/namespace.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/optimizer.h"
#include "parser/analyze.h"
#include "parser/parsetree.h"
+#include "partitioning/partdesc.h"
#include "storage/lmgr.h"
#include "tcop/pquery.h"
#include "tcop/utility.h"
@@ -99,14 +101,15 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static List *AcquireExecutorLocks(List *stmt_list, bool acquire,
+ ParamListInfo boundParams);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -782,6 +785,47 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
return tlist;
}
+/*
+ * CachedPlanSaveExecPrepOutputs
+ * Save the list containing ExecPrepOutput nodes in the given CachedPlan
+ *
+ * The provided list is copied into a dedicated context that is a child of
+ * plan->context.
+ */
+static void
+CachedPlanSaveExecPrepOutputs(CachedPlan *plan, List *execprep_list)
+{
+ MemoryContext execprep_context = plan->execprep_context,
+ oldcontext = CurrentMemoryContext;
+ List *execprep_list_copy;
+
+ /*
+ * Set up the dedicated context if not already done, saving it as a child
+ * of the CachedPlan's context.
+ */
+ if (execprep_context == NULL)
+ {
+ execprep_context = AllocSetContextCreate(CurrentMemoryContext,
+ "CachedPlan execprep list",
+ ALLOCSET_START_SMALL_SIZES);
+ MemoryContextSetParent(execprep_context, plan->context);
+ MemoryContextSetIdentifier(execprep_context, plan->context->ident);
+ plan->execprep_context = execprep_context;
+ }
+ else
+ {
+ /* Just lear existing contents by resetting the context. */
+ Assert(MemoryContextIsValid(execprep_context));
+ MemoryContextReset(execprep_context);
+ }
+
+ MemoryContextSwitchTo(execprep_context);
+ execprep_list_copy = copyObject(execprep_list);
+ MemoryContextSwitchTo(oldcontext);
+
+ plan->stmt_execprep_list = execprep_list_copy;
+}
+
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
@@ -790,9 +834,16 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * If the CachedPlan is valid, this prepares the PlannedStmts contained in it
+ * for execution by invoking ExecutorPrep() on each. Resulting ExecPrepOutput
+ * nodes, allocated in a child context of the context containing the plan
+ * itself, are added into plan->stmt_execprep_list. ExecPrepOutput nodes that
+ * may be present in the list from the last invocation of CheckCachedPlan() on
+ * the same CachedPlan are deleted.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
CachedPlan *plan = plansource->gplan;
@@ -820,13 +871,22 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *execprep_list;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Take executor locks on the plan tree and perform other
+ * preparatatory actions on it by invoking ExecutorPrep(). A list of
+ * ExecPrepOutput nodes is generated as result which is saved in the
+ * CachedPlan.
+ */
+ execprep_list = AcquireExecutorLocks(plan->stmt_list, true, boundParams);
+ CachedPlanSaveExecPrepOutputs(plan, execprep_list);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +908,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ (void) AcquireExecutorLocks(plan->stmt_list, false, boundParams);
}
/*
@@ -880,7 +940,8 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv)
{
CachedPlan *plan;
- List *plist;
+ List *plist,
+ *execprep_list;
bool snapshot_set;
bool is_transient;
MemoryContext plan_context;
@@ -933,7 +994,8 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
* Generate the plan.
*/
plist = pg_plan_queries(qlist, plansource->query_string,
- plansource->cursor_options, boundParams);
+ plansource->cursor_options, boundParams,
+ &execprep_list);
/* Release snapshot if we got one */
if (snapshot_set)
@@ -1002,6 +1064,11 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->is_saved = false;
plan->is_valid = true;
+ /* Save the dummy ExecPrepOutput list. */
+ plan->execprep_context = NULL;
+ CachedPlanSaveExecPrepOutputs(plan, execprep_list);
+ Assert(MemoryContextIsValid(plan->execprep_context));
+
/* assign generation number to new plan */
plan->generation = ++(plansource->generation);
@@ -1160,7 +1227,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1366,7 +1433,6 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
foreach(lc, plan->stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc);
- ListCell *lc2;
if (plannedstmt->commandType == CMD_UTILITY)
return false;
@@ -1375,13 +1441,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
* We have to grovel through the rtable because it's likely to contain
* an RTE_RESULT relation, rather than being totally empty.
*/
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (rte->rtekind == RTE_RELATION)
- return false;
- }
+ if (!bms_is_empty(plannedstmt->relationRTIs))
+ return false;
}
/*
@@ -1738,16 +1799,22 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
+ *
+ * Returns a list of ExecPrepOutput nodes containing one element for each
+ * PlannedStmt in stmt_list; NULL if the latter is utility statement.
*/
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+static List *
+AcquireExecutorLocks(List *stmt_list, bool acquire, ParamListInfo boundParams)
{
ListCell *lc1;
+ List *stmt_execprep_list = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ ExecPrepContext *context;
+ ExecPrepOutput *execprep = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1762,28 +1829,46 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
if (query)
ScanQueryForLocks(query, acquire);
- continue;
}
-
- foreach(lc2, plannedstmt->rtable)
+ else
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (rte->rtekind != RTE_RELATION)
- continue;
-
/*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
+ * Prep the plan tree for execution.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ context = makeNode(ExecPrepContext);
+ context->stmt = plannedstmt;
+ context->params = boundParams;
+ execprep = ExecutorPrep(context);
+
+ rti = -1;
+ while ((rti = bms_next_member(execprep->relationRTIs, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ if (rte->rtekind != RTE_RELATION)
+ continue;
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID.
+ * Note that we don't actually try to open the rel, and hence
+ * will not fail if it's been dropped entirely --- we'll just
+ * transiently acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
}
+
+ /*
+ * Keep the invariant that stmt_execprep_list is same length as
+ * stmt_list.
+ */
+ stmt_execprep_list = lappend(stmt_execprep_list, execprep);
}
+
+ return stmt_execprep_list;
}
/*
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 236f450a2b..5cf1339ffd 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,6 +284,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *stmt_execpreps,
CachedPlan *cplan)
{
AssertArg(PortalIsValid(portal));
@@ -298,6 +299,7 @@ PortalDefineQuery(Portal portal,
portal->qc.nprocessed = 0;
portal->commandTag = commandTag;
portal->stmts = stmts;
+ portal->stmt_execpreps = stmt_execpreps;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 666977fb1f..f553649a5d 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, ExecPrepOutput *execprep,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 603d8becc4..785a09f15f 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -119,10 +119,21 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
EState *estate);
extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
PartitionTupleRouting *proute);
+
extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
-extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
+ PartitionPruneInfo *partitionpruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
- int nsubplans);
-
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **parentrelids);
+extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
+extern Bitmapset *ExecInitPartitionPruning(PlanState *planstate, int n_total_subplans,
+ PartitionPruneInfo *pruneinfo,
+ PartitionPruneState **prunestate);
+extern Bitmapset *ExecPrepDoInitialPruning(PartitionPruneInfo *pruneinfo,
+ List *rtable, ParamListInfo params,
+ Bitmapset **parentrelids);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..491ceef401 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ ExecPrepOutput *execprep; /* ExecutorPrep()'s output given plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +58,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecPrepOutput *execprep,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 344399f6a8..627cb19a4c 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,7 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern ExecPrepOutput *ExecutorPrep(ExecPrepContext *context);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
@@ -233,6 +234,8 @@ extern void EvalPlanQualEnd(EPQState *epqstate);
/*
* functions in execProcnode.c
*/
+extern void ExecPrepNode(Plan *node, ExecPrepContext *context,
+ ExecPrepOutput *result);
extern PlanState *ExecInitNode(Plan *node, EState *estate, int eflags);
extern void ExecSetExecProcNode(PlanState *node, ExecProcNodeMtd function);
extern Node *MultiExecProcNode(PlanState *node);
diff --git a/src/include/executor/nodeAgg.h b/src/include/executor/nodeAgg.h
index 4d1bd92999..2dd7570067 100644
--- a/src/include/executor/nodeAgg.h
+++ b/src/include/executor/nodeAgg.h
@@ -314,6 +314,7 @@ typedef struct AggStatePerHashData
} AggStatePerHashData;
+extern void ExecPrepAgg(Agg *node, ExecPrepContext *context, ExecPrepOutput *result);
extern AggState *ExecInitAgg(Agg *node, EState *estate, int eflags);
extern void ExecEndAgg(AggState *node);
extern void ExecReScanAgg(AggState *node);
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 4cb78ee5b6..85bc9d30a6 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -17,6 +17,7 @@
#include "access/parallel.h"
#include "nodes/execnodes.h"
+extern void ExecPrepAppend(Append *node, ExecPrepContext *context, ExecPrepOutput *execprep);
extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
extern void ExecEndAppend(AppendState *node);
extern void ExecReScanAppend(AppendState *node);
diff --git a/src/include/executor/nodeBitmapAnd.h b/src/include/executor/nodeBitmapAnd.h
index bae6a83826..aafb10a2aa 100644
--- a/src/include/executor/nodeBitmapAnd.h
+++ b/src/include/executor/nodeBitmapAnd.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepBitmapAnd(BitmapAnd *node, ExecPrepContext *context, ExecPrepOutput *result);
extern BitmapAndState *ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags);
extern Node *MultiExecBitmapAnd(BitmapAndState *node);
extern void ExecEndBitmapAnd(BitmapAndState *node);
diff --git a/src/include/executor/nodeBitmapHeapscan.h b/src/include/executor/nodeBitmapHeapscan.h
index 789522cb8d..7240d9fa93 100644
--- a/src/include/executor/nodeBitmapHeapscan.h
+++ b/src/include/executor/nodeBitmapHeapscan.h
@@ -17,6 +17,7 @@
#include "access/parallel.h"
#include "nodes/execnodes.h"
+extern void ExecPrepBitmapHeapScan(BitmapHeapScan *node, ExecPrepContext *context, ExecPrepOutput *result);
extern BitmapHeapScanState *ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags);
extern void ExecEndBitmapHeapScan(BitmapHeapScanState *node);
extern void ExecReScanBitmapHeapScan(BitmapHeapScanState *node);
diff --git a/src/include/executor/nodeBitmapIndexscan.h b/src/include/executor/nodeBitmapIndexscan.h
index 01fb6ef536..6759724c2e 100644
--- a/src/include/executor/nodeBitmapIndexscan.h
+++ b/src/include/executor/nodeBitmapIndexscan.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepBitmapIndexScan(BitmapIndexScan *node, ExecPrepContext *context, ExecPrepOutput *result);
extern BitmapIndexScanState *ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags);
extern Node *MultiExecBitmapIndexScan(BitmapIndexScanState *node);
extern void ExecEndBitmapIndexScan(BitmapIndexScanState *node);
diff --git a/src/include/executor/nodeBitmapOr.h b/src/include/executor/nodeBitmapOr.h
index ad90812cc1..66ddc18f63 100644
--- a/src/include/executor/nodeBitmapOr.h
+++ b/src/include/executor/nodeBitmapOr.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepBitmapOr(BitmapOr *node, ExecPrepContext *context, ExecPrepOutput *result);
extern BitmapOrState *ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags);
extern Node *MultiExecBitmapOr(BitmapOrState *node);
extern void ExecEndBitmapOr(BitmapOrState *node);
diff --git a/src/include/executor/nodeCtescan.h b/src/include/executor/nodeCtescan.h
index 317d142b16..7908ae51df 100644
--- a/src/include/executor/nodeCtescan.h
+++ b/src/include/executor/nodeCtescan.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepCteScan(CteScan *node, ExecPrepContext *context, ExecPrepOutput *result);
extern CteScanState *ExecInitCteScan(CteScan *node, EState *estate, int eflags);
extern void ExecEndCteScan(CteScanState *node);
extern void ExecReScanCteScan(CteScanState *node);
diff --git a/src/include/executor/nodeCustom.h b/src/include/executor/nodeCustom.h
index 5ef890144f..8c1d05f64b 100644
--- a/src/include/executor/nodeCustom.h
+++ b/src/include/executor/nodeCustom.h
@@ -18,6 +18,7 @@
/*
* General executor code
*/
+extern void ExecPrepCustomScan(CustomScan *node, ExecPrepContext *context, ExecPrepOutput *result);
extern CustomScanState *ExecInitCustomScan(CustomScan *cscan,
EState *estate, int eflags);
extern void ExecEndCustomScan(CustomScanState *node);
diff --git a/src/include/executor/nodeForeignscan.h b/src/include/executor/nodeForeignscan.h
index c9fbaed79c..a2d6667011 100644
--- a/src/include/executor/nodeForeignscan.h
+++ b/src/include/executor/nodeForeignscan.h
@@ -17,6 +17,7 @@
#include "access/parallel.h"
#include "nodes/execnodes.h"
+extern void ExecPrepForeignScan(ForeignScan *node, ExecPrepContext *context, ExecPrepOutput *result);
extern ForeignScanState *ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags);
extern void ExecEndForeignScan(ForeignScanState *node);
extern void ExecReScanForeignScan(ForeignScanState *node);
diff --git a/src/include/executor/nodeFunctionscan.h b/src/include/executor/nodeFunctionscan.h
index 7a598a1d46..8686bb5c09 100644
--- a/src/include/executor/nodeFunctionscan.h
+++ b/src/include/executor/nodeFunctionscan.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepFunctionScan(FunctionScan *node, ExecPrepContext *context, ExecPrepOutput *result);
extern FunctionScanState *ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags);
extern void ExecEndFunctionScan(FunctionScanState *node);
extern void ExecReScanFunctionScan(FunctionScanState *node);
diff --git a/src/include/executor/nodeGather.h b/src/include/executor/nodeGather.h
index 29829ffe9a..206185ffbc 100644
--- a/src/include/executor/nodeGather.h
+++ b/src/include/executor/nodeGather.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepGather(Gather *node, ExecPrepContext *context, ExecPrepOutput *result);
extern GatherState *ExecInitGather(Gather *node, EState *estate, int eflags);
extern void ExecEndGather(GatherState *node);
extern void ExecShutdownGather(GatherState *node);
diff --git a/src/include/executor/nodeGatherMerge.h b/src/include/executor/nodeGatherMerge.h
index d724d5fea4..b124a3fe99 100644
--- a/src/include/executor/nodeGatherMerge.h
+++ b/src/include/executor/nodeGatherMerge.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepGatherMerge(GatherMerge *node, ExecPrepContext *context, ExecPrepOutput *result);
extern GatherMergeState *ExecInitGatherMerge(GatherMerge *node,
EState *estate,
int eflags);
diff --git a/src/include/executor/nodeGroup.h b/src/include/executor/nodeGroup.h
index 816ed2c099..7e86abab01 100644
--- a/src/include/executor/nodeGroup.h
+++ b/src/include/executor/nodeGroup.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepGroup(Group *node, ExecPrepContext *context, ExecPrepOutput *result);
extern GroupState *ExecInitGroup(Group *node, EState *estate, int eflags);
extern void ExecEndGroup(GroupState *node);
extern void ExecReScanGroup(GroupState *node);
diff --git a/src/include/executor/nodeHash.h b/src/include/executor/nodeHash.h
index e1e0dec24b..1426a6e9a1 100644
--- a/src/include/executor/nodeHash.h
+++ b/src/include/executor/nodeHash.h
@@ -19,6 +19,7 @@
struct SharedHashJoinBatch;
+extern void ExecPrepHash(Hash *node, ExecPrepContext *context, ExecPrepOutput *result);
extern HashState *ExecInitHash(Hash *node, EState *estate, int eflags);
extern Node *MultiExecHash(HashState *node);
extern void ExecEndHash(HashState *node);
diff --git a/src/include/executor/nodeHashjoin.h b/src/include/executor/nodeHashjoin.h
index b3b5a2c3f2..6dc88282d4 100644
--- a/src/include/executor/nodeHashjoin.h
+++ b/src/include/executor/nodeHashjoin.h
@@ -18,6 +18,7 @@
#include "nodes/execnodes.h"
#include "storage/buffile.h"
+extern void ExecPrepHashJoin(HashJoin *node, ExecPrepContext *context, ExecPrepOutput *result);
extern HashJoinState *ExecInitHashJoin(HashJoin *node, EState *estate, int eflags);
extern void ExecEndHashJoin(HashJoinState *node);
extern void ExecReScanHashJoin(HashJoinState *node);
diff --git a/src/include/executor/nodeIncrementalSort.h b/src/include/executor/nodeIncrementalSort.h
index 84cfd96b13..e909cb784b 100644
--- a/src/include/executor/nodeIncrementalSort.h
+++ b/src/include/executor/nodeIncrementalSort.h
@@ -15,6 +15,7 @@
#include "access/parallel.h"
#include "nodes/execnodes.h"
+extern void ExecPrepIncrementalSort(IncrementalSort *node, ExecPrepContext *context, ExecPrepOutput *result);
extern IncrementalSortState *ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags);
extern void ExecEndIncrementalSort(IncrementalSortState *node);
extern void ExecReScanIncrementalSort(IncrementalSortState *node);
diff --git a/src/include/executor/nodeIndexonlyscan.h b/src/include/executor/nodeIndexonlyscan.h
index 47b03950ea..d0aca7a303 100644
--- a/src/include/executor/nodeIndexonlyscan.h
+++ b/src/include/executor/nodeIndexonlyscan.h
@@ -17,6 +17,7 @@
#include "access/parallel.h"
#include "nodes/execnodes.h"
+extern void ExecPrepIndexOnlyScan(IndexOnlyScan *node, ExecPrepContext *context, ExecPrepOutput *result);
extern IndexOnlyScanState *ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags);
extern void ExecEndIndexOnlyScan(IndexOnlyScanState *node);
extern void ExecIndexOnlyMarkPos(IndexOnlyScanState *node);
diff --git a/src/include/executor/nodeIndexscan.h b/src/include/executor/nodeIndexscan.h
index 0a075f9aea..d57c370466 100644
--- a/src/include/executor/nodeIndexscan.h
+++ b/src/include/executor/nodeIndexscan.h
@@ -18,6 +18,7 @@
#include "access/parallel.h"
#include "nodes/execnodes.h"
+extern void ExecPrepIndexScan(IndexScan *node, ExecPrepContext *context, ExecPrepOutput *result);
extern IndexScanState *ExecInitIndexScan(IndexScan *node, EState *estate, int eflags);
extern void ExecEndIndexScan(IndexScanState *node);
extern void ExecIndexMarkPos(IndexScanState *node);
diff --git a/src/include/executor/nodeLimit.h b/src/include/executor/nodeLimit.h
index 6da0c4026c..05d7e4797b 100644
--- a/src/include/executor/nodeLimit.h
+++ b/src/include/executor/nodeLimit.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepLimit(Limit *node, ExecPrepContext *context, ExecPrepOutput *result);
extern LimitState *ExecInitLimit(Limit *node, EState *estate, int eflags);
extern void ExecEndLimit(LimitState *node);
extern void ExecReScanLimit(LimitState *node);
diff --git a/src/include/executor/nodeLockRows.h b/src/include/executor/nodeLockRows.h
index 125a32b608..157d4a7f0e 100644
--- a/src/include/executor/nodeLockRows.h
+++ b/src/include/executor/nodeLockRows.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepLockRows(LockRows *node, ExecPrepContext *context, ExecPrepOutput *result);
extern LockRowsState *ExecInitLockRows(LockRows *node, EState *estate, int eflags);
extern void ExecEndLockRows(LockRowsState *node);
extern void ExecReScanLockRows(LockRowsState *node);
diff --git a/src/include/executor/nodeMaterial.h b/src/include/executor/nodeMaterial.h
index 21a6860a1a..9b70d6e97b 100644
--- a/src/include/executor/nodeMaterial.h
+++ b/src/include/executor/nodeMaterial.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepMaterial(Material *node, ExecPrepContext *context, ExecPrepOutput *result);
extern MaterialState *ExecInitMaterial(Material *node, EState *estate, int eflags);
extern void ExecEndMaterial(MaterialState *node);
extern void ExecMaterialMarkPos(MaterialState *node);
diff --git a/src/include/executor/nodeMemoize.h b/src/include/executor/nodeMemoize.h
index 4643163dc7..53a784f012 100644
--- a/src/include/executor/nodeMemoize.h
+++ b/src/include/executor/nodeMemoize.h
@@ -17,6 +17,7 @@
#include "access/parallel.h"
#include "nodes/execnodes.h"
+extern void ExecPrepMemoize(Memoize *node, ExecPrepContext *context, ExecPrepOutput *result);
extern MemoizeState *ExecInitMemoize(Memoize *node, EState *estate, int eflags);
extern void ExecEndMemoize(MemoizeState *node);
extern void ExecReScanMemoize(MemoizeState *node);
diff --git a/src/include/executor/nodeMergeAppend.h b/src/include/executor/nodeMergeAppend.h
index 97fe3b0665..60a9136de6 100644
--- a/src/include/executor/nodeMergeAppend.h
+++ b/src/include/executor/nodeMergeAppend.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepMergeAppend(MergeAppend *node, ExecPrepContext *context, ExecPrepOutput *result);
extern MergeAppendState *ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags);
extern void ExecEndMergeAppend(MergeAppendState *node);
extern void ExecReScanMergeAppend(MergeAppendState *node);
diff --git a/src/include/executor/nodeMergejoin.h b/src/include/executor/nodeMergejoin.h
index 26ab517508..29553d5dd0 100644
--- a/src/include/executor/nodeMergejoin.h
+++ b/src/include/executor/nodeMergejoin.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepMergeJoin(MergeJoin *node, ExecPrepContext *context, ExecPrepOutput *result);
extern MergeJoinState *ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags);
extern void ExecEndMergeJoin(MergeJoinState *node);
extern void ExecReScanMergeJoin(MergeJoinState *node);
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 1d225bc88d..4b1846f8ff 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -19,6 +19,7 @@ extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
EState *estate, TupleTableSlot *slot,
CmdType cmdtype);
+extern void ExecPrepModifyTable(ModifyTable *node, ExecPrepContext *context, ExecPrepOutput *result);
extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
extern void ExecEndModifyTable(ModifyTableState *node);
extern void ExecReScanModifyTable(ModifyTableState *node);
diff --git a/src/include/executor/nodeNamedtuplestorescan.h b/src/include/executor/nodeNamedtuplestorescan.h
index d595124e54..964afcd816 100644
--- a/src/include/executor/nodeNamedtuplestorescan.h
+++ b/src/include/executor/nodeNamedtuplestorescan.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepNamedTuplestoreScan(NamedTuplestoreScan *node, ExecPrepContext *context, ExecPrepOutput *result);
extern NamedTuplestoreScanState *ExecInitNamedTuplestoreScan(NamedTuplestoreScan *node, EState *estate, int eflags);
extern void ExecEndNamedTuplestoreScan(NamedTuplestoreScanState *node);
extern void ExecReScanNamedTuplestoreScan(NamedTuplestoreScanState *node);
diff --git a/src/include/executor/nodeNestloop.h b/src/include/executor/nodeNestloop.h
index b1411faf57..13ea4cc870 100644
--- a/src/include/executor/nodeNestloop.h
+++ b/src/include/executor/nodeNestloop.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepNestLoop(NestLoop *node, ExecPrepContext *context, ExecPrepOutput *result);
extern NestLoopState *ExecInitNestLoop(NestLoop *node, EState *estate, int eflags);
extern void ExecEndNestLoop(NestLoopState *node);
extern void ExecReScanNestLoop(NestLoopState *node);
diff --git a/src/include/executor/nodeProjectSet.h b/src/include/executor/nodeProjectSet.h
index 2c2b58282c..c9b44356ba 100644
--- a/src/include/executor/nodeProjectSet.h
+++ b/src/include/executor/nodeProjectSet.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepProjectSet(ProjectSet *node, ExecPrepContext *context, ExecPrepOutput *result);
extern ProjectSetState *ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags);
extern void ExecEndProjectSet(ProjectSetState *node);
extern void ExecReScanProjectSet(ProjectSetState *node);
diff --git a/src/include/executor/nodeRecursiveunion.h b/src/include/executor/nodeRecursiveunion.h
index 2d20470da2..7b7585d594 100644
--- a/src/include/executor/nodeRecursiveunion.h
+++ b/src/include/executor/nodeRecursiveunion.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepRecursiveUnion(RecursiveUnion *node, ExecPrepContext *context, ExecPrepOutput *result);
extern RecursiveUnionState *ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags);
extern void ExecEndRecursiveUnion(RecursiveUnionState *node);
extern void ExecReScanRecursiveUnion(RecursiveUnionState *node);
diff --git a/src/include/executor/nodeResult.h b/src/include/executor/nodeResult.h
index ebb131d265..998a50ae27 100644
--- a/src/include/executor/nodeResult.h
+++ b/src/include/executor/nodeResult.h
@@ -16,6 +16,8 @@
#include "nodes/execnodes.h"
+extern void ExecPrepResult(Result *node, ExecPrepContext *context, ExecPrepOutput *result);
+extern ResultState *ExecInitResult(Result *node, EState *estate, int eflags);
extern ResultState *ExecInitResult(Result *node, EState *estate, int eflags);
extern void ExecEndResult(ResultState *node);
extern void ExecResultMarkPos(ResultState *node);
diff --git a/src/include/executor/nodeSamplescan.h b/src/include/executor/nodeSamplescan.h
index 340b41a427..c0dd45b8bc 100644
--- a/src/include/executor/nodeSamplescan.h
+++ b/src/include/executor/nodeSamplescan.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepSampleScan(SampleScan *node, ExecPrepContext *context, ExecPrepOutput *result);
extern SampleScanState *ExecInitSampleScan(SampleScan *node, EState *estate, int eflags);
extern void ExecEndSampleScan(SampleScanState *node);
extern void ExecReScanSampleScan(SampleScanState *node);
diff --git a/src/include/executor/nodeSeqscan.h b/src/include/executor/nodeSeqscan.h
index c225ba6e04..5452742622 100644
--- a/src/include/executor/nodeSeqscan.h
+++ b/src/include/executor/nodeSeqscan.h
@@ -17,6 +17,7 @@
#include "access/parallel.h"
#include "nodes/execnodes.h"
+extern void ExecPrepSeqScan(SeqScan *node, ExecPrepContext *context, ExecPrepOutput *result);
extern SeqScanState *ExecInitSeqScan(SeqScan *node, EState *estate, int eflags);
extern void ExecEndSeqScan(SeqScanState *node);
extern void ExecReScanSeqScan(SeqScanState *node);
diff --git a/src/include/executor/nodeSetOp.h b/src/include/executor/nodeSetOp.h
index a504cf8613..bc80011513 100644
--- a/src/include/executor/nodeSetOp.h
+++ b/src/include/executor/nodeSetOp.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepSetOp(SetOp *node, ExecPrepContext *context, ExecPrepOutput *result);
extern SetOpState *ExecInitSetOp(SetOp *node, EState *estate, int eflags);
extern void ExecEndSetOp(SetOpState *node);
extern void ExecReScanSetOp(SetOpState *node);
diff --git a/src/include/executor/nodeSort.h b/src/include/executor/nodeSort.h
index 008e6a6bc6..def930a8bc 100644
--- a/src/include/executor/nodeSort.h
+++ b/src/include/executor/nodeSort.h
@@ -17,6 +17,7 @@
#include "access/parallel.h"
#include "nodes/execnodes.h"
+extern void ExecPrepSort(Sort *node, ExecPrepContext *context, ExecPrepOutput *result);
extern SortState *ExecInitSort(Sort *node, EState *estate, int eflags);
extern void ExecEndSort(SortState *node);
extern void ExecSortMarkPos(SortState *node);
diff --git a/src/include/executor/nodeSubplan.h b/src/include/executor/nodeSubplan.h
index 75cc6d5104..f6e21007fa 100644
--- a/src/include/executor/nodeSubplan.h
+++ b/src/include/executor/nodeSubplan.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepSubPlan(SubPlan *node, ExecPrepContext *context, ExecPrepOutput *result);
extern SubPlanState *ExecInitSubPlan(SubPlan *subplan, PlanState *parent);
extern Datum ExecSubPlan(SubPlanState *node, ExprContext *econtext, bool *isNull);
diff --git a/src/include/executor/nodeSubqueryscan.h b/src/include/executor/nodeSubqueryscan.h
index a09e2be423..3fbf053e04 100644
--- a/src/include/executor/nodeSubqueryscan.h
+++ b/src/include/executor/nodeSubqueryscan.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepSubqueryScan(SubqueryScan *node, ExecPrepContext *context, ExecPrepOutput *result);
extern SubqueryScanState *ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags);
extern void ExecEndSubqueryScan(SubqueryScanState *node);
extern void ExecReScanSubqueryScan(SubqueryScanState *node);
diff --git a/src/include/executor/nodeTableFuncscan.h b/src/include/executor/nodeTableFuncscan.h
index 2b82e7d7ed..ba2e7774f1 100644
--- a/src/include/executor/nodeTableFuncscan.h
+++ b/src/include/executor/nodeTableFuncscan.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepTableFuncScan(TableFuncScan *node, ExecPrepContext *context, ExecPrepOutput *result);
extern TableFuncScanState *ExecInitTableFuncScan(TableFuncScan *node, EState *estate, int eflags);
extern void ExecEndTableFuncScan(TableFuncScanState *node);
extern void ExecReScanTableFuncScan(TableFuncScanState *node);
diff --git a/src/include/executor/nodeTidrangescan.h b/src/include/executor/nodeTidrangescan.h
index f122e09583..333cfbb5c6 100644
--- a/src/include/executor/nodeTidrangescan.h
+++ b/src/include/executor/nodeTidrangescan.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepTidRangeScan(TidRangeScan *node, ExecPrepContext *context, ExecPrepOutput *result);
extern TidRangeScanState *ExecInitTidRangeScan(TidRangeScan *node,
EState *estate, int eflags);
extern void ExecEndTidRangeScan(TidRangeScanState *node);
diff --git a/src/include/executor/nodeTidscan.h b/src/include/executor/nodeTidscan.h
index 91a5f89f42..188f3f3f97 100644
--- a/src/include/executor/nodeTidscan.h
+++ b/src/include/executor/nodeTidscan.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepTidScan(TidScan *node, ExecPrepContext *context, ExecPrepOutput *result);
extern TidScanState *ExecInitTidScan(TidScan *node, EState *estate, int eflags);
extern void ExecEndTidScan(TidScanState *node);
extern void ExecReScanTidScan(TidScanState *node);
diff --git a/src/include/executor/nodeUnique.h b/src/include/executor/nodeUnique.h
index 61f09d9853..970e894681 100644
--- a/src/include/executor/nodeUnique.h
+++ b/src/include/executor/nodeUnique.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepUnique(Unique *node, ExecPrepContext *context, ExecPrepOutput *result);
extern UniqueState *ExecInitUnique(Unique *node, EState *estate, int eflags);
extern void ExecEndUnique(UniqueState *node);
extern void ExecReScanUnique(UniqueState *node);
diff --git a/src/include/executor/nodeValuesscan.h b/src/include/executor/nodeValuesscan.h
index 07c13ef123..f08bb080eb 100644
--- a/src/include/executor/nodeValuesscan.h
+++ b/src/include/executor/nodeValuesscan.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepValuesScan(ValuesScan *node, ExecPrepContext *context, ExecPrepOutput *result);
extern ValuesScanState *ExecInitValuesScan(ValuesScan *node, EState *estate, int eflags);
extern void ExecEndValuesScan(ValuesScanState *node);
extern void ExecReScanValuesScan(ValuesScanState *node);
diff --git a/src/include/executor/nodeWindowAgg.h b/src/include/executor/nodeWindowAgg.h
index 4e62c8936d..a4d8487aba 100644
--- a/src/include/executor/nodeWindowAgg.h
+++ b/src/include/executor/nodeWindowAgg.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepWindowAgg(WindowAgg *node, ExecPrepContext *context, ExecPrepOutput *result);
extern WindowAggState *ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags);
extern void ExecEndWindowAgg(WindowAggState *node);
extern void ExecReScanWindowAgg(WindowAggState *node);
diff --git a/src/include/executor/nodeWorktablescan.h b/src/include/executor/nodeWorktablescan.h
index 17842de576..5f7f76ec85 100644
--- a/src/include/executor/nodeWorktablescan.h
+++ b/src/include/executor/nodeWorktablescan.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern void ExecPrepWorkTableScan(WorkTableScan *node, ExecPrepContext *context, ExecPrepOutput *result);
extern WorkTableScanState *ExecInitWorkTableScan(WorkTableScan *node, EState *estate, int eflags);
extern void ExecEndWorkTableScan(WorkTableScanState *node);
extern void ExecReScanWorkTableScan(WorkTableScanState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index dd95dc40c7..7b03f46966 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -570,6 +570,8 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct ExecPrepOutput *es_execprep; /* link to ExecPrepOutput, if one was
+ * passed to ExecutorStart() */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -958,6 +960,82 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * ExecPrepContext
+ *
+ * Context information for performing ExecutorPrep() on a given plan
+ */
+typedef struct ExecPrepContext
+{
+ NodeTag type;
+
+ PlannedStmt *stmt; /* target plan */
+ ParamListInfo params; /* EXTERN parameters to prune with */
+} ExecPrepContext;
+
+/*----------------
+ * ExecPrepOutput
+ *
+ * Result of of performing ExecutorPrep() for a given PlannedStmt
+ */
+typedef struct ExecPrepOutput
+{
+ NodeTag type;
+
+ Bitmapset *relationRTIs; /* RT indexes of RTE_RELATIONs */
+ int numPlanNodes; /* PlannedStmt.numPlanNodes */
+
+ /*
+ * Array of 'numPlanNodes' elements containing PlanPrepOutput nodes
+ * for each node in the plan tree, indexed using the node's plan_node_id.
+ * A NULL value means that the corresponding plan node does not have a
+ * PlanPrepOutput associated with it.
+ */
+ struct PlanPrepOutput **planPrepResults;
+} ExecPrepOutput;
+
+#define ExecPrepStorePlanPrepOutput(execprep, planPrepResult, plannode) \
+ (execprep)->planPrepResults[(plannode)->plan_node_id] = (planPrepResult)
+
+#define ExecPrepFetchPlanPrepOutput(execprep, plannode) \
+ ((execprep) != NULL ? \
+ (execprep)->planPrepResults[(plannode)->plan_node_id] : NULL)
+
+#ifdef USE_ASSERT_CHECKING
+#define EXEC_PREP_OUTPUT_SANITY(plannode, estate) \
+ do { \
+ PlanPrepOutput *planPrepOutput = \
+ ExecPrepFetchPlanPrepOutput(estate->es_execprep, node); \
+ Assert(planPrepOutput == NULL || \
+ (IsA(planPrepOutput, PlanPrepOutput) && \
+ planPrepOutput->plan_node_id == plannode->plan_node_id)); \
+ } while (0);
+#else
+#define EXEC_PREP_OUTPUT_SANITY(node, estate)
+#endif
+
+/* ---------------
+ * PlanPrepOutput
+ *
+ * ExecutorPrep() creates a node of this type for every node in the Plan tree
+ * that does some "prep" work.
+ */
+typedef struct PlanPrepOutput
+{
+ NodeTag type;
+
+ int plan_node_id; /* associated Plan node */
+
+ /* Information collected by ExecPrepNode subroutine for the node */
+
+ /*
+ * For nodes that contain a list of prunable subnodes, the following
+ * contains offsets into that list, of the subnodes that survive initial
+ * partition pruning.
+ */
+ Bitmapset *initially_valid_subnodes;
+} PlanPrepOutput;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodeFuncs.h b/src/include/nodes/nodeFuncs.h
index 93c60bde66..fca107ad65 100644
--- a/src/include/nodes/nodeFuncs.h
+++ b/src/include/nodes/nodeFuncs.h
@@ -158,5 +158,8 @@ extern bool raw_expression_tree_walker(Node *node, bool (*walker) (),
struct PlanState;
extern bool planstate_tree_walker(struct PlanState *planstate, bool (*walker) (),
void *context);
+struct Plan;
+extern bool plan_tree_walker(struct Plan *plan, bool (*walker) (),
+ void *context);
#endif /* NODEFUNCS_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index da35f2c272..8db017a138 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -96,6 +96,11 @@ typedef enum NodeTag
T_PartitionPruneStepCombine,
T_PlanInvalItem,
+ /* TAGS FOR EXECUTOR PREP NODES (execnodes.h) */
+ T_ExecPrepContext,
+ T_ExecPrepOutput,
+ T_PlanPrepOutput,
+
/*
* TAGS FOR PLAN STATE NODES (execnodes.h)
*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1f3845b3fe..ffde93ef13 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -101,6 +101,9 @@ typedef struct PlannerGlobal
List *finalrtable; /* "flat" rangetable for executor */
+ Bitmapset *relationRTIs; /* Indexes of RTE_RELATION entries in range
+ * table */
+
List *finalrowmarks; /* "flat" list of PlanRowMarks */
List *resultRelations; /* "flat" list of integer RT indexes */
@@ -129,6 +132,9 @@ typedef struct PlannerGlobal
char maxParallelHazard; /* worst PROPARALLEL hazard level */
+ bool usesPreExecPruning; /* Do some Plan nodes use pre-execution
+ * partition pruning */
+
PartitionDirectory partition_directory; /* partition descriptors */
} PlannerGlobal;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 0b518ce6b2..69bc5f918c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -59,12 +59,20 @@ typedef struct PlannedStmt
bool parallelModeNeeded; /* parallel mode required to execute? */
+ bool usesPreExecPruning; /* Do some Plan nodes use pre-execution
+ * partition pruning */
+
int jitFlags; /* which forms of JIT should be performed */
struct Plan *planTree; /* tree of Plan nodes */
+ int numPlanNodes; /* number of nodes in planTree */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *relationRTIs; /* Indexes of RTE_RELATION entries in range
+ * table */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1172,6 +1180,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * contains_init_steps Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * contains_exec_steps Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1180,6 +1195,8 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ bool contains_init_steps;
+ bool contains_exec_steps;
Bitmapset *other_subplans;
} PartitionPruneInfo;
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index ee11b6feae..90684efa25 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -41,6 +41,7 @@ struct RelOptInfo;
* subsidiary data, such as the FmgrInfos.
* planstate Points to the parent plan node's PlanState when called
* during execution; NULL when called from the planner.
+ * exprcontext ExprContext to use when evaluating pruning expressions
* exprstates Array of ExprStates, indexed as per PruneCxtStateIdx; one
* for each partition key in each pruning step. Allocated if
* planstate is non-NULL, otherwise NULL.
@@ -56,6 +57,7 @@ typedef struct PartitionPruneContext
FmgrInfo *stepcmpfuncs;
MemoryContext ppccontext;
PlanState *planstate;
+ ExprContext *exprcontext;
ExprState **exprstates;
} PartitionPruneContext;
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 15a11bc3ff..02124af4ed 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -59,7 +59,7 @@ extern PlannedStmt *pg_plan_query(Query *querytree, const char *query_string,
ParamListInfo boundParams);
extern List *pg_plan_queries(List *querytrees, const char *query_string,
int cursorOptions,
- ParamListInfo boundParams);
+ ParamListInfo boundParams, List **stmt_execprep_list);
extern bool check_max_stack_depth(int *newval, void **extra, GucSource source);
extern void assign_max_stack_depth(int newval, void *extra);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 95b99e3d25..14794972a0 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -148,6 +148,9 @@ typedef struct CachedPlan
{
int magic; /* should equal CACHEDPLAN_MAGIC */
List *stmt_list; /* list of PlannedStmts */
+ List *stmt_execprep_list; /* list of ExecutorPrepResult with one
+ * element for each of stmt_list; NIL
+ * if not a generic plan */
bool is_oneshot; /* is it a "oneshot" plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
@@ -158,6 +161,8 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+ MemoryContext execprep_context; /* context containing stmt_execprep_list,
+ * a child of the above context */
} CachedPlan;
/*
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..03c39ff97a 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,10 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *stmt_execpreps; /* list of ExecutorPrepResults with one element
+ * for each of 'stmts'; same as
+ * cplan->stmt_execprep_list if cplan is
+ * not NULL */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -241,6 +245,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *stmt_execpreps,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.24.1
On Thu, Feb 10, 2022 at 3:14 AM Amit Langote <amitlangote09@gmail.com> wrote:
Maybe this should be more than one patch? Say:
0001 to add ExecutorPrep and the boilerplate,
0002 to teach plancache.c to use the new facility
Could be, not sure. I agree that if it's possible to split this in a
meaningful way, it would facilitate review. I notice that there is
some straight code movement e.g. the creation of
ExecPartitionPruneFixSubPlanIndexes. It would be best, I think, to do
pure code movement in a preparatory patch so that the main patch is
just adding the new stuff we need and not moving stuff around.
David Rowley recently proposed a patch for some parallel-safety
debugging cross checks which added a plan tree walker. I'm not sure
whether he's going to press that patch forward to commit, but I think
we should get something like that into the tree and start using it,
rather than adding more bespoke code. Maybe you/we should steal that
part of his patch and commit it separately. What I'm imagining is that
plan_tree_walker() would know which nodes have subnodes and how to
recurse over the tree structure, and you'd have a walker function to
use with it that would know which executor nodes have ExecPrep
functions and call them, and just do nothing for the others. That
would spare you adding stub functions for nodes that don't need to do
anything, or don't need to do anything other than recurse. Admittedly
it would look a bit different from the existing executor phases, but
I'd argue that it's a better coding model.
Actually, you might've had this in the patch at some point, because
you have a declaration for plan_tree_walker but no implementation. I
guess one thing that's a bit awkward about this idea is that in some
cases you want to recurse to some subnodes but not other subnodes. But
maybe it would work to put the recursion in the walker function in
that case, and then just return true; but if you want to walk all
children, return false.
+ bool contains_init_steps;
+ bool contains_exec_steps;
s/steps/pruning/? maybe with contains -> needs or performs or requires as well?
+ * Returned information includes the set of RT indexes of relations referenced
+ * in the plan, and a PlanPrepOutput node for each node in the planTree if the
+ * node type supports producing one.
Aren't all RT indexes referenced in the plan?
+ * This may lock relations whose information may be used to produce the
+ * PlanPrepOutput nodes. For example, a partitioned table before perusing its
+ * PartitionPruneInfo contained in an Append node to do the pruning the result
+ * of which is used to populate the Append node's PlanPrepOutput.
"may lock" feels awfully fuzzy to me. How am I supposed to rely on
something that "may" happen? And don't we need to have tight logic
around locking, with specific guarantees about what is locked at which
points in the code and what is not?
+ * At least one of 'planstate' or 'econtext' must be passed to be able to
+ * successfully evaluate any non-Const expressions contained in the
+ * steps.
This also seems fuzzy. If I'm thinking of calling this function, I
don't know how I'd know whether this criterion is met.
I don't love PlanPrepOutput the way you have it. I think one of the
basic design issues for this patch is: should we think of the prep
phase as specifically pruning, or is it general prep and pruning is
the first thing for which we're going to use it? If it's really a
pre-pruning phase, we could name it that way instead of calling it
"prep". If it's really a general prep phase, then why does
PlanPrepOutput contain initially_valid_subnodes as a field? One could
imagine letting each prep function decide what kind of prep node it
would like to return, with partition pruning being just one of the
options. But is that a useful generalization of the basic concept, or
just pretending that a special-purpose mechanism is more general than
it really is?
+ return CreateQueryDesc(pstmt, NULL, /* XXX pass ExecPrepOutput too? */
It seems to me that we should do what the XXX suggests. It doesn't
seem nice if the parallel workers could theoretically decide to prune
a different set of nodes than the leader.
+ * known at executor startup (excludeing expressions containing
Extra e.
+ * into subplan indexes, is also returned for use during subsquent
Missing e.
Somewhere, we're going to need to document the idea that this may
permit us to execute a plan that isn't actually fully valid, but that
we expect to survive because we'll never do anything with the parts of
it that aren't. Maybe that should be added to the executor README, or
maybe there's some better place, but I don't think that should remain
something that's just implicit.
This is not a full review, just some initial thoughts looking through this.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi,
On 2022-02-10 17:13:52 +0900, Amit Langote wrote:
The attached patch implements this idea. Sorry for the delay in
getting this out and thanks to Robert for the off-list discussions on
this.
I did not follow this thread at all. And I only skimmed the patch. So I'm
probably wrong.
I'm a wary of this increasing executor overhead even in cases it won't
help. Without this patch, for simple queries, I see small allocations
noticeably in profiles. This adds a bunch more, even if
!context->stmt->usesPreExecPruning:
- makeNode(ExecPrepContext)
- makeNode(ExecPrepOutput)
- palloc0(sizeof(PlanPrepOutput *) * result->numPlanNodes)
- stmt_execprep_list = lappend(stmt_execprep_list, execprep);
- AllocSetContextCreate(CurrentMemoryContext,
"CachedPlan execprep list", ...
- ...
That's a lot of extra for something that's already a bottleneck.
Greetings,
Andres Freund
(just catching up on this thread)
On Thu, 13 Jan 2022 at 07:20, Robert Haas <robertmhaas@gmail.com> wrote:
Yeah. I don't think it's only non-core code we need to worry about
either. What if I just do EXPLAIN ANALYZE on a prepared query that
ends up pruning away some stuff? IIRC, the pruned subplans are not
shown, so we might escape disaster here, but FWIW if I'd committed
that code I would have pushed hard for showing those and saying "(not
executed)" .... so it's not too crazy to imagine a world in which
things work that way.
FWIW, that would remove the whole point in init run-time pruning. The
reason I made two phases of run-time pruning was so that we could get
away from having the init plan overhead of nodes we'll never need to
scan. If we wanted to show the (never executed) scans in EXPLAIN then
we'd need to do the init plan part and allocate all that memory
needlessly.
Imagine a hash partitioned table on "id" with 1000 partitions. The user does:
PREPARE q1 (INT) AS SELECT * FROM parttab WHERE id = $1;
EXECUTE q1(123);
Assuming a generic plan, if we didn't have init pruning then we have
to build a plan containing the scans for all 1000 partitions. There's
significant overhead to that compared to just locking the partitions,
and initialising 1 scan.
If it worked this way then we'd be even further from Amit's goal of
reducing the overhead of starting plan with run-time pruning nodes.
I understood at the time it was just the EXPLAIN output that you had
concerns with. I thought that was just around the lack of any display
of the condition we used for pruning.
David
On Sun, Feb 13, 2022 at 4:55 PM David Rowley <dgrowleyml@gmail.com> wrote:
FWIW, that would remove the whole point in init run-time pruning. The
reason I made two phases of run-time pruning was so that we could get
away from having the init plan overhead of nodes we'll never need to
scan. If we wanted to show the (never executed) scans in EXPLAIN then
we'd need to do the init plan part and allocate all that memory
needlessly.
Interesting. I didn't realize that was why it had ended up like this.
I understood at the time it was just the EXPLAIN output that you had
concerns with. I thought that was just around the lack of any display
of the condition we used for pruning.
That was part of it, but I did think it was surprising that we didn't
print anything at all about the nodes we pruned, too. Although we're
technically iterating over the PlanState, from the user perspective it
feels like you're asking PostgreSQL to print out the plan - so it
seems weird to have nodes in the Plan tree that are quietly omitted
from the output. That said, perhaps in retrospect it's good that it
ended up as it did, since we'd have a lot of trouble printing anything
sensible for a scan of a table that's since been dropped.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi Andres,
On Fri, Feb 11, 2022 at 10:29 AM Andres Freund <andres@anarazel.de> wrote:
On 2022-02-10 17:13:52 +0900, Amit Langote wrote:
The attached patch implements this idea. Sorry for the delay in
getting this out and thanks to Robert for the off-list discussions on
this.I did not follow this thread at all. And I only skimmed the patch. So I'm
probably wrong.
Thanks for your interest in this and sorry about the delay in replying
(have been away due to illness).
I'm a wary of this increasing executor overhead even in cases it won't
help. Without this patch, for simple queries, I see small allocations
noticeably in profiles. This adds a bunch more, even if
!context->stmt->usesPreExecPruning:
Ah, if any new stuff added by the patch runs in
!context->stmt->usesPreExecPruning paths, then it's just poor coding
on my part, which I'm now looking to fix. Maybe not all of it is
avoidable, but I think whatever isn't should be trivial...
- makeNode(ExecPrepContext)
- makeNode(ExecPrepOutput)
- palloc0(sizeof(PlanPrepOutput *) * result->numPlanNodes)
- stmt_execprep_list = lappend(stmt_execprep_list, execprep);
- AllocSetContextCreate(CurrentMemoryContext,
"CachedPlan execprep list", ...
- ...That's a lot of extra for something that's already a bottleneck.
If all these allocations are limited to the usesPreExecPruning path,
IMO, they would amount to trivial overhead compared to what is going
to be avoided -- locking say 1000 partitions when only 1 will be
scanned. Although, maybe there's a way to code this to have even less
overhead than what's in the patch now.
--
Amit Langote
EDB: http://www.enterprisedb.com
On Fri, Feb 11, 2022 at 7:02 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Feb 10, 2022 at 3:14 AM Amit Langote <amitlangote09@gmail.com> wrote:
Maybe this should be more than one patch? Say:
0001 to add ExecutorPrep and the boilerplate,
0002 to teach plancache.c to use the new facility
Thanks for taking a look and sorry about the delay.
Could be, not sure. I agree that if it's possible to split this in a
meaningful way, it would facilitate review. I notice that there is
some straight code movement e.g. the creation of
ExecPartitionPruneFixSubPlanIndexes. It would be best, I think, to do
pure code movement in a preparatory patch so that the main patch is
just adding the new stuff we need and not moving stuff around.
Okay, created 0001 for moving around the execution pruning code.
David Rowley recently proposed a patch for some parallel-safety
debugging cross checks which added a plan tree walker. I'm not sure
whether he's going to press that patch forward to commit, but I think
we should get something like that into the tree and start using it,
rather than adding more bespoke code. Maybe you/we should steal that
part of his patch and commit it separately.
I looked at the thread you mentioned (I guess [1]/messages/by-id/b59605fecb20ba9ea94e70ab60098c237c870628.camel@postgrespro.ru), though it seems
David's proposing a path_tree_walker(), so I guess only useful within
the planner and not here.
What I'm imagining is that
plan_tree_walker() would know which nodes have subnodes and how to
recurse over the tree structure, and you'd have a walker function to
use with it that would know which executor nodes have ExecPrep
functions and call them, and just do nothing for the others. That
would spare you adding stub functions for nodes that don't need to do
anything, or don't need to do anything other than recurse. Admittedly
it would look a bit different from the existing executor phases, but
I'd argue that it's a better coding model.Actually, you might've had this in the patch at some point, because
you have a declaration for plan_tree_walker but no implementation.
Right, the previous patch indeed used a plan_tree_walker() for this
and I think in a way you seem to think it should work.
I do agree that plan_tree_walker() allows for a better implementation
of the idea of this patch and may also be generally useful, so I've
created a separate patch that adds it to nodeFuncs.c.
I guess one thing that's a bit awkward about this idea is that in some
cases you want to recurse to some subnodes but not other subnodes. But
maybe it would work to put the recursion in the walker function in
that case, and then just return true; but if you want to walk all
children, return false.
Right, that's how I've made ExecPrepAppend() etc. do it.
+ bool contains_init_steps;
+ bool contains_exec_steps;s/steps/pruning/? maybe with contains -> needs or performs or requires as well?
Went with: needs_{init|exec}_pruning
+ * Returned information includes the set of RT indexes of relations referenced + * in the plan, and a PlanPrepOutput node for each node in the planTree if the + * node type supports producing one.Aren't all RT indexes referenced in the plan?
Ah yes. How about:
* Returned information includes the set of RT indexes of relations that must
* be locked to safely execute the plan,
+ * This may lock relations whose information may be used to produce the + * PlanPrepOutput nodes. For example, a partitioned table before perusing its + * PartitionPruneInfo contained in an Append node to do the pruning the result + * of which is used to populate the Append node's PlanPrepOutput."may lock" feels awfully fuzzy to me. How am I supposed to rely on
something that "may" happen? And don't we need to have tight logic
around locking, with specific guarantees about what is locked at which
points in the code and what is not?
Agree the wording was fuzzy. I've rewrote as:
* This locks relations whose information is needed to produce the
* PlanPrepOutput nodes. For example, a partitioned table before perusing its
* PartitionedRelPruneInfo contained in an Append node to do the pruning, the
* result of which is used to populate the Append node's PlanPrepOutput.
BTW, I've added an Assert in ExecGetRangeTableRelation():
/*
* A cross-check that AcquireExecutorLocks() hasn't missed any relations
* it must not have.
*/
Assert(estate->es_execprep == NULL ||
bms_is_member(rti, estate->es_execprep->relationRTIs));
which IOW ensures that the actual execution of a plan only sees
relations that ExecutorPrep() would've told AcquireExecutorLocks() to
take a lock on.
+ * At least one of 'planstate' or 'econtext' must be passed to be able to + * successfully evaluate any non-Const expressions contained in the + * steps.This also seems fuzzy. If I'm thinking of calling this function, I
don't know how I'd know whether this criterion is met.
OK, I have removed this comment (which was on top of a static local
function) in favor of adding some commentary on this in places where
it belongs. For example, in ExecPrepDoInitialPruning():
/*
* We don't yet have a PlanState for the parent plan node, so must create
* a standalone ExprContext to evaluate pruning expressions, equipped with
* the information about the EXTERN parameters that the caller passed us.
* Note that that's okay because the initial pruning steps does not
* involve anything that requires the execution to have started.
*/
econtext = CreateStandaloneExprContext();
econtext->ecxt_param_list_info = params;
prunestate = ExecCreatePartitionPruneState(NULL, pruneinfo,
true, false,
rtable, econtext,
pdir, parentrelids);
I don't love PlanPrepOutput the way you have it. I think one of the
basic design issues for this patch is: should we think of the prep
phase as specifically pruning, or is it general prep and pruning is
the first thing for which we're going to use it? If it's really a
pre-pruning phase, we could name it that way instead of calling it
"prep". If it's really a general prep phase, then why does
PlanPrepOutput contain initially_valid_subnodes as a field? One could
imagine letting each prep function decide what kind of prep node it
would like to return, with partition pruning being just one of the
options. But is that a useful generalization of the basic concept, or
just pretending that a special-purpose mechanism is more general than
it really is?
While it can feel like the latter TBH, I'm inclined to keep
ExecutorPrep generalized. What bothers me about about the
alternative of calling the new phase something less generalized like
ExecutorDoInitPruning() is that that makes the somewhat elaborate API
changes needed for the phase's output to put into QueryDesc, through
which it ultimately reaches the main executor, seem less worthwhile.
I agree that PlanPrepOutput design needs to be likewise generalized,
maybe like you suggest -- using PlanInitPruningOutput, a child class
of PlanPrepOutput, to return the prep output for plan nodes that
support pruning.
Thoughts?
+ return CreateQueryDesc(pstmt, NULL, /* XXX pass ExecPrepOutput too? */
It seems to me that we should do what the XXX suggests. It doesn't
seem nice if the parallel workers could theoretically decide to prune
a different set of nodes than the leader.
OK, will fix.
+ * known at executor startup (excludeing expressions containing
Extra e.
+ * into subplan indexes, is also returned for use during subsquent
Missing e.
Will fix.
Somewhere, we're going to need to document the idea that this may
permit us to execute a plan that isn't actually fully valid, but that
we expect to survive because we'll never do anything with the parts of
it that aren't. Maybe that should be added to the executor README, or
maybe there's some better place, but I don't think that should remain
something that's just implicit.
Agreed. I'd added a description of the new prep phase to executor
README, though the text didn't mention this particular bit. Will fix
to mention it.
This is not a full review, just some initial thoughts looking through this.
Thanks again. Will post a new version soon after a bit more polishing.
--
Amit Langote
EDB: http://www.enterprisedb.com
[1]: /messages/by-id/b59605fecb20ba9ea94e70ab60098c237c870628.camel@postgrespro.ru
On Mon, Mar 7, 2022 at 11:18 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Fri, Feb 11, 2022 at 7:02 AM Robert Haas <robertmhaas@gmail.com> wrote:
I don't love PlanPrepOutput the way you have it. I think one of the
basic design issues for this patch is: should we think of the prep
phase as specifically pruning, or is it general prep and pruning is
the first thing for which we're going to use it? If it's really a
pre-pruning phase, we could name it that way instead of calling it
"prep". If it's really a general prep phase, then why does
PlanPrepOutput contain initially_valid_subnodes as a field? One could
imagine letting each prep function decide what kind of prep node it
would like to return, with partition pruning being just one of the
options. But is that a useful generalization of the basic concept, or
just pretending that a special-purpose mechanism is more general than
it really is?While it can feel like the latter TBH, I'm inclined to keep
ExecutorPrep generalized. What bothers me about about the
alternative of calling the new phase something less generalized like
ExecutorDoInitPruning() is that that makes the somewhat elaborate API
changes needed for the phase's output to put into QueryDesc, through
which it ultimately reaches the main executor, seem less worthwhile.I agree that PlanPrepOutput design needs to be likewise generalized,
maybe like you suggest -- using PlanInitPruningOutput, a child class
of PlanPrepOutput, to return the prep output for plan nodes that
support pruning.Thoughts?
So I decided to agree with you after all about limiting the scope of
this new executor interface, or IOW call it what it is.
I have named it ExecutorGetLockRels() to go with the only use case we
know for it -- get the set of relations for AcquireExecutorLocks() to
lock to validate a plan tree. Its result returned in a node named
ExecLockRelsInfo, which contains the set of relations scanned in the
plan tree (lockrels) and a list of PlanInitPruningOutput nodes for all
nodes that undergo pruning.
+ return CreateQueryDesc(pstmt, NULL, /* XXX pass ExecPrepOutput too? */
It seems to me that we should do what the XXX suggests. It doesn't
seem nice if the parallel workers could theoretically decide to prune
a different set of nodes than the leader.OK, will fix.
Done. This required adding nodeToString() and stringToNode() support
for the nodes produced by the new executor function that wasn't there
before.
Somewhere, we're going to need to document the idea that this may
permit us to execute a plan that isn't actually fully valid, but that
we expect to survive because we'll never do anything with the parts of
it that aren't. Maybe that should be added to the executor README, or
maybe there's some better place, but I don't think that should remain
something that's just implicit.Agreed. I'd added a description of the new prep phase to executor
README, though the text didn't mention this particular bit. Will fix
to mention it.
Rewrote the comments above ExecutorGetLockRels() (previously
ExecutorPrep()) and the executor README text to be explicit about the
fact that not locking some relations effectively invalidates pruned
parts of the plan tree.
This is not a full review, just some initial thoughts looking through this.
Thanks again. Will post a new version soon after a bit more polishing.
Attached is v5, now broken into 3 patches:
0001: Some refactoring of runtime pruning code
0002: Add a plan_tree_walker
0003: Teach AcquireExecutorLocks to skip locking pruned relations
--
Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v5-0002-Add-a-plan_tree_walker.patchapplication/octet-stream; name=v5-0002-Add-a-plan_tree_walker.patchDownload
From 22ff31c7b052eabb32f4a529c48fe48180332156 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 3 Mar 2022 16:04:13 +0900
Subject: [PATCH v5 2/3] Add a plan_tree_walker()
Like planstate_tree_walker() but for uninitialized plan trees.
---
src/backend/nodes/nodeFuncs.c | 116 ++++++++++++++++++++++++++++++++++
src/include/nodes/nodeFuncs.h | 3 +
2 files changed, 119 insertions(+)
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 47d0564fa2..cdf937f127 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -31,6 +31,10 @@ static bool planstate_walk_subplans(List *plans, bool (*walker) (),
void *context);
static bool planstate_walk_members(PlanState **planstates, int nplans,
bool (*walker) (), void *context);
+static bool plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context);
+static bool plan_walk_members(List *plans, bool (*walker) (), void *context);
/*
@@ -4148,3 +4152,115 @@ planstate_walk_members(PlanState **planstates, int nplans,
return false;
}
+
+/*
+ * plan_tree_walker --- walk plantrees
+ *
+ * The walker has already visited the current node, and so we need only
+ * recurse into any sub-nodes it has.
+ */
+bool
+plan_tree_walker(Plan *plan,
+ bool (*walker) (),
+ void *context)
+{
+ /* Guard against stack overflow due to overly complex plan trees */
+ check_stack_depth();
+
+ /* initPlan-s */
+ if (plan_walk_subplans(plan->initPlan, walker, context))
+ return true;
+
+ /* lefttree */
+ if (outerPlan(plan))
+ {
+ if (walker(outerPlan(plan), context))
+ return true;
+ }
+
+ /* righttree */
+ if (innerPlan(plan))
+ {
+ if (walker(innerPlan(plan), context))
+ return true;
+ }
+
+ /* special child plans */
+ switch (nodeTag(plan))
+ {
+ case T_Append:
+ if (plan_walk_members(((Append *) plan)->appendplans,
+ walker, context))
+ return true;
+ break;
+ case T_MergeAppend:
+ if (plan_walk_members(((MergeAppend *) plan)->mergeplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapAnd:
+ if (plan_walk_members(((BitmapAnd *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapOr:
+ if (plan_walk_members(((BitmapOr *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_CustomScan:
+ if (plan_walk_members(((CustomScan *) plan)->custom_plans,
+ walker, context))
+ return true;
+ break;
+ case T_SubqueryScan:
+ if (walker(((SubqueryScan *) plan)->subplan, context))
+ return true;
+ break;
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * Walk a list of SubPlans (or initPlans, which also use SubPlan nodes).
+ */
+static bool
+plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context)
+{
+ ListCell *lc;
+ PlannedStmt *plannedstmt = (PlannedStmt *) context;
+
+ foreach(lc, plans)
+ {
+ SubPlan *sp = lfirst_node(SubPlan, lc);
+ Plan *p = list_nth(plannedstmt->subplans, sp->plan_id - 1);
+
+ if (walker(p, context))
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Walk the constituent plans of a ModifyTable, Append, MergeAppend,
+ * BitmapAnd, or BitmapOr node.
+ */
+static bool
+plan_walk_members(List *plans, bool (*walker) (), void *context)
+{
+ ListCell *lc;
+
+ foreach(lc, plans)
+ {
+ if (walker(lfirst(lc), context))
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/include/nodes/nodeFuncs.h b/src/include/nodes/nodeFuncs.h
index 93c60bde66..fca107ad65 100644
--- a/src/include/nodes/nodeFuncs.h
+++ b/src/include/nodes/nodeFuncs.h
@@ -158,5 +158,8 @@ extern bool raw_expression_tree_walker(Node *node, bool (*walker) (),
struct PlanState;
extern bool planstate_tree_walker(struct PlanState *planstate, bool (*walker) (),
void *context);
+struct Plan;
+extern bool plan_tree_walker(struct Plan *plan, bool (*walker) (),
+ void *context);
#endif /* NODEFUNCS_H */
--
2.24.1
v5-0003-Teach-AcquireExecutorLocks-to-skip-locking-pruned.patchapplication/octet-stream; name=v5-0003-Teach-AcquireExecutorLocks-to-skip-locking-pruned.patchDownload
From 62fd8ca887f62dcd89010bf4475529eb16f07d52 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v5 3/3] Teach AcquireExecutorLocks() to skip locking pruned
partitions
Instead of locking all relations listed in the range table, this
asks the new executor function ExecutorGetLockRels() to return a set
of relations (their RT indexes) to lock or simply use the set
given by PlannedStmt.lockrels. To wit, ExecutorGetLockRels() must be
called if some nodes in the plan tree contain initial pruning steps
(pruning steps containing expressions that can be computed before
before the executor proper has started), which results in the lockrels
set to be computed such that any subplans that are pruned as result of
doing initial pruning do not contribute any relations to the set.
That can result in a much smaller lockrels set when the plan contains
thousands of child subplans, of which only a small number remain
after pruning.
The result of doing the initial pruning during ExecutorGetLockRels()
is preserved for use later during actual execution by creating a
a new node called PlanInitPruningOutput for each plan node that
undergoes pruning and a set of those for the whole plan tree are
put into another new node ExecLockRelsInfo that represents the output
of a given ExecutorGetLockRels() invocation. ExecLockRelsInfos are
passed down the executor alongside the PlannedStmts. This
arrangement ensures that the set of plan tree nodes that
AcquireExecutorLocks() has acquired locks to protect and the one
that the executor will initialize and execute are one and the same.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 13 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 17 +-
src/backend/executor/README | 22 ++-
src/backend/executor/execMain.c | 181 +++++++++++++++++++
src/backend/executor/execParallel.c | 27 ++-
src/backend/executor/execPartition.c | 233 +++++++++++++++++++++----
src/backend/executor/execUtils.c | 8 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 42 ++++-
src/backend/executor/nodeMergeAppend.c | 42 ++++-
src/backend/executor/nodeModifyTable.c | 24 +++
src/backend/executor/spi.c | 14 +-
src/backend/nodes/copyfuncs.c | 50 +++++-
src/backend/nodes/outfuncs.c | 41 +++++
src/backend/nodes/readfuncs.c | 38 ++++
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/plan/setrefs.c | 10 ++
src/backend/partitioning/partprune.c | 37 +++-
src/backend/tcop/postgres.c | 15 +-
src/backend/tcop/pquery.c | 21 ++-
src/backend/utils/cache/plancache.c | 220 +++++++++++++++++++----
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execPartition.h | 2 +
src/include/executor/execdesc.h | 2 +
src/include/executor/executor.h | 2 +
src/include/executor/nodeAppend.h | 1 +
src/include/executor/nodeMergeAppend.h | 1 +
src/include/executor/nodeModifyTable.h | 1 +
src/include/nodes/execnodes.h | 87 +++++++++
src/include/nodes/nodes.h | 5 +
src/include/nodes/pathnodes.h | 7 +
src/include/nodes/plannodes.h | 18 ++
src/include/tcop/tcopprot.h | 2 +-
src/include/utils/plancache.h | 5 +
src/include/utils/portal.h | 5 +
41 files changed, 1108 insertions(+), 109 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 55c38b04c4..d403eb2309 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -542,7 +542,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index de81379da3..a9dc6d1755 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, ExecLockRelsInfo *execlockrelsinfo,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, execlockrelsinfo, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 1013790dbb..008b8ce0e9 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -741,8 +741,10 @@ execute_sql_string(const char *sql)
RawStmt *parsetree = lfirst_node(RawStmt, lc1);
MemoryContext per_parsetree_context,
oldcontext;
- List *stmt_list;
- ListCell *lc2;
+ List *stmt_list,
+ *execlockrelsinfo_list;
+ ListCell *lc2,
+ *lc3;
/*
* We do the work for each parsetree in a short-lived context, to
@@ -762,11 +764,13 @@ execute_sql_string(const char *sql)
NULL,
0,
NULL);
- stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL);
+ stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL,
+ &execlockrelsinfo_list);
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, execlockrelsinfo_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc3);
CommandCounterIncrement();
@@ -777,6 +781,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ execlockrelsinfo,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 05e7b60059..4ef44aaf23 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -416,7 +416,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 9902c5c566..85e73ddded 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -107,6 +107,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ list_make1(NULL), /* no ExecLockRelsInfo to pass */
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 80738547ed..bbbf8bbcbd 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *plan_execlockrelsinfo_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -195,6 +196,7 @@ ExecuteQuery(ParseState *pstate,
/* Replan if needed, and increment plan refcount for portal */
cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
plan_list = cplan->stmt_list;
+ plan_execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/*
* DO NOT add any logic that could possibly throw an error between
@@ -204,7 +206,7 @@ ExecuteQuery(ParseState *pstate,
NULL,
query_string,
entry->plansource->commandTag,
- plan_list,
+ plan_list, plan_execlockrelsinfo_list,
cplan);
/*
@@ -576,7 +578,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *plan_execlockrelsinfo_list;
+ ListCell *p,
+ *pe;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -632,15 +636,18 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ plan_execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pe, plan_execlockrelsinfo_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, pe);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, execlockrelsinfo, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index bf5e70860d..27341a2818 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -59,11 +59,20 @@ state tree. Read-only plan trees make life much simpler for plan caching and
reuse.
A corresponding executor state node may not be created during executor startup
-if the executor determines that an entire subplan is not required due to
-execution time partition pruning determining that no matching records will be
-found there. This currently only occurs for Append and MergeAppend nodes. In
-this case the non-required subplans are ignored and the executor state's
-subnode array will become out of sequence to the plan's subplan list.
+if the ExecutorGetLockRels() determines that an entire subplan is not required
+due to initial partition pruning determining that no matching records will be
+found there, while also skipping the locking of relation(s) that would be
+scanned by the subplan were it not pruned. This currently only occurs for
+Append and MergeAppend nodes (see ExecGet[Merge]AppendLockRels()). In this
+case, the non-required subplans are ignored and the executor state's subnode
+array will become out of sequence to the plan's subplan list.
+ExecutorGetLockRels() typically runs before the execution starts, for example,
+as part of checking if a cached generic plan is still valid, though the
+result it produces (ExecLockRelsInfo) is made available to ExecutorStart() via
+the QueryDesc. ExecInitNode() on the plan nodes whose child subplans may have
+been pruned as part of ExecutorGetLockRels() must look up the surviving set of
+subplans to initialize in the ExecLockRelsInfo, instead of reiterating the
+initial pruning computation.
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
@@ -247,6 +256,9 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorGetLockRels ] --- an optional step to walk over the plan tree
+ to produce an ExecLockRelsInfo to be passed to CreateQueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 549d9eb696..3b1f588321 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -48,11 +48,15 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/nodeAppend.h"
+#include "executor/nodeMergeAppend.h"
+#include "executor/nodeModifyTable.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
#include "parser/parsetree.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
@@ -100,9 +104,184 @@ static char *ExecBuildSlotValueDescription(Oid reloid,
Bitmapset *modifiedCols,
int maxfieldlen);
static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
+static bool ExecGetScanLockRels(Scan *scan, ExecGetLockRelsContext *context);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorGetLockRels
+ *
+ * Figure out the set of relations to lock to be able to execute a given
+ * plan, after taking into account the result of performing any initial
+ * pruning steps present in the plan. Performing those pruning steps
+ * would effectively invalidate the pruned subplans (that is, will not
+ * be looked at during the actual execution of the parent plan), so the
+ * relations that those subplans scan need not be locked.
+ *
+ * Along with the set of RT indexes of relations that must be locked, the
+ * returned struct also contains the information look up PlanInitPruningOutput
+ * nodes, containing the result of performing initial pruning (identities of
+ * surviving partition subnodes), for each plan node that undergoes pruning.
+ *
+ * The caller must arrange to pass on the returned struct down to the
+ * executor, so that the latter can reuse the result of initial pruning to
+ * initialize the same set of surviving subplans, instead of doing the pruning
+ * again by itself.
+ *
+ * This locks relations whose information is perused to do the pruning. For
+ * example, a partitioned table before perusing its PartitionedRelPruneInfo
+ * contained in an Append node to do pruning in ExecGetAppendLockRels().
+ */
+ExecLockRelsInfo *
+ExecutorGetLockRels(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ int numPlanNodes = plannedstmt->numPlanNodes;
+ ExecGetLockRelsContext context;
+ ExecLockRelsInfo *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ context.stmt = plannedstmt;
+ context.params = params;
+
+ /* Go do init pruning and fill lockrels. */
+ context.lockrels = NULL;
+ context.initPruningOutputs = NIL;
+ context.ipoIndexes = palloc0(sizeof(int) * numPlanNodes);
+ foreach(lc, plannedstmt->subplans)
+ {
+ Plan *subplan = lfirst(lc);
+
+ (void) ExecGetLockRels(subplan, &context);
+ }
+
+ (void) ExecGetLockRels(plannedstmt->planTree, &context);
+
+ result = makeNode(ExecLockRelsInfo);
+ result->lockrels = context.lockrels;
+ result->numPlanNodes = numPlanNodes;
+ result->initPruningOutputs = context.initPruningOutputs;
+ result->ipoIndexes = context.ipoIndexes;
+
+ return result;
+}
+
+/* ------------------------------------------------------------------------
+ * ExecGetLockRels
+ * Recursively find relations to lock in the plan tree rooted at 'node',
+ * performing initial pruning if the node contains the information to
+ * do so
+ *
+ * 'node' is the current node of the plan produced by the query planner
+ * 'context' contains the PlannedStmt and the information about EXTERN
+ * parameters to use for partition pruning and also where to add the
+ * result -- lockrels and PlanInitPruningOutput nodes
+ *
+ * NOTE: ExecGetLockRels subroutine for a given node must add the RT indexes of
+ * any relations that it manipulates to result->lockrels. If the node needs
+ * initial pruning, it must add the resulting PlanInitPruningOutput node to
+ * context using the ExecStorePlanInitPruningOutput() macro.
+ * ------------------------------------------------------------------------
+ */
+bool
+ExecGetLockRels(Plan *node, ExecGetLockRelsContext *context)
+{
+ /* Do nothing when we get to the end of a leaf on tree. */
+ if (node == NULL)
+ return true;
+
+ /* Make sure there's enough stack available. */
+ check_stack_depth();
+
+ switch (nodeTag(node))
+ {
+ case T_Append:
+ if (ExecGetAppendLockRels((Append *) node, context))
+ return true;
+ break;
+ case T_MergeAppend:
+ if (ExecGetMergeAppendLockRels((MergeAppend *) node, context))
+ return true;
+ break;
+
+ case T_SeqScan:
+ case T_SampleScan:
+ case T_IndexScan:
+ case T_IndexOnlyScan:
+ case T_BitmapIndexScan:
+ case T_BitmapHeapScan:
+ case T_TidScan:
+ case T_TidRangeScan:
+ case T_ForeignScan:
+ case T_SubqueryScan:
+ case T_CustomScan:
+ if (ExecGetScanLockRels((Scan *) node, context))
+ return true;
+ break;
+
+ case T_ModifyTable:
+ if (ExecGetModifyTableLockRels((ModifyTable *) node, context))
+ return true;
+ /* plan_tree_walker() will visit the subplan (outerNode) */
+ break;
+
+ default:
+ break;
+ }
+
+ return plan_tree_walker(node, ExecGetLockRels, (void *) context);
+}
+
+/*
+ * ExecGetScanLockRels
+ * Do ExecGetLockRels()'s work for a Scan plan
+ */
+static bool
+ExecGetScanLockRels(Scan *scan, ExecGetLockRelsContext *context)
+{
+ switch (nodeTag(scan))
+ {
+ case T_ForeignScan:
+ {
+ ForeignScan *fscan = (ForeignScan *) scan;
+
+ context->lockrels = bms_add_members(context->lockrels,
+ fscan->fs_relids);
+ }
+ break;
+
+ case T_SubqueryScan:
+ {
+ SubqueryScan *sscan = (SubqueryScan *) scan;
+
+ (void) ExecGetLockRels((Plan *) sscan->subplan, context);
+ }
+ break;
+
+ case T_CustomScan:
+ {
+ CustomScan *cscan = (CustomScan *) scan;
+ ListCell *lc;
+
+ context->lockrels = bms_add_members(context->lockrels,
+ cscan->custom_relids);
+ foreach(lc, cscan->custom_plans)
+ {
+ (void) ExecGetLockRels((Plan *) lfirst(lc), context);
+ }
+ }
+ break;
+
+ default:
+ context->lockrels = bms_add_member(context->lockrels,
+ scan->scanrelid);
+ break;
+ }
+
+ return true;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -804,6 +983,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ ExecLockRelsInfo *execlockrelsinfo = queryDesc->execlockrelsinfo;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -823,6 +1003,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_execlockrelsinfo = execlockrelsinfo;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 5dd8ab7db2..f27f85ab4f 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_EXECLOCKRELSINFO UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,8 +183,10 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->rtable = estate->es_range_table;
+ pstmt->lockrels = NULL;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
@@ -596,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *execlockrelsinfo_data;
+ char *execlockrelsinfo_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int execlockrelsinfo_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -630,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ execlockrelsinfo_data = nodeToString(estate->es_execlockrelsinfo);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -656,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized ExecLockRelsInfo. */
+ execlockrelsinfo_len = strlen(execlockrelsinfo_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, execlockrelsinfo_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -750,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized ExecLockRelsInfo */
+ execlockrelsinfo_space = shm_toc_allocate(pcxt->toc, execlockrelsinfo_len);
+ memcpy(execlockrelsinfo_space, execlockrelsinfo_data, execlockrelsinfo_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECLOCKRELSINFO,
+ execlockrelsinfo_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1231,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *execlockrelsinfospace;
char *paramspace;
PlannedStmt *pstmt;
+ ExecLockRelsInfo *execlockrelsinfo;
ParamListInfo paramLI;
char *queryString;
@@ -1243,12 +1263,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied ExecLockRelsInfo. */
+ execlockrelsinfospace = shm_toc_lookup(toc, PARALLEL_KEY_EXECLOCKRELSINFO,
+ false);
+ execlockrelsinfo = (ExecLockRelsInfo *) stringToNode(execlockrelsinfospace);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, execlockrelsinfo,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 21953f253b..db8c4cd719 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -24,6 +24,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -183,8 +184,14 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
-static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate);
+ PartitionPruneInfo *partitionpruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir,
+ Bitmapset **parentrelids);
+static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
+ PartitionPruneInfo *pruneinfo);
static void ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1483,8 +1490,9 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or even before during ExecutorGetLockRels().
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1503,6 +1511,10 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* updated to account for initial pruning having eliminated some of the
* subplans, if any.
*
+ * ExecGetLockRelsDoInitialPruning:
+ * Do initial pruning as part of ExecGetLockRels() on the parent plan
+ * node
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating all available
* expressions, that is, using execution pruning steps. This function can
@@ -1531,22 +1543,57 @@ ExecInitPartitionPruning(PlanState *planstate,
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ Plan *plan = planstate->plan;
+ PlanInitPruningOutput *initPruningOutput = NULL;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ if (estate->es_execlockrelsinfo)
+ {
+ initPruningOutput = (PlanInitPruningOutput *)
+ ExecFetchPlanInitPruningOutput(estate->es_execlockrelsinfo, plan);
- /*
- * Create the working data structure for pruning.
- */
- prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo);
+ Assert(initPruningOutput != NULL &&
+ IsA(initPruningOutput, PlanInitPruningOutput));
+ /* No need to do initial pruning again, only exec pruning. */
+ do_pruning = pruneinfo->needs_exec_pruning;
+ }
+
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PlanInitPruningOutput.
+ */
+ prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo,
+ initPruningOutput == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory,
+ NULL);
+ }
/*
* Perform an initial partition prune, if required.
*/
- if (prunestate->do_initial_prune)
+ if (initPruningOutput)
+ {
+ /* ExecutorGetLockRels() already did it for us! */
+ *initially_valid_subplans = initPruningOutput->initially_valid_subplans;
+ }
+ else if (prunestate && prunestate->do_initial_prune)
{
/* Determine which subplans survive initial pruning */
- *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate);
+ *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate,
+ pruneinfo);
}
else
{
@@ -1564,7 +1611,7 @@ ExecInitPartitionPruning(PlanState *planstate,
* invalid data in prunestate, because that data won't be consulted again
* (cf initial Assert in ExecFindMatchingSubPlans).
*/
- if (prunestate->do_exec_prune &&
+ if (prunestate && prunestate->do_exec_prune &&
bms_num_members(*initially_valid_subplans) < n_total_subplans)
ExecPartitionPruneFixSubPlanIndexes(prunestate,
*initially_valid_subplans,
@@ -1573,12 +1620,83 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecGetLockRelsDoInitialPruning
+ * Perform initial pruning as part of doing ExecGetLockRels() on the parent
+ * plan node
+ */
+Bitmapset *
+ExecGetLockRelsDoInitialPruning(Plan *plan, ExecGetLockRelsContext *context,
+ PartitionPruneInfo *pruneinfo)
+{
+ List *rtable = context->stmt->rtable;
+ ParamListInfo params = context->params;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ Bitmapset *parentrelids;
+ PartitionPruneState *prunestate;
+ PlanInitPruningOutput *initPruningOutput;
+
+ /*
+ * A temporary context to allocate stuff needded to run the pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so must create
+ * a standalone ExprContext to evaluate pruning expressions, equipped with
+ * the information about the EXTERN parameters that the caller passed us.
+ * Note that that's okay because the initial pruning steps do not contain
+ * anything that requires the execution to have started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = ExecCreatePartitionPruneState(NULL, pruneinfo,
+ true, false,
+ rtable, econtext,
+ pdir, &parentrelids);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the pruning and populate a PlanInitPruningOutput for this node. */
+ initPruningOutput = makeNode(PlanInitPruningOutput);
+ initPruningOutput->initially_valid_subplans =
+ ExecFindInitialMatchingSubPlans(prunestate, pruneinfo);
+ ExecStorePlanInitPruningOutput(context, initPruningOutput, plan);
+
+ /*
+ * Report parent partitioned tables as locking targets, though they
+ * would already be locked by ExecCreatePartitionPruneState().
+ */
+ Assert(bms_num_members(parentrelids) > 0);
+ context->lockrels = bms_add_members(context->lockrels, parentrelids);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return initPruningOutput->initially_valid_subplans;
+}
+
/*
* ExecCreatePartitionPruneState
* Build the data structure required for calling
* ExecFindInitialMatchingSubPlans and ExecFindMatchingSubPlans.
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'partitionpruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1590,26 +1708,35 @@ ExecInitPartitionPruning(PlanState *planstate,
* as children. The data stored in each PartitionedRelPruningData can be
* re-used each time we re-evaluate which partitions match the pruning steps
* provided in each PartitionedRelPruneInfo.
+ *
+ * The RT indexes of parent partitioned table that are locked here to peruse
+ * their PartitionedRelPruningInfo are returned in *parentrelids if asked
+ * for by the caller.
*/
static PartitionPruneState *
ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo)
+ PartitionPruneInfo *partitionpruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir,
+ Bitmapset **parentrelids)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(partitionpruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
+ if (parentrelids)
+ *parentrelids = NULL;
+
/*
* Allocate the data structure
*/
@@ -1656,19 +1783,58 @@ ExecCreatePartitionPruneState(PlanState *planstate,
PartitionedRelPruneInfo *pinfo = lfirst_node(PartitionedRelPruneInfo, lc2);
PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
Relation partrel;
+ bool close_partrel = false;
PartitionDesc partdesc;
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorGetLockRels() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ close_partrel = true;
+
+ /*
+ * Also report the partitioned table as having been locked.
+ * XXX - actually, *parentrelids set is later merged by the
+ * caller into the set of relations "to-be locked" by
+ * AcquireExecutorLocks(), thus causing the lock on this
+ * table to be requested again.
+ */
+ Assert(parentrelids != NULL);
+ *parentrelids = bms_add_member(*parentrelids, pinfo->rtindex);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (close_partrel)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1770,7 +1936,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
ExecInitPruningContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1780,7 +1946,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
ExecInitPruningContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -1899,7 +2065,8 @@ ExecInitPruningContext(PartitionPruneContext *context,
* is required.
*/
static Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
+ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
+ PartitionPruneInfo *pruneinfo)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -1909,8 +2076,8 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
Assert(prunestate->do_initial_prune);
/*
- * Switch to a temp context to avoid leaking memory in the executor's
- * query-lifespan memory context.
+ * Switch to a temp context to avoid leaking memory in the longer-term
+ * memory context.
*/
oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..7246f9175f 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,7 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_execlockrelsinfo = NULL;
estate->es_junkFilter = NULL;
@@ -785,6 +786,13 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rti > 0 && rti <= estate->es_range_table_size);
+ /*
+ * A cross-check that AcquireExecutorLocks() hasn't missed any relations
+ * it must not have.
+ */
+ Assert(estate->es_execlockrelsinfo == NULL ||
+ bms_is_member(rti, estate->es_execlockrelsinfo->lockrels));
+
rel = estate->es_relations[rti - 1];
if (rel == NULL)
{
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f9460ae506..a2182a6b1f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 5b6d3eb23b..966615f670 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -94,6 +94,45 @@ static bool ExecAppendAsyncRequest(AppendState *node, TupleTableSlot **result);
static void ExecAppendAsyncEventWait(AppendState *node);
static void classify_matching_subplans(AppendState *node);
+/* ----------------------------------------------------------------
+ * ExecGetAppendLockRels
+ * Do ExecGetLockRels()'s work for an Append plan
+ * ----------------------------------------------------------------
+ */
+bool
+ExecGetAppendLockRels(Append *node, ExecGetLockRelsContext *context)
+{
+ PartitionPruneInfo *pruneinfo = node->part_prune_info;
+
+ if (pruneinfo && pruneinfo->needs_init_pruning)
+ {
+ List *subplans = node->appendplans;
+ Bitmapset *validsubplans;
+ int i;
+
+ validsubplans = ExecGetLockRelsDoInitialPruning((Plan *) node,
+ context, pruneinfo);
+
+ /* Prep the surviving subplans. */
+ i = -1;
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ (void) ExecGetLockRels(subplan, context);
+ }
+
+ /* done with this node */
+ return true;
+ }
+
+ /*
+ * Look at all subplans, which the caller would do by calling
+ * plan_tree_walker() on the node.
+ */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitAppend
*
@@ -155,7 +194,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 9a9f29e845..869b836a14 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -54,6 +54,45 @@ typedef int32 SlotNumber;
static TupleTableSlot *ExecMergeAppend(PlanState *pstate);
static int heap_compare_slots(Datum a, Datum b, void *arg);
+/* ----------------------------------------------------------------
+ * ExecGetMergeAppendLockRels
+ * Do ExecGetLockRels()'s work for a MergeAppend plan
+ * ----------------------------------------------------------------
+ */
+bool
+ExecGetMergeAppendLockRels(MergeAppend *node, ExecGetLockRelsContext *context)
+{
+ PartitionPruneInfo *pruneinfo = node->part_prune_info;
+
+ if (pruneinfo && pruneinfo->needs_init_pruning)
+ {
+ List *subplans = node->mergeplans;
+ Bitmapset *validsubplans;
+ int i;
+
+ validsubplans = ExecGetLockRelsDoInitialPruning((Plan *) node,
+ context, pruneinfo);
+
+ /* Prep the surviving subplans. */
+ i = -1;
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ (void) ExecGetLockRels(subplan, context);
+ }
+
+ /* done with this node */
+ return true;
+ }
+
+ /*
+ * Look at all subplans, which the caller would do by calling
+ * plan_tree_walker() on the node.
+ */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitMergeAppend
@@ -103,7 +142,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 5ec699a9bd..c860045fcb 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2700,6 +2700,30 @@ ExecLookupResultRelByOid(ModifyTableState *node, Oid resultoid,
return NULL;
}
+/*
+ * ExecGetModifyTableLockRels
+ * Do ExecGetLockRels()'s work for a ModifyTable plan
+ */
+bool
+ExecGetModifyTableLockRels(ModifyTable *plan, ExecGetLockRelsContext *context)
+{
+ ListCell *lc;
+
+ if (plan->rootRelation > 0)
+ context->lockrels = bms_add_member(context->lockrels,
+ plan->rootRelation);
+ context->lockrels = bms_add_member(context->lockrels,
+ plan->nominalRelation);
+ foreach(lc, plan->resultRelations)
+ {
+ context->lockrels = bms_add_member(context->lockrels,
+ lfirst_int(lc));
+ }
+
+ /* caller will look at the source subplan */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitModifyTable
* ----------------------------------------------------------------
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index a82e986667..2107009591 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *execlockrelsinfo_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1659,6 +1660,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
/* Replan if needed, and increment plan refcount for portal */
cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ execlockrelsinfo_list = cplan->execlockrelsinfo_list;
if (!plan->saved)
{
@@ -1670,6 +1672,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
oldcontext = MemoryContextSwitchTo(portal->portalContext);
stmt_list = copyObject(stmt_list);
+ execlockrelsinfo_list = copyObject(execlockrelsinfo_list);
MemoryContextSwitchTo(oldcontext);
ReleaseCachedPlan(cplan, NULL);
cplan = NULL; /* portal shouldn't depend on cplan */
@@ -1683,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ execlockrelsinfo_list,
cplan);
/*
@@ -2473,7 +2477,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *execlockrelsinfo_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2552,6 +2558,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2589,9 +2596,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, execlockrelsinfo_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2671,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, execlockrelsinfo,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index d4f8455a2b..68c664070c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -68,6 +68,13 @@
} \
} while (0)
+/* Copy a field that is an array with numElem ints */
+#define COPY_INT_ARRAY(fldname, numElem) \
+ do { \
+ newnode->fldname = (numElem) > 0 ? palloc((numElem) * sizeof(int)) : NULL; \
+ memcpy(newnode->fldname, from->fldname, sizeof(int) * (numElem)); \
+ } while (0)
+
/* Copy a parse location field (for Copy, this is same as scalar case) */
#define COPY_LOCATION_FIELD(fldname) \
(newnode->fldname = from->fldname)
@@ -94,9 +101,12 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(transientPlan);
COPY_SCALAR_FIELD(dependsOnRole);
COPY_SCALAR_FIELD(parallelModeNeeded);
+ COPY_SCALAR_FIELD(containsInitialPruning);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
+ COPY_SCALAR_FIELD(numPlanNodes);
COPY_NODE_FIELD(rtable);
+ COPY_BITMAPSET_FIELD(lockrels);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
COPY_NODE_FIELD(subplans);
@@ -1278,6 +1288,8 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(needs_init_pruning);
+ COPY_SCALAR_FIELD(needs_exec_pruning);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
@@ -4941,6 +4953,33 @@ _copyExtensibleNode(const ExtensibleNode *from)
return newnode;
}
+/* ****************************************************************
+ * execnodes.h copy functions
+ * ****************************************************************
+ */
+static ExecLockRelsInfo *
+_copyExecLockRelsInfo(const ExecLockRelsInfo *from)
+{
+ ExecLockRelsInfo *newnode = makeNode(ExecLockRelsInfo);
+
+ COPY_BITMAPSET_FIELD(lockrels);
+ COPY_SCALAR_FIELD(numPlanNodes);
+ COPY_NODE_FIELD(initPruningOutputs);
+ COPY_INT_ARRAY(ipoIndexes, from->numPlanNodes);
+
+ return newnode;
+}
+
+static PlanInitPruningOutput *
+_copyPlanInitPruningOutput(const PlanInitPruningOutput *from)
+{
+ PlanInitPruningOutput *newnode = makeNode(PlanInitPruningOutput);
+
+ COPY_BITMAPSET_FIELD(initially_valid_subplans);
+
+ return newnode;
+}
+
/* ****************************************************************
* value.h copy functions
* ****************************************************************
@@ -4995,7 +5034,6 @@ _copyBitString(const BitString *from)
return newnode;
}
-
static ForeignKeyCacheInfo *
_copyForeignKeyCacheInfo(const ForeignKeyCacheInfo *from)
{
@@ -5944,6 +5982,16 @@ copyObjectImpl(const void *from)
retval = _copyPublicationTable(from);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_ExecLockRelsInfo:
+ retval = _copyExecLockRelsInfo(from);
+ break;
+ case T_PlanInitPruningOutput:
+ retval = _copyPlanInitPruningOutput(from);
+ break;
+
/*
* MISCELLANEOUS NODES
*/
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 6bdad462c7..e0e09d7abd 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -312,9 +312,12 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(transientPlan);
WRITE_BOOL_FIELD(dependsOnRole);
WRITE_BOOL_FIELD(parallelModeNeeded);
+ WRITE_BOOL_FIELD(containsInitialPruning);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
+ WRITE_INT_FIELD(numPlanNodes);
WRITE_NODE_FIELD(rtable);
+ WRITE_BITMAPSET_FIELD(lockrels);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
WRITE_NODE_FIELD(subplans);
@@ -1004,6 +1007,8 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(needs_init_pruning);
+ WRITE_BOOL_FIELD(needs_exec_pruning);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -2274,6 +2279,7 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_NODE_FIELD(subplans);
WRITE_BITMAPSET_FIELD(rewindPlanIDs);
WRITE_NODE_FIELD(finalrtable);
+ WRITE_BITMAPSET_FIELD(lockrels);
WRITE_NODE_FIELD(finalrowmarks);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
@@ -2697,6 +2703,31 @@ _outExtensibleNode(StringInfo str, const ExtensibleNode *node)
methods->nodeOut(str, node);
}
+/*****************************************************************************
+ *
+ * Stuff from execnodes.h
+ *
+ *****************************************************************************/
+
+static void
+_outExecLockRelsInfo(StringInfo str, const ExecLockRelsInfo *node)
+{
+ WRITE_NODE_TYPE("EXECLOCKRELSINFO");
+
+ WRITE_BITMAPSET_FIELD(lockrels);
+ WRITE_INT_FIELD(numPlanNodes);
+ WRITE_NODE_FIELD(initPruningOutputs);
+ WRITE_INT_ARRAY(ipoIndexes, node->numPlanNodes);
+}
+
+static void
+_outPlanInitPruningOutput(StringInfo str, const PlanInitPruningOutput *node)
+{
+ WRITE_NODE_TYPE("PARTITIONINITPRUNINGOUTPUT");
+
+ WRITE_BITMAPSET_FIELD(initially_valid_subplans);
+}
+
/*****************************************************************************
*
* Stuff from parsenodes.h.
@@ -4538,6 +4569,16 @@ outNode(StringInfo str, const void *obj)
_outPartitionRangeDatum(str, obj);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_ExecLockRelsInfo:
+ _outExecLockRelsInfo(str, obj);
+ break;
+ case T_PlanInitPruningOutput:
+ _outPlanInitPruningOutput(str, obj);
+ break;
+
default:
/*
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 3f68f7c18d..41ded72c4c 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1585,9 +1585,12 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(transientPlan);
READ_BOOL_FIELD(dependsOnRole);
READ_BOOL_FIELD(parallelModeNeeded);
+ READ_BOOL_FIELD(containsInitialPruning);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
+ READ_INT_FIELD(numPlanNodes);
READ_NODE_FIELD(rtable);
+ READ_BITMAPSET_FIELD(lockrels);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
READ_NODE_FIELD(subplans);
@@ -2534,6 +2537,8 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(needs_init_pruning);
+ READ_BOOL_FIELD(needs_exec_pruning);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
@@ -2703,6 +2708,35 @@ _readPartitionRangeDatum(void)
READ_DONE();
}
+/*
+ * _readExecLockRelsInfo
+ */
+static ExecLockRelsInfo *
+_readExecLockRelsInfo(void)
+{
+ READ_LOCALS(ExecLockRelsInfo);
+
+ READ_BITMAPSET_FIELD(lockrels);
+ READ_INT_FIELD(numPlanNodes);
+ READ_NODE_FIELD(initPruningOutputs);
+ READ_INT_ARRAY(ipoIndexes, local_node->numPlanNodes);
+
+ READ_DONE();
+}
+
+/*
+ * _readPlanInitPruningOutput
+ */
+static PlanInitPruningOutput *
+_readPlanInitPruningOutput(void)
+{
+ READ_LOCALS(PlanInitPruningOutput);
+
+ READ_BITMAPSET_FIELD(initially_valid_subplans);
+
+ READ_DONE();
+}
+
/*
* parseNodeString
*
@@ -2974,6 +3008,10 @@ parseNodeString(void)
return_value = _readPartitionBoundSpec();
else if (MATCH("PARTITIONRANGEDATUM", 19))
return_value = _readPartitionRangeDatum();
+ else if (MATCH("EXECLOCKRELSINFO", 16))
+ return_value = _readExecLockRelsInfo();
+ else if (MATCH("PARTITIONINITPRUNINGOUTPUT", 26))
+ return_value = _readPlanInitPruningOutput();
else
{
elog(ERROR, "badly formatted node string \"%.32s\"...", token);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index bd09f85aea..9e41bbd228 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -517,8 +517,11 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->transientPlan = glob->transientPlan;
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->planTree = top_plan;
+ result->numPlanNodes = glob->lastPlanNodeId;
result->rtable = glob->finalrtable;
+ result->lockrels = glob->lockrels;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index a7b11b7f03..cee8c570fd 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -483,6 +483,7 @@ static void
add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte)
{
RangeTblEntry *newrte;
+ Index rti = list_length(glob->finalrtable) + 1;
/* flat copy to duplicate all the scalar fields */
newrte = (RangeTblEntry *) palloc(sizeof(RangeTblEntry));
@@ -517,7 +518,10 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte)
* but it would probably cost more cycles than it would save.
*/
if (newrte->rtekind == RTE_RELATION)
+ {
+ glob->lockrels = bms_add_member(glob->lockrels, rti);
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
+ }
}
/*
@@ -1548,6 +1552,9 @@ set_append_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (aplan->part_prune_info->needs_init_pruning)
+ root->glob->containsInitialPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
@@ -1620,6 +1627,9 @@ set_mergeappend_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (mplan->part_prune_info->needs_init_pruning)
+ root->glob->containsInitialPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 7080cb25d9..3322dc79f2 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -230,6 +232,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -309,12 +313,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -323,6 +331,10 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+ if (!needs_init_pruning)
+ needs_init_pruning = partrel_needs_init_pruning;
+ if (!needs_exec_pruning)
+ needs_exec_pruning = partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -337,6 +349,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -435,13 +449,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -452,6 +471,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -539,6 +562,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * by noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -613,6 +639,11 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ if (!*needs_init_pruning)
+ *needs_init_pruning = (initial_pruning_steps != NIL);
+ if (!*needs_exec_pruning)
+ *needs_exec_pruning = (exec_pruning_steps != NIL);
+
pinfolist = lappend(pinfolist, pinfo);
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index ba2fcfeb4a..085eb3f209 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -945,15 +945,17 @@ pg_plan_query(Query *querytree, const char *query_string, int cursorOptions,
* For normal optimizable statements, invoke the planner. For utility
* statements, just make a wrapper PlannedStmt node.
*
- * The result is a list of PlannedStmt nodes.
+ * The result is a list of PlannedStmt nodes. Also, a NULL is appended to
+ * *execlockrelsinfo_list for each PlannedStmt added to the returned list.
*/
List *
pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
- ParamListInfo boundParams)
+ ParamListInfo boundParams, List **execlockrelsinfo_list)
{
List *stmt_list = NIL;
ListCell *query_list;
+ *execlockrelsinfo_list = NIL;
foreach(query_list, querytrees)
{
Query *query = lfirst_node(Query, query_list);
@@ -977,6 +979,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
}
stmt_list = lappend(stmt_list, stmt);
+ *execlockrelsinfo_list = lappend(*execlockrelsinfo_list, NULL);
}
return stmt_list;
@@ -1080,7 +1083,8 @@ exec_simple_query(const char *query_string)
QueryCompletion qc;
MemoryContext per_parsetree_context = NULL;
List *querytree_list,
- *plantree_list;
+ *plantree_list,
+ *plantree_execlockrelsinfo_list;
Portal portal;
DestReceiver *receiver;
int16 format;
@@ -1167,7 +1171,8 @@ exec_simple_query(const char *query_string)
NULL, 0, NULL);
plantree_list = pg_plan_queries(querytree_list, query_string,
- CURSOR_OPT_PARALLEL_OK, NULL);
+ CURSOR_OPT_PARALLEL_OK, NULL,
+ &plantree_execlockrelsinfo_list);
/*
* Done with the snapshot used for parsing/planning.
@@ -1203,6 +1208,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ plantree_execlockrelsinfo_list,
NULL);
/*
@@ -1991,6 +1997,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ cplan->execlockrelsinfo_list,
cplan);
/* Done with the snapshot used for parameter I/O and parsing/planning */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5f907831a3..972ddc014e 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->execlockrelsinfo = execlockrelsinfo; /* ExecutorGetLockRels() output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +124,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * execlockrelsinfo: ExecutorGetLockRels() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +137,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +149,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, execlockrelsinfo, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -490,6 +494,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ linitial_node(ExecLockRelsInfo, portal->execlockrelsinfos),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1190,7 +1195,8 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *stmtlist_item,
+ *execlockrelsinfolist_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1211,9 +1217,12 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ forboth(stmtlist_item, portal->stmts,
+ execlockrelsinfolist_item, portal->execlockrelsinfos)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo,
+ execlockrelsinfolist_item);
/*
* If we got a cancel signal in prior command, quit
@@ -1271,7 +1280,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, execlockrelsinfo,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1280,7 +1289,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, execlockrelsinfo,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 4cf6db504f..c40a6f19d6 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,15 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static List *AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams);
+static void ReleaseExecutorLocks(List *stmt_list, List *execlockrelsinfo_list);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -782,6 +783,47 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
return tlist;
}
+/*
+ * CachedPlanSaveExecLockRelsInfos
+ * Save the list containing ExecLockRelsInfo nodes in the given CachedPlan
+ *
+ * The provided list is copied into a dedicated context that is a child of
+ * plan->context.
+ */
+static void
+CachedPlanSaveExecLockRelsInfos(CachedPlan *plan, List *execlockrelsinfo_list)
+{
+ MemoryContext execlockrelsinfo_context = plan->execlockrelsinfo_context,
+ oldcontext = CurrentMemoryContext;
+ List *execlockrelsinfo_list_copy;
+
+ /*
+ * Set up the dedicated context if not already done, saving it as a child
+ * of the CachedPlan's context.
+ */
+ if (execlockrelsinfo_context == NULL)
+ {
+ execlockrelsinfo_context = AllocSetContextCreate(CurrentMemoryContext,
+ "CachedPlan execlockrelsinfo list",
+ ALLOCSET_START_SMALL_SIZES);
+ MemoryContextSetParent(execlockrelsinfo_context, plan->context);
+ MemoryContextSetIdentifier(execlockrelsinfo_context, plan->context->ident);
+ plan->execlockrelsinfo_context = execlockrelsinfo_context;
+ }
+ else
+ {
+ /* Just clear existing contents by resetting the context. */
+ Assert(MemoryContextIsValid(execlockrelsinfo_context));
+ MemoryContextReset(execlockrelsinfo_context);
+ }
+
+ MemoryContextSwitchTo(execlockrelsinfo_context);
+ execlockrelsinfo_list_copy = copyObject(execlockrelsinfo_list);
+ MemoryContextSwitchTo(oldcontext);
+
+ plan->execlockrelsinfo_list = execlockrelsinfo_list_copy;
+}
+
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
@@ -790,9 +832,17 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * If the CachedPlan is valid, this calls ExecutorGetLockRels on each
+ * PlannedStmt contained in it to determine the set of relations to lock by
+ * AcquireExecutorLocks(). Resulting ExecLockRelsInfo nodes, allocated in a
+ * child context of the context containing the plan itself, are added into
+ * plan->execlockrelsinfo_list. ExecLockRelsInfo nodes that may be present
+ * in the list from the last invocation of CheckCachedPlan() on the same
+ * CachedPlan are deleted.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
CachedPlan *plan = plansource->gplan;
@@ -820,13 +870,22 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *execlockrelsinfo_list;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This also invokes
+ * ExecutorGetLockRels() to do initial partition pruning on the plan
+ * tree iff some nodes in it are marked as needing it. Relations whose
+ * scan nodes are pruned as a result of that are not locked here.
+ */
+ execlockrelsinfo_list = AcquireExecutorLocks(plan->stmt_list,
+ boundParams);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -844,11 +903,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (plan->is_valid)
{
/* Successfully revalidated and locked the query. */
+
+ /* Remember ExecLockRelsInfos in the CachedPlan. */
+ CachedPlanSaveExecLockRelsInfos(plan, execlockrelsinfo_list);
return true;
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, execlockrelsinfo_list);
}
/*
@@ -880,7 +942,8 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv)
{
CachedPlan *plan;
- List *plist;
+ List *plist,
+ *execlockrelsinfo_list;
bool snapshot_set;
bool is_transient;
MemoryContext plan_context;
@@ -933,7 +996,8 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
* Generate the plan.
*/
plist = pg_plan_queries(qlist, plansource->query_string,
- plansource->cursor_options, boundParams);
+ plansource->cursor_options, boundParams,
+ &execlockrelsinfo_list);
/* Release snapshot if we got one */
if (snapshot_set)
@@ -1002,6 +1066,11 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->is_saved = false;
plan->is_valid = true;
+ /* Save the dummy ExecLockRelsInfo list. */
+ plan->execlockrelsinfo_context = NULL;
+ CachedPlanSaveExecLockRelsInfos(plan, execlockrelsinfo_list);
+ Assert(MemoryContextIsValid(plan->execlockrelsinfo_context));
+
/* assign generation number to new plan */
plan->generation = ++(plansource->generation);
@@ -1160,7 +1229,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1366,7 +1435,6 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
foreach(lc, plan->stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc);
- ListCell *lc2;
if (plannedstmt->commandType == CMD_UTILITY)
return false;
@@ -1375,13 +1443,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
* We have to grovel through the rtable because it's likely to contain
* an RTE_RESULT relation, rather than being totally empty.
*/
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (rte->rtekind == RTE_RELATION)
- return false;
- }
+ if (!bms_is_empty(plannedstmt->lockrels))
+ return false;
}
/*
@@ -1737,17 +1800,22 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * Returns a list of ExecLockRelsInfo nodes containing one element for each
+ * PlannedStmt in stmt_list; NULL when the latter is utility statement or
+ * its containsInitialPruning is false.
*/
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+static List *
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams)
{
ListCell *lc1;
+ List *execlockrelsinfo_list = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ ExecLockRelsInfo *execlockrelsinfo = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,27 +1829,113 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
- continue;
+ ScanQueryForLocks(query, true);
}
-
- foreach(lc2, plannedstmt->rtable)
+ else
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (rte->rtekind != RTE_RELATION)
- continue;
+ Bitmapset *lockrels;
/*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
*/
- if (acquire)
+ if (!plannedstmt->containsInitialPruning)
+ {
+ /*
+ * If the plan contains no initial pruning steps, the executor
+ * would just need to lock whatever relations the planner would
+ * have locked to make the plan.
+ */
+ lockrels = plannedstmt->lockrels;
+ }
+ else
+ {
+ /*
+ * Ask the executor to perform initial pruning steps to skip
+ * relations that are pruned away.
+ */
+ execlockrelsinfo = ExecutorGetLockRels(plannedstmt, boundParams);
+ lockrels = execlockrelsinfo->lockrels;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockrels, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID.
+ * Note that we don't actually try to open the rel, and hence
+ * will not fail if it's been dropped entirely --- we'll just
+ * transiently acquire a non-conflicting lock.
+ */
LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+
+ /*
+ * Remember ExecLockRelsInfo for later adding to the QueryDesc that
+ * will be passed to the executor when executing this plan. May be
+ * NULL, but must keep the list the same length as stmt_list.
+ */
+ execlockrelsinfo_list = lappend(execlockrelsinfo_list,
+ execlockrelsinfo);
+ }
+
+ return execlockrelsinfo_list;
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *execlockrelsinfo_list)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, execlockrelsinfo_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ }
+ else
+ {
+ Bitmapset *lockrels;
+
+ if (execlockrelsinfo == NULL)
+ lockrels = plannedstmt->lockrels;
else
+ lockrels = execlockrelsinfo->lockrels;
+
+ rti = -1;
+ while ((rti = bms_next_member(lockrels, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index d549f66d4a..896f51be08 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -285,6 +285,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *execlockrelsinfos,
CachedPlan *cplan)
{
AssertArg(PortalIsValid(portal));
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->qc.nprocessed = 0;
portal->commandTag = commandTag;
portal->stmts = stmts;
+ portal->execlockrelsinfos = execlockrelsinfos;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 666977fb1f..fef75ba147 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, ExecLockRelsInfo *execlockrelsinfo,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index fd5735a946..ded19b8cbb 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -124,4 +124,6 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
PartitionPruneInfo *pruneinfo,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
+extern Bitmapset *ExecGetLockRelsDoInitialPruning(Plan *plan, ExecGetLockRelsContext *context,
+ PartitionPruneInfo *pruneinfo);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..4338463479 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ ExecLockRelsInfo *execlockrelsinfo; /* ExecutorGetLockRels()'s output given plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +58,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 344399f6a8..5959d67221 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern ExecLockRelsInfo *ExecutorGetLockRels(PlannedStmt *plannedstmt, ParamListInfo params);
+extern bool ExecGetLockRels(Plan *node, ExecGetLockRelsContext *context);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 4cb78ee5b6..b53535c2a4 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -17,6 +17,7 @@
#include "access/parallel.h"
#include "nodes/execnodes.h"
+extern bool ExecGetAppendLockRels(Append *node, ExecGetLockRelsContext *context);
extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
extern void ExecEndAppend(AppendState *node);
extern void ExecReScanAppend(AppendState *node);
diff --git a/src/include/executor/nodeMergeAppend.h b/src/include/executor/nodeMergeAppend.h
index 97fe3b0665..8eb4e9df93 100644
--- a/src/include/executor/nodeMergeAppend.h
+++ b/src/include/executor/nodeMergeAppend.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern bool ExecGetMergeAppendLockRels(MergeAppend *node, ExecGetLockRelsContext *context);
extern MergeAppendState *ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags);
extern void ExecEndMergeAppend(MergeAppendState *node);
extern void ExecReScanMergeAppend(MergeAppendState *node);
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 1d225bc88d..5006499088 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -19,6 +19,7 @@ extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
EState *estate, TupleTableSlot *slot,
CmdType cmdtype);
+extern bool ExecGetModifyTableLockRels(ModifyTable *plan, ExecGetLockRelsContext *context);
extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
extern void ExecEndModifyTable(ModifyTableState *node);
extern void ExecReScanModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index dd95dc40c7..718603d400 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -570,6 +570,7 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct ExecLockRelsInfo *es_execlockrelsinfo; /* QueryDesc.execlockrelsinfo */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -958,6 +959,92 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * ExecLockRelsInfo
+ *
+ * Result of performing ExecutorGetLockRels() for a given PlannedStmt
+ */
+typedef struct ExecLockRelsInfo
+{
+ NodeTag type;
+
+ /*
+ * Relations that must be locked to execute the plan tree contained in
+ * the PlannedStmt.
+ */
+ Bitmapset *lockrels;
+
+ /* PlannedStmt.numPlanNodes */
+ int numPlanNodes;
+
+ /*
+ * List of PlanInitPruningOutput, each representing the output of
+ * performing initial pruning on a given plan node, for all nodes in the
+ * plan tree that have been marked as needing initial pruning.
+ *
+ * 'ipoIndexes' is an array of 'numPlanNodes' elements, indexed with
+ * plan_node_id of the individual nodes in the plan tree, each a 1-based
+ * index into 'initPruningOutputs' list for a given plan node. 0 means
+ * that a given plan node has no entry in the list because of not needing
+ * any initial pruning done on it.
+ */
+ List *initPruningOutputs;
+ int *ipoIndexes;
+} ExecLockRelsInfo;
+
+/*----------------
+ * ExecGetLockRelsContext
+ *
+ * Context information for performing ExecutorGetLockRels() on a given plan
+ */
+typedef struct ExecGetLockRelsContext
+{
+ NodeTag type;
+
+ PlannedStmt *stmt; /* target plan */
+ ParamListInfo params; /* EXTERN parameters to prune with */
+
+ /* Output parameters for ExecGetLockRels and its subroutines. */
+ Bitmapset *lockrels;
+
+ /* See above comment. */
+ List *initPruningOutputs;
+ int *ipoIndexes;
+} ExecGetLockRelsContext;
+
+#define ExecStorePlanInitPruningOutput(prepcxt, initPruningOutput, plannode) \
+ do { \
+ (prepcxt)->initPruningOutputs = lappend((prepcxt)->initPruningOutputs, initPruningOutput); \
+ (prepcxt)->ipoIndexes[(plannode)->plan_node_id] = list_length((prepcxt)->initPruningOutputs); \
+ } while (0)
+
+#define ExecFetchPlanInitPruningOutput(prepres, plannode) \
+ (((prepres) != NULL && (prepres)->initPruningOutputs != NIL) ? \
+ list_nth((prepres)->initPruningOutputs, \
+ (prepres)->ipoIndexes[(plannode)->plan_node_id] - 1) : NULL)
+
+/* ---------------
+ * PlanInitPruningOutput
+ *
+ * Node to remember the result of performing initial partition pruning steps
+ * during ExecutorGetLockRels() on nodes that support pruning.
+ *
+ * ExecLockRelsDoInitPruning(), which runs during ExecutorGetLockRels(),
+ * creates it and stores it in the corresponding ExecLockRelsInfo.
+ *
+ * ExecInitPartitionPruning(), which runs during ExecuorStart(), fetches it
+ * from the EState's ExecLockRelsInfo (if any) and uses the value of
+ * initially_valid_subplans contained in it as-is to select the subplans to be
+ * initialized for execution, instead of re-evaluating that by performing
+ * initial pruning again.
+ */
+typedef struct PlanInitPruningOutput
+{
+ NodeTag type;
+
+ Bitmapset *initially_valid_subplans;
+} PlanInitPruningOutput;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 5d075f0c34..d365fc4402 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -96,6 +96,11 @@ typedef enum NodeTag
T_PartitionPruneStepCombine,
T_PlanInvalItem,
+ /* TAGS FOR EXECUTOR PREP NODES (execnodes.h) */
+ T_ExecGetLockRelsContext,
+ T_ExecLockRelsInfo,
+ T_PlanInitPruningOutput,
+
/*
* TAGS FOR PLAN STATE NODES (execnodes.h)
*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1f3845b3fe..96c652ebaf 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -101,6 +101,9 @@ typedef struct PlannerGlobal
List *finalrtable; /* "flat" rangetable for executor */
+ Bitmapset *lockrels; /* Indexes of RTE_RELATION entries in range
+ * table */
+
List *finalrowmarks; /* "flat" list of PlanRowMarks */
List *resultRelations; /* "flat" list of integer RT indexes */
@@ -129,6 +132,10 @@ typedef struct PlannerGlobal
char maxParallelHazard; /* worst PROPARALLEL hazard level */
+ bool containsInitialPruning; /* Do some Plan nodes in the tree
+ * have initial (pre-exec) pruning
+ * steps? */
+
PartitionDirectory partition_directory; /* partition descriptors */
} PlannerGlobal;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 0b518ce6b2..5a8c34bdf6 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -59,12 +59,21 @@ typedef struct PlannedStmt
bool parallelModeNeeded; /* parallel mode required to execute? */
+ bool containsInitialPruning; /* Do some Plan nodes in the tree
+ * have initial (pre-exec) pruning
+ * steps? */
+
int jitFlags; /* which forms of JIT should be performed */
struct Plan *planTree; /* tree of Plan nodes */
+ int numPlanNodes; /* number of nodes in planTree */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *lockrels; /* Indexes of RTE_RELATION entries in range
+ * table */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1172,6 +1181,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1180,6 +1196,8 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 92291a750d..bf80c53bed 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -64,7 +64,7 @@ extern PlannedStmt *pg_plan_query(Query *querytree, const char *query_string,
ParamListInfo boundParams);
extern List *pg_plan_queries(List *querytrees, const char *query_string,
int cursorOptions,
- ParamListInfo boundParams);
+ ParamListInfo boundParams, List **execlockrelsinfo_list);
extern bool check_max_stack_depth(int *newval, void **extra, GucSource source);
extern void assign_max_stack_depth(int newval, void *extra);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 95b99e3d25..2a847f54da 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -148,6 +148,9 @@ typedef struct CachedPlan
{
int magic; /* should equal CACHEDPLAN_MAGIC */
List *stmt_list; /* list of PlannedStmts */
+ List *execlockrelsinfo_list; /* list of ExecutorGetLockRelsResult with one
+ * element for each of stmt_list; NIL
+ * if not a generic plan */
bool is_oneshot; /* is it a "oneshot" plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
@@ -158,6 +161,8 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+ MemoryContext execlockrelsinfo_context; /* context containing execlockrelsinfo_list,
+ * a child of the above context */
} CachedPlan;
/*
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9abace6734 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,10 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *execlockrelsinfos; /* list of ExecutorGetLockRelsResults with one element
+ * for each of 'stmts'; same as
+ * cplan->execlockrelsinfo_list if cplan is
+ * not NULL */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -241,6 +245,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *execlockrelsinfos,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.24.1
v5-0001-Some-refactoring-of-runtime-pruning-code.patchapplication/octet-stream; name=v5-0001-Some-refactoring-of-runtime-pruning-code.patchDownload
From 1164015d8561151d1fb5d861b236961e237102ff Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 2 Mar 2022 15:17:55 +0900
Subject: [PATCH v5 1/3] Some refactoring of runtime pruning code
This does two things mainly:
* Move the execution pruning initialization steps that are common
between both ExecInitAppend() and ExecInitMergeAppend() into a new
function ExecInitPartitionPruning() defined in execPartition.c.
Thus, ExecFindInitialMatchingSubPlans() need not be exported.
* Add an ExprContext field to PartitionPruneContext to remove the
implicit assumption in the runtime pruning code that the ExprContext
to use to compute pruning expressions that need one can always rely
on the PlanState providing it. A future patch will allow runtime
pruning (at least the initial pruning steps) to be performed without
the corresponding PlanState yet having been created, so this will
help.
---
src/backend/executor/execPartition.c | 340 ++++++++++++++++---------
src/backend/executor/nodeAppend.c | 33 +--
src/backend/executor/nodeMergeAppend.c | 32 +--
src/backend/partitioning/partprune.c | 20 +-
src/include/executor/execPartition.h | 9 +-
src/include/partitioning/partprune.h | 2 +
6 files changed, 255 insertions(+), 181 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 90ed1485d1..21953f253b 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,11 +182,18 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
bool *isnull,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
+static PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *partitionpruneinfo);
+static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate);
static void ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate);
+ PlanState *planstate,
+ ExprContext *econtext);
+static void ExecPartitionPruneFixSubPlanIndexes(PartitionPruneState *prunestate,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans);
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
@@ -1485,30 +1492,87 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
*
* Functions:
*
- * ExecCreatePartitionPruneState:
- * Creates the PartitionPruneState required by each of the two pruning
- * functions. Details stored include how to map the partition index
- * returned by the partition pruning code into subplan indexes.
- *
- * ExecFindInitialMatchingSubPlans:
- * Returns indexes of matching subplans. Partition pruning is attempted
- * without any evaluation of expressions containing PARAM_EXEC Params.
- * This function must be called during executor startup for the parent
- * plan before the subplans themselves are initialized. Subplans which
- * are found not to match by this function must be removed from the
- * plan's list of subplans during execution, as this function performs a
- * remap of the partition index to subplan index map and the newly
- * created map provides indexes only for subplans which remain after
- * calling this function.
+ * ExecInitPartitionPruning:
+ * Sets up run-time pruning data structure (PartitionPruneState) that is
+ * needed by each of the two pruning functions. Also determines the set
+ * of initially valid subplans by performing initial pruning steps,
+ * telling the caller (such as ExecInitAppend) to initialize only those
+ * for execution. Maps in PartitionPruneState that are used to map the
+ * partition indexes returned by partprune.c functions into the indexes
+ * of partition's subplans in the parent node (such as Append) are
+ * updated to account for initial pruning having eliminated some of the
+ * subplans, if any.
*
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating all available
- * expressions. This function can only be called during execution and
- * must be called again each time the value of a Param listed in
- * PartitionPruneState's 'execparamids' changes.
+ * expressions, that is, using execution pruning steps. This function can
+ * can only be called during execution and must be called again each time
+ * the value of a Param listed in PartitionPruneState's 'execparamids'
+ * changes.
*-------------------------------------------------------------------------
*/
+/*
+ * ExecInitPartitionPruning
+ * Initialize data structure needed for run-time partition pruning
+ *
+ * Initial pruning can be done immediately, so it is done here if needed and
+ * the set of surviving partition subplans' indexes are added to the output
+ * parameter *initially_valid_subplans. If subplans are indeed pruned,
+ * subplan_map arrays contained in the returned PartitionPruneState are
+ * re-sequenced to not count those, though only if the maps will be needed
+ * for subsequent execution pruning passes.
+ */
+PartitionPruneState *
+ExecInitPartitionPruning(PlanState *planstate,
+ int n_total_subplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **initially_valid_subplans)
+{
+ PartitionPruneState *prunestate;
+ EState *estate = planstate->state;
+
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /*
+ * Create the working data structure for pruning.
+ */
+ prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo);
+
+ /*
+ * Perform an initial partition prune, if required.
+ */
+ if (prunestate->do_initial_prune)
+ {
+ /* Determine which subplans survive initial pruning */
+ *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate);
+ }
+ else
+ {
+ /* We'll need to initialize all subplans */
+ Assert(n_total_subplans > 0);
+ *initially_valid_subplans = bms_add_range(NULL, 0,
+ n_total_subplans - 1);
+ }
+
+ /*
+ * Re-sequence subplan indexes contained in prunestate to account for any
+ * that were removed above due to initial pruning.
+ *
+ * We can safely skip this when !do_exec_prune, even though that leaves
+ * invalid data in prunestate, because that data won't be consulted again
+ * (cf initial Assert in ExecFindMatchingSubPlans).
+ */
+ if (prunestate->do_exec_prune &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ ExecPartitionPruneFixSubPlanIndexes(prunestate,
+ *initially_valid_subplans,
+ n_total_subplans);
+
+ return prunestate;
+}
+
/*
* ExecCreatePartitionPruneState
* Build the data structure required for calling
@@ -1527,7 +1591,7 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* re-used each time we re-evaluate which partitions match the pruning steps
* provided in each PartitionedRelPruneInfo.
*/
-PartitionPruneState *
+static PartitionPruneState *
ExecCreatePartitionPruneState(PlanState *planstate,
PartitionPruneInfo *partitionpruneinfo)
{
@@ -1536,6 +1600,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
int n_part_hierarchies;
ListCell *lc;
int i;
+ ExprContext *econtext = planstate->ps_ExprContext;
/* For data reading, executor always omits detached partitions */
if (estate->es_partition_directory == NULL)
@@ -1709,7 +1774,8 @@ ExecCreatePartitionPruneState(PlanState *planstate,
{
ExecInitPruningContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether initial pruning is needed at any level */
prunestate->do_initial_prune = true;
}
@@ -1718,7 +1784,8 @@ ExecCreatePartitionPruneState(PlanState *planstate,
{
ExecInitPruningContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether exec pruning is needed at any level */
prunestate->do_exec_prune = true;
}
@@ -1746,7 +1813,8 @@ ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate)
+ PlanState *planstate,
+ ExprContext *econtext)
{
int n_steps;
int partnatts;
@@ -1767,6 +1835,7 @@ ExecInitPruningContext(PartitionPruneContext *context,
context->ppccontext = CurrentMemoryContext;
context->planstate = planstate;
+ context->exprcontext = econtext;
/* Initialize expression state for each expression we need */
context->exprstates = (ExprState **)
@@ -1795,8 +1864,20 @@ ExecInitPruningContext(PartitionPruneContext *context,
step->step.step_id,
keyno);
- context->exprstates[stateidx] =
- ExecInitExpr(expr, context->planstate);
+ /*
+ * When planstate is NULL, pruning_steps is known not to
+ * contain any expressions that depend on the parent plan.
+ * Information of any available EXTERN parameters must be
+ * passed explicitly in that case, which the caller must
+ * have made available via econtext.
+ */
+ if (planstate == NULL)
+ context->exprstates[stateidx] =
+ ExecInitExprWithParams(expr,
+ econtext->ecxt_param_list_info);
+ else
+ context->exprstates[stateidx] =
+ ExecInitExpr(expr, context->planstate);
}
keyno++;
}
@@ -1816,11 +1897,9 @@ ExecInitPruningContext(PartitionPruneContext *context,
*
* Must only be called once per 'prunestate', and only if initial pruning
* is required.
- *
- * 'nsubplans' must be passed as the total number of unpruned subplans.
*/
-Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
+static Bitmapset *
+ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -1845,14 +1924,20 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
PartitionedRelPruningData *pprune;
prunedata = prunestate->partprunedata[i];
+
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
pprune = &prunedata->partrelprunedata[0];
/* Perform pruning without using PARAM_EXEC Params */
find_matching_subplans_recurse(prunedata, pprune, true, &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /* Expression eval may have used space in ExprContext too */
if (pprune->initial_pruning_steps)
- ResetExprContext(pprune->initial_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->initial_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
@@ -1865,118 +1950,120 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
MemoryContextReset(prunestate->prune_context);
+ return result;
+}
+
+/*
+ * ExecPartitionPruneFixSubPlanIndexes
+ * Fix mapping of partition indexes to subplan indexes contained in
+ * prunestate by considering the new list of subplans that survived
+ * initial pruning
+ *
+ * Subplans would be previously indexed 0..(n_total_subplans - 1), though
+ * now should be changed to index range 0..num(initially_valid_subplans).
+ */
+static void
+ExecPartitionPruneFixSubPlanIndexes(PartitionPruneState *prunestate,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans)
+{
+ int *new_subplan_indexes;
+ Bitmapset *new_other_subplans;
+ int i;
+ int newidx;
+
/*
- * If exec-time pruning is required and we pruned subplans above, then we
- * must re-sequence the subplan indexes so that ExecFindMatchingSubPlans
- * properly returns the indexes from the subplans which will remain after
- * execution of this function.
- *
- * We can safely skip this when !do_exec_prune, even though that leaves
- * invalid data in prunestate, because that data won't be consulted again
- * (cf initial Assert in ExecFindMatchingSubPlans).
+ * First we must build a temporary array which maps old subplan
+ * indexes to new ones. For convenience of initialization, we use
+ * 1-based indexes in this array and leave pruned items as 0.
*/
- if (prunestate->do_exec_prune && bms_num_members(result) < nsubplans)
+ new_subplan_indexes = (int *) palloc0(sizeof(int) * n_total_subplans);
+ newidx = 1;
+ i = -1;
+ while ((i = bms_next_member(initially_valid_subplans, i)) >= 0)
{
- int *new_subplan_indexes;
- Bitmapset *new_other_subplans;
- int i;
- int newidx;
+ Assert(i < n_total_subplans);
+ new_subplan_indexes[i] = newidx++;
+ }
- /*
- * First we must build a temporary array which maps old subplan
- * indexes to new ones. For convenience of initialization, we use
- * 1-based indexes in this array and leave pruned items as 0.
- */
- new_subplan_indexes = (int *) palloc0(sizeof(int) * nsubplans);
- newidx = 1;
- i = -1;
- while ((i = bms_next_member(result, i)) >= 0)
- {
- Assert(i < nsubplans);
- new_subplan_indexes[i] = newidx++;
- }
+ /*
+ * Now we can update each PartitionedRelPruneInfo's subplan_map with
+ * new subplan indexes. We must also recompute its present_parts
+ * bitmap.
+ */
+ for (i = 0; i < prunestate->num_partprunedata; i++)
+ {
+ PartitionPruningData *prunedata = prunestate->partprunedata[i];
+ int j;
/*
- * Now we can update each PartitionedRelPruneInfo's subplan_map with
- * new subplan indexes. We must also recompute its present_parts
- * bitmap.
+ * Within each hierarchy, we perform this loop in back-to-front
+ * order so that we determine present_parts for the lowest-level
+ * partitioned tables first. This way we can tell whether a
+ * sub-partitioned table's partitions were entirely pruned so we
+ * can exclude it from the current level's present_parts.
*/
- for (i = 0; i < prunestate->num_partprunedata; i++)
+ for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
{
- PartitionPruningData *prunedata = prunestate->partprunedata[i];
- int j;
+ PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
+ int nparts = pprune->nparts;
+ int k;
- /*
- * Within each hierarchy, we perform this loop in back-to-front
- * order so that we determine present_parts for the lowest-level
- * partitioned tables first. This way we can tell whether a
- * sub-partitioned table's partitions were entirely pruned so we
- * can exclude it from the current level's present_parts.
- */
- for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
- {
- PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
- int nparts = pprune->nparts;
- int k;
+ /* We just rebuild present_parts from scratch */
+ bms_free(pprune->present_parts);
+ pprune->present_parts = NULL;
- /* We just rebuild present_parts from scratch */
- bms_free(pprune->present_parts);
- pprune->present_parts = NULL;
+ for (k = 0; k < nparts; k++)
+ {
+ int oldidx = pprune->subplan_map[k];
+ int subidx;
- for (k = 0; k < nparts; k++)
+ /*
+ * If this partition existed as a subplan then change the
+ * old subplan index to the new subplan index. The new
+ * index may become -1 if the partition was pruned above,
+ * or it may just come earlier in the subplan list due to
+ * some subplans being removed earlier in the list. If
+ * it's a subpartition, add it to present_parts unless
+ * it's entirely pruned.
+ */
+ if (oldidx >= 0)
{
- int oldidx = pprune->subplan_map[k];
- int subidx;
-
- /*
- * If this partition existed as a subplan then change the
- * old subplan index to the new subplan index. The new
- * index may become -1 if the partition was pruned above,
- * or it may just come earlier in the subplan list due to
- * some subplans being removed earlier in the list. If
- * it's a subpartition, add it to present_parts unless
- * it's entirely pruned.
- */
- if (oldidx >= 0)
- {
- Assert(oldidx < nsubplans);
- pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
+ Assert(oldidx < n_total_subplans);
+ pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
- if (new_subplan_indexes[oldidx] > 0)
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
- }
- else if ((subidx = pprune->subpart_map[k]) >= 0)
- {
- PartitionedRelPruningData *subprune;
+ if (new_subplan_indexes[oldidx] > 0)
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
+ }
+ else if ((subidx = pprune->subpart_map[k]) >= 0)
+ {
+ PartitionedRelPruningData *subprune;
- subprune = &prunedata->partrelprunedata[subidx];
+ subprune = &prunedata->partrelprunedata[subidx];
- if (!bms_is_empty(subprune->present_parts))
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
- }
+ if (!bms_is_empty(subprune->present_parts))
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
}
}
}
+ }
- /*
- * We must also recompute the other_subplans set, since indexes in it
- * may change.
- */
- new_other_subplans = NULL;
- i = -1;
- while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
- new_other_subplans = bms_add_member(new_other_subplans,
- new_subplan_indexes[i] - 1);
-
- bms_free(prunestate->other_subplans);
- prunestate->other_subplans = new_other_subplans;
+ /*
+ * We must also recompute the other_subplans set, since indexes in it
+ * may change.
+ */
+ new_other_subplans = NULL;
+ i = -1;
+ while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
+ new_other_subplans = bms_add_member(new_other_subplans,
+ new_subplan_indexes[i] - 1);
- pfree(new_subplan_indexes);
- }
+ bms_free(prunestate->other_subplans);
+ prunestate->other_subplans = new_other_subplans;
- return result;
+ pfree(new_subplan_indexes);
}
/*
@@ -2018,11 +2105,16 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
prunedata = prunestate->partprunedata[i];
pprune = &prunedata->partrelprunedata[0];
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
find_matching_subplans_recurse(prunedata, pprune, false, &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
- ResetExprContext(pprune->exec_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->exec_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 7937f1c88f..5b6d3eb23b 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -138,30 +138,17 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
{
PartitionPruneState *prunestate;
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &appendstate->ps);
-
- /* Create the working data structure for pruning. */
- prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
- node->part_prune_info);
+ /*
+ * Set up pruning data structure. Initial pruning steps, if any, are
+ * performed as part of the setup, adding the set of indexes of
+ * surviving subplans to 'validsubplans'.
+ */
+ prunestate = ExecInitPartitionPruning(&appendstate->ps,
+ list_length(node->appendplans),
+ node->part_prune_info,
+ &validsubplans);
appendstate->as_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
- {
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->appendplans));
-
- nplans = bms_num_members(validsubplans);
- }
- else
- {
- /* We'll need to initialize all subplans */
- nplans = list_length(node->appendplans);
- Assert(nplans > 0);
- validsubplans = bms_add_range(NULL, 0, nplans - 1);
- }
+ nplans = bms_num_members(validsubplans);
/*
* When no run-time pruning is required and there's at least one
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 418f89dea8..9a9f29e845 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -86,29 +86,17 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
{
PartitionPruneState *prunestate;
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &mergestate->ps);
-
- prunestate = ExecCreatePartitionPruneState(&mergestate->ps,
- node->part_prune_info);
+ /*
+ * Set up pruning data structure. Initial pruning steps, if any, are
+ * performed as part of the setup, adding the set of indexes of
+ * surviving subplans to 'validsubplans'.
+ */
+ prunestate = ExecInitPartitionPruning(&mergestate->ps,
+ list_length(node->mergeplans),
+ node->part_prune_info,
+ &validsubplans);
mergestate->ms_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
- {
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->mergeplans));
-
- nplans = bms_num_members(validsubplans);
- }
- else
- {
- /* We'll need to initialize all subplans */
- nplans = list_length(node->mergeplans);
- Assert(nplans > 0);
- validsubplans = bms_add_range(NULL, 0, nplans - 1);
- }
+ nplans = bms_num_members(validsubplans);
/*
* When no run-time pruning is required and there's at least one
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 1bc00826c1..7080cb25d9 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -798,6 +798,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
/* These are not valid when being called from the planner */
context.planstate = NULL;
+ context.exprcontext = NULL;
context.exprstates = NULL;
/* Actual pruning happens here. */
@@ -808,8 +809,8 @@ prune_append_rel_partitions(RelOptInfo *rel)
* get_matching_partitions
* Determine partitions that survive partition pruning
*
- * Note: context->planstate must be set to a valid PlanState when the
- * pruning_steps were generated with a target other than PARTTARGET_PLANNER.
+ * Note: context->exprcontext must be valid when the pruning_steps were
+ * generated with a target other than PARTTARGET_PLANNER.
*
* Returns a Bitmapset of the RelOptInfo->part_rels indexes of the surviving
* partitions.
@@ -3654,7 +3655,7 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
* exprstate array.
*
* Note that the evaluated result may be in the per-tuple memory context of
- * context->planstate->ps_ExprContext, and we may have leaked other memory
+ * context->exprcontext, and we may have leaked other memory
* there too. This memory must be recovered by resetting that ExprContext
* after we're done with the pruning operation (see execPartition.c).
*/
@@ -3677,13 +3678,18 @@ partkey_datum_from_expr(PartitionPruneContext *context,
ExprContext *ectx;
/*
- * We should never see a non-Const in a step unless we're running in
- * the executor.
+ * We should never see a non-Const in a step unless the caller has
+ * passed a valid ExprContext.
+ *
+ * When context->planstate is valid, context->exprcontext is same
+ * as context->planstate->ps_ExprContext.
*/
- Assert(context->planstate != NULL);
+ Assert(context->planstate != NULL || context->exprcontext != NULL);
+ Assert(context->planstate == NULL ||
+ (context->exprcontext == context->planstate->ps_ExprContext));
exprstate = context->exprstates[stateidx];
- ectx = context->planstate->ps_ExprContext;
+ ectx = context->exprcontext;
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 603d8becc4..fd5735a946 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -119,10 +119,9 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
EState *estate);
extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
PartitionTupleRouting *proute);
-extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
+extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
+ int n_total_subplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
-extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
- int nsubplans);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index ee11b6feae..90684efa25 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -41,6 +41,7 @@ struct RelOptInfo;
* subsidiary data, such as the FmgrInfos.
* planstate Points to the parent plan node's PlanState when called
* during execution; NULL when called from the planner.
+ * exprcontext ExprContext to use when evaluating pruning expressions
* exprstates Array of ExprStates, indexed as per PruneCxtStateIdx; one
* for each partition key in each pruning step. Allocated if
* planstate is non-NULL, otherwise NULL.
@@ -56,6 +57,7 @@ typedef struct PartitionPruneContext
FmgrInfo *stepcmpfuncs;
MemoryContext ppccontext;
PlanState *planstate;
+ ExprContext *exprcontext;
ExprState **exprstates;
} PartitionPruneContext;
--
2.24.1
On Fri, Mar 11, 2022 at 11:35 PM Amit Langote <amitlangote09@gmail.com> wrote:
Attached is v5, now broken into 3 patches:
0001: Some refactoring of runtime pruning code
0002: Add a plan_tree_walker
0003: Teach AcquireExecutorLocks to skip locking pruned relations
Repeated the performance tests described in the 1st email of this thread:
HEAD: (copied from the 1st email)
32 tps = 20561.776403 (without initial connection time)
64 tps = 12553.131423 (without initial connection time)
128 tps = 13330.365696 (without initial connection time)
256 tps = 8605.723120 (without initial connection time)
512 tps = 4435.951139 (without initial connection time)
1024 tps = 2346.902973 (without initial connection time)
2048 tps = 1334.680971 (without initial connection time)
Patched v1: (copied from the 1st email)
32 tps = 27554.156077 (without initial connection time)
64 tps = 27531.161310 (without initial connection time)
128 tps = 27138.305677 (without initial connection time)
256 tps = 25825.467724 (without initial connection time)
512 tps = 19864.386305 (without initial connection time)
1024 tps = 18742.668944 (without initial connection time)
2048 tps = 16312.412704 (without initial connection time)
Patched v5:
32 tps = 28204.197738 (without initial connection time)
64 tps = 26795.385318 (without initial connection time)
128 tps = 26387.920550 (without initial connection time)
256 tps = 25601.141556 (without initial connection time)
512 tps = 19911.947502 (without initial connection time)
1024 tps = 20158.692952 (without initial connection time)
2048 tps = 16180.195463 (without initial connection time)
Good to see that these rewrites haven't really hurt the numbers much,
which makes sense because the rewrites have really been about putting
the code in the right place.
BTW, these are the numbers for the same benchmark repeated with
plan_cache_mode = auto, which causes a custom plan to be chosen for
every execution and so unaffected by this patch.
32 tps = 13359.225082 (without initial connection time)
64 tps = 15760.533280 (without initial connection time)
128 tps = 15825.734482 (without initial connection time)
256 tps = 15017.693905 (without initial connection time)
512 tps = 13479.973395 (without initial connection time)
1024 tps = 13200.444397 (without initial connection time)
2048 tps = 12884.645475 (without initial connection time)
Comparing them to numbers when using force_generic_plan shows that
making the generic plans faster is indeed worthwhile.
--
Amit Langote
EDB: http://www.enterprisedb.com
Hi,
w.r.t. v5-0003-Teach-AcquireExecutorLocks-to-skip-locking-pruned.patch :
(pruning steps containing expressions that can be computed before
before the executor proper has started)
the word 'before' was repeated.
For ExecInitParallelPlan():
+ char *execlockrelsinfo_data;
+ char *execlockrelsinfo_space;
the content of execlockrelsinfo_data is copied into execlockrelsinfo_space.
I wonder if having one of execlockrelsinfo_data and
execlockrelsinfo_space suffices.
Cheers
Import Notes
Resolved by subject fallback
On Fri, Mar 11, 2022 at 9:35 AM Amit Langote <amitlangote09@gmail.com> wrote:
Attached is v5, now broken into 3 patches:
0001: Some refactoring of runtime pruning code
0002: Add a plan_tree_walker
0003: Teach AcquireExecutorLocks to skip locking pruned relations
So is any other committer planning to look at this? Tom, perhaps?
David? This strikes me as important work, and I don't mind going
through and trying to do some detailed review, but (A) I am not the
person most familiar with the code being modified here and (B) there
are some important theoretical questions about the approach that we
might want to try to cover before we get down into the details.
In my opinion, the most important theoretical issue here is around
reuse of plans that are no longer entirely valid, but the parts that
are no longer valid are certain to be pruned. If, because we know that
some parameter has some particular value, we skip locking a bunch of
partitions, then when we're executing the plan, those partitions need
not exist any more -- or they could have different indexes, be
detached from the partitioning hierarchy and subsequently altered,
whatever. That seems fine to me provided that all of our code (and any
third-party code) is careful not to rely on the portion of the plan
that we've pruned away, and doesn't assume that (for example) we can
still fetch the name of an index whose OID appears in there someplace.
I cannot think of a hazard where the fact that the part of a plan is
no longer valid because some DDL has been executed "infects" the
remainder of the plan. As long as we lock the partitioned tables named
in the plan and their descendents down to the level just above the one
at which something is pruned, and are careful, I think we should be
OK. It would be nice to know if someone has a fundamentally different
view of the hazards here, though.
Just to state my position here clearly, I would be more than happy if
somebody else plans to pick this up and try to get some or all of it
committed, and will cheerfully defer to such person in the event that
they have that plan. If, however, no such person exists, I may try my
hand at that myself.
Thanks,
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
In my opinion, the most important theoretical issue here is around
reuse of plans that are no longer entirely valid, but the parts that
are no longer valid are certain to be pruned. If, because we know that
some parameter has some particular value, we skip locking a bunch of
partitions, then when we're executing the plan, those partitions need
not exist any more -- or they could have different indexes, be
detached from the partitioning hierarchy and subsequently altered,
whatever.
Check.
That seems fine to me provided that all of our code (and any
third-party code) is careful not to rely on the portion of the plan
that we've pruned away, and doesn't assume that (for example) we can
still fetch the name of an index whose OID appears in there someplace.
... like EXPLAIN, for example?
If "pruning" means physical removal from the plan tree, then it's
probably all right. However, it looks to me like that doesn't
actually happen, or at least doesn't happen till much later, so
there's room for worry about a disconnect between what plancache.c
has verified and what executor startup will try to touch. As you
say, in the absence of any bugs, that's not a problem ... but if
there are such bugs, tracking them down would be really hard.
What I am skeptical about is that this work actually accomplishes
anything under real-world conditions. That's because if pruning would
save enough to make skipping the lock-acquisition phase worth the
trouble, the plan cache is almost certainly going to decide it should
be using a custom plan not a generic plan. Now if we had a better
cost model (or, indeed, any model at all) for run-time pruning effects
then maybe that situation could be improved. I think we'd be better
served to worry about that end of it before we spend more time making
the executor even less predictable.
Also, while I've not spent much time at all reading this patch,
it seems rather desperately undercommented, and a lot of the
new names are unintelligible. In particular, I suspect that the
patch is significantly redesigning when/where run-time pruning
happens (unless it's just letting that be run twice); but I don't
see any documentation or name changes suggesting where that
responsibility is now.
regards, tom lane
On Mon, Mar 14, 2022 at 3:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
... like EXPLAIN, for example?
Exactly! I think that's the foremost example, but extension modules
like auto_explain or even third-party extensions are also a risk. I
think there was some discussion of this previously.
If "pruning" means physical removal from the plan tree, then it's
probably all right. However, it looks to me like that doesn't
actually happen, or at least doesn't happen till much later, so
there's room for worry about a disconnect between what plancache.c
has verified and what executor startup will try to touch. As you
say, in the absence of any bugs, that's not a problem ... but if
there are such bugs, tracking them down would be really hard.
Surgery on the plan would violate the general principle that plans are
read only once constructed. I think the idea ought to be to pass a
secondary data structure around with the plan that defines which parts
you must ignore. Any code that fails to use that other data structure
in the appropriate manner gets defined to be buggy and has to be fixed
by making it follow the new rules.
What I am skeptical about is that this work actually accomplishes
anything under real-world conditions. That's because if pruning would
save enough to make skipping the lock-acquisition phase worth the
trouble, the plan cache is almost certainly going to decide it should
be using a custom plan not a generic plan. Now if we had a better
cost model (or, indeed, any model at all) for run-time pruning effects
then maybe that situation could be improved. I think we'd be better
served to worry about that end of it before we spend more time making
the executor even less predictable.
I don't agree with that analysis, because setting plan_cache_mode is
not uncommon. Even if that GUC didn't exist, I'm pretty sure there are
cases where the planner naturally falls into a generic plan anyway,
even though pruning is happening. But as it is, the GUC does exist,
and people use it. Consequently, while I'd love to see something done
about the costing side of things, I do not accept that all other
improvements should wait for that to happen.
Also, while I've not spent much time at all reading this patch,
it seems rather desperately undercommented, and a lot of the
new names are unintelligible. In particular, I suspect that the
patch is significantly redesigning when/where run-time pruning
happens (unless it's just letting that be run twice); but I don't
see any documentation or name changes suggesting where that
responsibility is now.
I am sympathetic to that concern. I spent a while staring at a
baffling comment in 0001 only to discover it had just been moved from
elsewhere. I really don't feel that things in this are as clear as
they could be -- although I hasten to add that I respect the people
who have done work in this area previously and am grateful for what
they did. It's been a huge benefit to the project in spite of the
bumps in the road. Moreover, this isn't the only code in PostgreSQL
that needs improvement, or the worst. That said, I do think there are
problems. I don't yet have a position on whether this patch is making
that better or worse.
That said, I believe that the core idea of the patch is to optionally
perform pruning before we acquire locks or spin up the main executor
and then remember the decisions we made. If once the main executor is
spun up we already made those decisions, then we must stick with what
we decided. If not, we make those pruning decisions at the same point
we do currently - more or less on demand, at the point when we'd need
to know whether to descend that branch of the plan tree or not. I
think this scheme comes about because there are a couple of different
interfaces to the parameterized query stuff, and in some code paths we
have the values early enough to use them for pre-pruning, and in
others we don't.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Mar 15, 2022 at 5:06 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Mar 14, 2022 at 3:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
What I am skeptical about is that this work actually accomplishes
anything under real-world conditions. That's because if pruning would
save enough to make skipping the lock-acquisition phase worth the
trouble, the plan cache is almost certainly going to decide it should
be using a custom plan not a generic plan. Now if we had a better
cost model (or, indeed, any model at all) for run-time pruning effects
then maybe that situation could be improved. I think we'd be better
served to worry about that end of it before we spend more time making
the executor even less predictable.I don't agree with that analysis, because setting plan_cache_mode is
not uncommon. Even if that GUC didn't exist, I'm pretty sure there are
cases where the planner naturally falls into a generic plan anyway,
even though pruning is happening. But as it is, the GUC does exist,
and people use it. Consequently, while I'd love to see something done
about the costing side of things, I do not accept that all other
improvements should wait for that to happen.
I agree that making generic plans execute faster has merit even before
we make the costing changes to allow plancache.c prefer generic plans
over custom ones in these cases. As the numbers in my previous email
show, simply executing a generic plan with the proposed improvements
applied is significantly cheaper than having the planner do the
pruning on every execution:
nparts auto/custom generic
====== ========== ======
32 13359 28204
64 15760 26795
128 15825 26387
256 15017 25601
512 13479 19911
1024 13200 20158
2048 12884 16180
Also, while I've not spent much time at all reading this patch,
it seems rather desperately undercommented, and a lot of the
new names are unintelligible. In particular, I suspect that the
patch is significantly redesigning when/where run-time pruning
happens (unless it's just letting that be run twice); but I don't
see any documentation or name changes suggesting where that
responsibility is now.I am sympathetic to that concern. I spent a while staring at a
baffling comment in 0001 only to discover it had just been moved from
elsewhere. I really don't feel that things in this are as clear as
they could be -- although I hasten to add that I respect the people
who have done work in this area previously and am grateful for what
they did. It's been a huge benefit to the project in spite of the
bumps in the road. Moreover, this isn't the only code in PostgreSQL
that needs improvement, or the worst. That said, I do think there are
problems. I don't yet have a position on whether this patch is making
that better or worse.
Okay, I'd like to post a new version with the comments edited to make
them a bit more intelligible. I understand that the comments around
the new invocation mode(s) of runtime pruning are not as clear as they
should be, especially as the changes that this patch wants to make to
how things work are not very localized.
That said, I believe that the core idea of the patch is to optionally
perform pruning before we acquire locks or spin up the main executor
and then remember the decisions we made. If once the main executor is
spun up we already made those decisions, then we must stick with what
we decided. If not, we make those pruning decisions at the same point
we do currently
Right. The "initial" pruning, that this patch wants to make occur at
an earlier point (plancache.c), is currently performed in
ExecInit[Merge]Append().
If it does occur early due to the plan being a cached one,
ExecInit[Merge]Append() simply refers to its result that would be made
available via a new data structure that plancache.c has been made to
pass down to the executor alongside the plan tree.
If it does not, ExecInit[Merge]Append() does the pruning in the same
way it does now. Such cases include initial pruning using only STABLE
expressions that the planner doesn't bother to compute by itself lest
the resulting plan may be cached, but no EXTERN parameters.
--
Amit Langote
EDB: http://www.enterprisedb.com
On Tue, Mar 15, 2022 at 3:19 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Tue, Mar 15, 2022 at 5:06 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Mar 14, 2022 at 3:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Also, while I've not spent much time at all reading this patch,
it seems rather desperately undercommented, and a lot of the
new names are unintelligible. In particular, I suspect that the
patch is significantly redesigning when/where run-time pruning
happens (unless it's just letting that be run twice); but I don't
see any documentation or name changes suggesting where that
responsibility is now.I am sympathetic to that concern. I spent a while staring at a
baffling comment in 0001 only to discover it had just been moved from
elsewhere. I really don't feel that things in this are as clear as
they could be -- although I hasten to add that I respect the people
who have done work in this area previously and am grateful for what
they did. It's been a huge benefit to the project in spite of the
bumps in the road. Moreover, this isn't the only code in PostgreSQL
that needs improvement, or the worst. That said, I do think there are
problems. I don't yet have a position on whether this patch is making
that better or worse.Okay, I'd like to post a new version with the comments edited to make
them a bit more intelligible. I understand that the comments around
the new invocation mode(s) of runtime pruning are not as clear as they
should be, especially as the changes that this patch wants to make to
how things work are not very localized.
Actually, another area where the comments may not be as clear as they
should have been is the changes that the patch makes to the
AcquireExecutorLocks() logic that decides which relations are locked
to safeguard the plan tree for execution, which are those given by
RTE_RELATION entries in the range table.
Without the patch, they are found by actually scanning the range table.
With the patch, it's the same set of RTEs if the plan doesn't contain
any pruning nodes, though instead of the range table, what is scanned
is a bitmapset of their RT indexes that is made available by the
planner in the form of PlannedStmt.lockrels. When the plan does
contain a pruning node (PlannedStmt.containsInitialPruning), the
bitmapset is constructed by calling ExecutorGetLockRels() on the plan
tree, which walks it to add RT indexes of relations mentioned in the
Scan nodes, while skipping any nodes that are pruned after performing
initial pruning steps that may be present in their containing parent
node's PartitionPruneInfo. Also, the RT indexes of partitioned tables
that are present in the PartitionPruneInfo itself are also added to
the set.
While expanding comments added by the patch to make this clear, I
realized that there are two problems, one of them quite glaring:
* Planner's constructing this bitmapset and its copying along with the
PlannedStmt is pure overhead in the cases that this patch has nothing
to do with, which is the kind of thing that Andres cautioned against
upthread.
* Not all partitioned tables that would have been locked without the
patch to come up with a Append/MergeAppend plan may be returned by
ExecutorGetLockRels(). For example, if none of the query's
runtime-prunable quals were found to match the partition key of an
intermediate partitioned table and thus that partitioned table not
included in the PartitionPruneInfo. Or if an Append/MergeAppend
covering a partition tree doesn't contain any PartitionPruneInfo to
begin with, in which case, only the leaf partitions and none of
partitioned parents would be accounted for by the
ExecutorGetLockRels() logic.
The 1st one seems easy to fix by not inventing PlannedStmt.lockrels
and just doing what's being done now: scan the range table if
(!PlannedStmt.containsInitialPruning).
The only way perhaps to fix the second one is to reconsider the
decision we made in the following commit:
commit 52ed730d511b7b1147f2851a7295ef1fb5273776
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun Oct 7 14:33:17 2018 -0400
Remove some unnecessary fields from Plan trees.
In the wake of commit f2343653f, we no longer need some fields that
were used before to control executor lock acquisitions:
* PlannedStmt.nonleafResultRelations can go away entirely.
* partitioned_rels can go away from Append, MergeAppend, and ModifyTable.
However, ModifyTable still needs to know the RT index of the partition
root table if any, which was formerly kept in the first entry of that
list. Add a new field "rootRelation" to remember that. rootRelation is
partly redundant with nominalRelation, in that if it's set it will have
the same value as nominalRelation. However, the latter field has a
different purpose so it seems best to keep them distinct.
That is, add back the partitioned_rels field, at least to Append and
MergeAppend, to store the RT indexes of partitioned tables whose
children's paths are present in Append/MergeAppend.subpaths.
Thoughts?
--
Amit Langote
EDB: http://www.enterprisedb.com
On Tue, Mar 22, 2022 at 9:44 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Tue, Mar 15, 2022 at 3:19 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Tue, Mar 15, 2022 at 5:06 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Mar 14, 2022 at 3:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Also, while I've not spent much time at all reading this patch,
it seems rather desperately undercommented, and a lot of the
new names are unintelligible. In particular, I suspect that the
patch is significantly redesigning when/where run-time pruning
happens (unless it's just letting that be run twice); but I don't
see any documentation or name changes suggesting where that
responsibility is now.I am sympathetic to that concern. I spent a while staring at a
baffling comment in 0001 only to discover it had just been moved from
elsewhere. I really don't feel that things in this are as clear as
they could be -- although I hasten to add that I respect the people
who have done work in this area previously and am grateful for what
they did. It's been a huge benefit to the project in spite of the
bumps in the road. Moreover, this isn't the only code in PostgreSQL
that needs improvement, or the worst. That said, I do think there are
problems. I don't yet have a position on whether this patch is making
that better or worse.Okay, I'd like to post a new version with the comments edited to make
them a bit more intelligible. I understand that the comments around
the new invocation mode(s) of runtime pruning are not as clear as they
should be, especially as the changes that this patch wants to make to
how things work are not very localized.Actually, another area where the comments may not be as clear as they
should have been is the changes that the patch makes to the
AcquireExecutorLocks() logic that decides which relations are locked
to safeguard the plan tree for execution, which are those given by
RTE_RELATION entries in the range table.Without the patch, they are found by actually scanning the range table.
With the patch, it's the same set of RTEs if the plan doesn't contain
any pruning nodes, though instead of the range table, what is scanned
is a bitmapset of their RT indexes that is made available by the
planner in the form of PlannedStmt.lockrels. When the plan does
contain a pruning node (PlannedStmt.containsInitialPruning), the
bitmapset is constructed by calling ExecutorGetLockRels() on the plan
tree, which walks it to add RT indexes of relations mentioned in the
Scan nodes, while skipping any nodes that are pruned after performing
initial pruning steps that may be present in their containing parent
node's PartitionPruneInfo. Also, the RT indexes of partitioned tables
that are present in the PartitionPruneInfo itself are also added to
the set.While expanding comments added by the patch to make this clear, I
realized that there are two problems, one of them quite glaring:* Planner's constructing this bitmapset and its copying along with the
PlannedStmt is pure overhead in the cases that this patch has nothing
to do with, which is the kind of thing that Andres cautioned against
upthread.* Not all partitioned tables that would have been locked without the
patch to come up with a Append/MergeAppend plan may be returned by
ExecutorGetLockRels(). For example, if none of the query's
runtime-prunable quals were found to match the partition key of an
intermediate partitioned table and thus that partitioned table not
included in the PartitionPruneInfo. Or if an Append/MergeAppend
covering a partition tree doesn't contain any PartitionPruneInfo to
begin with, in which case, only the leaf partitions and none of
partitioned parents would be accounted for by the
ExecutorGetLockRels() logic.The 1st one seems easy to fix by not inventing PlannedStmt.lockrels
and just doing what's being done now: scan the range table if
(!PlannedStmt.containsInitialPruning).
The attached updated patch does it like this.
The only way perhaps to fix the second one is to reconsider the
decision we made in the following commit:commit 52ed730d511b7b1147f2851a7295ef1fb5273776
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun Oct 7 14:33:17 2018 -0400Remove some unnecessary fields from Plan trees.
In the wake of commit f2343653f, we no longer need some fields that
were used before to control executor lock acquisitions:* PlannedStmt.nonleafResultRelations can go away entirely.
* partitioned_rels can go away from Append, MergeAppend, and ModifyTable.
However, ModifyTable still needs to know the RT index of the partition
root table if any, which was formerly kept in the first entry of that
list. Add a new field "rootRelation" to remember that. rootRelation is
partly redundant with nominalRelation, in that if it's set it will have
the same value as nominalRelation. However, the latter field has a
different purpose so it seems best to keep them distinct.That is, add back the partitioned_rels field, at least to Append and
MergeAppend, to store the RT indexes of partitioned tables whose
children's paths are present in Append/MergeAppend.subpaths.
And implemented this in the attached 0002 that reintroduces
partitioned_rels in Append/MergeAppend nodes as a bitmapset of RT
indexes. The set contains the RT indexes of partitioned ancestors
whose expansion produced the leaf partitions that a given
Append/MergeAppend node scans. This project needs this way of
knowing the partitioned tables involved in producing an
Append/MergeAppend node, because we'd like to give plancache.c the
ability to glean the set of relations to be locked by scanning a plan
tree to make the tree ready for execution rather than by scanning the
range table and the only relations we're missing in the tree right now
are partitioned tables.
One fly-in-the-ointment situation I faced when doing that is the fact
that setrefs.c in most situations removes the Append/MergeAppend from
the final plan if it contains only one child subplan. I got around it
by inventing a PlannerGlobal/PlannedStmt.elidedAppendPartedRels set
which is a union of partitioned_rels of all the Append/MergeAppend
nodes in the plan tree that were removed as described.
Other than the changes mentioned above, the updated patch now contains
a bit more commentary than earlier versions, mostly around
AcquireExecutorLocks()'s new way of determining the set of relations
to lock and the significantly redesigned working of the "initial"
execution pruning.
--
Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v6-0003-Add-a-plan_tree_walker.patchapplication/x-patch; name=v6-0003-Add-a-plan_tree_walker.patchDownload
From 47a00a6b8cf695e5890fc6555e2df2980eb2115b Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 3 Mar 2022 16:04:13 +0900
Subject: [PATCH v6 3/4] Add a plan_tree_walker()
Like planstate_tree_walker() but for uninitialized plan trees.
---
src/backend/nodes/nodeFuncs.c | 116 ++++++++++++++++++++++++++++++++++
src/include/nodes/nodeFuncs.h | 3 +
2 files changed, 119 insertions(+)
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index ec25aae6e3..c16f9c6b40 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -31,6 +31,10 @@ static bool planstate_walk_subplans(List *plans, bool (*walker) (),
void *context);
static bool planstate_walk_members(PlanState **planstates, int nplans,
bool (*walker) (), void *context);
+static bool plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context);
+static bool plan_walk_members(List *plans, bool (*walker) (), void *context);
/*
@@ -4150,3 +4154,115 @@ planstate_walk_members(PlanState **planstates, int nplans,
return false;
}
+
+/*
+ * plan_tree_walker --- walk plantrees
+ *
+ * The walker has already visited the current node, and so we need only
+ * recurse into any sub-nodes it has.
+ */
+bool
+plan_tree_walker(Plan *plan,
+ bool (*walker) (),
+ void *context)
+{
+ /* Guard against stack overflow due to overly complex plan trees */
+ check_stack_depth();
+
+ /* initPlan-s */
+ if (plan_walk_subplans(plan->initPlan, walker, context))
+ return true;
+
+ /* lefttree */
+ if (outerPlan(plan))
+ {
+ if (walker(outerPlan(plan), context))
+ return true;
+ }
+
+ /* righttree */
+ if (innerPlan(plan))
+ {
+ if (walker(innerPlan(plan), context))
+ return true;
+ }
+
+ /* special child plans */
+ switch (nodeTag(plan))
+ {
+ case T_Append:
+ if (plan_walk_members(((Append *) plan)->appendplans,
+ walker, context))
+ return true;
+ break;
+ case T_MergeAppend:
+ if (plan_walk_members(((MergeAppend *) plan)->mergeplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapAnd:
+ if (plan_walk_members(((BitmapAnd *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapOr:
+ if (plan_walk_members(((BitmapOr *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_CustomScan:
+ if (plan_walk_members(((CustomScan *) plan)->custom_plans,
+ walker, context))
+ return true;
+ break;
+ case T_SubqueryScan:
+ if (walker(((SubqueryScan *) plan)->subplan, context))
+ return true;
+ break;
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * Walk a list of SubPlans (or initPlans, which also use SubPlan nodes).
+ */
+static bool
+plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context)
+{
+ ListCell *lc;
+ PlannedStmt *plannedstmt = (PlannedStmt *) context;
+
+ foreach(lc, plans)
+ {
+ SubPlan *sp = lfirst_node(SubPlan, lc);
+ Plan *p = list_nth(plannedstmt->subplans, sp->plan_id - 1);
+
+ if (walker(p, context))
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Walk the constituent plans of a ModifyTable, Append, MergeAppend,
+ * BitmapAnd, or BitmapOr node.
+ */
+static bool
+plan_walk_members(List *plans, bool (*walker) (), void *context)
+{
+ ListCell *lc;
+
+ foreach(lc, plans)
+ {
+ if (walker(lfirst(lc), context))
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/include/nodes/nodeFuncs.h b/src/include/nodes/nodeFuncs.h
index 93c60bde66..fca107ad65 100644
--- a/src/include/nodes/nodeFuncs.h
+++ b/src/include/nodes/nodeFuncs.h
@@ -158,5 +158,8 @@ extern bool raw_expression_tree_walker(Node *node, bool (*walker) (),
struct PlanState;
extern bool planstate_tree_walker(struct PlanState *planstate, bool (*walker) (),
void *context);
+struct Plan;
+extern bool plan_tree_walker(struct Plan *plan, bool (*walker) (),
+ void *context);
#endif /* NODEFUNCS_H */
--
2.24.1
v6-0002-Add-Merge-Append.partitioned_rels.patchapplication/x-patch; name=v6-0002-Add-Merge-Append.partitioned_rels.patchDownload
From 8c81237402922ebf82786f3ff34972a6a3cb8c03 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 24 Mar 2022 22:47:03 +0900
Subject: [PATCH v6 2/4] Add [Merge]Append.partitioned_rels
To record the RT indexes of all partitioned ancestors leading up to
leaf partitions that are appended by the node.
If a given [Merge]Append node is left out from the plan due to there
being only one element in its list of child subplans, then its
partitioned_rels set is added to PlannerGlobal.elidedAppendPartedRels
that is passed down to the executor through PlannedStmt.
There are no users for partitioned_rels and elidedAppendPartedRels
as of this commit, though a later commit will require the ability
to extract the set of relations that must be locked to make a plan
tree safe for execution by walking the plan tree itself, so having
the partitioned tables be also present in the plan tree will be
helpful. Note that currently the executor relies on the fact that
the set of relations to be locked can be obtained by simply scanning
the range table that's made available in PlannedStmt along with the
plan tree.
---
src/backend/nodes/copyfuncs.c | 3 +++
src/backend/nodes/outfuncs.c | 5 +++++
src/backend/nodes/readfuncs.c | 3 +++
src/backend/optimizer/path/joinrels.c | 9 ++++++++
src/backend/optimizer/plan/createplan.c | 18 +++++++++++++++-
src/backend/optimizer/plan/planner.c | 8 +++++++
src/backend/optimizer/plan/setrefs.c | 28 +++++++++++++++++++++++++
src/backend/optimizer/util/inherit.c | 16 ++++++++++++++
src/backend/optimizer/util/relnode.c | 20 ++++++++++++++++++
src/include/nodes/pathnodes.h | 22 +++++++++++++++++++
src/include/nodes/plannodes.h | 17 +++++++++++++++
11 files changed, 148 insertions(+), 1 deletion(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 55f720a88f..dc68a12486 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -106,6 +106,7 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_NODE_FIELD(invalItems);
COPY_NODE_FIELD(paramExecTypes);
COPY_NODE_FIELD(utilityStmt);
+ COPY_BITMAPSET_FIELD(elidedAppendPartedRels);
COPY_LOCATION_FIELD(stmt_location);
COPY_SCALAR_FIELD(stmt_len);
@@ -253,6 +254,7 @@ _copyAppend(const Append *from)
COPY_SCALAR_FIELD(nasyncplans);
COPY_SCALAR_FIELD(first_partial_plan);
COPY_NODE_FIELD(part_prune_info);
+ COPY_BITMAPSET_FIELD(partitioned_rels);
return newnode;
}
@@ -281,6 +283,7 @@ _copyMergeAppend(const MergeAppend *from)
COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
COPY_NODE_FIELD(part_prune_info);
+ COPY_BITMAPSET_FIELD(partitioned_rels);
return newnode;
}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 6bdad462c7..bc178d53bf 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -324,6 +324,7 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_NODE_FIELD(invalItems);
WRITE_NODE_FIELD(paramExecTypes);
WRITE_NODE_FIELD(utilityStmt);
+ WRITE_BITMAPSET_FIELD(elidedAppendPartedRels);
WRITE_LOCATION_FIELD(stmt_location);
WRITE_INT_FIELD(stmt_len);
}
@@ -443,6 +444,7 @@ _outAppend(StringInfo str, const Append *node)
WRITE_INT_FIELD(nasyncplans);
WRITE_INT_FIELD(first_partial_plan);
WRITE_NODE_FIELD(part_prune_info);
+ WRITE_BITMAPSET_FIELD(partitioned_rels);
}
static void
@@ -460,6 +462,7 @@ _outMergeAppend(StringInfo str, const MergeAppend *node)
WRITE_OID_ARRAY(collations, node->numCols);
WRITE_BOOL_ARRAY(nullsFirst, node->numCols);
WRITE_NODE_FIELD(part_prune_info);
+ WRITE_BITMAPSET_FIELD(partitioned_rels);
}
static void
@@ -2288,6 +2291,7 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_BOOL_FIELD(parallelModeOK);
WRITE_BOOL_FIELD(parallelModeNeeded);
WRITE_CHAR_FIELD(maxParallelHazard);
+ WRITE_BITMAPSET_FIELD(elidedAppendPartedRels);
}
static void
@@ -2399,6 +2403,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_BOOL_FIELD(partbounds_merged);
WRITE_BITMAPSET_FIELD(live_parts);
WRITE_BITMAPSET_FIELD(all_partrels);
+ WRITE_BITMAPSET_FIELD(partitioned_rels);
}
static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 3f68f7c18d..3c673c42d5 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1597,6 +1597,7 @@ _readPlannedStmt(void)
READ_NODE_FIELD(invalItems);
READ_NODE_FIELD(paramExecTypes);
READ_NODE_FIELD(utilityStmt);
+ READ_BITMAPSET_FIELD(elidedAppendPartedRels);
READ_LOCATION_FIELD(stmt_location);
READ_INT_FIELD(stmt_len);
@@ -1719,6 +1720,7 @@ _readAppend(void)
READ_INT_FIELD(nasyncplans);
READ_INT_FIELD(first_partial_plan);
READ_NODE_FIELD(part_prune_info);
+ READ_BITMAPSET_FIELD(partitioned_rels);
READ_DONE();
}
@@ -1741,6 +1743,7 @@ _readMergeAppend(void)
READ_OID_ARRAY(collations, local_node->numCols);
READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
READ_NODE_FIELD(part_prune_info);
+ READ_BITMAPSET_FIELD(partitioned_rels);
READ_DONE();
}
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 9da3ff2f9a..e74d40fee3 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1549,6 +1549,15 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
child_restrictlist);
+
+ /*
+ * A parent relation's partitioned_rels must be a superset of the sets
+ * of all its children, direct or indirect, so bubble up the child
+ * joinrel's set.
+ */
+ joinrel->partitioned_rels =
+ bms_add_members(joinrel->partitioned_rels,
+ child_joinrel->partitioned_rels);
}
}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index fa069a217c..0026086591 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -26,10 +26,12 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
#include "optimizer/paramassign.h"
+#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
@@ -1331,11 +1333,11 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
best_path->subpaths,
prunequal);
}
-
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
plan->part_prune_info = partpruneinfo;
+ plan->partitioned_rels = bms_copy(rel->partitioned_rels);
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1499,6 +1501,20 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
node->mergeplans = subplans;
node->part_prune_info = partpruneinfo;
+ /*
+ * We need to explicitly add to the plan node the RT indexes of any
+ * partitioned tables whose partitions will be scanned by the nodes in
+ * 'subplans'. There can be multiple RT indexes in the set due to the
+ * partition tree being multi-level and/or this being a plan for UNION ALL
+ * over multiple partition trees. Along with scanrelids of leaf-level Scan
+ * nodes, this allows the executor to lock the full set of relations being
+ * scanned by this node.
+ *
+ * Note that 'apprelids' only contains the top-level base relation(s), so
+ * is not sufficient for the purpose.
+ */
+ node->partitioned_rels = bms_copy(rel->partitioned_rels);
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
* produce either the exact tlist or a narrow tlist, we should get rid of
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index bd09f85aea..374a9d9753 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -529,6 +529,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->paramExecTypes = glob->paramExecTypes;
/* utilityStmt should be null, but we might as well copy it */
result->utilityStmt = parse->utilityStmt;
+ result->elidedAppendPartedRels = glob->elidedAppendPartedRels;
result->stmt_location = parse->stmt_location;
result->stmt_len = parse->stmt_len;
@@ -7365,6 +7366,13 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, grouped_rel, grouped_live_children);
}
+
+ /*
+ * Input rel might be a partitioned appendrel, though grouped_rel has at
+ * this point taken its role as the an appendrel owning the former's
+ * children, so copy the former's partitioned_rels set into the latter.
+ */
+ grouped_rel->partitioned_rels = bms_copy(input_rel->partitioned_rels);
}
/*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index a7b11b7f03..dbdeb8ec9d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1512,6 +1512,10 @@ set_append_references(PlannerInfo *root,
lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset);
}
+ /* Fix up partitioned_rels before possibly removing the Append below. */
+ aplan->partitioned_rels = offset_relid_set(aplan->partitioned_rels,
+ rtoffset);
+
/*
* See if it's safe to get rid of the Append entirely. For this to be
* safe, there must be only one child plan and that child plan's parallel
@@ -1522,8 +1526,17 @@ set_append_references(PlannerInfo *root,
*/
if (list_length(aplan->appendplans) == 1 &&
((Plan *) linitial(aplan->appendplans))->parallel_aware == aplan->plan.parallel_aware)
+ {
+ /*
+ * Partitioned table involved, if any, must be made known to the
+ * executor.
+ */
+ root->glob->elidedAppendPartedRels =
+ bms_add_members(root->glob->elidedAppendPartedRels,
+ aplan->partitioned_rels);
return clean_up_removed_plan_level((Plan *) aplan,
(Plan *) linitial(aplan->appendplans));
+ }
/*
* Otherwise, clean up the Append as needed. It's okay to do this after
@@ -1584,6 +1597,12 @@ set_mergeappend_references(PlannerInfo *root,
lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset);
}
+ /*
+ * Fix up partitioned_rels before possibly removing the MergeAppend below.
+ */
+ mplan->partitioned_rels = offset_relid_set(mplan->partitioned_rels,
+ rtoffset);
+
/*
* See if it's safe to get rid of the MergeAppend entirely. For this to
* be safe, there must be only one child plan and that child plan's
@@ -1594,8 +1613,17 @@ set_mergeappend_references(PlannerInfo *root,
*/
if (list_length(mplan->mergeplans) == 1 &&
((Plan *) linitial(mplan->mergeplans))->parallel_aware == mplan->plan.parallel_aware)
+ {
+ /*
+ * Partitioned tables involved, if any, must be made known to the
+ * executor.
+ */
+ root->glob->elidedAppendPartedRels =
+ bms_add_members(root->glob->elidedAppendPartedRels,
+ mplan->partitioned_rels);
return clean_up_removed_plan_level((Plan *) mplan,
(Plan *) linitial(mplan->mergeplans));
+ }
/*
* Otherwise, clean up the MergeAppend as needed. It's okay to do this
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 7e134822f3..56912e4101 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -406,6 +406,14 @@ expand_partitioned_rtentry(PlannerInfo *root, RelOptInfo *relinfo,
childrte, childRTindex,
childrel, top_parentrc, lockmode);
+ /*
+ * A parent relation's partitioned_rels must be a superset of the sets
+ * of all its children, direct or indirect, so bubble up the child
+ * rel's set.
+ */
+ relinfo->partitioned_rels = bms_add_members(relinfo->partitioned_rels,
+ childrelinfo->partitioned_rels);
+
/* Close child relation, but keep locks */
table_close(childrel, NoLock);
}
@@ -737,6 +745,14 @@ expand_appendrel_subquery(PlannerInfo *root, RelOptInfo *rel,
/* Child may itself be an inherited rel, either table or subquery. */
if (childrte->inh)
expand_inherited_rtentry(root, childrel, childrte, childRTindex);
+
+ /*
+ * A parent relation's partitioned_rels must be a superset of the sets
+ * of all its children, direct or indirect, so bubble up the child
+ * rel's set.
+ */
+ rel->partitioned_rels = bms_add_members(rel->partitioned_rels,
+ childrel->partitioned_rels);
}
}
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 520409f4ba..1d082a8fdd 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -361,6 +361,10 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
}
}
+ /* A partitioned appendrel. */
+ if (rel->part_scheme != NULL)
+ rel->partitioned_rels = bms_copy(rel->relids);
+
/* Save the finished struct in the query's simple_rel_array */
root->simple_rel_array[relid] = rel;
@@ -729,6 +733,14 @@ build_join_rel(PlannerInfo *root,
set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
sjinfo, restrictlist);
+ /*
+ * The joinrel may get processed as an appendrel via partitionwise join
+ * if both outer and inner rels are partitioned, so set partitioned_rels
+ * appropriately.
+ */
+ joinrel->partitioned_rels = bms_union(outer_rel->partitioned_rels,
+ inner_rel->partitioned_rels);
+
/*
* Set the consider_parallel flag if this joinrel could potentially be
* scanned within a parallel worker. If this flag is false for either
@@ -897,6 +909,14 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
sjinfo, restrictlist);
+ /*
+ * The joinrel may get processed as an appendrel via partitionwise join
+ * if both outer and inner rels are partitioned, so set partitioned_rels
+ * appropriately.
+ */
+ joinrel->partitioned_rels = bms_union(outer_rel->partitioned_rels,
+ inner_rel->partitioned_rels);
+
/* We build the join only once. */
Assert(!find_join_rel(root, joinrel->relids));
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1f3845b3fe..5327d9ba8b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -130,6 +130,11 @@ typedef struct PlannerGlobal
char maxParallelHazard; /* worst PROPARALLEL hazard level */
PartitionDirectory partition_directory; /* partition descriptors */
+
+ Bitmapset *elidedAppendPartedRels; /* Combined partitioned_rels of all
+ * single-subplan [Merge]Append nodes
+ * that have been removed fron the
+ * various plan trees. */
} PlannerGlobal;
/* macro for fetching the Plan associated with a SubPlan node */
@@ -773,6 +778,23 @@ typedef struct RelOptInfo
Relids all_partrels; /* Relids set of all partition relids */
List **partexprs; /* Non-nullable partition key expressions */
List **nullable_partexprs; /* Nullable partition key expressions */
+
+ /*
+ * For an appendrel parent relation (base, join, or upper) that is
+ * partitioned, this stores the RT indexes of all the paritioned ancestors
+ * including itself that lead up to the individual leaf partitions that
+ * will be scanned to produce this relation's output rows. The relid set
+ * is copied into the resulting Append or MergeAppend plan node for
+ * allowing the executor to take appropriate locks on those relations,
+ * unless the node is deemed useless in setrefs.c due to having a single
+ * leaf subplan and thus elided from the final plan, in which case, the set
+ * is added into PlannerGlobal.elidedAppendPartedRels.
+ *
+ * Note that 'apprelids' of those nodes only contains the top-level base
+ * relation(s), so is not sufficient for said purpose.
+ */
+
+ Bitmapset *partitioned_rels;
} RelOptInfo;
/*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 0b518ce6b2..bd87c35d6c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -85,6 +85,11 @@ typedef struct PlannedStmt
Node *utilityStmt; /* non-null if this is utility stmt */
+ Bitmapset *elidedAppendPartedRels; /* Combined partitioned_rels of all
+ * single-subplan [Merge]Append nodes
+ * that have been removed from the
+ * various plan trees. */
+
/* statement location in source string (copied from Query) */
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
@@ -261,6 +266,12 @@ typedef struct Append
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /*
+ * RT indexes of all partitioned parents whose partitions' plans are
+ * present in appendplans.
+ */
+ Bitmapset *partitioned_rels;
} Append;
/* ----------------
@@ -281,6 +292,12 @@ typedef struct MergeAppend
bool *nullsFirst; /* NULLS FIRST/LAST directions */
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /*
+ * RT indexes of all partitioned parents whose partitions' plans are
+ * present in appendplans.
+ */
+ Bitmapset *partitioned_rels;
} MergeAppend;
/* ----------------
--
2.24.1
v6-0004-Optimize-AcquireExecutorLocks-to-skip-pruned-part.patchapplication/x-patch; name=v6-0004-Optimize-AcquireExecutorLocks-to-skip-pruned-part.patchDownload
From 5e076f58274f6cd05afc8533af130e165c9b862e Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v6 4/4] Optimize AcquireExecutorLocks() to skip pruned
partitions
Instead of locking all relations listed in the range table in the
cases where the PlannedStmt indicates that some nodes in the plan
tree can do partition pruning without depending on execution having
started (so called "initial" pruning), AcquireExecutorLocks() now
calls the new executor function ExecutorGetLockRels() which returns
a set of relations (their RT indexes) to be locked not including
those scanned by the subplans that pruned.
The result of pruning done this way must be remembered and reused
during actual execution of the plan, which is done by creating a
PlanInitPruningOutput nodes for for each plan node that undergoes
pruning and a set of those for the whole plan tree are added to
ExecLockRelsInfo which also stores the bitmapset of RT indexes of
relations that are actually locked by AcquireExecutorLocks().
ExecLockRelsInfos are passed down the executor alongside the
PlannedStmts. This arrangement ensures that the executor doesn't
accidentally try to process a plan tree subnodes that has been
deemed pruned by AcquireExecutorLocks().
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 13 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 17 +-
src/backend/executor/README | 24 +++
src/backend/executor/execMain.c | 202 ++++++++++++++++++++
src/backend/executor/execParallel.c | 26 ++-
src/backend/executor/execPartition.c | 224 ++++++++++++++++++----
src/backend/executor/execUtils.c | 8 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 52 ++++-
src/backend/executor/nodeMergeAppend.c | 52 ++++-
src/backend/executor/nodeModifyTable.c | 25 +++
src/backend/executor/spi.c | 14 +-
src/backend/nodes/copyfuncs.c | 49 ++++-
src/backend/nodes/outfuncs.c | 39 ++++
src/backend/nodes/readfuncs.c | 37 ++++
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 6 +
src/backend/partitioning/partprune.c | 37 +++-
src/backend/tcop/postgres.c | 15 +-
src/backend/tcop/pquery.c | 21 ++-
src/backend/utils/cache/plancache.c | 252 ++++++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execPartition.h | 2 +
src/include/executor/execdesc.h | 2 +
src/include/executor/executor.h | 2 +
src/include/executor/nodeAppend.h | 1 +
src/include/executor/nodeMergeAppend.h | 1 +
src/include/executor/nodeModifyTable.h | 1 +
src/include/nodes/execnodes.h | 96 ++++++++++
src/include/nodes/nodes.h | 5 +
src/include/nodes/pathnodes.h | 4 +
src/include/nodes/plannodes.h | 15 ++
src/include/tcop/tcopprot.h | 2 +-
src/include/utils/plancache.h | 6 +
src/include/utils/portal.h | 5 +
41 files changed, 1174 insertions(+), 104 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 55c38b04c4..d403eb2309 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -542,7 +542,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 9f632285b6..1f1a44b9bb 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, ExecLockRelsInfo *execlockrelsinfo,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, execlockrelsinfo, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 1013790dbb..008b8ce0e9 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -741,8 +741,10 @@ execute_sql_string(const char *sql)
RawStmt *parsetree = lfirst_node(RawStmt, lc1);
MemoryContext per_parsetree_context,
oldcontext;
- List *stmt_list;
- ListCell *lc2;
+ List *stmt_list,
+ *execlockrelsinfo_list;
+ ListCell *lc2,
+ *lc3;
/*
* We do the work for each parsetree in a short-lived context, to
@@ -762,11 +764,13 @@ execute_sql_string(const char *sql)
NULL,
0,
NULL);
- stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL);
+ stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL,
+ &execlockrelsinfo_list);
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, execlockrelsinfo_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc3);
CommandCounterIncrement();
@@ -777,6 +781,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ execlockrelsinfo,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 05e7b60059..4ef44aaf23 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -416,7 +416,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 9902c5c566..85e73ddded 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -107,6 +107,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ list_make1(NULL), /* no ExecLockRelsInfo to pass */
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 80738547ed..bbbf8bbcbd 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *plan_execlockrelsinfo_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -195,6 +196,7 @@ ExecuteQuery(ParseState *pstate,
/* Replan if needed, and increment plan refcount for portal */
cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
plan_list = cplan->stmt_list;
+ plan_execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/*
* DO NOT add any logic that could possibly throw an error between
@@ -204,7 +206,7 @@ ExecuteQuery(ParseState *pstate,
NULL,
query_string,
entry->plansource->commandTag,
- plan_list,
+ plan_list, plan_execlockrelsinfo_list,
cplan);
/*
@@ -576,7 +578,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *plan_execlockrelsinfo_list;
+ ListCell *p,
+ *pe;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -632,15 +636,18 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ plan_execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pe, plan_execlockrelsinfo_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, pe);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, execlockrelsinfo, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index bf5e70860d..9720d0ac2c 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,27 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+proceeds by locking all the relations that will be scanned by that plan. If
+the generic plan has nodes that contain so-called initial pruning steps (a
+subset of execution pruning steps that do not depend on full-fledged execution
+having started), they are performed at this point to figure out the minimal
+set of child subplans that satisfy those pruning instructions and the result
+of performing that pruning is saved in a data structure that gets passed to
+the executor alongside the plan tree. Relations scanned by only those
+surviving subplans are then locked while those scanned by the pruned subplans
+are not, even though the pruned subplans themselves are not removed from the
+plan tree. So, it is imperative that the executor and any third party code
+invoked by it that gets passed the plan tree look at the initial pruning result
+made available via the aforementioned data structure to determine whether or
+not a particular subplan is valid. (The data structure basically consists of
+an array of PlanInitPruningOutput nodes containing one element for each node
+of the plan tree indexable using plan_node_id of the individual plan nodes,
+where each node contains a bitmapset of indexes of unpruned child subplans of
+a given node.)
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -247,6 +268,9 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorGetLockRels ] --- an optional step to walk over the plan tree
+ to produce an ExecLockRelsInfo to be passed to CreateQueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 473d2e00a2..1ddd1dfb83 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,11 +49,15 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/nodeAppend.h"
+#include "executor/nodeMergeAppend.h"
+#include "executor/nodeModifyTable.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
#include "parser/parsetree.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
@@ -101,9 +105,205 @@ static char *ExecBuildSlotValueDescription(Oid reloid,
Bitmapset *modifiedCols,
int maxfieldlen);
static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
+static bool ExecGetScanLockRels(Scan *scan, ExecGetLockRelsContext *context);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorGetLockRels
+ *
+ * Figure out the minimal set of relations to lock to be able to safely
+ * execute a given plan
+ *
+ * This ignores the relations scanned by child subplans that are pruned away
+ * after performing initial pruning steps present in the plan using the
+ * provided set of EXTERN parameters.
+ *
+ * Along with the set of RT indexes of relations that must be locked, the
+ * returned struct also contains an array of PlanInitPruningOutput nodes each
+ * of which contains the result of initial pruning for a given plan node, which
+ * is basically a bitmapset of the indexes of surviving child subplans. Each
+ * plan node in the tree that undergoes pruning will have an element in the
+ * array.
+ *
+ * Note that while relations scanned by the subplans that are pruned will not
+ * be locked, the subplans themselves are left as-is in the plan tree, assuming
+ * anything that reads the plan tree during execution knows to ignore them by
+ * looking at the PlanInitPruningOutput's list of valid subplans.
+ *
+ * Partitioned tables mentioned in PartitionedRelPruneInfo nodes that drive
+ * the pruning will be locked before doing the pruning and also added to the
+ * the returned set.
+ */
+ExecLockRelsInfo *
+ExecutorGetLockRels(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ int numPlanNodes = plannedstmt->numPlanNodes;
+ ExecGetLockRelsContext context;
+ ExecLockRelsInfo *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ context.stmt = plannedstmt;
+ context.params = params;
+
+ /*
+ * Go walk all the plan tree(s) present in the PlannedStmt, filling
+ * context.lockrels with only the relations from plan nodes that
+ * survive initial pruning and also the tables mentioned in
+ * partitioned_rels sets found in the plan.
+ */
+ context.lockrels = NULL;
+ context.initPruningOutputs = NIL;
+ context.ipoIndexes = palloc0(sizeof(int) * numPlanNodes);
+
+ /* All the subplans. */
+ foreach(lc, plannedstmt->subplans)
+ {
+ Plan *subplan = lfirst(lc);
+
+ (void) ExecGetLockRels(subplan, &context);
+ }
+
+ /* And the main tree. */
+ (void) ExecGetLockRels(plannedstmt->planTree, &context);
+
+ /*
+ * Also be sure to lock partitioned relations from any [Merge]Append nodes
+ * that were originally present but were ultimately left out from the plan
+ * due to being deemed no-op nodes.
+ */
+ context.lockrels = bms_add_members(context.lockrels,
+ plannedstmt->elidedAppendPartedRels);
+
+ result = makeNode(ExecLockRelsInfo);
+ result->lockrels = context.lockrels;
+ result->numPlanNodes = numPlanNodes;
+ result->initPruningOutputs = context.initPruningOutputs;
+ result->ipoIndexes = context.ipoIndexes;
+
+ return result;
+}
+
+/* ------------------------------------------------------------------------
+ * ExecGetLockRels
+ * Adds all the relations that will be scanned by 'node' and its child
+ * plans to context->lockrels after taking into the account the effect
+ * of performing initial pruning if any
+ *
+ * context->stmt gives the PlannedStmt being inspected to access the plan's
+ * range table if needed and context->params the set of EXTERN parameters
+ * available to evaluate pruning parameters.
+ *
+ * If initial pruning is done, a PlanInitPruningOutput node containing the
+ * result of pruning will be stored in context->initPruningOutputs that will
+ * be made available to the executor to reuse.
+ * ------------------------------------------------------------------------
+ */
+bool
+ExecGetLockRels(Plan *node, ExecGetLockRelsContext *context)
+{
+ /* Do nothing when we get to the end of a leaf on tree. */
+ if (node == NULL)
+ return true;
+
+ /* Make sure there's enough stack available. */
+ check_stack_depth();
+
+ switch (nodeTag(node))
+ {
+ /* Currently, only these two nodes have prunable child subplans. */
+ case T_Append:
+ if (ExecGetAppendLockRels((Append *) node, context))
+ return true;
+ break;
+ case T_MergeAppend:
+ if (ExecGetMergeAppendLockRels((MergeAppend *) node,
+ context))
+ return true;
+ break;
+
+ /*
+ * And these manipulate relations that must be added context->lockrels.
+ */
+ case T_SeqScan:
+ case T_SampleScan:
+ case T_IndexScan:
+ case T_IndexOnlyScan:
+ case T_BitmapIndexScan:
+ case T_BitmapHeapScan:
+ case T_TidScan:
+ case T_TidRangeScan:
+ case T_ForeignScan:
+ case T_SubqueryScan:
+ case T_CustomScan:
+ if (ExecGetScanLockRels((Scan *) node, context))
+ return true;
+ break;
+ case T_ModifyTable:
+ if (ExecGetModifyTableLockRels((ModifyTable *) node, context))
+ return true;
+ /* plan_tree_walker() will visit the subplan (outerNode) */
+ break;
+
+ default:
+ break;
+ }
+
+ /* Recurse to subnodes. */
+ return plan_tree_walker(node, ExecGetLockRels, (void *) context);
+}
+
+/*
+ * ExecGetScanLockRels
+ * Do ExecGetLockRels()'s work for a leaf Scan node
+ */
+static bool
+ExecGetScanLockRels(Scan *scan, ExecGetLockRelsContext *context)
+{
+ switch (nodeTag(scan))
+ {
+ case T_ForeignScan:
+ {
+ ForeignScan *fscan = (ForeignScan *) scan;
+
+ context->lockrels = bms_add_members(context->lockrels,
+ fscan->fs_relids);
+ }
+ break;
+
+ case T_SubqueryScan:
+ {
+ SubqueryScan *sscan = (SubqueryScan *) scan;
+
+ (void) ExecGetLockRels((Plan *) sscan->subplan, context);
+ }
+ break;
+
+ case T_CustomScan:
+ {
+ CustomScan *cscan = (CustomScan *) scan;
+ ListCell *lc;
+
+ context->lockrels = bms_add_members(context->lockrels,
+ cscan->custom_relids);
+ foreach(lc, cscan->custom_plans)
+ {
+ (void) ExecGetLockRels((Plan *) lfirst(lc), context);
+ }
+ }
+ break;
+
+ default:
+ context->lockrels = bms_add_member(context->lockrels,
+ scan->scanrelid);
+ break;
+ }
+
+ return true;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -805,6 +1005,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ ExecLockRelsInfo *execlockrelsinfo = queryDesc->execlockrelsinfo;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -824,6 +1025,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_execlockrelsinfo = execlockrelsinfo;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 5dd8ab7db2..02f2c27fdf 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_EXECLOCKRELSINFO UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
@@ -596,12 +598,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *execlockrelsinfo_data;
+ char *execlockrelsinfo_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int execlockrelsinfo_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -630,6 +635,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ execlockrelsinfo_data = nodeToString(estate->es_execlockrelsinfo);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -656,6 +662,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized ExecLockRelsInfo. */
+ execlockrelsinfo_len = strlen(execlockrelsinfo_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, execlockrelsinfo_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -750,6 +761,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized ExecLockRelsInfo */
+ execlockrelsinfo_space = shm_toc_allocate(pcxt->toc, execlockrelsinfo_len);
+ memcpy(execlockrelsinfo_space, execlockrelsinfo_data, execlockrelsinfo_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECLOCKRELSINFO,
+ execlockrelsinfo_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1231,8 +1248,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *execlockrelsinfospace;
char *paramspace;
PlannedStmt *pstmt;
+ ExecLockRelsInfo *execlockrelsinfo;
ParamListInfo paramLI;
char *queryString;
@@ -1243,12 +1262,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied ExecLockRelsInfo. */
+ execlockrelsinfospace = shm_toc_lookup(toc, PARALLEL_KEY_EXECLOCKRELSINFO,
+ false);
+ execlockrelsinfo = (ExecLockRelsInfo *) stringToNode(execlockrelsinfospace);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, execlockrelsinfo,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7ff5a95f05..fddc97280e 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -24,6 +24,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -183,8 +184,13 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
-static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate);
+ PartitionPruneInfo *partitionpruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
+static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
+ PartitionPruneInfo *pruneinfo);
static void ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1483,8 +1489,9 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or even before during ExecutorGetLockRels().
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1496,10 +1503,17 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* Creates the PartitionPruneState required by each of the two pruning
* functions. Details stored include how to map the partition index
* returned by the partition pruning code into subplan indexes. Also
- * determines the set of initially valid subplans by performing initial
- * pruning steps, only which need be initialized by the caller such as
- * ExecInitAppend. Maps in PartitionPruneState are updated to account
- * for initial pruning having eliminated some of the subplans, if any.
+ * determines the set of initially valid subplans by either looking that
+ * up in the plan node's PlanInitPruningOutput if one found in
+ * EState.es_execlockrelinfo or by performing initial pruning steps.
+ * Only the subplans included in that need be initialized by the caller
+ * such as ExecInitAppend. Maps in PartitionPruneState are updated to
+ * account for initial pruning having eliminated some of the subplans,
+ * if any.
+ *
+ * ExecGetLockRelsDoInitialPruning:
+ * Do initial pruning as part of ExecGetLockRels() on the parent plan
+ * node
*
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating all available
@@ -1514,9 +1528,10 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* ExecInitPartitionPruning
* Initialize data structure needed for run-time partition pruning
*
- * Initial pruning can be done immediately, so it is done here if needed and
- * the set of surviving partition subplans' indexes are added to the output
- * parameter *initially_valid_subplans.
+ * Initial pruning can be done immediately, so it is done here unless it has
+ * already been done by ExecGetLockRelsDoInitialPruning(), and the set of
+ * surviving partition subplans' indexes are added to the output parameter
+ * *initially_valid_subplans.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1530,22 +1545,57 @@ ExecInitPartitionPruning(PlanState *planstate,
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ Plan *plan = planstate->plan;
+ PlanInitPruningOutput *initPruningOutput = NULL;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /* Retrieve the parent plan's PlanInitPruningOutput, if any. */
+ if (estate->es_execlockrelsinfo)
+ {
+ initPruningOutput = (PlanInitPruningOutput *)
+ ExecFetchPlanInitPruningOutput(estate->es_execlockrelsinfo, plan);
- /*
- * Create the working data structure for pruning.
- */
- prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo);
+ Assert(initPruningOutput != NULL &&
+ IsA(initPruningOutput, PlanInitPruningOutput));
+ /* No need to do initial pruning again, only exec pruning. */
+ do_pruning = pruneinfo->needs_exec_pruning;
+ }
+
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PlanInitPruningOutput.
+ */
+ prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo,
+ initPruningOutput == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune, if required.
*/
- if (prunestate->do_initial_prune)
+ if (initPruningOutput)
+ {
+ /* ExecGetLockRelsDoInitialPruning() already did it for us! */
+ *initially_valid_subplans = initPruningOutput->initially_valid_subplans;
+ }
+ else if (prunestate && prunestate->do_initial_prune)
{
/* Determine which subplans survive initial pruning */
- *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate);
+ *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate,
+ pruneinfo);
}
else
{
@@ -1563,7 +1613,7 @@ ExecInitPartitionPruning(PlanState *planstate,
* invalid data in prunestate, because that data won't be consulted again
* (cf initial Assert in ExecFindMatchingSubPlans).
*/
- if (prunestate->do_exec_prune &&
+ if (prunestate && prunestate->do_exec_prune &&
bms_num_members(*initially_valid_subplans) < n_total_subplans)
PartitionPruneStateFixSubPlanMap(prunestate,
*initially_valid_subplans,
@@ -1572,12 +1622,75 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecGetLockRelsDoInitialPruning
+ * Perform initial pruning as part of doing ExecGetLockRels() on the parent
+ * plan node
+ */
+Bitmapset *
+ExecGetLockRelsDoInitialPruning(Plan *plan, ExecGetLockRelsContext *context,
+ PartitionPruneInfo *pruneinfo)
+{
+ List *rtable = context->stmt->rtable;
+ ParamListInfo params = context->params;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ PlanInitPruningOutput *initPruningOutput;
+
+ /*
+ * A temporary context to allocate stuff needded to run the pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so must create
+ * a standalone ExprContext to evaluate pruning expressions, equipped with
+ * the information about the EXTERN parameters that the caller passed us.
+ * Note that that's okay because the initial pruning steps do not contain
+ * anything that requires the execution to have started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = ExecCreatePartitionPruneState(NULL, pruneinfo,
+ true, false,
+ rtable, econtext,
+ pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the pruning and populate a PlanInitPruningOutput for this node. */
+ initPruningOutput = makeNode(PlanInitPruningOutput);
+ initPruningOutput->initially_valid_subplans =
+ ExecFindInitialMatchingSubPlans(prunestate, pruneinfo);
+ ExecStorePlanInitPruningOutput(context, initPruningOutput, plan);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return initPruningOutput->initially_valid_subplans;
+}
+
/*
* ExecCreatePartitionPruneState
* Build the data structure required for calling
* ExecFindInitialMatchingSubPlans and ExecFindMatchingSubPlans.
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'partitionpruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1592,19 +1705,20 @@ ExecInitPartitionPruning(PlanState *planstate,
*/
static PartitionPruneState *
ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo)
+ PartitionPruneInfo *partitionpruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(partitionpruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1655,19 +1769,48 @@ ExecCreatePartitionPruneState(PlanState *planstate,
PartitionedRelPruneInfo *pinfo = lfirst_node(PartitionedRelPruneInfo, lc2);
PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
Relation partrel;
+ bool close_partrel = false;
PartitionDesc partdesc;
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorGetLockRels() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ close_partrel = true;
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (close_partrel)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1769,7 +1912,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
ExecInitPruningContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1779,7 +1922,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
ExecInitPruningContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -1893,7 +2036,8 @@ ExecInitPruningContext(PartitionPruneContext *context,
* is required.
*/
static Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
+ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
+ PartitionPruneInfo *pruneinfo)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -1903,8 +2047,8 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
Assert(prunestate->do_initial_prune);
/*
- * Switch to a temp context to avoid leaking memory in the executor's
- * query-lifespan memory context.
+ * Switch to a temp context to avoid leaking memory in the longer-term
+ * memory context.
*/
oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..7246f9175f 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,7 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_execlockrelsinfo = NULL;
estate->es_junkFilter = NULL;
@@ -785,6 +786,13 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rti > 0 && rti <= estate->es_range_table_size);
+ /*
+ * A cross-check that AcquireExecutorLocks() hasn't missed any relations
+ * it must not have.
+ */
+ Assert(estate->es_execlockrelsinfo == NULL ||
+ bms_is_member(rti, estate->es_execlockrelsinfo->lockrels));
+
rel = estate->es_relations[rti - 1];
if (rel == NULL)
{
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f9460ae506..a2182a6b1f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 5b6d3eb23b..9c6f907687 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -94,6 +94,55 @@ static bool ExecAppendAsyncRequest(AppendState *node, TupleTableSlot **result);
static void ExecAppendAsyncEventWait(AppendState *node);
static void classify_matching_subplans(AppendState *node);
+/* ----------------------------------------------------------------
+ * ExecGetAppendLockRels
+ * Do ExecGetLockRels()'s work for an Append plan
+ * ----------------------------------------------------------------
+ */
+bool
+ExecGetAppendLockRels(Append *node, ExecGetLockRelsContext *context)
+{
+ PartitionPruneInfo *pruneinfo = node->part_prune_info;
+
+ /*
+ * Must always lock all the partitioned tables whose direct and indirect
+ * partitions will be scanned by this Append.
+ */
+ context->lockrels = bms_add_members(context->lockrels,
+ node->partitioned_rels);
+
+ /*
+ * Now recurse to subplans to add relations scanned therein.
+ *
+ * If initial pruning can be done, do that now and only recurse to the
+ * surviving subplans.
+ */
+ if (pruneinfo && pruneinfo->needs_init_pruning)
+ {
+ List *subplans = node->appendplans;
+ Bitmapset *validsubplans;
+ int i;
+
+ validsubplans = ExecGetLockRelsDoInitialPruning((Plan *) node,
+ context, pruneinfo);
+
+ /* Recurse to surviving subplans. */
+ i = -1;
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ (void) ExecGetLockRels(subplan, context);
+ }
+
+ /* done with this node */
+ return true;
+ }
+
+ /* Tell the caller to recurse to *all* the subplans. */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitAppend
*
@@ -155,7 +204,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 9a9f29e845..4b04fcdbc2 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -54,6 +54,55 @@ typedef int32 SlotNumber;
static TupleTableSlot *ExecMergeAppend(PlanState *pstate);
static int heap_compare_slots(Datum a, Datum b, void *arg);
+/* ----------------------------------------------------------------
+ * ExecGetMergeAppendLockRels
+ * Do ExecGetLockRels()'s work for a MergeAppend plan
+ * ----------------------------------------------------------------
+ */
+bool
+ExecGetMergeAppendLockRels(MergeAppend *node, ExecGetLockRelsContext *context)
+{
+ PartitionPruneInfo *pruneinfo = node->part_prune_info;
+
+ /*
+ * Must always lock all the partitioned tables whose direct and indirect
+ * partitions will be scanned by this Append.
+ */
+ context->lockrels = bms_add_members(context->lockrels,
+ node->partitioned_rels);
+
+ /*
+ * Now recurse to subplans to add relations scanned therein.
+ *
+ * If initial pruning can be done, do that now and only recurse to the
+ * surviving subplans.
+ */
+ if (pruneinfo && pruneinfo->needs_init_pruning)
+ {
+ List *subplans = node->mergeplans;
+ Bitmapset *validsubplans;
+ int i;
+
+ validsubplans = ExecGetLockRelsDoInitialPruning((Plan *) node,
+ context, pruneinfo);
+
+ /* Recurse to surviving subplans. */
+ i = -1;
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ (void) ExecGetLockRels(subplan, context);
+ }
+
+ /* done with this node */
+ return true;
+ }
+
+ /* Tell the caller to recurse to *all* the subplans. */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitMergeAppend
@@ -103,7 +152,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 701fe05296..23df3efef0 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3008,6 +3008,31 @@ ExecLookupResultRelByOid(ModifyTableState *node, Oid resultoid,
return NULL;
}
+/*
+ * ExecGetModifyTableLockRels
+ * Do ExecGetLockRels()'s work for a ModifyTable plan
+ */
+bool
+ExecGetModifyTableLockRels(ModifyTable *plan, ExecGetLockRelsContext *context)
+{
+ ListCell *lc;
+
+ /* First add the result relation RTIs mentioned in the node. */
+ if (plan->rootRelation > 0)
+ context->lockrels = bms_add_member(context->lockrels,
+ plan->rootRelation);
+ context->lockrels = bms_add_member(context->lockrels,
+ plan->nominalRelation);
+ foreach(lc, plan->resultRelations)
+ {
+ context->lockrels = bms_add_member(context->lockrels,
+ lfirst_int(lc));
+ }
+
+ /* Tell the caller to recurse to the subplan (outerPlan(plan)). */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitModifyTable
* ----------------------------------------------------------------
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index a82e986667..2107009591 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *execlockrelsinfo_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1659,6 +1660,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
/* Replan if needed, and increment plan refcount for portal */
cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ execlockrelsinfo_list = cplan->execlockrelsinfo_list;
if (!plan->saved)
{
@@ -1670,6 +1672,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
oldcontext = MemoryContextSwitchTo(portal->portalContext);
stmt_list = copyObject(stmt_list);
+ execlockrelsinfo_list = copyObject(execlockrelsinfo_list);
MemoryContextSwitchTo(oldcontext);
ReleaseCachedPlan(cplan, NULL);
cplan = NULL; /* portal shouldn't depend on cplan */
@@ -1683,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ execlockrelsinfo_list,
cplan);
/*
@@ -2473,7 +2477,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *execlockrelsinfo_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2552,6 +2558,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2589,9 +2596,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, execlockrelsinfo_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2671,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, execlockrelsinfo,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index dc68a12486..1b94d7c881 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -68,6 +68,13 @@
} \
} while (0)
+/* Copy a field that is an array with numElem ints */
+#define COPY_INT_ARRAY(fldname, numElem) \
+ do { \
+ newnode->fldname = (numElem) > 0 ? palloc((numElem) * sizeof(int)) : NULL; \
+ memcpy(newnode->fldname, from->fldname, sizeof(int) * (numElem)); \
+ } while (0)
+
/* Copy a parse location field (for Copy, this is same as scalar case) */
#define COPY_LOCATION_FIELD(fldname) \
(newnode->fldname = from->fldname)
@@ -94,8 +101,10 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(transientPlan);
COPY_SCALAR_FIELD(dependsOnRole);
COPY_SCALAR_FIELD(parallelModeNeeded);
+ COPY_SCALAR_FIELD(containsInitialPruning);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
+ COPY_SCALAR_FIELD(numPlanNodes);
COPY_NODE_FIELD(rtable);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
@@ -1281,6 +1290,8 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(needs_init_pruning);
+ COPY_SCALAR_FIELD(needs_exec_pruning);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
@@ -4944,6 +4955,33 @@ _copyExtensibleNode(const ExtensibleNode *from)
return newnode;
}
+/* ****************************************************************
+ * execnodes.h copy functions
+ * ****************************************************************
+ */
+static ExecLockRelsInfo *
+_copyExecLockRelsInfo(const ExecLockRelsInfo *from)
+{
+ ExecLockRelsInfo *newnode = makeNode(ExecLockRelsInfo);
+
+ COPY_BITMAPSET_FIELD(lockrels);
+ COPY_SCALAR_FIELD(numPlanNodes);
+ COPY_NODE_FIELD(initPruningOutputs);
+ COPY_INT_ARRAY(ipoIndexes, from->numPlanNodes);
+
+ return newnode;
+}
+
+static PlanInitPruningOutput *
+_copyPlanInitPruningOutput(const PlanInitPruningOutput *from)
+{
+ PlanInitPruningOutput *newnode = makeNode(PlanInitPruningOutput);
+
+ COPY_BITMAPSET_FIELD(initially_valid_subplans);
+
+ return newnode;
+}
+
/* ****************************************************************
* value.h copy functions
* ****************************************************************
@@ -4998,7 +5036,6 @@ _copyBitString(const BitString *from)
return newnode;
}
-
static ForeignKeyCacheInfo *
_copyForeignKeyCacheInfo(const ForeignKeyCacheInfo *from)
{
@@ -5947,6 +5984,16 @@ copyObjectImpl(const void *from)
retval = _copyPublicationTable(from);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_ExecLockRelsInfo:
+ retval = _copyExecLockRelsInfo(from);
+ break;
+ case T_PlanInitPruningOutput:
+ retval = _copyPlanInitPruningOutput(from);
+ break;
+
/*
* MISCELLANEOUS NODES
*/
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index bc178d53bf..6c404c8664 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -312,8 +312,10 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(transientPlan);
WRITE_BOOL_FIELD(dependsOnRole);
WRITE_BOOL_FIELD(parallelModeNeeded);
+ WRITE_BOOL_FIELD(containsInitialPruning);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
+ WRITE_INT_FIELD(numPlanNodes);
WRITE_NODE_FIELD(rtable);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
@@ -1007,6 +1009,8 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(needs_init_pruning);
+ WRITE_BOOL_FIELD(needs_exec_pruning);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -2702,6 +2706,31 @@ _outExtensibleNode(StringInfo str, const ExtensibleNode *node)
methods->nodeOut(str, node);
}
+/*****************************************************************************
+ *
+ * Stuff from execnodes.h
+ *
+ *****************************************************************************/
+
+static void
+_outExecLockRelsInfo(StringInfo str, const ExecLockRelsInfo *node)
+{
+ WRITE_NODE_TYPE("EXECLOCKRELSINFO");
+
+ WRITE_BITMAPSET_FIELD(lockrels);
+ WRITE_INT_FIELD(numPlanNodes);
+ WRITE_NODE_FIELD(initPruningOutputs);
+ WRITE_INT_ARRAY(ipoIndexes, node->numPlanNodes);
+}
+
+static void
+_outPlanInitPruningOutput(StringInfo str, const PlanInitPruningOutput *node)
+{
+ WRITE_NODE_TYPE("PARTITIONINITPRUNINGOUTPUT");
+
+ WRITE_BITMAPSET_FIELD(initially_valid_subplans);
+}
+
/*****************************************************************************
*
* Stuff from parsenodes.h.
@@ -4543,6 +4572,16 @@ outNode(StringInfo str, const void *obj)
_outPartitionRangeDatum(str, obj);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_ExecLockRelsInfo:
+ _outExecLockRelsInfo(str, obj);
+ break;
+ case T_PlanInitPruningOutput:
+ _outPlanInitPruningOutput(str, obj);
+ break;
+
default:
/*
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 3c673c42d5..863f082729 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1585,8 +1585,10 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(transientPlan);
READ_BOOL_FIELD(dependsOnRole);
READ_BOOL_FIELD(parallelModeNeeded);
+ READ_BOOL_FIELD(containsInitialPruning);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
+ READ_INT_FIELD(numPlanNodes);
READ_NODE_FIELD(rtable);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
@@ -2537,6 +2539,8 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(needs_init_pruning);
+ READ_BOOL_FIELD(needs_exec_pruning);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
@@ -2706,6 +2710,35 @@ _readPartitionRangeDatum(void)
READ_DONE();
}
+/*
+ * _readExecLockRelsInfo
+ */
+static ExecLockRelsInfo *
+_readExecLockRelsInfo(void)
+{
+ READ_LOCALS(ExecLockRelsInfo);
+
+ READ_BITMAPSET_FIELD(lockrels);
+ READ_INT_FIELD(numPlanNodes);
+ READ_NODE_FIELD(initPruningOutputs);
+ READ_INT_ARRAY(ipoIndexes, local_node->numPlanNodes);
+
+ READ_DONE();
+}
+
+/*
+ * _readPlanInitPruningOutput
+ */
+static PlanInitPruningOutput *
+_readPlanInitPruningOutput(void)
+{
+ READ_LOCALS(PlanInitPruningOutput);
+
+ READ_BITMAPSET_FIELD(initially_valid_subplans);
+
+ READ_DONE();
+}
+
/*
* parseNodeString
*
@@ -2977,6 +3010,10 @@ parseNodeString(void)
return_value = _readPartitionBoundSpec();
else if (MATCH("PARTITIONRANGEDATUM", 19))
return_value = _readPartitionRangeDatum();
+ else if (MATCH("EXECLOCKRELSINFO", 16))
+ return_value = _readExecLockRelsInfo();
+ else if (MATCH("PARTITIONINITPRUNINGOUTPUT", 26))
+ return_value = _readPlanInitPruningOutput();
else
{
elog(ERROR, "badly formatted node string \"%.32s\"...", token);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 374a9d9753..329fb9d6e7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -517,7 +517,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->transientPlan = glob->transientPlan;
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->planTree = top_plan;
+ result->numPlanNodes = glob->lastPlanNodeId;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index dbdeb8ec9d..ac795ae9d9 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1561,6 +1561,9 @@ set_append_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (aplan->part_prune_info->needs_init_pruning)
+ root->glob->containsInitialPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
@@ -1648,6 +1651,9 @@ set_mergeappend_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (mplan->part_prune_info->needs_init_pruning)
+ root->glob->containsInitialPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 7080cb25d9..3322dc79f2 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -230,6 +232,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -309,12 +313,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -323,6 +331,10 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+ if (!needs_init_pruning)
+ needs_init_pruning = partrel_needs_init_pruning;
+ if (!needs_exec_pruning)
+ needs_exec_pruning = partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -337,6 +349,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -435,13 +449,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -452,6 +471,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -539,6 +562,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * by noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -613,6 +639,11 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ if (!*needs_init_pruning)
+ *needs_init_pruning = (initial_pruning_steps != NIL);
+ if (!*needs_exec_pruning)
+ *needs_exec_pruning = (exec_pruning_steps != NIL);
+
pinfolist = lappend(pinfolist, pinfo);
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index ba2fcfeb4a..085eb3f209 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -945,15 +945,17 @@ pg_plan_query(Query *querytree, const char *query_string, int cursorOptions,
* For normal optimizable statements, invoke the planner. For utility
* statements, just make a wrapper PlannedStmt node.
*
- * The result is a list of PlannedStmt nodes.
+ * The result is a list of PlannedStmt nodes. Also, a NULL is appended to
+ * *execlockrelsinfo_list for each PlannedStmt added to the returned list.
*/
List *
pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
- ParamListInfo boundParams)
+ ParamListInfo boundParams, List **execlockrelsinfo_list)
{
List *stmt_list = NIL;
ListCell *query_list;
+ *execlockrelsinfo_list = NIL;
foreach(query_list, querytrees)
{
Query *query = lfirst_node(Query, query_list);
@@ -977,6 +979,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
}
stmt_list = lappend(stmt_list, stmt);
+ *execlockrelsinfo_list = lappend(*execlockrelsinfo_list, NULL);
}
return stmt_list;
@@ -1080,7 +1083,8 @@ exec_simple_query(const char *query_string)
QueryCompletion qc;
MemoryContext per_parsetree_context = NULL;
List *querytree_list,
- *plantree_list;
+ *plantree_list,
+ *plantree_execlockrelsinfo_list;
Portal portal;
DestReceiver *receiver;
int16 format;
@@ -1167,7 +1171,8 @@ exec_simple_query(const char *query_string)
NULL, 0, NULL);
plantree_list = pg_plan_queries(querytree_list, query_string,
- CURSOR_OPT_PARALLEL_OK, NULL);
+ CURSOR_OPT_PARALLEL_OK, NULL,
+ &plantree_execlockrelsinfo_list);
/*
* Done with the snapshot used for parsing/planning.
@@ -1203,6 +1208,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ plantree_execlockrelsinfo_list,
NULL);
/*
@@ -1991,6 +1997,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ cplan->execlockrelsinfo_list,
cplan);
/* Done with the snapshot used for parameter I/O and parsing/planning */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5f907831a3..972ddc014e 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->execlockrelsinfo = execlockrelsinfo; /* ExecutorGetLockRels() output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +124,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * execlockrelsinfo: ExecutorGetLockRels() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +137,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +149,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, execlockrelsinfo, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -490,6 +494,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ linitial_node(ExecLockRelsInfo, portal->execlockrelsinfos),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1190,7 +1195,8 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *stmtlist_item,
+ *execlockrelsinfolist_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1211,9 +1217,12 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ forboth(stmtlist_item, portal->stmts,
+ execlockrelsinfolist_item, portal->execlockrelsinfos)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo,
+ execlockrelsinfolist_item);
/*
* If we got a cancel signal in prior command, quit
@@ -1271,7 +1280,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, execlockrelsinfo,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1280,7 +1289,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, execlockrelsinfo,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 4cf6db504f..9f5a40a0a6 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,16 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
+static void CachedPlanSaveExecLockRelsInfos(CachedPlan *plan, List *execlockrelsinfo_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static List *AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams);
+static void ReleaseExecutorLocks(List *stmt_list, List *execlockrelsinfo_list);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,9 +792,21 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * If the CachedPlan is valid, this may in some cases call ExecutorGetLockRels
+ * on each PlannedStmt contained in it to determine the set of relations to be
+ * locked by AcquireExecutorLocks(), instead of just scanning its range table,
+ * which is done to prune away any nodes in the tree that need not be executed
+ * based on the result of initial partition pruning. Resulting
+ * ExecLockRelsInfo nodes containing the result of such pruning, allocated in
+ * a child context of the context containing the plan itself, are added into
+ * plan->execlockrelsinfo_list. The previous contents of the list from the
+ * last invocation on the same CachedPlan are deleted, because they would no
+ * longer be valid given the fresh set of parameter values which may be used
+ * as pruning parameters.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
CachedPlan *plan = plansource->gplan;
@@ -820,13 +834,25 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *execlockrelsinfo_list;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. If ExecutorGetLockRels() asked
+ * to omit some relations because the plan nodes that scan them were
+ * found to be pruned, the executor will be informed of the omission of
+ * the plan nodes themselves, so that it doesn't accidentally try to
+ * execute those nodes, via the ExecLockRelsInfo nodes collected in the
+ * returned list that is also passed to it along with the list of
+ * PlannedStmts.
+ */
+ execlockrelsinfo_list = AcquireExecutorLocks(plan->stmt_list,
+ boundParams);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -844,11 +870,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (plan->is_valid)
{
/* Successfully revalidated and locked the query. */
+
+ /* Remember ExecLockRelsInfos in the CachedPlan. */
+ CachedPlanSaveExecLockRelsInfos(plan, execlockrelsinfo_list);
return true;
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, execlockrelsinfo_list);
}
/*
@@ -880,7 +909,8 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv)
{
CachedPlan *plan;
- List *plist;
+ List *plist,
+ *execlockrelsinfo_list;
bool snapshot_set;
bool is_transient;
MemoryContext plan_context;
@@ -933,7 +963,8 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
* Generate the plan.
*/
plist = pg_plan_queries(qlist, plansource->query_string,
- plansource->cursor_options, boundParams);
+ plansource->cursor_options, boundParams,
+ &execlockrelsinfo_list);
/* Release snapshot if we got one */
if (snapshot_set)
@@ -1002,6 +1033,16 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->is_saved = false;
plan->is_valid = true;
+ /*
+ * Save the dummy ExecLockRelsInfo list, that is a list containing NULLs
+ * as elements. We must do this, becasue users of the CachedPlan expect
+ * one to go with the list of PlannedStmts.
+ * XXX maybe get rid of that contract.
+ */
+ plan->execlockrelsinfo_context = NULL;
+ CachedPlanSaveExecLockRelsInfos(plan, execlockrelsinfo_list);
+ Assert(MemoryContextIsValid(plan->execlockrelsinfo_context));
+
/* assign generation number to new plan */
plan->generation = ++(plansource->generation);
@@ -1160,7 +1201,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1586,6 +1627,49 @@ CopyCachedPlan(CachedPlanSource *plansource)
return newsource;
}
+/*
+ * CachedPlanSaveExecLockRelsInfos
+ * Save the list containing ExecLockRelsInfo nodes into the given
+ * CachedPlan
+ *
+ * The provided list is copied into a dedicated context that is a child of
+ * plan->context. If the child context already exists, it is emptied, because
+ * any ExecLockRelsInfo contained therein would no longer be useful.
+ */
+static void
+CachedPlanSaveExecLockRelsInfos(CachedPlan *plan, List *execlockrelsinfo_list)
+{
+ MemoryContext execlockrelsinfo_context = plan->execlockrelsinfo_context,
+ oldcontext = CurrentMemoryContext;
+ List *execlockrelsinfo_list_copy;
+
+ /*
+ * Set up the dedicated context if not already done, saving it as a child
+ * of the CachedPlan's context.
+ */
+ if (execlockrelsinfo_context == NULL)
+ {
+ execlockrelsinfo_context = AllocSetContextCreate(CurrentMemoryContext,
+ "CachedPlan execlockrelsinfo list",
+ ALLOCSET_START_SMALL_SIZES);
+ MemoryContextSetParent(execlockrelsinfo_context, plan->context);
+ MemoryContextSetIdentifier(execlockrelsinfo_context, plan->context->ident);
+ plan->execlockrelsinfo_context = execlockrelsinfo_context;
+ }
+ else
+ {
+ /* Just clear existing contents by resetting the context. */
+ Assert(MemoryContextIsValid(execlockrelsinfo_context));
+ MemoryContextReset(execlockrelsinfo_context);
+ }
+
+ MemoryContextSwitchTo(execlockrelsinfo_context);
+ execlockrelsinfo_list_copy = copyObject(execlockrelsinfo_list);
+ MemoryContextSwitchTo(oldcontext);
+
+ plan->execlockrelsinfo_list = execlockrelsinfo_list_copy;
+}
+
/*
* CachedPlanIsValid: test whether the rewritten querytree within a
* CachedPlanSource is currently valid (that is, not marked as being in need
@@ -1737,17 +1821,21 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * Returns a list of ExecLockRelsInfo nodes containing one element for each
+ * PlannedStmt in stmt_list or NULL if the latter is utility statement or its
+ * containsInitialPruning is false.
*/
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+static List *
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams)
{
ListCell *lc1;
+ List *execlockrelsinfo_list = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ ExecLockRelsInfo *execlockrelsinfo = NULL;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,27 +1849,139 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
- continue;
+ ScanQueryForLocks(query, true);
}
-
- foreach(lc2, plannedstmt->rtable)
+ else
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (!plannedstmt->containsInitialPruning)
+ {
+ /*
+ * If the plan contains no initial pruning steps, just lock
+ * all the relations found in the range table.
+ */
+ ListCell *lc;
- if (rte->rtekind != RTE_RELATION)
- continue;
+ foreach(lc, plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst(lc);
+
+ if (rte->rtekind != RTE_RELATION)
+ continue;
+
+ /*
+ * Acquire the appropriate type of lock on each relation
+ * OID. Note that we don't actually try to open the rel,
+ * and hence will not fail if it's been dropped entirely
+ * --- we'll just transiently acquire a non-conflicting
+ * lock.
+ */
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+ else
+ {
+ int rti;
+ Bitmapset *lockrels;
+
+ /*
+ * Walk the plan tree to find only the minimal set of
+ * relations to be locked, considering the effect of performing
+ * initial partition pruning.
+ */
+ execlockrelsinfo = ExecutorGetLockRels(plannedstmt, boundParams);
+ lockrels = execlockrelsinfo->lockrels;
+
+ rti = -1;
+ while ((rti = bms_next_member(lockrels, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment above. */
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+ }
+
+ /*
+ * Remember ExecLockRelsInfo for later adding to the QueryDesc that
+ * will be passed to the executor when executing this plan. May be
+ * NULL, but must keep the list the same length as stmt_list.
+ */
+ execlockrelsinfo_list = lappend(execlockrelsinfo_list,
+ execlockrelsinfo);
+ }
+
+ return execlockrelsinfo_list;
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *execlockrelsinfo_list)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, execlockrelsinfo_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc2);
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
/*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ }
+ else
+ {
+ if (execlockrelsinfo == NULL)
+ {
+ ListCell *lc;
+
+ foreach(lc, plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst(lc);
+
+ if (rte->rtekind != RTE_RELATION)
+ continue;
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ {
+ int rti;
+ Bitmapset *lockrels;
+
+ lockrels = execlockrelsinfo->lockrels;
+ rti = -1;
+ while ((rti = bms_next_member(lockrels, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index d549f66d4a..896f51be08 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -285,6 +285,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *execlockrelsinfos,
CachedPlan *cplan)
{
AssertArg(PortalIsValid(portal));
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->qc.nprocessed = 0;
portal->commandTag = commandTag;
portal->stmts = stmts;
+ portal->execlockrelsinfos = execlockrelsinfos;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 666977fb1f..fef75ba147 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, ExecLockRelsInfo *execlockrelsinfo,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index fd5735a946..ded19b8cbb 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -124,4 +124,6 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
PartitionPruneInfo *pruneinfo,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
+extern Bitmapset *ExecGetLockRelsDoInitialPruning(Plan *plan, ExecGetLockRelsContext *context,
+ PartitionPruneInfo *pruneinfo);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..4338463479 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ ExecLockRelsInfo *execlockrelsinfo; /* ExecutorGetLockRels()'s output given plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +58,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 82925b4b63..5cf414cc11 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern ExecLockRelsInfo *ExecutorGetLockRels(PlannedStmt *plannedstmt, ParamListInfo params);
+extern bool ExecGetLockRels(Plan *node, ExecGetLockRelsContext *context);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 4cb78ee5b6..b53535c2a4 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -17,6 +17,7 @@
#include "access/parallel.h"
#include "nodes/execnodes.h"
+extern bool ExecGetAppendLockRels(Append *node, ExecGetLockRelsContext *context);
extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
extern void ExecEndAppend(AppendState *node);
extern void ExecReScanAppend(AppendState *node);
diff --git a/src/include/executor/nodeMergeAppend.h b/src/include/executor/nodeMergeAppend.h
index 97fe3b0665..8eb4e9df93 100644
--- a/src/include/executor/nodeMergeAppend.h
+++ b/src/include/executor/nodeMergeAppend.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern bool ExecGetMergeAppendLockRels(MergeAppend *node, ExecGetLockRelsContext *context);
extern MergeAppendState *ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags);
extern void ExecEndMergeAppend(MergeAppendState *node);
extern void ExecReScanMergeAppend(MergeAppendState *node);
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 1d225bc88d..5006499088 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -19,6 +19,7 @@ extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
EState *estate, TupleTableSlot *slot,
CmdType cmdtype);
+extern bool ExecGetModifyTableLockRels(ModifyTable *plan, ExecGetLockRelsContext *context);
extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
extern void ExecEndModifyTable(ModifyTableState *node);
extern void ExecReScanModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 44dd73fc80..1253fdb0ed 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -576,6 +576,7 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct ExecLockRelsInfo *es_execlockrelsinfo; /* QueryDesc.execlockrelsinfo */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -964,6 +965,101 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * ExecLockRelsInfo
+ *
+ * Result of performing ExecutorGetLockRels() for a given PlannedStmt
+ */
+typedef struct ExecLockRelsInfo
+{
+ NodeTag type;
+
+ /*
+ * Relations that must be locked to execute the plan tree contained in
+ * the PlannedStmt.
+ */
+ Bitmapset *lockrels;
+
+ /* PlannedStmt.numPlanNodes */
+ int numPlanNodes;
+
+ /*
+ * List of PlanInitPruningOutput, each representing the output of
+ * performing initial pruning on a given plan node, for all nodes in the
+ * plan tree that have been marked as needing initial pruning.
+ *
+ * 'ipoIndexes' is an array of 'numPlanNodes' elements, indexed with
+ * plan_node_id of the individual nodes in the plan tree, each a 1-based
+ * index into 'initPruningOutputs' list for a given plan node. 0 means
+ * that a given plan node has no entry in the list because of not needing
+ * any initial pruning done on it.
+ */
+ List *initPruningOutputs;
+ int *ipoIndexes;
+} ExecLockRelsInfo;
+
+/*----------------
+ * ExecGetLockRelsContext
+ *
+ * Information pertaining to ExecutorGetLockRels() invocation for a given
+ * plan.
+ */
+typedef struct ExecGetLockRelsContext
+{
+ NodeTag type;
+
+ PlannedStmt *stmt; /* target plan */
+ ParamListInfo params; /* EXTERN parameters available for pruning */
+
+ /* Output parameters for ExecGetLockRels and its subroutines. */
+ Bitmapset *lockrels;
+
+ /* See the omment in the definition of ExecLockRelsInfo struct. */
+ List *initPruningOutputs;
+ int *ipoIndexes;
+} ExecGetLockRelsContext;
+
+/*
+ * Appends the provided PlanInitPruningOutput to
+ * ExecGetLockRelsContext.initPruningOutput
+ */
+#define ExecStorePlanInitPruningOutput(cxt, initPruningOutput, plannode) \
+ do { \
+ (cxt)->initPruningOutputs = lappend((cxt)->initPruningOutputs, initPruningOutput); \
+ (cxt)->ipoIndexes[(plannode)->plan_node_id] = list_length((cxt)->initPruningOutputs); \
+ } while (0)
+
+/*
+ * Finds the PlanInitPruningOutput for a given Plan node in
+ * ExecLockRelsInfo.initPruningOutputs.
+ */
+#define ExecFetchPlanInitPruningOutput(execlockrelsinfo, plannode) \
+ (((execlockrelsinfo) != NULL && (execlockrelsinfo)->initPruningOutputs != NIL) ? \
+ list_nth((execlockrelsinfo)->initPruningOutputs, \
+ (execlockrelsinfo)->ipoIndexes[(plannode)->plan_node_id] - 1) : NULL)
+
+/* ---------------
+ * PlanInitPruningOutput
+ *
+ * Node to remember the result of performing initial partition pruning steps
+ * during ExecutorGetLockRels() on nodes that support pruning.
+ *
+ * ExecLockRelsDoInitPruning(), which runs during ExecutorGetLockRels(),
+ * creates it and stores it in the corresponding ExecLockRelsInfo.
+ *
+ * ExecInitPartitionPruning(), which runs during ExecuorStart(), fetches it
+ * from the EState's ExecLockRelsInfo (if any) and uses the value of
+ * initially_valid_subplans contained in it as-is to select the subplans to be
+ * initialized for execution, instead of re-evaluating that by performing
+ * initial pruning again.
+ */
+typedef struct PlanInitPruningOutput
+{
+ NodeTag type;
+
+ Bitmapset *initially_valid_subplans;
+} PlanInitPruningOutput;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 5d075f0c34..d365fc4402 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -96,6 +96,11 @@ typedef enum NodeTag
T_PartitionPruneStepCombine,
T_PlanInvalItem,
+ /* TAGS FOR EXECUTOR PREP NODES (execnodes.h) */
+ T_ExecGetLockRelsContext,
+ T_ExecLockRelsInfo,
+ T_PlanInitPruningOutput,
+
/*
* TAGS FOR PLAN STATE NODES (execnodes.h)
*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 5327d9ba8b..019719c1a4 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -129,6 +129,10 @@ typedef struct PlannerGlobal
char maxParallelHazard; /* worst PROPARALLEL hazard level */
+ bool containsInitialPruning; /* Do some Plan nodes in the tree
+ * have initial (pre-exec) pruning
+ * steps? */
+
PartitionDirectory partition_directory; /* partition descriptors */
Bitmapset *elidedAppendPartedRels; /* Combined partitioned_rels of all
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index bd87c35d6c..bfdb5bbf28 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -59,10 +59,16 @@ typedef struct PlannedStmt
bool parallelModeNeeded; /* parallel mode required to execute? */
+ bool containsInitialPruning; /* Do some Plan nodes in the tree
+ * have initial (pre-exec) pruning
+ * steps? */
+
int jitFlags; /* which forms of JIT should be performed */
struct Plan *planTree; /* tree of Plan nodes */
+ int numPlanNodes; /* number of nodes in planTree */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
@@ -1189,6 +1195,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1197,6 +1210,8 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 92291a750d..bf80c53bed 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -64,7 +64,7 @@ extern PlannedStmt *pg_plan_query(Query *querytree, const char *query_string,
ParamListInfo boundParams);
extern List *pg_plan_queries(List *querytrees, const char *query_string,
int cursorOptions,
- ParamListInfo boundParams);
+ ParamListInfo boundParams, List **execlockrelsinfo_list);
extern bool check_max_stack_depth(int *newval, void **extra, GucSource source);
extern void assign_max_stack_depth(int newval, void *extra);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 95b99e3d25..56b0dcc6bd 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -148,6 +148,9 @@ typedef struct CachedPlan
{
int magic; /* should equal CACHEDPLAN_MAGIC */
List *stmt_list; /* list of PlannedStmts */
+ List *execlockrelsinfo_list; /* list of ExecutorGetLockRelsResult with one
+ * element for each of stmt_list; NIL
+ * if not a generic plan */
bool is_oneshot; /* is it a "oneshot" plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
@@ -158,6 +161,9 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+ MemoryContext execlockrelsinfo_context; /* context containing
+ * execlockrelsinfo_list,
+ * a child of the above context */
} CachedPlan;
/*
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9abace6734 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,10 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *execlockrelsinfos; /* list of ExecutorGetLockRelsResults with one element
+ * for each of 'stmts'; same as
+ * cplan->execlockrelsinfo_list if cplan is
+ * not NULL */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -241,6 +245,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *execlockrelsinfos,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.24.1
v6-0001-Some-refactoring-of-runtime-pruning-code.patchapplication/x-patch; name=v6-0001-Some-refactoring-of-runtime-pruning-code.patchDownload
From df8186c0e4a76f31c1f803a953f2c98ac88f9dc8 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 2 Mar 2022 15:17:55 +0900
Subject: [PATCH v6 1/4] Some refactoring of runtime pruning code
This does two things mainly:
* Move the execution pruning initialization steps that are common
between both ExecInitAppend() and ExecInitMergeAppend() into a new
function ExecInitPartitionPruning() defined in execPartition.c.
Thus, ExecCreatePartitionPruneState() and
ExecFindInitialMatchingSubPlans() need not be exported.
* Add an ExprContext field to PartitionPruneContext to remove the
implicit assumption in the runtime pruning code that the ExprContext
to use to compute pruning expressions that need one can always rely
on the PlanState providing it. A future patch will allow runtime
pruning (at least the initial pruning steps) to be performed without
the corresponding PlanState yet having been created, so this will
help.
---
src/backend/executor/execPartition.c | 340 ++++++++++++++++---------
src/backend/executor/nodeAppend.c | 33 +--
src/backend/executor/nodeMergeAppend.c | 32 +--
src/backend/partitioning/partprune.c | 20 +-
src/include/executor/execPartition.h | 9 +-
src/include/partitioning/partprune.h | 2 +
6 files changed, 252 insertions(+), 184 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 90ed1485d1..7ff5a95f05 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,11 +182,18 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
bool *isnull,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
+static PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *partitionpruneinfo);
+static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate);
static void ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate);
+ PlanState *planstate,
+ ExprContext *econtext);
+static void PartitionPruneStateFixSubPlanMap(PartitionPruneState *prunestate,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans);
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
@@ -1485,30 +1492,86 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
*
* Functions:
*
- * ExecCreatePartitionPruneState:
+ * ExecInitPartitionPruning:
* Creates the PartitionPruneState required by each of the two pruning
* functions. Details stored include how to map the partition index
- * returned by the partition pruning code into subplan indexes.
- *
- * ExecFindInitialMatchingSubPlans:
- * Returns indexes of matching subplans. Partition pruning is attempted
- * without any evaluation of expressions containing PARAM_EXEC Params.
- * This function must be called during executor startup for the parent
- * plan before the subplans themselves are initialized. Subplans which
- * are found not to match by this function must be removed from the
- * plan's list of subplans during execution, as this function performs a
- * remap of the partition index to subplan index map and the newly
- * created map provides indexes only for subplans which remain after
- * calling this function.
+ * returned by the partition pruning code into subplan indexes. Also
+ * determines the set of initially valid subplans by performing initial
+ * pruning steps, only which need be initialized by the caller such as
+ * ExecInitAppend. Maps in PartitionPruneState are updated to account
+ * for initial pruning having eliminated some of the subplans, if any.
*
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating all available
- * expressions. This function can only be called during execution and
- * must be called again each time the value of a Param listed in
- * PartitionPruneState's 'execparamids' changes.
+ * expressions, that is, using execution pruning steps. This function can
+ * can only be called during execution and must be called again each time
+ * the value of a Param listed in PartitionPruneState's 'execparamids'
+ * changes.
*-------------------------------------------------------------------------
*/
+/*
+ * ExecInitPartitionPruning
+ * Initialize data structure needed for run-time partition pruning
+ *
+ * Initial pruning can be done immediately, so it is done here if needed and
+ * the set of surviving partition subplans' indexes are added to the output
+ * parameter *initially_valid_subplans.
+ *
+ * If subplans are indeed pruned, subplan_map arrays contained in the returned
+ * PartitionPruneState are re-sequenced to not count those, though only if the
+ * maps will be needed for subsequent execution pruning passes.
+ */
+PartitionPruneState *
+ExecInitPartitionPruning(PlanState *planstate,
+ int n_total_subplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **initially_valid_subplans)
+{
+ PartitionPruneState *prunestate;
+ EState *estate = planstate->state;
+
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /*
+ * Create the working data structure for pruning.
+ */
+ prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo);
+
+ /*
+ * Perform an initial partition prune, if required.
+ */
+ if (prunestate->do_initial_prune)
+ {
+ /* Determine which subplans survive initial pruning */
+ *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate);
+ }
+ else
+ {
+ /* We'll need to initialize all subplans */
+ Assert(n_total_subplans > 0);
+ *initially_valid_subplans = bms_add_range(NULL, 0,
+ n_total_subplans - 1);
+ }
+
+ /*
+ * Re-sequence subplan indexes contained in prunestate to account for any
+ * that were removed above due to initial pruning.
+ *
+ * We can safely skip this when !do_exec_prune, even though that leaves
+ * invalid data in prunestate, because that data won't be consulted again
+ * (cf initial Assert in ExecFindMatchingSubPlans).
+ */
+ if (prunestate->do_exec_prune &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ PartitionPruneStateFixSubPlanMap(prunestate,
+ *initially_valid_subplans,
+ n_total_subplans);
+
+ return prunestate;
+}
+
/*
* ExecCreatePartitionPruneState
* Build the data structure required for calling
@@ -1527,7 +1590,7 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* re-used each time we re-evaluate which partitions match the pruning steps
* provided in each PartitionedRelPruneInfo.
*/
-PartitionPruneState *
+static PartitionPruneState *
ExecCreatePartitionPruneState(PlanState *planstate,
PartitionPruneInfo *partitionpruneinfo)
{
@@ -1536,6 +1599,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
int n_part_hierarchies;
ListCell *lc;
int i;
+ ExprContext *econtext = planstate->ps_ExprContext;
/* For data reading, executor always omits detached partitions */
if (estate->es_partition_directory == NULL)
@@ -1709,7 +1773,8 @@ ExecCreatePartitionPruneState(PlanState *planstate,
{
ExecInitPruningContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether initial pruning is needed at any level */
prunestate->do_initial_prune = true;
}
@@ -1718,7 +1783,8 @@ ExecCreatePartitionPruneState(PlanState *planstate,
{
ExecInitPruningContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether exec pruning is needed at any level */
prunestate->do_exec_prune = true;
}
@@ -1746,7 +1812,8 @@ ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate)
+ PlanState *planstate,
+ ExprContext *econtext)
{
int n_steps;
int partnatts;
@@ -1767,6 +1834,7 @@ ExecInitPruningContext(PartitionPruneContext *context,
context->ppccontext = CurrentMemoryContext;
context->planstate = planstate;
+ context->exprcontext = econtext;
/* Initialize expression state for each expression we need */
context->exprstates = (ExprState **)
@@ -1795,8 +1863,20 @@ ExecInitPruningContext(PartitionPruneContext *context,
step->step.step_id,
keyno);
- context->exprstates[stateidx] =
- ExecInitExpr(expr, context->planstate);
+ /*
+ * When planstate is NULL, pruning_steps is known not to
+ * contain any expressions that depend on the parent plan.
+ * Information of any available EXTERN parameters must be
+ * passed explicitly in that case, which the caller must
+ * have made available via econtext.
+ */
+ if (planstate == NULL)
+ context->exprstates[stateidx] =
+ ExecInitExprWithParams(expr,
+ econtext->ecxt_param_list_info);
+ else
+ context->exprstates[stateidx] =
+ ExecInitExpr(expr, context->planstate);
}
keyno++;
}
@@ -1809,18 +1889,11 @@ ExecInitPruningContext(PartitionPruneContext *context,
* pruning, disregarding any pruning constraints involving PARAM_EXEC
* Params.
*
- * If additional pruning passes will be required (because of PARAM_EXEC
- * Params), we must also update the translation data that allows conversion
- * of partition indexes into subplan indexes to account for the unneeded
- * subplans having been removed.
- *
* Must only be called once per 'prunestate', and only if initial pruning
* is required.
- *
- * 'nsubplans' must be passed as the total number of unpruned subplans.
*/
-Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
+static Bitmapset *
+ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -1845,14 +1918,20 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
PartitionedRelPruningData *pprune;
prunedata = prunestate->partprunedata[i];
+
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
pprune = &prunedata->partrelprunedata[0];
/* Perform pruning without using PARAM_EXEC Params */
find_matching_subplans_recurse(prunedata, pprune, true, &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /* Expression eval may have used space in ExprContext too */
if (pprune->initial_pruning_steps)
- ResetExprContext(pprune->initial_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->initial_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
@@ -1865,118 +1944,120 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
MemoryContextReset(prunestate->prune_context);
+ return result;
+}
+
+/*
+ * PartitionPruneStateFixSubPlanMap
+ * Fix mapping of partition indexes to subplan indexes contained in
+ * prunestate by considering the new list of subplans that survived
+ * initial pruning
+ *
+ * Subplans would previously be indexed 0..(n_total_subplans - 1) should be
+ * changed to index range 0..num(initially_valid_subplans).
+ */
+static void
+PartitionPruneStateFixSubPlanMap(PartitionPruneState *prunestate,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans)
+{
+ int *new_subplan_indexes;
+ Bitmapset *new_other_subplans;
+ int i;
+ int newidx;
+
/*
- * If exec-time pruning is required and we pruned subplans above, then we
- * must re-sequence the subplan indexes so that ExecFindMatchingSubPlans
- * properly returns the indexes from the subplans which will remain after
- * execution of this function.
- *
- * We can safely skip this when !do_exec_prune, even though that leaves
- * invalid data in prunestate, because that data won't be consulted again
- * (cf initial Assert in ExecFindMatchingSubPlans).
+ * First we must build a temporary array which maps old subplan
+ * indexes to new ones. For convenience of initialization, we use
+ * 1-based indexes in this array and leave pruned items as 0.
*/
- if (prunestate->do_exec_prune && bms_num_members(result) < nsubplans)
+ new_subplan_indexes = (int *) palloc0(sizeof(int) * n_total_subplans);
+ newidx = 1;
+ i = -1;
+ while ((i = bms_next_member(initially_valid_subplans, i)) >= 0)
{
- int *new_subplan_indexes;
- Bitmapset *new_other_subplans;
- int i;
- int newidx;
+ Assert(i < n_total_subplans);
+ new_subplan_indexes[i] = newidx++;
+ }
- /*
- * First we must build a temporary array which maps old subplan
- * indexes to new ones. For convenience of initialization, we use
- * 1-based indexes in this array and leave pruned items as 0.
- */
- new_subplan_indexes = (int *) palloc0(sizeof(int) * nsubplans);
- newidx = 1;
- i = -1;
- while ((i = bms_next_member(result, i)) >= 0)
- {
- Assert(i < nsubplans);
- new_subplan_indexes[i] = newidx++;
- }
+ /*
+ * Now we can update each PartitionedRelPruneInfo's subplan_map with
+ * new subplan indexes. We must also recompute its present_parts
+ * bitmap.
+ */
+ for (i = 0; i < prunestate->num_partprunedata; i++)
+ {
+ PartitionPruningData *prunedata = prunestate->partprunedata[i];
+ int j;
/*
- * Now we can update each PartitionedRelPruneInfo's subplan_map with
- * new subplan indexes. We must also recompute its present_parts
- * bitmap.
+ * Within each hierarchy, we perform this loop in back-to-front
+ * order so that we determine present_parts for the lowest-level
+ * partitioned tables first. This way we can tell whether a
+ * sub-partitioned table's partitions were entirely pruned so we
+ * can exclude it from the current level's present_parts.
*/
- for (i = 0; i < prunestate->num_partprunedata; i++)
+ for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
{
- PartitionPruningData *prunedata = prunestate->partprunedata[i];
- int j;
+ PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
+ int nparts = pprune->nparts;
+ int k;
- /*
- * Within each hierarchy, we perform this loop in back-to-front
- * order so that we determine present_parts for the lowest-level
- * partitioned tables first. This way we can tell whether a
- * sub-partitioned table's partitions were entirely pruned so we
- * can exclude it from the current level's present_parts.
- */
- for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
- {
- PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
- int nparts = pprune->nparts;
- int k;
+ /* We just rebuild present_parts from scratch */
+ bms_free(pprune->present_parts);
+ pprune->present_parts = NULL;
- /* We just rebuild present_parts from scratch */
- bms_free(pprune->present_parts);
- pprune->present_parts = NULL;
+ for (k = 0; k < nparts; k++)
+ {
+ int oldidx = pprune->subplan_map[k];
+ int subidx;
- for (k = 0; k < nparts; k++)
+ /*
+ * If this partition existed as a subplan then change the
+ * old subplan index to the new subplan index. The new
+ * index may become -1 if the partition was pruned above,
+ * or it may just come earlier in the subplan list due to
+ * some subplans being removed earlier in the list. If
+ * it's a subpartition, add it to present_parts unless
+ * it's entirely pruned.
+ */
+ if (oldidx >= 0)
{
- int oldidx = pprune->subplan_map[k];
- int subidx;
-
- /*
- * If this partition existed as a subplan then change the
- * old subplan index to the new subplan index. The new
- * index may become -1 if the partition was pruned above,
- * or it may just come earlier in the subplan list due to
- * some subplans being removed earlier in the list. If
- * it's a subpartition, add it to present_parts unless
- * it's entirely pruned.
- */
- if (oldidx >= 0)
- {
- Assert(oldidx < nsubplans);
- pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
+ Assert(oldidx < n_total_subplans);
+ pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
- if (new_subplan_indexes[oldidx] > 0)
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
- }
- else if ((subidx = pprune->subpart_map[k]) >= 0)
- {
- PartitionedRelPruningData *subprune;
+ if (new_subplan_indexes[oldidx] > 0)
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
+ }
+ else if ((subidx = pprune->subpart_map[k]) >= 0)
+ {
+ PartitionedRelPruningData *subprune;
- subprune = &prunedata->partrelprunedata[subidx];
+ subprune = &prunedata->partrelprunedata[subidx];
- if (!bms_is_empty(subprune->present_parts))
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
- }
+ if (!bms_is_empty(subprune->present_parts))
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
}
}
}
+ }
- /*
- * We must also recompute the other_subplans set, since indexes in it
- * may change.
- */
- new_other_subplans = NULL;
- i = -1;
- while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
- new_other_subplans = bms_add_member(new_other_subplans,
- new_subplan_indexes[i] - 1);
-
- bms_free(prunestate->other_subplans);
- prunestate->other_subplans = new_other_subplans;
+ /*
+ * We must also recompute the other_subplans set, since indexes in it
+ * may change.
+ */
+ new_other_subplans = NULL;
+ i = -1;
+ while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
+ new_other_subplans = bms_add_member(new_other_subplans,
+ new_subplan_indexes[i] - 1);
- pfree(new_subplan_indexes);
- }
+ bms_free(prunestate->other_subplans);
+ prunestate->other_subplans = new_other_subplans;
- return result;
+ pfree(new_subplan_indexes);
}
/*
@@ -2018,11 +2099,16 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
prunedata = prunestate->partprunedata[i];
pprune = &prunedata->partrelprunedata[0];
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
find_matching_subplans_recurse(prunedata, pprune, false, &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
- ResetExprContext(pprune->exec_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->exec_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 7937f1c88f..5b6d3eb23b 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -138,30 +138,17 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
{
PartitionPruneState *prunestate;
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &appendstate->ps);
-
- /* Create the working data structure for pruning. */
- prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
- node->part_prune_info);
+ /*
+ * Set up pruning data structure. Initial pruning steps, if any, are
+ * performed as part of the setup, adding the set of indexes of
+ * surviving subplans to 'validsubplans'.
+ */
+ prunestate = ExecInitPartitionPruning(&appendstate->ps,
+ list_length(node->appendplans),
+ node->part_prune_info,
+ &validsubplans);
appendstate->as_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
- {
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->appendplans));
-
- nplans = bms_num_members(validsubplans);
- }
- else
- {
- /* We'll need to initialize all subplans */
- nplans = list_length(node->appendplans);
- Assert(nplans > 0);
- validsubplans = bms_add_range(NULL, 0, nplans - 1);
- }
+ nplans = bms_num_members(validsubplans);
/*
* When no run-time pruning is required and there's at least one
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 418f89dea8..9a9f29e845 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -86,29 +86,17 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
{
PartitionPruneState *prunestate;
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &mergestate->ps);
-
- prunestate = ExecCreatePartitionPruneState(&mergestate->ps,
- node->part_prune_info);
+ /*
+ * Set up pruning data structure. Initial pruning steps, if any, are
+ * performed as part of the setup, adding the set of indexes of
+ * surviving subplans to 'validsubplans'.
+ */
+ prunestate = ExecInitPartitionPruning(&mergestate->ps,
+ list_length(node->mergeplans),
+ node->part_prune_info,
+ &validsubplans);
mergestate->ms_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
- {
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->mergeplans));
-
- nplans = bms_num_members(validsubplans);
- }
- else
- {
- /* We'll need to initialize all subplans */
- nplans = list_length(node->mergeplans);
- Assert(nplans > 0);
- validsubplans = bms_add_range(NULL, 0, nplans - 1);
- }
+ nplans = bms_num_members(validsubplans);
/*
* When no run-time pruning is required and there's at least one
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 1bc00826c1..7080cb25d9 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -798,6 +798,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
/* These are not valid when being called from the planner */
context.planstate = NULL;
+ context.exprcontext = NULL;
context.exprstates = NULL;
/* Actual pruning happens here. */
@@ -808,8 +809,8 @@ prune_append_rel_partitions(RelOptInfo *rel)
* get_matching_partitions
* Determine partitions that survive partition pruning
*
- * Note: context->planstate must be set to a valid PlanState when the
- * pruning_steps were generated with a target other than PARTTARGET_PLANNER.
+ * Note: context->exprcontext must be valid when the pruning_steps were
+ * generated with a target other than PARTTARGET_PLANNER.
*
* Returns a Bitmapset of the RelOptInfo->part_rels indexes of the surviving
* partitions.
@@ -3654,7 +3655,7 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
* exprstate array.
*
* Note that the evaluated result may be in the per-tuple memory context of
- * context->planstate->ps_ExprContext, and we may have leaked other memory
+ * context->exprcontext, and we may have leaked other memory
* there too. This memory must be recovered by resetting that ExprContext
* after we're done with the pruning operation (see execPartition.c).
*/
@@ -3677,13 +3678,18 @@ partkey_datum_from_expr(PartitionPruneContext *context,
ExprContext *ectx;
/*
- * We should never see a non-Const in a step unless we're running in
- * the executor.
+ * We should never see a non-Const in a step unless the caller has
+ * passed a valid ExprContext.
+ *
+ * When context->planstate is valid, context->exprcontext is same
+ * as context->planstate->ps_ExprContext.
*/
- Assert(context->planstate != NULL);
+ Assert(context->planstate != NULL || context->exprcontext != NULL);
+ Assert(context->planstate == NULL ||
+ (context->exprcontext == context->planstate->ps_ExprContext));
exprstate = context->exprstates[stateidx];
- ectx = context->planstate->ps_ExprContext;
+ ectx = context->exprcontext;
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 603d8becc4..fd5735a946 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -119,10 +119,9 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
EState *estate);
extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
PartitionTupleRouting *proute);
-extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
+extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
+ int n_total_subplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
-extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
- int nsubplans);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index ee11b6feae..90684efa25 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -41,6 +41,7 @@ struct RelOptInfo;
* subsidiary data, such as the FmgrInfos.
* planstate Points to the parent plan node's PlanState when called
* during execution; NULL when called from the planner.
+ * exprcontext ExprContext to use when evaluating pruning expressions
* exprstates Array of ExprStates, indexed as per PruneCxtStateIdx; one
* for each partition key in each pruning step. Allocated if
* planstate is non-NULL, otherwise NULL.
@@ -56,6 +57,7 @@ typedef struct PartitionPruneContext
FmgrInfo *stepcmpfuncs;
MemoryContext ppccontext;
PlanState *planstate;
+ ExprContext *exprcontext;
ExprState **exprstates;
} PartitionPruneContext;
--
2.24.1
On Mon, Mar 28, 2022 at 4:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
Other than the changes mentioned above, the updated patch now contains
a bit more commentary than earlier versions, mostly around
AcquireExecutorLocks()'s new way of determining the set of relations
to lock and the significantly redesigned working of the "initial"
execution pruning.
Forgot to rebase over the latest HEAD, so here's v7. Also fixed that
_out and _read functions for PlanInitPruningOutput were using an
obsolete node label.
--
Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v7-0002-Add-Merge-Append.partitioned_rels.patchapplication/octet-stream; name=v7-0002-Add-Merge-Append.partitioned_rels.patchDownload
From b43aac217ba51854c5a22636f94f14e81bae3991 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 24 Mar 2022 22:47:03 +0900
Subject: [PATCH v7 2/4] Add [Merge]Append.partitioned_rels
To record the RT indexes of all partitioned ancestors leading up to
leaf partitions that are appended by the node.
If a given [Merge]Append node is left out from the plan due to there
being only one element in its list of child subplans, then its
partitioned_rels set is added to PlannerGlobal.elidedAppendPartedRels
that is passed down to the executor through PlannedStmt.
There are no users for partitioned_rels and elidedAppendPartedRels
as of this commit, though a later commit will require the ability
to extract the set of relations that must be locked to make a plan
tree safe for execution by walking the plan tree itself, so having
the partitioned tables be also present in the plan tree will be
helpful. Note that currently the executor relies on the fact that
the set of relations to be locked can be obtained by simply scanning
the range table that's made available in PlannedStmt along with the
plan tree.
---
src/backend/nodes/copyfuncs.c | 3 +++
src/backend/nodes/outfuncs.c | 5 +++++
src/backend/nodes/readfuncs.c | 3 +++
src/backend/optimizer/path/joinrels.c | 9 ++++++++
src/backend/optimizer/plan/createplan.c | 18 +++++++++++++++-
src/backend/optimizer/plan/planner.c | 8 +++++++
src/backend/optimizer/plan/setrefs.c | 28 +++++++++++++++++++++++++
src/backend/optimizer/util/inherit.c | 16 ++++++++++++++
src/backend/optimizer/util/relnode.c | 20 ++++++++++++++++++
src/include/nodes/pathnodes.h | 22 +++++++++++++++++++
src/include/nodes/plannodes.h | 17 +++++++++++++++
11 files changed, 148 insertions(+), 1 deletion(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 2cbd8aa0df..d4b5cc7e59 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -106,6 +106,7 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_NODE_FIELD(invalItems);
COPY_NODE_FIELD(paramExecTypes);
COPY_NODE_FIELD(utilityStmt);
+ COPY_BITMAPSET_FIELD(elidedAppendPartedRels);
COPY_LOCATION_FIELD(stmt_location);
COPY_SCALAR_FIELD(stmt_len);
@@ -253,6 +254,7 @@ _copyAppend(const Append *from)
COPY_SCALAR_FIELD(nasyncplans);
COPY_SCALAR_FIELD(first_partial_plan);
COPY_NODE_FIELD(part_prune_info);
+ COPY_BITMAPSET_FIELD(partitioned_rels);
return newnode;
}
@@ -281,6 +283,7 @@ _copyMergeAppend(const MergeAppend *from)
COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
COPY_NODE_FIELD(part_prune_info);
+ COPY_BITMAPSET_FIELD(partitioned_rels);
return newnode;
}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index c25f0bd684..99056272f3 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -324,6 +324,7 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_NODE_FIELD(invalItems);
WRITE_NODE_FIELD(paramExecTypes);
WRITE_NODE_FIELD(utilityStmt);
+ WRITE_BITMAPSET_FIELD(elidedAppendPartedRels);
WRITE_LOCATION_FIELD(stmt_location);
WRITE_INT_FIELD(stmt_len);
}
@@ -443,6 +444,7 @@ _outAppend(StringInfo str, const Append *node)
WRITE_INT_FIELD(nasyncplans);
WRITE_INT_FIELD(first_partial_plan);
WRITE_NODE_FIELD(part_prune_info);
+ WRITE_BITMAPSET_FIELD(partitioned_rels);
}
static void
@@ -460,6 +462,7 @@ _outMergeAppend(StringInfo str, const MergeAppend *node)
WRITE_OID_ARRAY(collations, node->numCols);
WRITE_BOOL_ARRAY(nullsFirst, node->numCols);
WRITE_NODE_FIELD(part_prune_info);
+ WRITE_BITMAPSET_FIELD(partitioned_rels);
}
static void
@@ -2333,6 +2336,7 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_BOOL_FIELD(parallelModeOK);
WRITE_BOOL_FIELD(parallelModeNeeded);
WRITE_CHAR_FIELD(maxParallelHazard);
+ WRITE_BITMAPSET_FIELD(elidedAppendPartedRels);
}
static void
@@ -2444,6 +2448,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_BOOL_FIELD(partbounds_merged);
WRITE_BITMAPSET_FIELD(live_parts);
WRITE_BITMAPSET_FIELD(all_partrels);
+ WRITE_BITMAPSET_FIELD(partitioned_rels);
}
static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index e0b3ad1ed2..7536f216bd 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1662,6 +1662,7 @@ _readPlannedStmt(void)
READ_NODE_FIELD(invalItems);
READ_NODE_FIELD(paramExecTypes);
READ_NODE_FIELD(utilityStmt);
+ READ_BITMAPSET_FIELD(elidedAppendPartedRels);
READ_LOCATION_FIELD(stmt_location);
READ_INT_FIELD(stmt_len);
@@ -1784,6 +1785,7 @@ _readAppend(void)
READ_INT_FIELD(nasyncplans);
READ_INT_FIELD(first_partial_plan);
READ_NODE_FIELD(part_prune_info);
+ READ_BITMAPSET_FIELD(partitioned_rels);
READ_DONE();
}
@@ -1806,6 +1808,7 @@ _readMergeAppend(void)
READ_OID_ARRAY(collations, local_node->numCols);
READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
READ_NODE_FIELD(part_prune_info);
+ READ_BITMAPSET_FIELD(partitioned_rels);
READ_DONE();
}
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 9da3ff2f9a..e74d40fee3 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1549,6 +1549,15 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
child_restrictlist);
+
+ /*
+ * A parent relation's partitioned_rels must be a superset of the sets
+ * of all its children, direct or indirect, so bubble up the child
+ * joinrel's set.
+ */
+ joinrel->partitioned_rels =
+ bms_add_members(joinrel->partitioned_rels,
+ child_joinrel->partitioned_rels);
}
}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index fa069a217c..0026086591 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -26,10 +26,12 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
#include "optimizer/paramassign.h"
+#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
@@ -1331,11 +1333,11 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
best_path->subpaths,
prunequal);
}
-
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
plan->part_prune_info = partpruneinfo;
+ plan->partitioned_rels = bms_copy(rel->partitioned_rels);
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1499,6 +1501,20 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
node->mergeplans = subplans;
node->part_prune_info = partpruneinfo;
+ /*
+ * We need to explicitly add to the plan node the RT indexes of any
+ * partitioned tables whose partitions will be scanned by the nodes in
+ * 'subplans'. There can be multiple RT indexes in the set due to the
+ * partition tree being multi-level and/or this being a plan for UNION ALL
+ * over multiple partition trees. Along with scanrelids of leaf-level Scan
+ * nodes, this allows the executor to lock the full set of relations being
+ * scanned by this node.
+ *
+ * Note that 'apprelids' only contains the top-level base relation(s), so
+ * is not sufficient for the purpose.
+ */
+ node->partitioned_rels = bms_copy(rel->partitioned_rels);
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
* produce either the exact tlist or a narrow tlist, we should get rid of
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index bd09f85aea..374a9d9753 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -529,6 +529,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->paramExecTypes = glob->paramExecTypes;
/* utilityStmt should be null, but we might as well copy it */
result->utilityStmt = parse->utilityStmt;
+ result->elidedAppendPartedRels = glob->elidedAppendPartedRels;
result->stmt_location = parse->stmt_location;
result->stmt_len = parse->stmt_len;
@@ -7365,6 +7366,13 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, grouped_rel, grouped_live_children);
}
+
+ /*
+ * Input rel might be a partitioned appendrel, though grouped_rel has at
+ * this point taken its role as the an appendrel owning the former's
+ * children, so copy the former's partitioned_rels set into the latter.
+ */
+ grouped_rel->partitioned_rels = bms_copy(input_rel->partitioned_rels);
}
/*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index a7b11b7f03..dbdeb8ec9d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1512,6 +1512,10 @@ set_append_references(PlannerInfo *root,
lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset);
}
+ /* Fix up partitioned_rels before possibly removing the Append below. */
+ aplan->partitioned_rels = offset_relid_set(aplan->partitioned_rels,
+ rtoffset);
+
/*
* See if it's safe to get rid of the Append entirely. For this to be
* safe, there must be only one child plan and that child plan's parallel
@@ -1522,8 +1526,17 @@ set_append_references(PlannerInfo *root,
*/
if (list_length(aplan->appendplans) == 1 &&
((Plan *) linitial(aplan->appendplans))->parallel_aware == aplan->plan.parallel_aware)
+ {
+ /*
+ * Partitioned table involved, if any, must be made known to the
+ * executor.
+ */
+ root->glob->elidedAppendPartedRels =
+ bms_add_members(root->glob->elidedAppendPartedRels,
+ aplan->partitioned_rels);
return clean_up_removed_plan_level((Plan *) aplan,
(Plan *) linitial(aplan->appendplans));
+ }
/*
* Otherwise, clean up the Append as needed. It's okay to do this after
@@ -1584,6 +1597,12 @@ set_mergeappend_references(PlannerInfo *root,
lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset);
}
+ /*
+ * Fix up partitioned_rels before possibly removing the MergeAppend below.
+ */
+ mplan->partitioned_rels = offset_relid_set(mplan->partitioned_rels,
+ rtoffset);
+
/*
* See if it's safe to get rid of the MergeAppend entirely. For this to
* be safe, there must be only one child plan and that child plan's
@@ -1594,8 +1613,17 @@ set_mergeappend_references(PlannerInfo *root,
*/
if (list_length(mplan->mergeplans) == 1 &&
((Plan *) linitial(mplan->mergeplans))->parallel_aware == mplan->plan.parallel_aware)
+ {
+ /*
+ * Partitioned tables involved, if any, must be made known to the
+ * executor.
+ */
+ root->glob->elidedAppendPartedRels =
+ bms_add_members(root->glob->elidedAppendPartedRels,
+ mplan->partitioned_rels);
return clean_up_removed_plan_level((Plan *) mplan,
(Plan *) linitial(mplan->mergeplans));
+ }
/*
* Otherwise, clean up the MergeAppend as needed. It's okay to do this
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 7e134822f3..56912e4101 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -406,6 +406,14 @@ expand_partitioned_rtentry(PlannerInfo *root, RelOptInfo *relinfo,
childrte, childRTindex,
childrel, top_parentrc, lockmode);
+ /*
+ * A parent relation's partitioned_rels must be a superset of the sets
+ * of all its children, direct or indirect, so bubble up the child
+ * rel's set.
+ */
+ relinfo->partitioned_rels = bms_add_members(relinfo->partitioned_rels,
+ childrelinfo->partitioned_rels);
+
/* Close child relation, but keep locks */
table_close(childrel, NoLock);
}
@@ -737,6 +745,14 @@ expand_appendrel_subquery(PlannerInfo *root, RelOptInfo *rel,
/* Child may itself be an inherited rel, either table or subquery. */
if (childrte->inh)
expand_inherited_rtentry(root, childrel, childrte, childRTindex);
+
+ /*
+ * A parent relation's partitioned_rels must be a superset of the sets
+ * of all its children, direct or indirect, so bubble up the child
+ * rel's set.
+ */
+ rel->partitioned_rels = bms_add_members(rel->partitioned_rels,
+ childrel->partitioned_rels);
}
}
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 520409f4ba..1d082a8fdd 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -361,6 +361,10 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
}
}
+ /* A partitioned appendrel. */
+ if (rel->part_scheme != NULL)
+ rel->partitioned_rels = bms_copy(rel->relids);
+
/* Save the finished struct in the query's simple_rel_array */
root->simple_rel_array[relid] = rel;
@@ -729,6 +733,14 @@ build_join_rel(PlannerInfo *root,
set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
sjinfo, restrictlist);
+ /*
+ * The joinrel may get processed as an appendrel via partitionwise join
+ * if both outer and inner rels are partitioned, so set partitioned_rels
+ * appropriately.
+ */
+ joinrel->partitioned_rels = bms_union(outer_rel->partitioned_rels,
+ inner_rel->partitioned_rels);
+
/*
* Set the consider_parallel flag if this joinrel could potentially be
* scanned within a parallel worker. If this flag is false for either
@@ -897,6 +909,14 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
sjinfo, restrictlist);
+ /*
+ * The joinrel may get processed as an appendrel via partitionwise join
+ * if both outer and inner rels are partitioned, so set partitioned_rels
+ * appropriately.
+ */
+ joinrel->partitioned_rels = bms_union(outer_rel->partitioned_rels,
+ inner_rel->partitioned_rels);
+
/* We build the join only once. */
Assert(!find_join_rel(root, joinrel->relids));
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1f3845b3fe..5327d9ba8b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -130,6 +130,11 @@ typedef struct PlannerGlobal
char maxParallelHazard; /* worst PROPARALLEL hazard level */
PartitionDirectory partition_directory; /* partition descriptors */
+
+ Bitmapset *elidedAppendPartedRels; /* Combined partitioned_rels of all
+ * single-subplan [Merge]Append nodes
+ * that have been removed fron the
+ * various plan trees. */
} PlannerGlobal;
/* macro for fetching the Plan associated with a SubPlan node */
@@ -773,6 +778,23 @@ typedef struct RelOptInfo
Relids all_partrels; /* Relids set of all partition relids */
List **partexprs; /* Non-nullable partition key expressions */
List **nullable_partexprs; /* Nullable partition key expressions */
+
+ /*
+ * For an appendrel parent relation (base, join, or upper) that is
+ * partitioned, this stores the RT indexes of all the paritioned ancestors
+ * including itself that lead up to the individual leaf partitions that
+ * will be scanned to produce this relation's output rows. The relid set
+ * is copied into the resulting Append or MergeAppend plan node for
+ * allowing the executor to take appropriate locks on those relations,
+ * unless the node is deemed useless in setrefs.c due to having a single
+ * leaf subplan and thus elided from the final plan, in which case, the set
+ * is added into PlannerGlobal.elidedAppendPartedRels.
+ *
+ * Note that 'apprelids' of those nodes only contains the top-level base
+ * relation(s), so is not sufficient for said purpose.
+ */
+
+ Bitmapset *partitioned_rels;
} RelOptInfo;
/*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 0b518ce6b2..bd87c35d6c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -85,6 +85,11 @@ typedef struct PlannedStmt
Node *utilityStmt; /* non-null if this is utility stmt */
+ Bitmapset *elidedAppendPartedRels; /* Combined partitioned_rels of all
+ * single-subplan [Merge]Append nodes
+ * that have been removed from the
+ * various plan trees. */
+
/* statement location in source string (copied from Query) */
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
@@ -261,6 +266,12 @@ typedef struct Append
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /*
+ * RT indexes of all partitioned parents whose partitions' plans are
+ * present in appendplans.
+ */
+ Bitmapset *partitioned_rels;
} Append;
/* ----------------
@@ -281,6 +292,12 @@ typedef struct MergeAppend
bool *nullsFirst; /* NULLS FIRST/LAST directions */
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /*
+ * RT indexes of all partitioned parents whose partitions' plans are
+ * present in appendplans.
+ */
+ Bitmapset *partitioned_rels;
} MergeAppend;
/* ----------------
--
2.24.1
v7-0003-Add-a-plan_tree_walker.patchapplication/octet-stream; name=v7-0003-Add-a-plan_tree_walker.patchDownload
From 761e6c2583b37eb9d45d64de954d65d953277040 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 3 Mar 2022 16:04:13 +0900
Subject: [PATCH v7 3/4] Add a plan_tree_walker()
Like planstate_tree_walker() but for uninitialized plan trees.
---
src/backend/nodes/nodeFuncs.c | 116 ++++++++++++++++++++++++++++++++++
src/include/nodes/nodeFuncs.h | 3 +
2 files changed, 119 insertions(+)
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 25cf282aab..5e5158ea0e 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -31,6 +31,10 @@ static bool planstate_walk_subplans(List *plans, bool (*walker) (),
void *context);
static bool planstate_walk_members(PlanState **planstates, int nplans,
bool (*walker) (), void *context);
+static bool plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context);
+static bool plan_walk_members(List *plans, bool (*walker) (), void *context);
/*
@@ -4368,3 +4372,115 @@ planstate_walk_members(PlanState **planstates, int nplans,
return false;
}
+
+/*
+ * plan_tree_walker --- walk plantrees
+ *
+ * The walker has already visited the current node, and so we need only
+ * recurse into any sub-nodes it has.
+ */
+bool
+plan_tree_walker(Plan *plan,
+ bool (*walker) (),
+ void *context)
+{
+ /* Guard against stack overflow due to overly complex plan trees */
+ check_stack_depth();
+
+ /* initPlan-s */
+ if (plan_walk_subplans(plan->initPlan, walker, context))
+ return true;
+
+ /* lefttree */
+ if (outerPlan(plan))
+ {
+ if (walker(outerPlan(plan), context))
+ return true;
+ }
+
+ /* righttree */
+ if (innerPlan(plan))
+ {
+ if (walker(innerPlan(plan), context))
+ return true;
+ }
+
+ /* special child plans */
+ switch (nodeTag(plan))
+ {
+ case T_Append:
+ if (plan_walk_members(((Append *) plan)->appendplans,
+ walker, context))
+ return true;
+ break;
+ case T_MergeAppend:
+ if (plan_walk_members(((MergeAppend *) plan)->mergeplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapAnd:
+ if (plan_walk_members(((BitmapAnd *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapOr:
+ if (plan_walk_members(((BitmapOr *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_CustomScan:
+ if (plan_walk_members(((CustomScan *) plan)->custom_plans,
+ walker, context))
+ return true;
+ break;
+ case T_SubqueryScan:
+ if (walker(((SubqueryScan *) plan)->subplan, context))
+ return true;
+ break;
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * Walk a list of SubPlans (or initPlans, which also use SubPlan nodes).
+ */
+static bool
+plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context)
+{
+ ListCell *lc;
+ PlannedStmt *plannedstmt = (PlannedStmt *) context;
+
+ foreach(lc, plans)
+ {
+ SubPlan *sp = lfirst_node(SubPlan, lc);
+ Plan *p = list_nth(plannedstmt->subplans, sp->plan_id - 1);
+
+ if (walker(p, context))
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Walk the constituent plans of a ModifyTable, Append, MergeAppend,
+ * BitmapAnd, or BitmapOr node.
+ */
+static bool
+plan_walk_members(List *plans, bool (*walker) (), void *context)
+{
+ ListCell *lc;
+
+ foreach(lc, plans)
+ {
+ if (walker(lfirst(lc), context))
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/include/nodes/nodeFuncs.h b/src/include/nodes/nodeFuncs.h
index 93c60bde66..fca107ad65 100644
--- a/src/include/nodes/nodeFuncs.h
+++ b/src/include/nodes/nodeFuncs.h
@@ -158,5 +158,8 @@ extern bool raw_expression_tree_walker(Node *node, bool (*walker) (),
struct PlanState;
extern bool planstate_tree_walker(struct PlanState *planstate, bool (*walker) (),
void *context);
+struct Plan;
+extern bool plan_tree_walker(struct Plan *plan, bool (*walker) (),
+ void *context);
#endif /* NODEFUNCS_H */
--
2.24.1
v7-0001-Some-refactoring-of-runtime-pruning-code.patchapplication/octet-stream; name=v7-0001-Some-refactoring-of-runtime-pruning-code.patchDownload
From 60ec0ebb911a2c7c8cc13ea9f96e1fb2038842a0 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 2 Mar 2022 15:17:55 +0900
Subject: [PATCH v7 1/4] Some refactoring of runtime pruning code
This does two things mainly:
* Move the execution pruning initialization steps that are common
between both ExecInitAppend() and ExecInitMergeAppend() into a new
function ExecInitPartitionPruning() defined in execPartition.c.
Thus, ExecCreatePartitionPruneState() and
ExecFindInitialMatchingSubPlans() need not be exported.
* Add an ExprContext field to PartitionPruneContext to remove the
implicit assumption in the runtime pruning code that the ExprContext
to use to compute pruning expressions that need one can always rely
on the PlanState providing it. A future patch will allow runtime
pruning (at least the initial pruning steps) to be performed without
the corresponding PlanState yet having been created, so this will
help.
---
src/backend/executor/execPartition.c | 340 ++++++++++++++++---------
src/backend/executor/nodeAppend.c | 33 +--
src/backend/executor/nodeMergeAppend.c | 32 +--
src/backend/partitioning/partprune.c | 20 +-
src/include/executor/execPartition.h | 9 +-
src/include/partitioning/partprune.h | 2 +
6 files changed, 252 insertions(+), 184 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 90ed1485d1..7ff5a95f05 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,11 +182,18 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
bool *isnull,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
+static PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *partitionpruneinfo);
+static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate);
static void ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate);
+ PlanState *planstate,
+ ExprContext *econtext);
+static void PartitionPruneStateFixSubPlanMap(PartitionPruneState *prunestate,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans);
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
@@ -1485,30 +1492,86 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
*
* Functions:
*
- * ExecCreatePartitionPruneState:
+ * ExecInitPartitionPruning:
* Creates the PartitionPruneState required by each of the two pruning
* functions. Details stored include how to map the partition index
- * returned by the partition pruning code into subplan indexes.
- *
- * ExecFindInitialMatchingSubPlans:
- * Returns indexes of matching subplans. Partition pruning is attempted
- * without any evaluation of expressions containing PARAM_EXEC Params.
- * This function must be called during executor startup for the parent
- * plan before the subplans themselves are initialized. Subplans which
- * are found not to match by this function must be removed from the
- * plan's list of subplans during execution, as this function performs a
- * remap of the partition index to subplan index map and the newly
- * created map provides indexes only for subplans which remain after
- * calling this function.
+ * returned by the partition pruning code into subplan indexes. Also
+ * determines the set of initially valid subplans by performing initial
+ * pruning steps, only which need be initialized by the caller such as
+ * ExecInitAppend. Maps in PartitionPruneState are updated to account
+ * for initial pruning having eliminated some of the subplans, if any.
*
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating all available
- * expressions. This function can only be called during execution and
- * must be called again each time the value of a Param listed in
- * PartitionPruneState's 'execparamids' changes.
+ * expressions, that is, using execution pruning steps. This function can
+ * can only be called during execution and must be called again each time
+ * the value of a Param listed in PartitionPruneState's 'execparamids'
+ * changes.
*-------------------------------------------------------------------------
*/
+/*
+ * ExecInitPartitionPruning
+ * Initialize data structure needed for run-time partition pruning
+ *
+ * Initial pruning can be done immediately, so it is done here if needed and
+ * the set of surviving partition subplans' indexes are added to the output
+ * parameter *initially_valid_subplans.
+ *
+ * If subplans are indeed pruned, subplan_map arrays contained in the returned
+ * PartitionPruneState are re-sequenced to not count those, though only if the
+ * maps will be needed for subsequent execution pruning passes.
+ */
+PartitionPruneState *
+ExecInitPartitionPruning(PlanState *planstate,
+ int n_total_subplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **initially_valid_subplans)
+{
+ PartitionPruneState *prunestate;
+ EState *estate = planstate->state;
+
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /*
+ * Create the working data structure for pruning.
+ */
+ prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo);
+
+ /*
+ * Perform an initial partition prune, if required.
+ */
+ if (prunestate->do_initial_prune)
+ {
+ /* Determine which subplans survive initial pruning */
+ *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate);
+ }
+ else
+ {
+ /* We'll need to initialize all subplans */
+ Assert(n_total_subplans > 0);
+ *initially_valid_subplans = bms_add_range(NULL, 0,
+ n_total_subplans - 1);
+ }
+
+ /*
+ * Re-sequence subplan indexes contained in prunestate to account for any
+ * that were removed above due to initial pruning.
+ *
+ * We can safely skip this when !do_exec_prune, even though that leaves
+ * invalid data in prunestate, because that data won't be consulted again
+ * (cf initial Assert in ExecFindMatchingSubPlans).
+ */
+ if (prunestate->do_exec_prune &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ PartitionPruneStateFixSubPlanMap(prunestate,
+ *initially_valid_subplans,
+ n_total_subplans);
+
+ return prunestate;
+}
+
/*
* ExecCreatePartitionPruneState
* Build the data structure required for calling
@@ -1527,7 +1590,7 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* re-used each time we re-evaluate which partitions match the pruning steps
* provided in each PartitionedRelPruneInfo.
*/
-PartitionPruneState *
+static PartitionPruneState *
ExecCreatePartitionPruneState(PlanState *planstate,
PartitionPruneInfo *partitionpruneinfo)
{
@@ -1536,6 +1599,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
int n_part_hierarchies;
ListCell *lc;
int i;
+ ExprContext *econtext = planstate->ps_ExprContext;
/* For data reading, executor always omits detached partitions */
if (estate->es_partition_directory == NULL)
@@ -1709,7 +1773,8 @@ ExecCreatePartitionPruneState(PlanState *planstate,
{
ExecInitPruningContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether initial pruning is needed at any level */
prunestate->do_initial_prune = true;
}
@@ -1718,7 +1783,8 @@ ExecCreatePartitionPruneState(PlanState *planstate,
{
ExecInitPruningContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether exec pruning is needed at any level */
prunestate->do_exec_prune = true;
}
@@ -1746,7 +1812,8 @@ ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate)
+ PlanState *planstate,
+ ExprContext *econtext)
{
int n_steps;
int partnatts;
@@ -1767,6 +1834,7 @@ ExecInitPruningContext(PartitionPruneContext *context,
context->ppccontext = CurrentMemoryContext;
context->planstate = planstate;
+ context->exprcontext = econtext;
/* Initialize expression state for each expression we need */
context->exprstates = (ExprState **)
@@ -1795,8 +1863,20 @@ ExecInitPruningContext(PartitionPruneContext *context,
step->step.step_id,
keyno);
- context->exprstates[stateidx] =
- ExecInitExpr(expr, context->planstate);
+ /*
+ * When planstate is NULL, pruning_steps is known not to
+ * contain any expressions that depend on the parent plan.
+ * Information of any available EXTERN parameters must be
+ * passed explicitly in that case, which the caller must
+ * have made available via econtext.
+ */
+ if (planstate == NULL)
+ context->exprstates[stateidx] =
+ ExecInitExprWithParams(expr,
+ econtext->ecxt_param_list_info);
+ else
+ context->exprstates[stateidx] =
+ ExecInitExpr(expr, context->planstate);
}
keyno++;
}
@@ -1809,18 +1889,11 @@ ExecInitPruningContext(PartitionPruneContext *context,
* pruning, disregarding any pruning constraints involving PARAM_EXEC
* Params.
*
- * If additional pruning passes will be required (because of PARAM_EXEC
- * Params), we must also update the translation data that allows conversion
- * of partition indexes into subplan indexes to account for the unneeded
- * subplans having been removed.
- *
* Must only be called once per 'prunestate', and only if initial pruning
* is required.
- *
- * 'nsubplans' must be passed as the total number of unpruned subplans.
*/
-Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
+static Bitmapset *
+ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -1845,14 +1918,20 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
PartitionedRelPruningData *pprune;
prunedata = prunestate->partprunedata[i];
+
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
pprune = &prunedata->partrelprunedata[0];
/* Perform pruning without using PARAM_EXEC Params */
find_matching_subplans_recurse(prunedata, pprune, true, &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /* Expression eval may have used space in ExprContext too */
if (pprune->initial_pruning_steps)
- ResetExprContext(pprune->initial_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->initial_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
@@ -1865,118 +1944,120 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
MemoryContextReset(prunestate->prune_context);
+ return result;
+}
+
+/*
+ * PartitionPruneStateFixSubPlanMap
+ * Fix mapping of partition indexes to subplan indexes contained in
+ * prunestate by considering the new list of subplans that survived
+ * initial pruning
+ *
+ * Subplans would previously be indexed 0..(n_total_subplans - 1) should be
+ * changed to index range 0..num(initially_valid_subplans).
+ */
+static void
+PartitionPruneStateFixSubPlanMap(PartitionPruneState *prunestate,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans)
+{
+ int *new_subplan_indexes;
+ Bitmapset *new_other_subplans;
+ int i;
+ int newidx;
+
/*
- * If exec-time pruning is required and we pruned subplans above, then we
- * must re-sequence the subplan indexes so that ExecFindMatchingSubPlans
- * properly returns the indexes from the subplans which will remain after
- * execution of this function.
- *
- * We can safely skip this when !do_exec_prune, even though that leaves
- * invalid data in prunestate, because that data won't be consulted again
- * (cf initial Assert in ExecFindMatchingSubPlans).
+ * First we must build a temporary array which maps old subplan
+ * indexes to new ones. For convenience of initialization, we use
+ * 1-based indexes in this array and leave pruned items as 0.
*/
- if (prunestate->do_exec_prune && bms_num_members(result) < nsubplans)
+ new_subplan_indexes = (int *) palloc0(sizeof(int) * n_total_subplans);
+ newidx = 1;
+ i = -1;
+ while ((i = bms_next_member(initially_valid_subplans, i)) >= 0)
{
- int *new_subplan_indexes;
- Bitmapset *new_other_subplans;
- int i;
- int newidx;
+ Assert(i < n_total_subplans);
+ new_subplan_indexes[i] = newidx++;
+ }
- /*
- * First we must build a temporary array which maps old subplan
- * indexes to new ones. For convenience of initialization, we use
- * 1-based indexes in this array and leave pruned items as 0.
- */
- new_subplan_indexes = (int *) palloc0(sizeof(int) * nsubplans);
- newidx = 1;
- i = -1;
- while ((i = bms_next_member(result, i)) >= 0)
- {
- Assert(i < nsubplans);
- new_subplan_indexes[i] = newidx++;
- }
+ /*
+ * Now we can update each PartitionedRelPruneInfo's subplan_map with
+ * new subplan indexes. We must also recompute its present_parts
+ * bitmap.
+ */
+ for (i = 0; i < prunestate->num_partprunedata; i++)
+ {
+ PartitionPruningData *prunedata = prunestate->partprunedata[i];
+ int j;
/*
- * Now we can update each PartitionedRelPruneInfo's subplan_map with
- * new subplan indexes. We must also recompute its present_parts
- * bitmap.
+ * Within each hierarchy, we perform this loop in back-to-front
+ * order so that we determine present_parts for the lowest-level
+ * partitioned tables first. This way we can tell whether a
+ * sub-partitioned table's partitions were entirely pruned so we
+ * can exclude it from the current level's present_parts.
*/
- for (i = 0; i < prunestate->num_partprunedata; i++)
+ for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
{
- PartitionPruningData *prunedata = prunestate->partprunedata[i];
- int j;
+ PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
+ int nparts = pprune->nparts;
+ int k;
- /*
- * Within each hierarchy, we perform this loop in back-to-front
- * order so that we determine present_parts for the lowest-level
- * partitioned tables first. This way we can tell whether a
- * sub-partitioned table's partitions were entirely pruned so we
- * can exclude it from the current level's present_parts.
- */
- for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
- {
- PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
- int nparts = pprune->nparts;
- int k;
+ /* We just rebuild present_parts from scratch */
+ bms_free(pprune->present_parts);
+ pprune->present_parts = NULL;
- /* We just rebuild present_parts from scratch */
- bms_free(pprune->present_parts);
- pprune->present_parts = NULL;
+ for (k = 0; k < nparts; k++)
+ {
+ int oldidx = pprune->subplan_map[k];
+ int subidx;
- for (k = 0; k < nparts; k++)
+ /*
+ * If this partition existed as a subplan then change the
+ * old subplan index to the new subplan index. The new
+ * index may become -1 if the partition was pruned above,
+ * or it may just come earlier in the subplan list due to
+ * some subplans being removed earlier in the list. If
+ * it's a subpartition, add it to present_parts unless
+ * it's entirely pruned.
+ */
+ if (oldidx >= 0)
{
- int oldidx = pprune->subplan_map[k];
- int subidx;
-
- /*
- * If this partition existed as a subplan then change the
- * old subplan index to the new subplan index. The new
- * index may become -1 if the partition was pruned above,
- * or it may just come earlier in the subplan list due to
- * some subplans being removed earlier in the list. If
- * it's a subpartition, add it to present_parts unless
- * it's entirely pruned.
- */
- if (oldidx >= 0)
- {
- Assert(oldidx < nsubplans);
- pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
+ Assert(oldidx < n_total_subplans);
+ pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
- if (new_subplan_indexes[oldidx] > 0)
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
- }
- else if ((subidx = pprune->subpart_map[k]) >= 0)
- {
- PartitionedRelPruningData *subprune;
+ if (new_subplan_indexes[oldidx] > 0)
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
+ }
+ else if ((subidx = pprune->subpart_map[k]) >= 0)
+ {
+ PartitionedRelPruningData *subprune;
- subprune = &prunedata->partrelprunedata[subidx];
+ subprune = &prunedata->partrelprunedata[subidx];
- if (!bms_is_empty(subprune->present_parts))
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
- }
+ if (!bms_is_empty(subprune->present_parts))
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
}
}
}
+ }
- /*
- * We must also recompute the other_subplans set, since indexes in it
- * may change.
- */
- new_other_subplans = NULL;
- i = -1;
- while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
- new_other_subplans = bms_add_member(new_other_subplans,
- new_subplan_indexes[i] - 1);
-
- bms_free(prunestate->other_subplans);
- prunestate->other_subplans = new_other_subplans;
+ /*
+ * We must also recompute the other_subplans set, since indexes in it
+ * may change.
+ */
+ new_other_subplans = NULL;
+ i = -1;
+ while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
+ new_other_subplans = bms_add_member(new_other_subplans,
+ new_subplan_indexes[i] - 1);
- pfree(new_subplan_indexes);
- }
+ bms_free(prunestate->other_subplans);
+ prunestate->other_subplans = new_other_subplans;
- return result;
+ pfree(new_subplan_indexes);
}
/*
@@ -2018,11 +2099,16 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
prunedata = prunestate->partprunedata[i];
pprune = &prunedata->partrelprunedata[0];
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
find_matching_subplans_recurse(prunedata, pprune, false, &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
- ResetExprContext(pprune->exec_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->exec_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 7937f1c88f..5b6d3eb23b 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -138,30 +138,17 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
{
PartitionPruneState *prunestate;
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &appendstate->ps);
-
- /* Create the working data structure for pruning. */
- prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
- node->part_prune_info);
+ /*
+ * Set up pruning data structure. Initial pruning steps, if any, are
+ * performed as part of the setup, adding the set of indexes of
+ * surviving subplans to 'validsubplans'.
+ */
+ prunestate = ExecInitPartitionPruning(&appendstate->ps,
+ list_length(node->appendplans),
+ node->part_prune_info,
+ &validsubplans);
appendstate->as_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
- {
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->appendplans));
-
- nplans = bms_num_members(validsubplans);
- }
- else
- {
- /* We'll need to initialize all subplans */
- nplans = list_length(node->appendplans);
- Assert(nplans > 0);
- validsubplans = bms_add_range(NULL, 0, nplans - 1);
- }
+ nplans = bms_num_members(validsubplans);
/*
* When no run-time pruning is required and there's at least one
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 418f89dea8..9a9f29e845 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -86,29 +86,17 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
{
PartitionPruneState *prunestate;
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &mergestate->ps);
-
- prunestate = ExecCreatePartitionPruneState(&mergestate->ps,
- node->part_prune_info);
+ /*
+ * Set up pruning data structure. Initial pruning steps, if any, are
+ * performed as part of the setup, adding the set of indexes of
+ * surviving subplans to 'validsubplans'.
+ */
+ prunestate = ExecInitPartitionPruning(&mergestate->ps,
+ list_length(node->mergeplans),
+ node->part_prune_info,
+ &validsubplans);
mergestate->ms_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
- {
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->mergeplans));
-
- nplans = bms_num_members(validsubplans);
- }
- else
- {
- /* We'll need to initialize all subplans */
- nplans = list_length(node->mergeplans);
- Assert(nplans > 0);
- validsubplans = bms_add_range(NULL, 0, nplans - 1);
- }
+ nplans = bms_num_members(validsubplans);
/*
* When no run-time pruning is required and there's at least one
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 1bc00826c1..7080cb25d9 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -798,6 +798,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
/* These are not valid when being called from the planner */
context.planstate = NULL;
+ context.exprcontext = NULL;
context.exprstates = NULL;
/* Actual pruning happens here. */
@@ -808,8 +809,8 @@ prune_append_rel_partitions(RelOptInfo *rel)
* get_matching_partitions
* Determine partitions that survive partition pruning
*
- * Note: context->planstate must be set to a valid PlanState when the
- * pruning_steps were generated with a target other than PARTTARGET_PLANNER.
+ * Note: context->exprcontext must be valid when the pruning_steps were
+ * generated with a target other than PARTTARGET_PLANNER.
*
* Returns a Bitmapset of the RelOptInfo->part_rels indexes of the surviving
* partitions.
@@ -3654,7 +3655,7 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
* exprstate array.
*
* Note that the evaluated result may be in the per-tuple memory context of
- * context->planstate->ps_ExprContext, and we may have leaked other memory
+ * context->exprcontext, and we may have leaked other memory
* there too. This memory must be recovered by resetting that ExprContext
* after we're done with the pruning operation (see execPartition.c).
*/
@@ -3677,13 +3678,18 @@ partkey_datum_from_expr(PartitionPruneContext *context,
ExprContext *ectx;
/*
- * We should never see a non-Const in a step unless we're running in
- * the executor.
+ * We should never see a non-Const in a step unless the caller has
+ * passed a valid ExprContext.
+ *
+ * When context->planstate is valid, context->exprcontext is same
+ * as context->planstate->ps_ExprContext.
*/
- Assert(context->planstate != NULL);
+ Assert(context->planstate != NULL || context->exprcontext != NULL);
+ Assert(context->planstate == NULL ||
+ (context->exprcontext == context->planstate->ps_ExprContext));
exprstate = context->exprstates[stateidx];
- ectx = context->planstate->ps_ExprContext;
+ ectx = context->exprcontext;
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 603d8becc4..fd5735a946 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -119,10 +119,9 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
EState *estate);
extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
PartitionTupleRouting *proute);
-extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
+extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
+ int n_total_subplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
-extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
- int nsubplans);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index ee11b6feae..90684efa25 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -41,6 +41,7 @@ struct RelOptInfo;
* subsidiary data, such as the FmgrInfos.
* planstate Points to the parent plan node's PlanState when called
* during execution; NULL when called from the planner.
+ * exprcontext ExprContext to use when evaluating pruning expressions
* exprstates Array of ExprStates, indexed as per PruneCxtStateIdx; one
* for each partition key in each pruning step. Allocated if
* planstate is non-NULL, otherwise NULL.
@@ -56,6 +57,7 @@ typedef struct PartitionPruneContext
FmgrInfo *stepcmpfuncs;
MemoryContext ppccontext;
PlanState *planstate;
+ ExprContext *exprcontext;
ExprState **exprstates;
} PartitionPruneContext;
--
2.24.1
v7-0004-Optimize-AcquireExecutorLocks-to-skip-pruned-part.patchapplication/octet-stream; name=v7-0004-Optimize-AcquireExecutorLocks-to-skip-pruned-part.patchDownload
From 14d951ca644860eec6d72ac03e3a95b12373938b Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v7 4/4] Optimize AcquireExecutorLocks() to skip pruned
partitions
Instead of locking all relations listed in the range table in the
cases where the PlannedStmt indicates that some nodes in the plan
tree can do partition pruning without depending on execution having
started (so called "initial" pruning), AcquireExecutorLocks() now
calls the new executor function ExecutorGetLockRels() which returns
a set of relations (their RT indexes) to be locked not including
those scanned by the subplans that pruned.
The result of pruning done this way must be remembered and reused
during actual execution of the plan, which is done by creating a
PlanInitPruningOutput nodes for for each plan node that undergoes
pruning and a set of those for the whole plan tree are added to
ExecLockRelsInfo which also stores the bitmapset of RT indexes of
relations that are actually locked by AcquireExecutorLocks().
ExecLockRelsInfos are passed down the executor alongside the
PlannedStmts. This arrangement ensures that the executor doesn't
accidentally try to process a plan tree subnodes that has been
deemed pruned by AcquireExecutorLocks().
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 13 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 17 +-
src/backend/executor/README | 24 +++
src/backend/executor/execMain.c | 202 ++++++++++++++++++++
src/backend/executor/execParallel.c | 26 ++-
src/backend/executor/execPartition.c | 224 ++++++++++++++++++----
src/backend/executor/execUtils.c | 8 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 52 ++++-
src/backend/executor/nodeMergeAppend.c | 52 ++++-
src/backend/executor/nodeModifyTable.c | 25 +++
src/backend/executor/spi.c | 14 +-
src/backend/nodes/copyfuncs.c | 49 ++++-
src/backend/nodes/outfuncs.c | 39 ++++
src/backend/nodes/readfuncs.c | 37 ++++
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 6 +
src/backend/partitioning/partprune.c | 37 +++-
src/backend/tcop/postgres.c | 15 +-
src/backend/tcop/pquery.c | 21 ++-
src/backend/utils/cache/plancache.c | 252 ++++++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execPartition.h | 2 +
src/include/executor/execdesc.h | 2 +
src/include/executor/executor.h | 2 +
src/include/executor/nodeAppend.h | 1 +
src/include/executor/nodeMergeAppend.h | 1 +
src/include/executor/nodeModifyTable.h | 1 +
src/include/nodes/execnodes.h | 96 ++++++++++
src/include/nodes/nodes.h | 5 +
src/include/nodes/pathnodes.h | 4 +
src/include/nodes/plannodes.h | 15 ++
src/include/tcop/tcopprot.h | 2 +-
src/include/utils/plancache.h | 6 +
src/include/utils/portal.h | 5 +
41 files changed, 1174 insertions(+), 104 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 55c38b04c4..d403eb2309 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -542,7 +542,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 9f632285b6..1f1a44b9bb 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, ExecLockRelsInfo *execlockrelsinfo,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, execlockrelsinfo, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 1013790dbb..008b8ce0e9 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -741,8 +741,10 @@ execute_sql_string(const char *sql)
RawStmt *parsetree = lfirst_node(RawStmt, lc1);
MemoryContext per_parsetree_context,
oldcontext;
- List *stmt_list;
- ListCell *lc2;
+ List *stmt_list,
+ *execlockrelsinfo_list;
+ ListCell *lc2,
+ *lc3;
/*
* We do the work for each parsetree in a short-lived context, to
@@ -762,11 +764,13 @@ execute_sql_string(const char *sql)
NULL,
0,
NULL);
- stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL);
+ stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL,
+ &execlockrelsinfo_list);
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, execlockrelsinfo_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc3);
CommandCounterIncrement();
@@ -777,6 +781,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ execlockrelsinfo,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 05e7b60059..4ef44aaf23 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -416,7 +416,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 9902c5c566..85e73ddded 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -107,6 +107,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ list_make1(NULL), /* no ExecLockRelsInfo to pass */
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 80738547ed..bbbf8bbcbd 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *plan_execlockrelsinfo_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -195,6 +196,7 @@ ExecuteQuery(ParseState *pstate,
/* Replan if needed, and increment plan refcount for portal */
cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
plan_list = cplan->stmt_list;
+ plan_execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/*
* DO NOT add any logic that could possibly throw an error between
@@ -204,7 +206,7 @@ ExecuteQuery(ParseState *pstate,
NULL,
query_string,
entry->plansource->commandTag,
- plan_list,
+ plan_list, plan_execlockrelsinfo_list,
cplan);
/*
@@ -576,7 +578,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *plan_execlockrelsinfo_list;
+ ListCell *p,
+ *pe;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -632,15 +636,18 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ plan_execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pe, plan_execlockrelsinfo_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, pe);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, execlockrelsinfo, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index bf5e70860d..9720d0ac2c 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,27 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+proceeds by locking all the relations that will be scanned by that plan. If
+the generic plan has nodes that contain so-called initial pruning steps (a
+subset of execution pruning steps that do not depend on full-fledged execution
+having started), they are performed at this point to figure out the minimal
+set of child subplans that satisfy those pruning instructions and the result
+of performing that pruning is saved in a data structure that gets passed to
+the executor alongside the plan tree. Relations scanned by only those
+surviving subplans are then locked while those scanned by the pruned subplans
+are not, even though the pruned subplans themselves are not removed from the
+plan tree. So, it is imperative that the executor and any third party code
+invoked by it that gets passed the plan tree look at the initial pruning result
+made available via the aforementioned data structure to determine whether or
+not a particular subplan is valid. (The data structure basically consists of
+an array of PlanInitPruningOutput nodes containing one element for each node
+of the plan tree indexable using plan_node_id of the individual plan nodes,
+where each node contains a bitmapset of indexes of unpruned child subplans of
+a given node.)
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -247,6 +268,9 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorGetLockRels ] --- an optional step to walk over the plan tree
+ to produce an ExecLockRelsInfo to be passed to CreateQueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 473d2e00a2..1ddd1dfb83 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,11 +49,15 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/nodeAppend.h"
+#include "executor/nodeMergeAppend.h"
+#include "executor/nodeModifyTable.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
#include "parser/parsetree.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
@@ -101,9 +105,205 @@ static char *ExecBuildSlotValueDescription(Oid reloid,
Bitmapset *modifiedCols,
int maxfieldlen);
static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
+static bool ExecGetScanLockRels(Scan *scan, ExecGetLockRelsContext *context);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorGetLockRels
+ *
+ * Figure out the minimal set of relations to lock to be able to safely
+ * execute a given plan
+ *
+ * This ignores the relations scanned by child subplans that are pruned away
+ * after performing initial pruning steps present in the plan using the
+ * provided set of EXTERN parameters.
+ *
+ * Along with the set of RT indexes of relations that must be locked, the
+ * returned struct also contains an array of PlanInitPruningOutput nodes each
+ * of which contains the result of initial pruning for a given plan node, which
+ * is basically a bitmapset of the indexes of surviving child subplans. Each
+ * plan node in the tree that undergoes pruning will have an element in the
+ * array.
+ *
+ * Note that while relations scanned by the subplans that are pruned will not
+ * be locked, the subplans themselves are left as-is in the plan tree, assuming
+ * anything that reads the plan tree during execution knows to ignore them by
+ * looking at the PlanInitPruningOutput's list of valid subplans.
+ *
+ * Partitioned tables mentioned in PartitionedRelPruneInfo nodes that drive
+ * the pruning will be locked before doing the pruning and also added to the
+ * the returned set.
+ */
+ExecLockRelsInfo *
+ExecutorGetLockRels(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ int numPlanNodes = plannedstmt->numPlanNodes;
+ ExecGetLockRelsContext context;
+ ExecLockRelsInfo *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ context.stmt = plannedstmt;
+ context.params = params;
+
+ /*
+ * Go walk all the plan tree(s) present in the PlannedStmt, filling
+ * context.lockrels with only the relations from plan nodes that
+ * survive initial pruning and also the tables mentioned in
+ * partitioned_rels sets found in the plan.
+ */
+ context.lockrels = NULL;
+ context.initPruningOutputs = NIL;
+ context.ipoIndexes = palloc0(sizeof(int) * numPlanNodes);
+
+ /* All the subplans. */
+ foreach(lc, plannedstmt->subplans)
+ {
+ Plan *subplan = lfirst(lc);
+
+ (void) ExecGetLockRels(subplan, &context);
+ }
+
+ /* And the main tree. */
+ (void) ExecGetLockRels(plannedstmt->planTree, &context);
+
+ /*
+ * Also be sure to lock partitioned relations from any [Merge]Append nodes
+ * that were originally present but were ultimately left out from the plan
+ * due to being deemed no-op nodes.
+ */
+ context.lockrels = bms_add_members(context.lockrels,
+ plannedstmt->elidedAppendPartedRels);
+
+ result = makeNode(ExecLockRelsInfo);
+ result->lockrels = context.lockrels;
+ result->numPlanNodes = numPlanNodes;
+ result->initPruningOutputs = context.initPruningOutputs;
+ result->ipoIndexes = context.ipoIndexes;
+
+ return result;
+}
+
+/* ------------------------------------------------------------------------
+ * ExecGetLockRels
+ * Adds all the relations that will be scanned by 'node' and its child
+ * plans to context->lockrels after taking into the account the effect
+ * of performing initial pruning if any
+ *
+ * context->stmt gives the PlannedStmt being inspected to access the plan's
+ * range table if needed and context->params the set of EXTERN parameters
+ * available to evaluate pruning parameters.
+ *
+ * If initial pruning is done, a PlanInitPruningOutput node containing the
+ * result of pruning will be stored in context->initPruningOutputs that will
+ * be made available to the executor to reuse.
+ * ------------------------------------------------------------------------
+ */
+bool
+ExecGetLockRels(Plan *node, ExecGetLockRelsContext *context)
+{
+ /* Do nothing when we get to the end of a leaf on tree. */
+ if (node == NULL)
+ return true;
+
+ /* Make sure there's enough stack available. */
+ check_stack_depth();
+
+ switch (nodeTag(node))
+ {
+ /* Currently, only these two nodes have prunable child subplans. */
+ case T_Append:
+ if (ExecGetAppendLockRels((Append *) node, context))
+ return true;
+ break;
+ case T_MergeAppend:
+ if (ExecGetMergeAppendLockRels((MergeAppend *) node,
+ context))
+ return true;
+ break;
+
+ /*
+ * And these manipulate relations that must be added context->lockrels.
+ */
+ case T_SeqScan:
+ case T_SampleScan:
+ case T_IndexScan:
+ case T_IndexOnlyScan:
+ case T_BitmapIndexScan:
+ case T_BitmapHeapScan:
+ case T_TidScan:
+ case T_TidRangeScan:
+ case T_ForeignScan:
+ case T_SubqueryScan:
+ case T_CustomScan:
+ if (ExecGetScanLockRels((Scan *) node, context))
+ return true;
+ break;
+ case T_ModifyTable:
+ if (ExecGetModifyTableLockRels((ModifyTable *) node, context))
+ return true;
+ /* plan_tree_walker() will visit the subplan (outerNode) */
+ break;
+
+ default:
+ break;
+ }
+
+ /* Recurse to subnodes. */
+ return plan_tree_walker(node, ExecGetLockRels, (void *) context);
+}
+
+/*
+ * ExecGetScanLockRels
+ * Do ExecGetLockRels()'s work for a leaf Scan node
+ */
+static bool
+ExecGetScanLockRels(Scan *scan, ExecGetLockRelsContext *context)
+{
+ switch (nodeTag(scan))
+ {
+ case T_ForeignScan:
+ {
+ ForeignScan *fscan = (ForeignScan *) scan;
+
+ context->lockrels = bms_add_members(context->lockrels,
+ fscan->fs_relids);
+ }
+ break;
+
+ case T_SubqueryScan:
+ {
+ SubqueryScan *sscan = (SubqueryScan *) scan;
+
+ (void) ExecGetLockRels((Plan *) sscan->subplan, context);
+ }
+ break;
+
+ case T_CustomScan:
+ {
+ CustomScan *cscan = (CustomScan *) scan;
+ ListCell *lc;
+
+ context->lockrels = bms_add_members(context->lockrels,
+ cscan->custom_relids);
+ foreach(lc, cscan->custom_plans)
+ {
+ (void) ExecGetLockRels((Plan *) lfirst(lc), context);
+ }
+ }
+ break;
+
+ default:
+ context->lockrels = bms_add_member(context->lockrels,
+ scan->scanrelid);
+ break;
+ }
+
+ return true;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -805,6 +1005,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ ExecLockRelsInfo *execlockrelsinfo = queryDesc->execlockrelsinfo;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -824,6 +1025,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_execlockrelsinfo = execlockrelsinfo;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 9a0d5d59ef..fb6dbd298a 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_EXECLOCKRELSINFO UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
@@ -596,12 +598,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *execlockrelsinfo_data;
+ char *execlockrelsinfo_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int execlockrelsinfo_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -630,6 +635,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ execlockrelsinfo_data = nodeToString(estate->es_execlockrelsinfo);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -656,6 +662,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized ExecLockRelsInfo. */
+ execlockrelsinfo_len = strlen(execlockrelsinfo_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, execlockrelsinfo_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -750,6 +761,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized ExecLockRelsInfo */
+ execlockrelsinfo_space = shm_toc_allocate(pcxt->toc, execlockrelsinfo_len);
+ memcpy(execlockrelsinfo_space, execlockrelsinfo_data, execlockrelsinfo_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECLOCKRELSINFO,
+ execlockrelsinfo_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1231,8 +1248,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *execlockrelsinfospace;
char *paramspace;
PlannedStmt *pstmt;
+ ExecLockRelsInfo *execlockrelsinfo;
ParamListInfo paramLI;
char *queryString;
@@ -1243,12 +1262,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied ExecLockRelsInfo. */
+ execlockrelsinfospace = shm_toc_lookup(toc, PARALLEL_KEY_EXECLOCKRELSINFO,
+ false);
+ execlockrelsinfo = (ExecLockRelsInfo *) stringToNode(execlockrelsinfospace);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, execlockrelsinfo,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7ff5a95f05..fddc97280e 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -24,6 +24,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -183,8 +184,13 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
-static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate);
+ PartitionPruneInfo *partitionpruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
+static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
+ PartitionPruneInfo *pruneinfo);
static void ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1483,8 +1489,9 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or even before during ExecutorGetLockRels().
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1496,10 +1503,17 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* Creates the PartitionPruneState required by each of the two pruning
* functions. Details stored include how to map the partition index
* returned by the partition pruning code into subplan indexes. Also
- * determines the set of initially valid subplans by performing initial
- * pruning steps, only which need be initialized by the caller such as
- * ExecInitAppend. Maps in PartitionPruneState are updated to account
- * for initial pruning having eliminated some of the subplans, if any.
+ * determines the set of initially valid subplans by either looking that
+ * up in the plan node's PlanInitPruningOutput if one found in
+ * EState.es_execlockrelinfo or by performing initial pruning steps.
+ * Only the subplans included in that need be initialized by the caller
+ * such as ExecInitAppend. Maps in PartitionPruneState are updated to
+ * account for initial pruning having eliminated some of the subplans,
+ * if any.
+ *
+ * ExecGetLockRelsDoInitialPruning:
+ * Do initial pruning as part of ExecGetLockRels() on the parent plan
+ * node
*
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating all available
@@ -1514,9 +1528,10 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* ExecInitPartitionPruning
* Initialize data structure needed for run-time partition pruning
*
- * Initial pruning can be done immediately, so it is done here if needed and
- * the set of surviving partition subplans' indexes are added to the output
- * parameter *initially_valid_subplans.
+ * Initial pruning can be done immediately, so it is done here unless it has
+ * already been done by ExecGetLockRelsDoInitialPruning(), and the set of
+ * surviving partition subplans' indexes are added to the output parameter
+ * *initially_valid_subplans.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1530,22 +1545,57 @@ ExecInitPartitionPruning(PlanState *planstate,
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ Plan *plan = planstate->plan;
+ PlanInitPruningOutput *initPruningOutput = NULL;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /* Retrieve the parent plan's PlanInitPruningOutput, if any. */
+ if (estate->es_execlockrelsinfo)
+ {
+ initPruningOutput = (PlanInitPruningOutput *)
+ ExecFetchPlanInitPruningOutput(estate->es_execlockrelsinfo, plan);
- /*
- * Create the working data structure for pruning.
- */
- prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo);
+ Assert(initPruningOutput != NULL &&
+ IsA(initPruningOutput, PlanInitPruningOutput));
+ /* No need to do initial pruning again, only exec pruning. */
+ do_pruning = pruneinfo->needs_exec_pruning;
+ }
+
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PlanInitPruningOutput.
+ */
+ prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo,
+ initPruningOutput == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune, if required.
*/
- if (prunestate->do_initial_prune)
+ if (initPruningOutput)
+ {
+ /* ExecGetLockRelsDoInitialPruning() already did it for us! */
+ *initially_valid_subplans = initPruningOutput->initially_valid_subplans;
+ }
+ else if (prunestate && prunestate->do_initial_prune)
{
/* Determine which subplans survive initial pruning */
- *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate);
+ *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate,
+ pruneinfo);
}
else
{
@@ -1563,7 +1613,7 @@ ExecInitPartitionPruning(PlanState *planstate,
* invalid data in prunestate, because that data won't be consulted again
* (cf initial Assert in ExecFindMatchingSubPlans).
*/
- if (prunestate->do_exec_prune &&
+ if (prunestate && prunestate->do_exec_prune &&
bms_num_members(*initially_valid_subplans) < n_total_subplans)
PartitionPruneStateFixSubPlanMap(prunestate,
*initially_valid_subplans,
@@ -1572,12 +1622,75 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecGetLockRelsDoInitialPruning
+ * Perform initial pruning as part of doing ExecGetLockRels() on the parent
+ * plan node
+ */
+Bitmapset *
+ExecGetLockRelsDoInitialPruning(Plan *plan, ExecGetLockRelsContext *context,
+ PartitionPruneInfo *pruneinfo)
+{
+ List *rtable = context->stmt->rtable;
+ ParamListInfo params = context->params;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ PlanInitPruningOutput *initPruningOutput;
+
+ /*
+ * A temporary context to allocate stuff needded to run the pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so must create
+ * a standalone ExprContext to evaluate pruning expressions, equipped with
+ * the information about the EXTERN parameters that the caller passed us.
+ * Note that that's okay because the initial pruning steps do not contain
+ * anything that requires the execution to have started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = ExecCreatePartitionPruneState(NULL, pruneinfo,
+ true, false,
+ rtable, econtext,
+ pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the pruning and populate a PlanInitPruningOutput for this node. */
+ initPruningOutput = makeNode(PlanInitPruningOutput);
+ initPruningOutput->initially_valid_subplans =
+ ExecFindInitialMatchingSubPlans(prunestate, pruneinfo);
+ ExecStorePlanInitPruningOutput(context, initPruningOutput, plan);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return initPruningOutput->initially_valid_subplans;
+}
+
/*
* ExecCreatePartitionPruneState
* Build the data structure required for calling
* ExecFindInitialMatchingSubPlans and ExecFindMatchingSubPlans.
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'partitionpruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1592,19 +1705,20 @@ ExecInitPartitionPruning(PlanState *planstate,
*/
static PartitionPruneState *
ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo)
+ PartitionPruneInfo *partitionpruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(partitionpruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1655,19 +1769,48 @@ ExecCreatePartitionPruneState(PlanState *planstate,
PartitionedRelPruneInfo *pinfo = lfirst_node(PartitionedRelPruneInfo, lc2);
PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
Relation partrel;
+ bool close_partrel = false;
PartitionDesc partdesc;
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorGetLockRels() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ close_partrel = true;
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (close_partrel)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1769,7 +1912,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
ExecInitPruningContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1779,7 +1922,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
ExecInitPruningContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -1893,7 +2036,8 @@ ExecInitPruningContext(PartitionPruneContext *context,
* is required.
*/
static Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
+ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
+ PartitionPruneInfo *pruneinfo)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -1903,8 +2047,8 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
Assert(prunestate->do_initial_prune);
/*
- * Switch to a temp context to avoid leaking memory in the executor's
- * query-lifespan memory context.
+ * Switch to a temp context to avoid leaking memory in the longer-term
+ * memory context.
*/
oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..7246f9175f 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,7 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_execlockrelsinfo = NULL;
estate->es_junkFilter = NULL;
@@ -785,6 +786,13 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rti > 0 && rti <= estate->es_range_table_size);
+ /*
+ * A cross-check that AcquireExecutorLocks() hasn't missed any relations
+ * it must not have.
+ */
+ Assert(estate->es_execlockrelsinfo == NULL ||
+ bms_is_member(rti, estate->es_execlockrelsinfo->lockrels));
+
rel = estate->es_relations[rti - 1];
if (rel == NULL)
{
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f9460ae506..a2182a6b1f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 5b6d3eb23b..9c6f907687 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -94,6 +94,55 @@ static bool ExecAppendAsyncRequest(AppendState *node, TupleTableSlot **result);
static void ExecAppendAsyncEventWait(AppendState *node);
static void classify_matching_subplans(AppendState *node);
+/* ----------------------------------------------------------------
+ * ExecGetAppendLockRels
+ * Do ExecGetLockRels()'s work for an Append plan
+ * ----------------------------------------------------------------
+ */
+bool
+ExecGetAppendLockRels(Append *node, ExecGetLockRelsContext *context)
+{
+ PartitionPruneInfo *pruneinfo = node->part_prune_info;
+
+ /*
+ * Must always lock all the partitioned tables whose direct and indirect
+ * partitions will be scanned by this Append.
+ */
+ context->lockrels = bms_add_members(context->lockrels,
+ node->partitioned_rels);
+
+ /*
+ * Now recurse to subplans to add relations scanned therein.
+ *
+ * If initial pruning can be done, do that now and only recurse to the
+ * surviving subplans.
+ */
+ if (pruneinfo && pruneinfo->needs_init_pruning)
+ {
+ List *subplans = node->appendplans;
+ Bitmapset *validsubplans;
+ int i;
+
+ validsubplans = ExecGetLockRelsDoInitialPruning((Plan *) node,
+ context, pruneinfo);
+
+ /* Recurse to surviving subplans. */
+ i = -1;
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ (void) ExecGetLockRels(subplan, context);
+ }
+
+ /* done with this node */
+ return true;
+ }
+
+ /* Tell the caller to recurse to *all* the subplans. */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitAppend
*
@@ -155,7 +204,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 9a9f29e845..4b04fcdbc2 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -54,6 +54,55 @@ typedef int32 SlotNumber;
static TupleTableSlot *ExecMergeAppend(PlanState *pstate);
static int heap_compare_slots(Datum a, Datum b, void *arg);
+/* ----------------------------------------------------------------
+ * ExecGetMergeAppendLockRels
+ * Do ExecGetLockRels()'s work for a MergeAppend plan
+ * ----------------------------------------------------------------
+ */
+bool
+ExecGetMergeAppendLockRels(MergeAppend *node, ExecGetLockRelsContext *context)
+{
+ PartitionPruneInfo *pruneinfo = node->part_prune_info;
+
+ /*
+ * Must always lock all the partitioned tables whose direct and indirect
+ * partitions will be scanned by this Append.
+ */
+ context->lockrels = bms_add_members(context->lockrels,
+ node->partitioned_rels);
+
+ /*
+ * Now recurse to subplans to add relations scanned therein.
+ *
+ * If initial pruning can be done, do that now and only recurse to the
+ * surviving subplans.
+ */
+ if (pruneinfo && pruneinfo->needs_init_pruning)
+ {
+ List *subplans = node->mergeplans;
+ Bitmapset *validsubplans;
+ int i;
+
+ validsubplans = ExecGetLockRelsDoInitialPruning((Plan *) node,
+ context, pruneinfo);
+
+ /* Recurse to surviving subplans. */
+ i = -1;
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ (void) ExecGetLockRels(subplan, context);
+ }
+
+ /* done with this node */
+ return true;
+ }
+
+ /* Tell the caller to recurse to *all* the subplans. */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitMergeAppend
@@ -103,7 +152,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 701fe05296..23df3efef0 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3008,6 +3008,31 @@ ExecLookupResultRelByOid(ModifyTableState *node, Oid resultoid,
return NULL;
}
+/*
+ * ExecGetModifyTableLockRels
+ * Do ExecGetLockRels()'s work for a ModifyTable plan
+ */
+bool
+ExecGetModifyTableLockRels(ModifyTable *plan, ExecGetLockRelsContext *context)
+{
+ ListCell *lc;
+
+ /* First add the result relation RTIs mentioned in the node. */
+ if (plan->rootRelation > 0)
+ context->lockrels = bms_add_member(context->lockrels,
+ plan->rootRelation);
+ context->lockrels = bms_add_member(context->lockrels,
+ plan->nominalRelation);
+ foreach(lc, plan->resultRelations)
+ {
+ context->lockrels = bms_add_member(context->lockrels,
+ lfirst_int(lc));
+ }
+
+ /* Tell the caller to recurse to the subplan (outerPlan(plan)). */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitModifyTable
* ----------------------------------------------------------------
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index a82e986667..2107009591 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *execlockrelsinfo_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1659,6 +1660,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
/* Replan if needed, and increment plan refcount for portal */
cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ execlockrelsinfo_list = cplan->execlockrelsinfo_list;
if (!plan->saved)
{
@@ -1670,6 +1672,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
oldcontext = MemoryContextSwitchTo(portal->portalContext);
stmt_list = copyObject(stmt_list);
+ execlockrelsinfo_list = copyObject(execlockrelsinfo_list);
MemoryContextSwitchTo(oldcontext);
ReleaseCachedPlan(cplan, NULL);
cplan = NULL; /* portal shouldn't depend on cplan */
@@ -1683,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ execlockrelsinfo_list,
cplan);
/*
@@ -2473,7 +2477,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *execlockrelsinfo_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2552,6 +2558,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2589,9 +2596,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, execlockrelsinfo_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2671,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, execlockrelsinfo,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index d4b5cc7e59..631727d310 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -68,6 +68,13 @@
} \
} while (0)
+/* Copy a field that is an array with numElem ints */
+#define COPY_INT_ARRAY(fldname, numElem) \
+ do { \
+ newnode->fldname = (numElem) > 0 ? palloc((numElem) * sizeof(int)) : NULL; \
+ memcpy(newnode->fldname, from->fldname, sizeof(int) * (numElem)); \
+ } while (0)
+
/* Copy a parse location field (for Copy, this is same as scalar case) */
#define COPY_LOCATION_FIELD(fldname) \
(newnode->fldname = from->fldname)
@@ -94,8 +101,10 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(transientPlan);
COPY_SCALAR_FIELD(dependsOnRole);
COPY_SCALAR_FIELD(parallelModeNeeded);
+ COPY_SCALAR_FIELD(containsInitialPruning);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
+ COPY_SCALAR_FIELD(numPlanNodes);
COPY_NODE_FIELD(rtable);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
@@ -1281,6 +1290,8 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(needs_init_pruning);
+ COPY_SCALAR_FIELD(needs_exec_pruning);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
@@ -5137,6 +5148,33 @@ _copyExtensibleNode(const ExtensibleNode *from)
return newnode;
}
+/* ****************************************************************
+ * execnodes.h copy functions
+ * ****************************************************************
+ */
+static ExecLockRelsInfo *
+_copyExecLockRelsInfo(const ExecLockRelsInfo *from)
+{
+ ExecLockRelsInfo *newnode = makeNode(ExecLockRelsInfo);
+
+ COPY_BITMAPSET_FIELD(lockrels);
+ COPY_SCALAR_FIELD(numPlanNodes);
+ COPY_NODE_FIELD(initPruningOutputs);
+ COPY_INT_ARRAY(ipoIndexes, from->numPlanNodes);
+
+ return newnode;
+}
+
+static PlanInitPruningOutput *
+_copyPlanInitPruningOutput(const PlanInitPruningOutput *from)
+{
+ PlanInitPruningOutput *newnode = makeNode(PlanInitPruningOutput);
+
+ COPY_BITMAPSET_FIELD(initially_valid_subplans);
+
+ return newnode;
+}
+
/* ****************************************************************
* value.h copy functions
* ****************************************************************
@@ -5191,7 +5229,6 @@ _copyBitString(const BitString *from)
return newnode;
}
-
static ForeignKeyCacheInfo *
_copyForeignKeyCacheInfo(const ForeignKeyCacheInfo *from)
{
@@ -6176,6 +6213,16 @@ copyObjectImpl(const void *from)
retval = _copyPublicationTable(from);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_ExecLockRelsInfo:
+ retval = _copyExecLockRelsInfo(from);
+ break;
+ case T_PlanInitPruningOutput:
+ retval = _copyPlanInitPruningOutput(from);
+ break;
+
/*
* MISCELLANEOUS NODES
*/
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 99056272f3..f361d2e2bc 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -312,8 +312,10 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(transientPlan);
WRITE_BOOL_FIELD(dependsOnRole);
WRITE_BOOL_FIELD(parallelModeNeeded);
+ WRITE_BOOL_FIELD(containsInitialPruning);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
+ WRITE_INT_FIELD(numPlanNodes);
WRITE_NODE_FIELD(rtable);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
@@ -1007,6 +1009,8 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(needs_init_pruning);
+ WRITE_BOOL_FIELD(needs_exec_pruning);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -2747,6 +2751,31 @@ _outExtensibleNode(StringInfo str, const ExtensibleNode *node)
methods->nodeOut(str, node);
}
+/*****************************************************************************
+ *
+ * Stuff from execnodes.h
+ *
+ *****************************************************************************/
+
+static void
+_outExecLockRelsInfo(StringInfo str, const ExecLockRelsInfo *node)
+{
+ WRITE_NODE_TYPE("EXECLOCKRELSINFO");
+
+ WRITE_BITMAPSET_FIELD(lockrels);
+ WRITE_INT_FIELD(numPlanNodes);
+ WRITE_NODE_FIELD(initPruningOutputs);
+ WRITE_INT_ARRAY(ipoIndexes, node->numPlanNodes);
+}
+
+static void
+_outPlanInitPruningOutput(StringInfo str, const PlanInitPruningOutput *node)
+{
+ WRITE_NODE_TYPE("PLANINITPRUNINGOUTPUT");
+
+ WRITE_BITMAPSET_FIELD(initially_valid_subplans);
+}
+
/*****************************************************************************
*
* Stuff from parsenodes.h.
@@ -4600,6 +4629,16 @@ outNode(StringInfo str, const void *obj)
_outJsonConstructorExpr(str, obj);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_ExecLockRelsInfo:
+ _outExecLockRelsInfo(str, obj);
+ break;
+ case T_PlanInitPruningOutput:
+ _outPlanInitPruningOutput(str, obj);
+ break;
+
default:
/*
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 7536f216bd..41fc710999 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1650,8 +1650,10 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(transientPlan);
READ_BOOL_FIELD(dependsOnRole);
READ_BOOL_FIELD(parallelModeNeeded);
+ READ_BOOL_FIELD(containsInitialPruning);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
+ READ_INT_FIELD(numPlanNodes);
READ_NODE_FIELD(rtable);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
@@ -2602,6 +2604,8 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(needs_init_pruning);
+ READ_BOOL_FIELD(needs_exec_pruning);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
@@ -2771,6 +2775,35 @@ _readPartitionRangeDatum(void)
READ_DONE();
}
+/*
+ * _readExecLockRelsInfo
+ */
+static ExecLockRelsInfo *
+_readExecLockRelsInfo(void)
+{
+ READ_LOCALS(ExecLockRelsInfo);
+
+ READ_BITMAPSET_FIELD(lockrels);
+ READ_INT_FIELD(numPlanNodes);
+ READ_NODE_FIELD(initPruningOutputs);
+ READ_INT_ARRAY(ipoIndexes, local_node->numPlanNodes);
+
+ READ_DONE();
+}
+
+/*
+ * _readPlanInitPruningOutput
+ */
+static PlanInitPruningOutput *
+_readPlanInitPruningOutput(void)
+{
+ READ_LOCALS(PlanInitPruningOutput);
+
+ READ_BITMAPSET_FIELD(initially_valid_subplans);
+
+ READ_DONE();
+}
+
/*
* parseNodeString
*
@@ -3050,6 +3083,10 @@ parseNodeString(void)
return_value = _readJsonValueExpr();
else if (MATCH("JSONCTOREXPR", 12))
return_value = _readJsonConstructorExpr();
+ else if (MATCH("EXECLOCKRELSINFO", 16))
+ return_value = _readExecLockRelsInfo();
+ else if (MATCH("PLANINITPRUNINGOUTPUT", 21))
+ return_value = _readPlanInitPruningOutput();
else
{
elog(ERROR, "badly formatted node string \"%.32s\"...", token);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 374a9d9753..329fb9d6e7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -517,7 +517,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->transientPlan = glob->transientPlan;
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->planTree = top_plan;
+ result->numPlanNodes = glob->lastPlanNodeId;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index dbdeb8ec9d..ac795ae9d9 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1561,6 +1561,9 @@ set_append_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (aplan->part_prune_info->needs_init_pruning)
+ root->glob->containsInitialPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
@@ -1648,6 +1651,9 @@ set_mergeappend_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (mplan->part_prune_info->needs_init_pruning)
+ root->glob->containsInitialPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 7080cb25d9..3322dc79f2 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -230,6 +232,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -309,12 +313,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -323,6 +331,10 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+ if (!needs_init_pruning)
+ needs_init_pruning = partrel_needs_init_pruning;
+ if (!needs_exec_pruning)
+ needs_exec_pruning = partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -337,6 +349,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -435,13 +449,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -452,6 +471,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -539,6 +562,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * by noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -613,6 +639,11 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ if (!*needs_init_pruning)
+ *needs_init_pruning = (initial_pruning_steps != NIL);
+ if (!*needs_exec_pruning)
+ *needs_exec_pruning = (exec_pruning_steps != NIL);
+
pinfolist = lappend(pinfolist, pinfo);
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index ba2fcfeb4a..085eb3f209 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -945,15 +945,17 @@ pg_plan_query(Query *querytree, const char *query_string, int cursorOptions,
* For normal optimizable statements, invoke the planner. For utility
* statements, just make a wrapper PlannedStmt node.
*
- * The result is a list of PlannedStmt nodes.
+ * The result is a list of PlannedStmt nodes. Also, a NULL is appended to
+ * *execlockrelsinfo_list for each PlannedStmt added to the returned list.
*/
List *
pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
- ParamListInfo boundParams)
+ ParamListInfo boundParams, List **execlockrelsinfo_list)
{
List *stmt_list = NIL;
ListCell *query_list;
+ *execlockrelsinfo_list = NIL;
foreach(query_list, querytrees)
{
Query *query = lfirst_node(Query, query_list);
@@ -977,6 +979,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
}
stmt_list = lappend(stmt_list, stmt);
+ *execlockrelsinfo_list = lappend(*execlockrelsinfo_list, NULL);
}
return stmt_list;
@@ -1080,7 +1083,8 @@ exec_simple_query(const char *query_string)
QueryCompletion qc;
MemoryContext per_parsetree_context = NULL;
List *querytree_list,
- *plantree_list;
+ *plantree_list,
+ *plantree_execlockrelsinfo_list;
Portal portal;
DestReceiver *receiver;
int16 format;
@@ -1167,7 +1171,8 @@ exec_simple_query(const char *query_string)
NULL, 0, NULL);
plantree_list = pg_plan_queries(querytree_list, query_string,
- CURSOR_OPT_PARALLEL_OK, NULL);
+ CURSOR_OPT_PARALLEL_OK, NULL,
+ &plantree_execlockrelsinfo_list);
/*
* Done with the snapshot used for parsing/planning.
@@ -1203,6 +1208,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ plantree_execlockrelsinfo_list,
NULL);
/*
@@ -1991,6 +1997,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ cplan->execlockrelsinfo_list,
cplan);
/* Done with the snapshot used for parameter I/O and parsing/planning */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5f907831a3..972ddc014e 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->execlockrelsinfo = execlockrelsinfo; /* ExecutorGetLockRels() output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +124,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * execlockrelsinfo: ExecutorGetLockRels() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +137,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +149,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, execlockrelsinfo, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -490,6 +494,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ linitial_node(ExecLockRelsInfo, portal->execlockrelsinfos),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1190,7 +1195,8 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *stmtlist_item,
+ *execlockrelsinfolist_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1211,9 +1217,12 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ forboth(stmtlist_item, portal->stmts,
+ execlockrelsinfolist_item, portal->execlockrelsinfos)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo,
+ execlockrelsinfolist_item);
/*
* If we got a cancel signal in prior command, quit
@@ -1271,7 +1280,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, execlockrelsinfo,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1280,7 +1289,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, execlockrelsinfo,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 4cf6db504f..9f5a40a0a6 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,16 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
+static void CachedPlanSaveExecLockRelsInfos(CachedPlan *plan, List *execlockrelsinfo_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static List *AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams);
+static void ReleaseExecutorLocks(List *stmt_list, List *execlockrelsinfo_list);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,9 +792,21 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * If the CachedPlan is valid, this may in some cases call ExecutorGetLockRels
+ * on each PlannedStmt contained in it to determine the set of relations to be
+ * locked by AcquireExecutorLocks(), instead of just scanning its range table,
+ * which is done to prune away any nodes in the tree that need not be executed
+ * based on the result of initial partition pruning. Resulting
+ * ExecLockRelsInfo nodes containing the result of such pruning, allocated in
+ * a child context of the context containing the plan itself, are added into
+ * plan->execlockrelsinfo_list. The previous contents of the list from the
+ * last invocation on the same CachedPlan are deleted, because they would no
+ * longer be valid given the fresh set of parameter values which may be used
+ * as pruning parameters.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
CachedPlan *plan = plansource->gplan;
@@ -820,13 +834,25 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *execlockrelsinfo_list;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. If ExecutorGetLockRels() asked
+ * to omit some relations because the plan nodes that scan them were
+ * found to be pruned, the executor will be informed of the omission of
+ * the plan nodes themselves, so that it doesn't accidentally try to
+ * execute those nodes, via the ExecLockRelsInfo nodes collected in the
+ * returned list that is also passed to it along with the list of
+ * PlannedStmts.
+ */
+ execlockrelsinfo_list = AcquireExecutorLocks(plan->stmt_list,
+ boundParams);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -844,11 +870,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (plan->is_valid)
{
/* Successfully revalidated and locked the query. */
+
+ /* Remember ExecLockRelsInfos in the CachedPlan. */
+ CachedPlanSaveExecLockRelsInfos(plan, execlockrelsinfo_list);
return true;
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, execlockrelsinfo_list);
}
/*
@@ -880,7 +909,8 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv)
{
CachedPlan *plan;
- List *plist;
+ List *plist,
+ *execlockrelsinfo_list;
bool snapshot_set;
bool is_transient;
MemoryContext plan_context;
@@ -933,7 +963,8 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
* Generate the plan.
*/
plist = pg_plan_queries(qlist, plansource->query_string,
- plansource->cursor_options, boundParams);
+ plansource->cursor_options, boundParams,
+ &execlockrelsinfo_list);
/* Release snapshot if we got one */
if (snapshot_set)
@@ -1002,6 +1033,16 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->is_saved = false;
plan->is_valid = true;
+ /*
+ * Save the dummy ExecLockRelsInfo list, that is a list containing NULLs
+ * as elements. We must do this, becasue users of the CachedPlan expect
+ * one to go with the list of PlannedStmts.
+ * XXX maybe get rid of that contract.
+ */
+ plan->execlockrelsinfo_context = NULL;
+ CachedPlanSaveExecLockRelsInfos(plan, execlockrelsinfo_list);
+ Assert(MemoryContextIsValid(plan->execlockrelsinfo_context));
+
/* assign generation number to new plan */
plan->generation = ++(plansource->generation);
@@ -1160,7 +1201,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1586,6 +1627,49 @@ CopyCachedPlan(CachedPlanSource *plansource)
return newsource;
}
+/*
+ * CachedPlanSaveExecLockRelsInfos
+ * Save the list containing ExecLockRelsInfo nodes into the given
+ * CachedPlan
+ *
+ * The provided list is copied into a dedicated context that is a child of
+ * plan->context. If the child context already exists, it is emptied, because
+ * any ExecLockRelsInfo contained therein would no longer be useful.
+ */
+static void
+CachedPlanSaveExecLockRelsInfos(CachedPlan *plan, List *execlockrelsinfo_list)
+{
+ MemoryContext execlockrelsinfo_context = plan->execlockrelsinfo_context,
+ oldcontext = CurrentMemoryContext;
+ List *execlockrelsinfo_list_copy;
+
+ /*
+ * Set up the dedicated context if not already done, saving it as a child
+ * of the CachedPlan's context.
+ */
+ if (execlockrelsinfo_context == NULL)
+ {
+ execlockrelsinfo_context = AllocSetContextCreate(CurrentMemoryContext,
+ "CachedPlan execlockrelsinfo list",
+ ALLOCSET_START_SMALL_SIZES);
+ MemoryContextSetParent(execlockrelsinfo_context, plan->context);
+ MemoryContextSetIdentifier(execlockrelsinfo_context, plan->context->ident);
+ plan->execlockrelsinfo_context = execlockrelsinfo_context;
+ }
+ else
+ {
+ /* Just clear existing contents by resetting the context. */
+ Assert(MemoryContextIsValid(execlockrelsinfo_context));
+ MemoryContextReset(execlockrelsinfo_context);
+ }
+
+ MemoryContextSwitchTo(execlockrelsinfo_context);
+ execlockrelsinfo_list_copy = copyObject(execlockrelsinfo_list);
+ MemoryContextSwitchTo(oldcontext);
+
+ plan->execlockrelsinfo_list = execlockrelsinfo_list_copy;
+}
+
/*
* CachedPlanIsValid: test whether the rewritten querytree within a
* CachedPlanSource is currently valid (that is, not marked as being in need
@@ -1737,17 +1821,21 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * Returns a list of ExecLockRelsInfo nodes containing one element for each
+ * PlannedStmt in stmt_list or NULL if the latter is utility statement or its
+ * containsInitialPruning is false.
*/
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+static List *
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams)
{
ListCell *lc1;
+ List *execlockrelsinfo_list = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ ExecLockRelsInfo *execlockrelsinfo = NULL;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,27 +1849,139 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
- continue;
+ ScanQueryForLocks(query, true);
}
-
- foreach(lc2, plannedstmt->rtable)
+ else
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (!plannedstmt->containsInitialPruning)
+ {
+ /*
+ * If the plan contains no initial pruning steps, just lock
+ * all the relations found in the range table.
+ */
+ ListCell *lc;
- if (rte->rtekind != RTE_RELATION)
- continue;
+ foreach(lc, plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst(lc);
+
+ if (rte->rtekind != RTE_RELATION)
+ continue;
+
+ /*
+ * Acquire the appropriate type of lock on each relation
+ * OID. Note that we don't actually try to open the rel,
+ * and hence will not fail if it's been dropped entirely
+ * --- we'll just transiently acquire a non-conflicting
+ * lock.
+ */
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+ else
+ {
+ int rti;
+ Bitmapset *lockrels;
+
+ /*
+ * Walk the plan tree to find only the minimal set of
+ * relations to be locked, considering the effect of performing
+ * initial partition pruning.
+ */
+ execlockrelsinfo = ExecutorGetLockRels(plannedstmt, boundParams);
+ lockrels = execlockrelsinfo->lockrels;
+
+ rti = -1;
+ while ((rti = bms_next_member(lockrels, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment above. */
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+ }
+
+ /*
+ * Remember ExecLockRelsInfo for later adding to the QueryDesc that
+ * will be passed to the executor when executing this plan. May be
+ * NULL, but must keep the list the same length as stmt_list.
+ */
+ execlockrelsinfo_list = lappend(execlockrelsinfo_list,
+ execlockrelsinfo);
+ }
+
+ return execlockrelsinfo_list;
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *execlockrelsinfo_list)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, execlockrelsinfo_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc2);
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
/*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ }
+ else
+ {
+ if (execlockrelsinfo == NULL)
+ {
+ ListCell *lc;
+
+ foreach(lc, plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst(lc);
+
+ if (rte->rtekind != RTE_RELATION)
+ continue;
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ {
+ int rti;
+ Bitmapset *lockrels;
+
+ lockrels = execlockrelsinfo->lockrels;
+ rti = -1;
+ while ((rti = bms_next_member(lockrels, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index d549f66d4a..896f51be08 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -285,6 +285,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *execlockrelsinfos,
CachedPlan *cplan)
{
AssertArg(PortalIsValid(portal));
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->qc.nprocessed = 0;
portal->commandTag = commandTag;
portal->stmts = stmts;
+ portal->execlockrelsinfos = execlockrelsinfos;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 666977fb1f..fef75ba147 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, ExecLockRelsInfo *execlockrelsinfo,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index fd5735a946..ded19b8cbb 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -124,4 +124,6 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
PartitionPruneInfo *pruneinfo,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
+extern Bitmapset *ExecGetLockRelsDoInitialPruning(Plan *plan, ExecGetLockRelsContext *context,
+ PartitionPruneInfo *pruneinfo);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..4338463479 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ ExecLockRelsInfo *execlockrelsinfo; /* ExecutorGetLockRels()'s output given plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +58,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 82925b4b63..5cf414cc11 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern ExecLockRelsInfo *ExecutorGetLockRels(PlannedStmt *plannedstmt, ParamListInfo params);
+extern bool ExecGetLockRels(Plan *node, ExecGetLockRelsContext *context);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 4cb78ee5b6..b53535c2a4 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -17,6 +17,7 @@
#include "access/parallel.h"
#include "nodes/execnodes.h"
+extern bool ExecGetAppendLockRels(Append *node, ExecGetLockRelsContext *context);
extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
extern void ExecEndAppend(AppendState *node);
extern void ExecReScanAppend(AppendState *node);
diff --git a/src/include/executor/nodeMergeAppend.h b/src/include/executor/nodeMergeAppend.h
index 97fe3b0665..8eb4e9df93 100644
--- a/src/include/executor/nodeMergeAppend.h
+++ b/src/include/executor/nodeMergeAppend.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern bool ExecGetMergeAppendLockRels(MergeAppend *node, ExecGetLockRelsContext *context);
extern MergeAppendState *ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags);
extern void ExecEndMergeAppend(MergeAppendState *node);
extern void ExecReScanMergeAppend(MergeAppendState *node);
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 1d225bc88d..5006499088 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -19,6 +19,7 @@ extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
EState *estate, TupleTableSlot *slot,
CmdType cmdtype);
+extern bool ExecGetModifyTableLockRels(ModifyTable *plan, ExecGetLockRelsContext *context);
extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
extern void ExecEndModifyTable(ModifyTableState *node);
extern void ExecReScanModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 44dd73fc80..1253fdb0ed 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -576,6 +576,7 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct ExecLockRelsInfo *es_execlockrelsinfo; /* QueryDesc.execlockrelsinfo */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -964,6 +965,101 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * ExecLockRelsInfo
+ *
+ * Result of performing ExecutorGetLockRels() for a given PlannedStmt
+ */
+typedef struct ExecLockRelsInfo
+{
+ NodeTag type;
+
+ /*
+ * Relations that must be locked to execute the plan tree contained in
+ * the PlannedStmt.
+ */
+ Bitmapset *lockrels;
+
+ /* PlannedStmt.numPlanNodes */
+ int numPlanNodes;
+
+ /*
+ * List of PlanInitPruningOutput, each representing the output of
+ * performing initial pruning on a given plan node, for all nodes in the
+ * plan tree that have been marked as needing initial pruning.
+ *
+ * 'ipoIndexes' is an array of 'numPlanNodes' elements, indexed with
+ * plan_node_id of the individual nodes in the plan tree, each a 1-based
+ * index into 'initPruningOutputs' list for a given plan node. 0 means
+ * that a given plan node has no entry in the list because of not needing
+ * any initial pruning done on it.
+ */
+ List *initPruningOutputs;
+ int *ipoIndexes;
+} ExecLockRelsInfo;
+
+/*----------------
+ * ExecGetLockRelsContext
+ *
+ * Information pertaining to ExecutorGetLockRels() invocation for a given
+ * plan.
+ */
+typedef struct ExecGetLockRelsContext
+{
+ NodeTag type;
+
+ PlannedStmt *stmt; /* target plan */
+ ParamListInfo params; /* EXTERN parameters available for pruning */
+
+ /* Output parameters for ExecGetLockRels and its subroutines. */
+ Bitmapset *lockrels;
+
+ /* See the omment in the definition of ExecLockRelsInfo struct. */
+ List *initPruningOutputs;
+ int *ipoIndexes;
+} ExecGetLockRelsContext;
+
+/*
+ * Appends the provided PlanInitPruningOutput to
+ * ExecGetLockRelsContext.initPruningOutput
+ */
+#define ExecStorePlanInitPruningOutput(cxt, initPruningOutput, plannode) \
+ do { \
+ (cxt)->initPruningOutputs = lappend((cxt)->initPruningOutputs, initPruningOutput); \
+ (cxt)->ipoIndexes[(plannode)->plan_node_id] = list_length((cxt)->initPruningOutputs); \
+ } while (0)
+
+/*
+ * Finds the PlanInitPruningOutput for a given Plan node in
+ * ExecLockRelsInfo.initPruningOutputs.
+ */
+#define ExecFetchPlanInitPruningOutput(execlockrelsinfo, plannode) \
+ (((execlockrelsinfo) != NULL && (execlockrelsinfo)->initPruningOutputs != NIL) ? \
+ list_nth((execlockrelsinfo)->initPruningOutputs, \
+ (execlockrelsinfo)->ipoIndexes[(plannode)->plan_node_id] - 1) : NULL)
+
+/* ---------------
+ * PlanInitPruningOutput
+ *
+ * Node to remember the result of performing initial partition pruning steps
+ * during ExecutorGetLockRels() on nodes that support pruning.
+ *
+ * ExecLockRelsDoInitPruning(), which runs during ExecutorGetLockRels(),
+ * creates it and stores it in the corresponding ExecLockRelsInfo.
+ *
+ * ExecInitPartitionPruning(), which runs during ExecuorStart(), fetches it
+ * from the EState's ExecLockRelsInfo (if any) and uses the value of
+ * initially_valid_subplans contained in it as-is to select the subplans to be
+ * initialized for execution, instead of re-evaluating that by performing
+ * initial pruning again.
+ */
+typedef struct PlanInitPruningOutput
+{
+ NodeTag type;
+
+ Bitmapset *initially_valid_subplans;
+} PlanInitPruningOutput;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 05f0b79e82..00c4d8293e 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -96,6 +96,11 @@ typedef enum NodeTag
T_PartitionPruneStepCombine,
T_PlanInvalItem,
+ /* TAGS FOR EXECUTOR PREP NODES (execnodes.h) */
+ T_ExecGetLockRelsContext,
+ T_ExecLockRelsInfo,
+ T_PlanInitPruningOutput,
+
/*
* TAGS FOR PLAN STATE NODES (execnodes.h)
*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 5327d9ba8b..019719c1a4 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -129,6 +129,10 @@ typedef struct PlannerGlobal
char maxParallelHazard; /* worst PROPARALLEL hazard level */
+ bool containsInitialPruning; /* Do some Plan nodes in the tree
+ * have initial (pre-exec) pruning
+ * steps? */
+
PartitionDirectory partition_directory; /* partition descriptors */
Bitmapset *elidedAppendPartedRels; /* Combined partitioned_rels of all
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index bd87c35d6c..bfdb5bbf28 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -59,10 +59,16 @@ typedef struct PlannedStmt
bool parallelModeNeeded; /* parallel mode required to execute? */
+ bool containsInitialPruning; /* Do some Plan nodes in the tree
+ * have initial (pre-exec) pruning
+ * steps? */
+
int jitFlags; /* which forms of JIT should be performed */
struct Plan *planTree; /* tree of Plan nodes */
+ int numPlanNodes; /* number of nodes in planTree */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
@@ -1189,6 +1195,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1197,6 +1210,8 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 92291a750d..bf80c53bed 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -64,7 +64,7 @@ extern PlannedStmt *pg_plan_query(Query *querytree, const char *query_string,
ParamListInfo boundParams);
extern List *pg_plan_queries(List *querytrees, const char *query_string,
int cursorOptions,
- ParamListInfo boundParams);
+ ParamListInfo boundParams, List **execlockrelsinfo_list);
extern bool check_max_stack_depth(int *newval, void **extra, GucSource source);
extern void assign_max_stack_depth(int newval, void *extra);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 95b99e3d25..56b0dcc6bd 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -148,6 +148,9 @@ typedef struct CachedPlan
{
int magic; /* should equal CACHEDPLAN_MAGIC */
List *stmt_list; /* list of PlannedStmts */
+ List *execlockrelsinfo_list; /* list of ExecutorGetLockRelsResult with one
+ * element for each of stmt_list; NIL
+ * if not a generic plan */
bool is_oneshot; /* is it a "oneshot" plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
@@ -158,6 +161,9 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+ MemoryContext execlockrelsinfo_context; /* context containing
+ * execlockrelsinfo_list,
+ * a child of the above context */
} CachedPlan;
/*
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9abace6734 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,10 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *execlockrelsinfos; /* list of ExecutorGetLockRelsResults with one element
+ * for each of 'stmts'; same as
+ * cplan->execlockrelsinfo_list if cplan is
+ * not NULL */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -241,6 +245,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *execlockrelsinfos,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.24.1
On Mon, Mar 28, 2022 at 4:28 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Mon, Mar 28, 2022 at 4:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
Other than the changes mentioned above, the updated patch now contains
a bit more commentary than earlier versions, mostly around
AcquireExecutorLocks()'s new way of determining the set of relations
to lock and the significantly redesigned working of the "initial"
execution pruning.Forgot to rebase over the latest HEAD, so here's v7. Also fixed that
_out and _read functions for PlanInitPruningOutput were using an
obsolete node label.
Rebased.
--
Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v8-0004-Optimize-AcquireExecutorLocks-to-skip-pruned-part.patchapplication/octet-stream; name=v8-0004-Optimize-AcquireExecutorLocks-to-skip-pruned-part.patchDownload
From 9e0ae8887a9f3d75feb4df969dde504a21d3700d Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v8 4/4] Optimize AcquireExecutorLocks() to skip pruned
partitions
Instead of locking all relations listed in the range table in the
cases where the PlannedStmt indicates that some nodes in the plan
tree can do partition pruning without depending on execution having
started (so called "initial" pruning), AcquireExecutorLocks() now
calls the new executor function ExecutorGetLockRels() which returns
a set of relations (their RT indexes) to be locked not including
those scanned by the subplans that pruned.
The result of pruning done this way must be remembered and reused
during actual execution of the plan, which is done by creating a
PlanInitPruningOutput nodes for for each plan node that undergoes
pruning and a set of those for the whole plan tree are added to
ExecLockRelsInfo which also stores the bitmapset of RT indexes of
relations that are actually locked by AcquireExecutorLocks().
ExecLockRelsInfos are passed down the executor alongside the
PlannedStmts. This arrangement ensures that the executor doesn't
accidentally try to process a plan tree subnodes that has been
deemed pruned by AcquireExecutorLocks().
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 13 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 17 +-
src/backend/executor/README | 24 +++
src/backend/executor/execMain.c | 202 ++++++++++++++++++++
src/backend/executor/execParallel.c | 26 ++-
src/backend/executor/execPartition.c | 224 ++++++++++++++++++----
src/backend/executor/execUtils.c | 8 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 52 ++++-
src/backend/executor/nodeMergeAppend.c | 52 ++++-
src/backend/executor/nodeModifyTable.c | 25 +++
src/backend/executor/spi.c | 14 +-
src/backend/nodes/copyfuncs.c | 49 ++++-
src/backend/nodes/outfuncs.c | 39 ++++
src/backend/nodes/readfuncs.c | 37 ++++
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 6 +
src/backend/partitioning/partprune.c | 37 +++-
src/backend/tcop/postgres.c | 15 +-
src/backend/tcop/pquery.c | 21 ++-
src/backend/utils/cache/plancache.c | 252 ++++++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execPartition.h | 2 +
src/include/executor/execdesc.h | 2 +
src/include/executor/executor.h | 2 +
src/include/executor/nodeAppend.h | 1 +
src/include/executor/nodeMergeAppend.h | 1 +
src/include/executor/nodeModifyTable.h | 1 +
src/include/nodes/execnodes.h | 96 ++++++++++
src/include/nodes/nodes.h | 5 +
src/include/nodes/pathnodes.h | 4 +
src/include/nodes/plannodes.h | 15 ++
src/include/tcop/tcopprot.h | 2 +-
src/include/utils/plancache.h | 6 +
src/include/utils/portal.h | 5 +
41 files changed, 1174 insertions(+), 104 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 55c38b04c4..d403eb2309 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -542,7 +542,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index cb13227db1..e5dff2bc25 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, ExecLockRelsInfo *execlockrelsinfo,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, execlockrelsinfo, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 1013790dbb..008b8ce0e9 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -741,8 +741,10 @@ execute_sql_string(const char *sql)
RawStmt *parsetree = lfirst_node(RawStmt, lc1);
MemoryContext per_parsetree_context,
oldcontext;
- List *stmt_list;
- ListCell *lc2;
+ List *stmt_list,
+ *execlockrelsinfo_list;
+ ListCell *lc2,
+ *lc3;
/*
* We do the work for each parsetree in a short-lived context, to
@@ -762,11 +764,13 @@ execute_sql_string(const char *sql)
NULL,
0,
NULL);
- stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL);
+ stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL,
+ &execlockrelsinfo_list);
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, execlockrelsinfo_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc3);
CommandCounterIncrement();
@@ -777,6 +781,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ execlockrelsinfo,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 05e7b60059..4ef44aaf23 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -416,7 +416,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 9902c5c566..85e73ddded 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -107,6 +107,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ list_make1(NULL), /* no ExecLockRelsInfo to pass */
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 80738547ed..bbbf8bbcbd 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *plan_execlockrelsinfo_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -195,6 +196,7 @@ ExecuteQuery(ParseState *pstate,
/* Replan if needed, and increment plan refcount for portal */
cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
plan_list = cplan->stmt_list;
+ plan_execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/*
* DO NOT add any logic that could possibly throw an error between
@@ -204,7 +206,7 @@ ExecuteQuery(ParseState *pstate,
NULL,
query_string,
entry->plansource->commandTag,
- plan_list,
+ plan_list, plan_execlockrelsinfo_list,
cplan);
/*
@@ -576,7 +578,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *plan_execlockrelsinfo_list;
+ ListCell *p,
+ *pe;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -632,15 +636,18 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ plan_execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pe, plan_execlockrelsinfo_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, pe);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, execlockrelsinfo, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..b45ca508a8 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,27 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+proceeds by locking all the relations that will be scanned by that plan. If
+the generic plan has nodes that contain so-called initial pruning steps (a
+subset of execution pruning steps that do not depend on full-fledged execution
+having started), they are performed at this point to figure out the minimal
+set of child subplans that satisfy those pruning instructions and the result
+of performing that pruning is saved in a data structure that gets passed to
+the executor alongside the plan tree. Relations scanned by only those
+surviving subplans are then locked while those scanned by the pruned subplans
+are not, even though the pruned subplans themselves are not removed from the
+plan tree. So, it is imperative that the executor and any third party code
+invoked by it that gets passed the plan tree look at the initial pruning result
+made available via the aforementioned data structure to determine whether or
+not a particular subplan is valid. (The data structure basically consists of
+an array of PlanInitPruningOutput nodes containing one element for each node
+of the plan tree indexable using plan_node_id of the individual plan nodes,
+where each node contains a bitmapset of indexes of unpruned child subplans of
+a given node.)
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +307,9 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorGetLockRels ] --- an optional step to walk over the plan tree
+ to produce an ExecLockRelsInfo to be passed to CreateQueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ef2fd46092..56946c12dd 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,11 +49,15 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/nodeAppend.h"
+#include "executor/nodeMergeAppend.h"
+#include "executor/nodeModifyTable.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
#include "parser/parsetree.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
@@ -101,9 +105,205 @@ static char *ExecBuildSlotValueDescription(Oid reloid,
Bitmapset *modifiedCols,
int maxfieldlen);
static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
+static bool ExecGetScanLockRels(Scan *scan, ExecGetLockRelsContext *context);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorGetLockRels
+ *
+ * Figure out the minimal set of relations to lock to be able to safely
+ * execute a given plan
+ *
+ * This ignores the relations scanned by child subplans that are pruned away
+ * after performing initial pruning steps present in the plan using the
+ * provided set of EXTERN parameters.
+ *
+ * Along with the set of RT indexes of relations that must be locked, the
+ * returned struct also contains an array of PlanInitPruningOutput nodes each
+ * of which contains the result of initial pruning for a given plan node, which
+ * is basically a bitmapset of the indexes of surviving child subplans. Each
+ * plan node in the tree that undergoes pruning will have an element in the
+ * array.
+ *
+ * Note that while relations scanned by the subplans that are pruned will not
+ * be locked, the subplans themselves are left as-is in the plan tree, assuming
+ * anything that reads the plan tree during execution knows to ignore them by
+ * looking at the PlanInitPruningOutput's list of valid subplans.
+ *
+ * Partitioned tables mentioned in PartitionedRelPruneInfo nodes that drive
+ * the pruning will be locked before doing the pruning and also added to the
+ * the returned set.
+ */
+ExecLockRelsInfo *
+ExecutorGetLockRels(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ int numPlanNodes = plannedstmt->numPlanNodes;
+ ExecGetLockRelsContext context;
+ ExecLockRelsInfo *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ context.stmt = plannedstmt;
+ context.params = params;
+
+ /*
+ * Go walk all the plan tree(s) present in the PlannedStmt, filling
+ * context.lockrels with only the relations from plan nodes that
+ * survive initial pruning and also the tables mentioned in
+ * partitioned_rels sets found in the plan.
+ */
+ context.lockrels = NULL;
+ context.initPruningOutputs = NIL;
+ context.ipoIndexes = palloc0(sizeof(int) * numPlanNodes);
+
+ /* All the subplans. */
+ foreach(lc, plannedstmt->subplans)
+ {
+ Plan *subplan = lfirst(lc);
+
+ (void) ExecGetLockRels(subplan, &context);
+ }
+
+ /* And the main tree. */
+ (void) ExecGetLockRels(plannedstmt->planTree, &context);
+
+ /*
+ * Also be sure to lock partitioned relations from any [Merge]Append nodes
+ * that were originally present but were ultimately left out from the plan
+ * due to being deemed no-op nodes.
+ */
+ context.lockrels = bms_add_members(context.lockrels,
+ plannedstmt->elidedAppendPartedRels);
+
+ result = makeNode(ExecLockRelsInfo);
+ result->lockrels = context.lockrels;
+ result->numPlanNodes = numPlanNodes;
+ result->initPruningOutputs = context.initPruningOutputs;
+ result->ipoIndexes = context.ipoIndexes;
+
+ return result;
+}
+
+/* ------------------------------------------------------------------------
+ * ExecGetLockRels
+ * Adds all the relations that will be scanned by 'node' and its child
+ * plans to context->lockrels after taking into the account the effect
+ * of performing initial pruning if any
+ *
+ * context->stmt gives the PlannedStmt being inspected to access the plan's
+ * range table if needed and context->params the set of EXTERN parameters
+ * available to evaluate pruning parameters.
+ *
+ * If initial pruning is done, a PlanInitPruningOutput node containing the
+ * result of pruning will be stored in context->initPruningOutputs that will
+ * be made available to the executor to reuse.
+ * ------------------------------------------------------------------------
+ */
+bool
+ExecGetLockRels(Plan *node, ExecGetLockRelsContext *context)
+{
+ /* Do nothing when we get to the end of a leaf on tree. */
+ if (node == NULL)
+ return true;
+
+ /* Make sure there's enough stack available. */
+ check_stack_depth();
+
+ switch (nodeTag(node))
+ {
+ /* Currently, only these two nodes have prunable child subplans. */
+ case T_Append:
+ if (ExecGetAppendLockRels((Append *) node, context))
+ return true;
+ break;
+ case T_MergeAppend:
+ if (ExecGetMergeAppendLockRels((MergeAppend *) node,
+ context))
+ return true;
+ break;
+
+ /*
+ * And these manipulate relations that must be added context->lockrels.
+ */
+ case T_SeqScan:
+ case T_SampleScan:
+ case T_IndexScan:
+ case T_IndexOnlyScan:
+ case T_BitmapIndexScan:
+ case T_BitmapHeapScan:
+ case T_TidScan:
+ case T_TidRangeScan:
+ case T_ForeignScan:
+ case T_SubqueryScan:
+ case T_CustomScan:
+ if (ExecGetScanLockRels((Scan *) node, context))
+ return true;
+ break;
+ case T_ModifyTable:
+ if (ExecGetModifyTableLockRels((ModifyTable *) node, context))
+ return true;
+ /* plan_tree_walker() will visit the subplan (outerNode) */
+ break;
+
+ default:
+ break;
+ }
+
+ /* Recurse to subnodes. */
+ return plan_tree_walker(node, ExecGetLockRels, (void *) context);
+}
+
+/*
+ * ExecGetScanLockRels
+ * Do ExecGetLockRels()'s work for a leaf Scan node
+ */
+static bool
+ExecGetScanLockRels(Scan *scan, ExecGetLockRelsContext *context)
+{
+ switch (nodeTag(scan))
+ {
+ case T_ForeignScan:
+ {
+ ForeignScan *fscan = (ForeignScan *) scan;
+
+ context->lockrels = bms_add_members(context->lockrels,
+ fscan->fs_relids);
+ }
+ break;
+
+ case T_SubqueryScan:
+ {
+ SubqueryScan *sscan = (SubqueryScan *) scan;
+
+ (void) ExecGetLockRels((Plan *) sscan->subplan, context);
+ }
+ break;
+
+ case T_CustomScan:
+ {
+ CustomScan *cscan = (CustomScan *) scan;
+ ListCell *lc;
+
+ context->lockrels = bms_add_members(context->lockrels,
+ cscan->custom_relids);
+ foreach(lc, cscan->custom_plans)
+ {
+ (void) ExecGetLockRels((Plan *) lfirst(lc), context);
+ }
+ }
+ break;
+
+ default:
+ context->lockrels = bms_add_member(context->lockrels,
+ scan->scanrelid);
+ break;
+ }
+
+ return true;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +1006,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ ExecLockRelsInfo *execlockrelsinfo = queryDesc->execlockrelsinfo;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -825,6 +1026,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_execlockrelsinfo = execlockrelsinfo;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 9a0d5d59ef..fb6dbd298a 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_EXECLOCKRELSINFO UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
@@ -596,12 +598,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *execlockrelsinfo_data;
+ char *execlockrelsinfo_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int execlockrelsinfo_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -630,6 +635,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ execlockrelsinfo_data = nodeToString(estate->es_execlockrelsinfo);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -656,6 +662,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized ExecLockRelsInfo. */
+ execlockrelsinfo_len = strlen(execlockrelsinfo_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, execlockrelsinfo_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -750,6 +761,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized ExecLockRelsInfo */
+ execlockrelsinfo_space = shm_toc_allocate(pcxt->toc, execlockrelsinfo_len);
+ memcpy(execlockrelsinfo_space, execlockrelsinfo_data, execlockrelsinfo_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECLOCKRELSINFO,
+ execlockrelsinfo_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1231,8 +1248,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *execlockrelsinfospace;
char *paramspace;
PlannedStmt *pstmt;
+ ExecLockRelsInfo *execlockrelsinfo;
ParamListInfo paramLI;
char *queryString;
@@ -1243,12 +1262,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied ExecLockRelsInfo. */
+ execlockrelsinfospace = shm_toc_lookup(toc, PARALLEL_KEY_EXECLOCKRELSINFO,
+ false);
+ execlockrelsinfo = (ExecLockRelsInfo *) stringToNode(execlockrelsinfospace);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, execlockrelsinfo,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 84b4e4b3d6..e79ada16f0 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,8 +186,13 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
-static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate);
+ PartitionPruneInfo *partitionpruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
+static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
+ PartitionPruneInfo *pruneinfo);
static void ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1588,8 +1594,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or even before during ExecutorGetLockRels().
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1601,10 +1608,17 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* Creates the PartitionPruneState required by each of the two pruning
* functions. Details stored include how to map the partition index
* returned by the partition pruning code into subplan indexes. Also
- * determines the set of initially valid subplans by performing initial
- * pruning steps, only which need be initialized by the caller such as
- * ExecInitAppend. Maps in PartitionPruneState are updated to account
- * for initial pruning having eliminated some of the subplans, if any.
+ * determines the set of initially valid subplans by either looking that
+ * up in the plan node's PlanInitPruningOutput if one found in
+ * EState.es_execlockrelinfo or by performing initial pruning steps.
+ * Only the subplans included in that need be initialized by the caller
+ * such as ExecInitAppend. Maps in PartitionPruneState are updated to
+ * account for initial pruning having eliminated some of the subplans,
+ * if any.
+ *
+ * ExecGetLockRelsDoInitialPruning:
+ * Do initial pruning as part of ExecGetLockRels() on the parent plan
+ * node
*
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating all available
@@ -1619,9 +1633,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* ExecInitPartitionPruning
* Initialize data structure needed for run-time partition pruning
*
- * Initial pruning can be done immediately, so it is done here if needed and
- * the set of surviving partition subplans' indexes are added to the output
- * parameter *initially_valid_subplans.
+ * Initial pruning can be done immediately, so it is done here unless it has
+ * already been done by ExecGetLockRelsDoInitialPruning(), and the set of
+ * surviving partition subplans' indexes are added to the output parameter
+ * *initially_valid_subplans.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1635,22 +1650,57 @@ ExecInitPartitionPruning(PlanState *planstate,
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ Plan *plan = planstate->plan;
+ PlanInitPruningOutput *initPruningOutput = NULL;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /* Retrieve the parent plan's PlanInitPruningOutput, if any. */
+ if (estate->es_execlockrelsinfo)
+ {
+ initPruningOutput = (PlanInitPruningOutput *)
+ ExecFetchPlanInitPruningOutput(estate->es_execlockrelsinfo, plan);
- /*
- * Create the working data structure for pruning.
- */
- prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo);
+ Assert(initPruningOutput != NULL &&
+ IsA(initPruningOutput, PlanInitPruningOutput));
+ /* No need to do initial pruning again, only exec pruning. */
+ do_pruning = pruneinfo->needs_exec_pruning;
+ }
+
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PlanInitPruningOutput.
+ */
+ prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo,
+ initPruningOutput == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune, if required.
*/
- if (prunestate->do_initial_prune)
+ if (initPruningOutput)
+ {
+ /* ExecGetLockRelsDoInitialPruning() already did it for us! */
+ *initially_valid_subplans = initPruningOutput->initially_valid_subplans;
+ }
+ else if (prunestate && prunestate->do_initial_prune)
{
/* Determine which subplans survive initial pruning */
- *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate);
+ *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate,
+ pruneinfo);
}
else
{
@@ -1668,7 +1718,7 @@ ExecInitPartitionPruning(PlanState *planstate,
* invalid data in prunestate, because that data won't be consulted again
* (cf initial Assert in ExecFindMatchingSubPlans).
*/
- if (prunestate->do_exec_prune &&
+ if (prunestate && prunestate->do_exec_prune &&
bms_num_members(*initially_valid_subplans) < n_total_subplans)
PartitionPruneStateFixSubPlanMap(prunestate,
*initially_valid_subplans,
@@ -1677,12 +1727,75 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecGetLockRelsDoInitialPruning
+ * Perform initial pruning as part of doing ExecGetLockRels() on the parent
+ * plan node
+ */
+Bitmapset *
+ExecGetLockRelsDoInitialPruning(Plan *plan, ExecGetLockRelsContext *context,
+ PartitionPruneInfo *pruneinfo)
+{
+ List *rtable = context->stmt->rtable;
+ ParamListInfo params = context->params;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ PlanInitPruningOutput *initPruningOutput;
+
+ /*
+ * A temporary context to allocate stuff needded to run the pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so must create
+ * a standalone ExprContext to evaluate pruning expressions, equipped with
+ * the information about the EXTERN parameters that the caller passed us.
+ * Note that that's okay because the initial pruning steps do not contain
+ * anything that requires the execution to have started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = ExecCreatePartitionPruneState(NULL, pruneinfo,
+ true, false,
+ rtable, econtext,
+ pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the pruning and populate a PlanInitPruningOutput for this node. */
+ initPruningOutput = makeNode(PlanInitPruningOutput);
+ initPruningOutput->initially_valid_subplans =
+ ExecFindInitialMatchingSubPlans(prunestate, pruneinfo);
+ ExecStorePlanInitPruningOutput(context, initPruningOutput, plan);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return initPruningOutput->initially_valid_subplans;
+}
+
/*
* ExecCreatePartitionPruneState
* Build the data structure required for calling
* ExecFindInitialMatchingSubPlans and ExecFindMatchingSubPlans.
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'partitionpruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1697,19 +1810,20 @@ ExecInitPartitionPruning(PlanState *planstate,
*/
static PartitionPruneState *
ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo)
+ PartitionPruneInfo *partitionpruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(partitionpruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1760,19 +1874,48 @@ ExecCreatePartitionPruneState(PlanState *planstate,
PartitionedRelPruneInfo *pinfo = lfirst_node(PartitionedRelPruneInfo, lc2);
PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
Relation partrel;
+ bool close_partrel = false;
PartitionDesc partdesc;
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorGetLockRels() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ close_partrel = true;
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (close_partrel)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1874,7 +2017,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
ExecInitPruningContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1884,7 +2027,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
ExecInitPruningContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -1998,7 +2141,8 @@ ExecInitPruningContext(PartitionPruneContext *context,
* is required.
*/
static Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
+ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
+ PartitionPruneInfo *pruneinfo)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2008,8 +2152,8 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
Assert(prunestate->do_initial_prune);
/*
- * Switch to a temp context to avoid leaking memory in the executor's
- * query-lifespan memory context.
+ * Switch to a temp context to avoid leaking memory in the longer-term
+ * memory context.
*/
oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..7246f9175f 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,7 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_execlockrelsinfo = NULL;
estate->es_junkFilter = NULL;
@@ -785,6 +786,13 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rti > 0 && rti <= estate->es_range_table_size);
+ /*
+ * A cross-check that AcquireExecutorLocks() hasn't missed any relations
+ * it must not have.
+ */
+ Assert(estate->es_execlockrelsinfo == NULL ||
+ bms_is_member(rti, estate->es_execlockrelsinfo->lockrels));
+
rel = estate->es_relations[rti - 1];
if (rel == NULL)
{
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f9460ae506..a2182a6b1f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 5b6d3eb23b..9c6f907687 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -94,6 +94,55 @@ static bool ExecAppendAsyncRequest(AppendState *node, TupleTableSlot **result);
static void ExecAppendAsyncEventWait(AppendState *node);
static void classify_matching_subplans(AppendState *node);
+/* ----------------------------------------------------------------
+ * ExecGetAppendLockRels
+ * Do ExecGetLockRels()'s work for an Append plan
+ * ----------------------------------------------------------------
+ */
+bool
+ExecGetAppendLockRels(Append *node, ExecGetLockRelsContext *context)
+{
+ PartitionPruneInfo *pruneinfo = node->part_prune_info;
+
+ /*
+ * Must always lock all the partitioned tables whose direct and indirect
+ * partitions will be scanned by this Append.
+ */
+ context->lockrels = bms_add_members(context->lockrels,
+ node->partitioned_rels);
+
+ /*
+ * Now recurse to subplans to add relations scanned therein.
+ *
+ * If initial pruning can be done, do that now and only recurse to the
+ * surviving subplans.
+ */
+ if (pruneinfo && pruneinfo->needs_init_pruning)
+ {
+ List *subplans = node->appendplans;
+ Bitmapset *validsubplans;
+ int i;
+
+ validsubplans = ExecGetLockRelsDoInitialPruning((Plan *) node,
+ context, pruneinfo);
+
+ /* Recurse to surviving subplans. */
+ i = -1;
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ (void) ExecGetLockRels(subplan, context);
+ }
+
+ /* done with this node */
+ return true;
+ }
+
+ /* Tell the caller to recurse to *all* the subplans. */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitAppend
*
@@ -155,7 +204,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 9a9f29e845..4b04fcdbc2 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -54,6 +54,55 @@ typedef int32 SlotNumber;
static TupleTableSlot *ExecMergeAppend(PlanState *pstate);
static int heap_compare_slots(Datum a, Datum b, void *arg);
+/* ----------------------------------------------------------------
+ * ExecGetMergeAppendLockRels
+ * Do ExecGetLockRels()'s work for a MergeAppend plan
+ * ----------------------------------------------------------------
+ */
+bool
+ExecGetMergeAppendLockRels(MergeAppend *node, ExecGetLockRelsContext *context)
+{
+ PartitionPruneInfo *pruneinfo = node->part_prune_info;
+
+ /*
+ * Must always lock all the partitioned tables whose direct and indirect
+ * partitions will be scanned by this Append.
+ */
+ context->lockrels = bms_add_members(context->lockrels,
+ node->partitioned_rels);
+
+ /*
+ * Now recurse to subplans to add relations scanned therein.
+ *
+ * If initial pruning can be done, do that now and only recurse to the
+ * surviving subplans.
+ */
+ if (pruneinfo && pruneinfo->needs_init_pruning)
+ {
+ List *subplans = node->mergeplans;
+ Bitmapset *validsubplans;
+ int i;
+
+ validsubplans = ExecGetLockRelsDoInitialPruning((Plan *) node,
+ context, pruneinfo);
+
+ /* Recurse to surviving subplans. */
+ i = -1;
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ (void) ExecGetLockRels(subplan, context);
+ }
+
+ /* done with this node */
+ return true;
+ }
+
+ /* Tell the caller to recurse to *all* the subplans. */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitMergeAppend
@@ -103,7 +152,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 171575cd73..f17bede367 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3853,6 +3853,31 @@ ExecLookupResultRelByOid(ModifyTableState *node, Oid resultoid,
return NULL;
}
+/*
+ * ExecGetModifyTableLockRels
+ * Do ExecGetLockRels()'s work for a ModifyTable plan
+ */
+bool
+ExecGetModifyTableLockRels(ModifyTable *plan, ExecGetLockRelsContext *context)
+{
+ ListCell *lc;
+
+ /* First add the result relation RTIs mentioned in the node. */
+ if (plan->rootRelation > 0)
+ context->lockrels = bms_add_member(context->lockrels,
+ plan->rootRelation);
+ context->lockrels = bms_add_member(context->lockrels,
+ plan->nominalRelation);
+ foreach(lc, plan->resultRelations)
+ {
+ context->lockrels = bms_add_member(context->lockrels,
+ lfirst_int(lc));
+ }
+
+ /* Tell the caller to recurse to the subplan (outerPlan(plan)). */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitModifyTable
* ----------------------------------------------------------------
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 042a5f8b0a..64ebbfb31e 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *execlockrelsinfo_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1659,6 +1660,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
/* Replan if needed, and increment plan refcount for portal */
cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ execlockrelsinfo_list = cplan->execlockrelsinfo_list;
if (!plan->saved)
{
@@ -1670,6 +1672,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
oldcontext = MemoryContextSwitchTo(portal->portalContext);
stmt_list = copyObject(stmt_list);
+ execlockrelsinfo_list = copyObject(execlockrelsinfo_list);
MemoryContextSwitchTo(oldcontext);
ReleaseCachedPlan(cplan, NULL);
cplan = NULL; /* portal shouldn't depend on cplan */
@@ -1683,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ execlockrelsinfo_list,
cplan);
/*
@@ -2473,7 +2477,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *execlockrelsinfo_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2552,6 +2558,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2589,9 +2596,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, execlockrelsinfo_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2671,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, execlockrelsinfo,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 29c515d7db..afffabbea0 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -68,6 +68,13 @@
} \
} while (0)
+/* Copy a field that is an array with numElem ints */
+#define COPY_INT_ARRAY(fldname, numElem) \
+ do { \
+ newnode->fldname = (numElem) > 0 ? palloc((numElem) * sizeof(int)) : NULL; \
+ memcpy(newnode->fldname, from->fldname, sizeof(int) * (numElem)); \
+ } while (0)
+
/* Copy a parse location field (for Copy, this is same as scalar case) */
#define COPY_LOCATION_FIELD(fldname) \
(newnode->fldname = from->fldname)
@@ -94,8 +101,10 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(transientPlan);
COPY_SCALAR_FIELD(dependsOnRole);
COPY_SCALAR_FIELD(parallelModeNeeded);
+ COPY_SCALAR_FIELD(containsInitialPruning);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
+ COPY_SCALAR_FIELD(numPlanNodes);
COPY_NODE_FIELD(rtable);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
@@ -1282,6 +1291,8 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(needs_init_pruning);
+ COPY_SCALAR_FIELD(needs_exec_pruning);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
@@ -5373,6 +5384,33 @@ _copyExtensibleNode(const ExtensibleNode *from)
return newnode;
}
+/* ****************************************************************
+ * execnodes.h copy functions
+ * ****************************************************************
+ */
+static ExecLockRelsInfo *
+_copyExecLockRelsInfo(const ExecLockRelsInfo *from)
+{
+ ExecLockRelsInfo *newnode = makeNode(ExecLockRelsInfo);
+
+ COPY_BITMAPSET_FIELD(lockrels);
+ COPY_SCALAR_FIELD(numPlanNodes);
+ COPY_NODE_FIELD(initPruningOutputs);
+ COPY_INT_ARRAY(ipoIndexes, from->numPlanNodes);
+
+ return newnode;
+}
+
+static PlanInitPruningOutput *
+_copyPlanInitPruningOutput(const PlanInitPruningOutput *from)
+{
+ PlanInitPruningOutput *newnode = makeNode(PlanInitPruningOutput);
+
+ COPY_BITMAPSET_FIELD(initially_valid_subplans);
+
+ return newnode;
+}
+
/* ****************************************************************
* value.h copy functions
* ****************************************************************
@@ -5427,7 +5465,6 @@ _copyBitString(const BitString *from)
return newnode;
}
-
static ForeignKeyCacheInfo *
_copyForeignKeyCacheInfo(const ForeignKeyCacheInfo *from)
{
@@ -6454,6 +6491,16 @@ copyObjectImpl(const void *from)
retval = _copyPublicationTable(from);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_ExecLockRelsInfo:
+ retval = _copyExecLockRelsInfo(from);
+ break;
+ case T_PlanInitPruningOutput:
+ retval = _copyPlanInitPruningOutput(from);
+ break;
+
/*
* MISCELLANEOUS NODES
*/
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 108ede9af9..e2d7e6bcac 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -312,8 +312,10 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(transientPlan);
WRITE_BOOL_FIELD(dependsOnRole);
WRITE_BOOL_FIELD(parallelModeNeeded);
+ WRITE_BOOL_FIELD(containsInitialPruning);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
+ WRITE_INT_FIELD(numPlanNodes);
WRITE_NODE_FIELD(rtable);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
@@ -1008,6 +1010,8 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(needs_init_pruning);
+ WRITE_BOOL_FIELD(needs_exec_pruning);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -2818,6 +2822,31 @@ _outExtensibleNode(StringInfo str, const ExtensibleNode *node)
methods->nodeOut(str, node);
}
+/*****************************************************************************
+ *
+ * Stuff from execnodes.h
+ *
+ *****************************************************************************/
+
+static void
+_outExecLockRelsInfo(StringInfo str, const ExecLockRelsInfo *node)
+{
+ WRITE_NODE_TYPE("EXECLOCKRELSINFO");
+
+ WRITE_BITMAPSET_FIELD(lockrels);
+ WRITE_INT_FIELD(numPlanNodes);
+ WRITE_NODE_FIELD(initPruningOutputs);
+ WRITE_INT_ARRAY(ipoIndexes, node->numPlanNodes);
+}
+
+static void
+_outPlanInitPruningOutput(StringInfo str, const PlanInitPruningOutput *node)
+{
+ WRITE_NODE_TYPE("PLANINITPRUNINGOUTPUT");
+
+ WRITE_BITMAPSET_FIELD(initially_valid_subplans);
+}
+
/*****************************************************************************
*
* Stuff from parsenodes.h.
@@ -4720,6 +4749,16 @@ outNode(StringInfo str, const void *obj)
_outJsonItemCoercions(str, obj);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_ExecLockRelsInfo:
+ _outExecLockRelsInfo(str, obj);
+ break;
+ case T_PlanInitPruningOutput:
+ _outPlanInitPruningOutput(str, obj);
+ break;
+
default:
/*
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index ce146dd45e..88173f70a1 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1782,8 +1782,10 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(transientPlan);
READ_BOOL_FIELD(dependsOnRole);
READ_BOOL_FIELD(parallelModeNeeded);
+ READ_BOOL_FIELD(containsInitialPruning);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
+ READ_INT_FIELD(numPlanNodes);
READ_NODE_FIELD(rtable);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
@@ -2735,6 +2737,8 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(needs_init_pruning);
+ READ_BOOL_FIELD(needs_exec_pruning);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
@@ -2904,6 +2908,35 @@ _readPartitionRangeDatum(void)
READ_DONE();
}
+/*
+ * _readExecLockRelsInfo
+ */
+static ExecLockRelsInfo *
+_readExecLockRelsInfo(void)
+{
+ READ_LOCALS(ExecLockRelsInfo);
+
+ READ_BITMAPSET_FIELD(lockrels);
+ READ_INT_FIELD(numPlanNodes);
+ READ_NODE_FIELD(initPruningOutputs);
+ READ_INT_ARRAY(ipoIndexes, local_node->numPlanNodes);
+
+ READ_DONE();
+}
+
+/*
+ * _readPlanInitPruningOutput
+ */
+static PlanInitPruningOutput *
+_readPlanInitPruningOutput(void)
+{
+ READ_LOCALS(PlanInitPruningOutput);
+
+ READ_BITMAPSET_FIELD(initially_valid_subplans);
+
+ READ_DONE();
+}
+
/*
* parseNodeString
*
@@ -3197,6 +3230,10 @@ parseNodeString(void)
return_value = _readJsonCoercion();
else if (MATCH("JSONITEMCOERCIONS", 17))
return_value = _readJsonItemCoercions();
+ else if (MATCH("EXECLOCKRELSINFO", 16))
+ return_value = _readExecLockRelsInfo();
+ else if (MATCH("PLANINITPRUNINGOUTPUT", 21))
+ return_value = _readPlanInitPruningOutput();
else
{
elog(ERROR, "badly formatted node string \"%.32s\"...", token);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c769b4b4b9..4c586ac1ec 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -517,7 +517,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->transientPlan = glob->transientPlan;
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->planTree = top_plan;
+ result->numPlanNodes = glob->lastPlanNodeId;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 8214edec54..a1c6c3caa2 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1623,6 +1623,9 @@ set_append_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (aplan->part_prune_info->needs_init_pruning)
+ root->glob->containsInitialPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
@@ -1710,6 +1713,9 @@ set_mergeappend_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (mplan->part_prune_info->needs_init_pruning)
+ root->glob->containsInitialPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 7080cb25d9..3322dc79f2 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -230,6 +232,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -309,12 +313,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -323,6 +331,10 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+ if (!needs_init_pruning)
+ needs_init_pruning = partrel_needs_init_pruning;
+ if (!needs_exec_pruning)
+ needs_exec_pruning = partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -337,6 +349,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -435,13 +449,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -452,6 +471,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -539,6 +562,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * by noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -613,6 +639,11 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ if (!*needs_init_pruning)
+ *needs_init_pruning = (initial_pruning_steps != NIL);
+ if (!*needs_exec_pruning)
+ *needs_exec_pruning = (exec_pruning_steps != NIL);
+
pinfolist = lappend(pinfolist, pinfo);
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index ba2fcfeb4a..085eb3f209 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -945,15 +945,17 @@ pg_plan_query(Query *querytree, const char *query_string, int cursorOptions,
* For normal optimizable statements, invoke the planner. For utility
* statements, just make a wrapper PlannedStmt node.
*
- * The result is a list of PlannedStmt nodes.
+ * The result is a list of PlannedStmt nodes. Also, a NULL is appended to
+ * *execlockrelsinfo_list for each PlannedStmt added to the returned list.
*/
List *
pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
- ParamListInfo boundParams)
+ ParamListInfo boundParams, List **execlockrelsinfo_list)
{
List *stmt_list = NIL;
ListCell *query_list;
+ *execlockrelsinfo_list = NIL;
foreach(query_list, querytrees)
{
Query *query = lfirst_node(Query, query_list);
@@ -977,6 +979,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
}
stmt_list = lappend(stmt_list, stmt);
+ *execlockrelsinfo_list = lappend(*execlockrelsinfo_list, NULL);
}
return stmt_list;
@@ -1080,7 +1083,8 @@ exec_simple_query(const char *query_string)
QueryCompletion qc;
MemoryContext per_parsetree_context = NULL;
List *querytree_list,
- *plantree_list;
+ *plantree_list,
+ *plantree_execlockrelsinfo_list;
Portal portal;
DestReceiver *receiver;
int16 format;
@@ -1167,7 +1171,8 @@ exec_simple_query(const char *query_string)
NULL, 0, NULL);
plantree_list = pg_plan_queries(querytree_list, query_string,
- CURSOR_OPT_PARALLEL_OK, NULL);
+ CURSOR_OPT_PARALLEL_OK, NULL,
+ &plantree_execlockrelsinfo_list);
/*
* Done with the snapshot used for parsing/planning.
@@ -1203,6 +1208,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ plantree_execlockrelsinfo_list,
NULL);
/*
@@ -1991,6 +1997,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ cplan->execlockrelsinfo_list,
cplan);
/* Done with the snapshot used for parameter I/O and parsing/planning */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..0fd8c65de7 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->execlockrelsinfo = execlockrelsinfo; /* ExecutorGetLockRels() output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +124,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * execlockrelsinfo: ExecutorGetLockRels() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +137,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +149,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, execlockrelsinfo, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -493,6 +497,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ linitial_node(ExecLockRelsInfo, portal->execlockrelsinfos),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1193,7 +1198,8 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *stmtlist_item,
+ *execlockrelsinfolist_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1220,12 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ forboth(stmtlist_item, portal->stmts,
+ execlockrelsinfolist_item, portal->execlockrelsinfos)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo,
+ execlockrelsinfolist_item);
/*
* If we got a cancel signal in prior command, quit
@@ -1274,7 +1283,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, execlockrelsinfo,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1292,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, execlockrelsinfo,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 4cf6db504f..9f5a40a0a6 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,16 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
+static void CachedPlanSaveExecLockRelsInfos(CachedPlan *plan, List *execlockrelsinfo_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static List *AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams);
+static void ReleaseExecutorLocks(List *stmt_list, List *execlockrelsinfo_list);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,9 +792,21 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * If the CachedPlan is valid, this may in some cases call ExecutorGetLockRels
+ * on each PlannedStmt contained in it to determine the set of relations to be
+ * locked by AcquireExecutorLocks(), instead of just scanning its range table,
+ * which is done to prune away any nodes in the tree that need not be executed
+ * based on the result of initial partition pruning. Resulting
+ * ExecLockRelsInfo nodes containing the result of such pruning, allocated in
+ * a child context of the context containing the plan itself, are added into
+ * plan->execlockrelsinfo_list. The previous contents of the list from the
+ * last invocation on the same CachedPlan are deleted, because they would no
+ * longer be valid given the fresh set of parameter values which may be used
+ * as pruning parameters.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
CachedPlan *plan = plansource->gplan;
@@ -820,13 +834,25 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *execlockrelsinfo_list;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. If ExecutorGetLockRels() asked
+ * to omit some relations because the plan nodes that scan them were
+ * found to be pruned, the executor will be informed of the omission of
+ * the plan nodes themselves, so that it doesn't accidentally try to
+ * execute those nodes, via the ExecLockRelsInfo nodes collected in the
+ * returned list that is also passed to it along with the list of
+ * PlannedStmts.
+ */
+ execlockrelsinfo_list = AcquireExecutorLocks(plan->stmt_list,
+ boundParams);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -844,11 +870,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (plan->is_valid)
{
/* Successfully revalidated and locked the query. */
+
+ /* Remember ExecLockRelsInfos in the CachedPlan. */
+ CachedPlanSaveExecLockRelsInfos(plan, execlockrelsinfo_list);
return true;
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, execlockrelsinfo_list);
}
/*
@@ -880,7 +909,8 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv)
{
CachedPlan *plan;
- List *plist;
+ List *plist,
+ *execlockrelsinfo_list;
bool snapshot_set;
bool is_transient;
MemoryContext plan_context;
@@ -933,7 +963,8 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
* Generate the plan.
*/
plist = pg_plan_queries(qlist, plansource->query_string,
- plansource->cursor_options, boundParams);
+ plansource->cursor_options, boundParams,
+ &execlockrelsinfo_list);
/* Release snapshot if we got one */
if (snapshot_set)
@@ -1002,6 +1033,16 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->is_saved = false;
plan->is_valid = true;
+ /*
+ * Save the dummy ExecLockRelsInfo list, that is a list containing NULLs
+ * as elements. We must do this, becasue users of the CachedPlan expect
+ * one to go with the list of PlannedStmts.
+ * XXX maybe get rid of that contract.
+ */
+ plan->execlockrelsinfo_context = NULL;
+ CachedPlanSaveExecLockRelsInfos(plan, execlockrelsinfo_list);
+ Assert(MemoryContextIsValid(plan->execlockrelsinfo_context));
+
/* assign generation number to new plan */
plan->generation = ++(plansource->generation);
@@ -1160,7 +1201,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1586,6 +1627,49 @@ CopyCachedPlan(CachedPlanSource *plansource)
return newsource;
}
+/*
+ * CachedPlanSaveExecLockRelsInfos
+ * Save the list containing ExecLockRelsInfo nodes into the given
+ * CachedPlan
+ *
+ * The provided list is copied into a dedicated context that is a child of
+ * plan->context. If the child context already exists, it is emptied, because
+ * any ExecLockRelsInfo contained therein would no longer be useful.
+ */
+static void
+CachedPlanSaveExecLockRelsInfos(CachedPlan *plan, List *execlockrelsinfo_list)
+{
+ MemoryContext execlockrelsinfo_context = plan->execlockrelsinfo_context,
+ oldcontext = CurrentMemoryContext;
+ List *execlockrelsinfo_list_copy;
+
+ /*
+ * Set up the dedicated context if not already done, saving it as a child
+ * of the CachedPlan's context.
+ */
+ if (execlockrelsinfo_context == NULL)
+ {
+ execlockrelsinfo_context = AllocSetContextCreate(CurrentMemoryContext,
+ "CachedPlan execlockrelsinfo list",
+ ALLOCSET_START_SMALL_SIZES);
+ MemoryContextSetParent(execlockrelsinfo_context, plan->context);
+ MemoryContextSetIdentifier(execlockrelsinfo_context, plan->context->ident);
+ plan->execlockrelsinfo_context = execlockrelsinfo_context;
+ }
+ else
+ {
+ /* Just clear existing contents by resetting the context. */
+ Assert(MemoryContextIsValid(execlockrelsinfo_context));
+ MemoryContextReset(execlockrelsinfo_context);
+ }
+
+ MemoryContextSwitchTo(execlockrelsinfo_context);
+ execlockrelsinfo_list_copy = copyObject(execlockrelsinfo_list);
+ MemoryContextSwitchTo(oldcontext);
+
+ plan->execlockrelsinfo_list = execlockrelsinfo_list_copy;
+}
+
/*
* CachedPlanIsValid: test whether the rewritten querytree within a
* CachedPlanSource is currently valid (that is, not marked as being in need
@@ -1737,17 +1821,21 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * Returns a list of ExecLockRelsInfo nodes containing one element for each
+ * PlannedStmt in stmt_list or NULL if the latter is utility statement or its
+ * containsInitialPruning is false.
*/
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+static List *
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams)
{
ListCell *lc1;
+ List *execlockrelsinfo_list = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ ExecLockRelsInfo *execlockrelsinfo = NULL;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,27 +1849,139 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
- continue;
+ ScanQueryForLocks(query, true);
}
-
- foreach(lc2, plannedstmt->rtable)
+ else
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (!plannedstmt->containsInitialPruning)
+ {
+ /*
+ * If the plan contains no initial pruning steps, just lock
+ * all the relations found in the range table.
+ */
+ ListCell *lc;
- if (rte->rtekind != RTE_RELATION)
- continue;
+ foreach(lc, plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst(lc);
+
+ if (rte->rtekind != RTE_RELATION)
+ continue;
+
+ /*
+ * Acquire the appropriate type of lock on each relation
+ * OID. Note that we don't actually try to open the rel,
+ * and hence will not fail if it's been dropped entirely
+ * --- we'll just transiently acquire a non-conflicting
+ * lock.
+ */
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+ else
+ {
+ int rti;
+ Bitmapset *lockrels;
+
+ /*
+ * Walk the plan tree to find only the minimal set of
+ * relations to be locked, considering the effect of performing
+ * initial partition pruning.
+ */
+ execlockrelsinfo = ExecutorGetLockRels(plannedstmt, boundParams);
+ lockrels = execlockrelsinfo->lockrels;
+
+ rti = -1;
+ while ((rti = bms_next_member(lockrels, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment above. */
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+ }
+
+ /*
+ * Remember ExecLockRelsInfo for later adding to the QueryDesc that
+ * will be passed to the executor when executing this plan. May be
+ * NULL, but must keep the list the same length as stmt_list.
+ */
+ execlockrelsinfo_list = lappend(execlockrelsinfo_list,
+ execlockrelsinfo);
+ }
+
+ return execlockrelsinfo_list;
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *execlockrelsinfo_list)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, execlockrelsinfo_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc2);
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
/*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ }
+ else
+ {
+ if (execlockrelsinfo == NULL)
+ {
+ ListCell *lc;
+
+ foreach(lc, plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst(lc);
+
+ if (rte->rtekind != RTE_RELATION)
+ continue;
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ {
+ int rti;
+ Bitmapset *lockrels;
+
+ lockrels = execlockrelsinfo->lockrels;
+ rti = -1;
+ while ((rti = bms_next_member(lockrels, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index d549f66d4a..896f51be08 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -285,6 +285,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *execlockrelsinfos,
CachedPlan *cplan)
{
AssertArg(PortalIsValid(portal));
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->qc.nprocessed = 0;
portal->commandTag = commandTag;
portal->stmts = stmts;
+ portal->execlockrelsinfos = execlockrelsinfos;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 666977fb1f..fef75ba147 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, ExecLockRelsInfo *execlockrelsinfo,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index fd5735a946..ded19b8cbb 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -124,4 +124,6 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
PartitionPruneInfo *pruneinfo,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
+extern Bitmapset *ExecGetLockRelsDoInitialPruning(Plan *plan, ExecGetLockRelsContext *context,
+ PartitionPruneInfo *pruneinfo);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..4338463479 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ ExecLockRelsInfo *execlockrelsinfo; /* ExecutorGetLockRels()'s output given plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +58,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 873772f188..d03bd5a026 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern ExecLockRelsInfo *ExecutorGetLockRels(PlannedStmt *plannedstmt, ParamListInfo params);
+extern bool ExecGetLockRels(Plan *node, ExecGetLockRelsContext *context);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 4cb78ee5b6..b53535c2a4 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -17,6 +17,7 @@
#include "access/parallel.h"
#include "nodes/execnodes.h"
+extern bool ExecGetAppendLockRels(Append *node, ExecGetLockRelsContext *context);
extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
extern void ExecEndAppend(AppendState *node);
extern void ExecReScanAppend(AppendState *node);
diff --git a/src/include/executor/nodeMergeAppend.h b/src/include/executor/nodeMergeAppend.h
index 97fe3b0665..8eb4e9df93 100644
--- a/src/include/executor/nodeMergeAppend.h
+++ b/src/include/executor/nodeMergeAppend.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern bool ExecGetMergeAppendLockRels(MergeAppend *node, ExecGetLockRelsContext *context);
extern MergeAppendState *ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags);
extern void ExecEndMergeAppend(MergeAppendState *node);
extern void ExecReScanMergeAppend(MergeAppendState *node);
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index c318681b9a..287baf6257 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -19,6 +19,7 @@ extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
EState *estate, TupleTableSlot *slot,
CmdType cmdtype);
+extern bool ExecGetModifyTableLockRels(ModifyTable *plan, ExecGetLockRelsContext *context);
extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
extern void ExecEndModifyTable(ModifyTableState *node);
extern void ExecReScanModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cbbcff81d2..ee0c73e9a4 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -596,6 +596,7 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct ExecLockRelsInfo *es_execlockrelsinfo; /* QueryDesc.execlockrelsinfo */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -984,6 +985,101 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * ExecLockRelsInfo
+ *
+ * Result of performing ExecutorGetLockRels() for a given PlannedStmt
+ */
+typedef struct ExecLockRelsInfo
+{
+ NodeTag type;
+
+ /*
+ * Relations that must be locked to execute the plan tree contained in
+ * the PlannedStmt.
+ */
+ Bitmapset *lockrels;
+
+ /* PlannedStmt.numPlanNodes */
+ int numPlanNodes;
+
+ /*
+ * List of PlanInitPruningOutput, each representing the output of
+ * performing initial pruning on a given plan node, for all nodes in the
+ * plan tree that have been marked as needing initial pruning.
+ *
+ * 'ipoIndexes' is an array of 'numPlanNodes' elements, indexed with
+ * plan_node_id of the individual nodes in the plan tree, each a 1-based
+ * index into 'initPruningOutputs' list for a given plan node. 0 means
+ * that a given plan node has no entry in the list because of not needing
+ * any initial pruning done on it.
+ */
+ List *initPruningOutputs;
+ int *ipoIndexes;
+} ExecLockRelsInfo;
+
+/*----------------
+ * ExecGetLockRelsContext
+ *
+ * Information pertaining to ExecutorGetLockRels() invocation for a given
+ * plan.
+ */
+typedef struct ExecGetLockRelsContext
+{
+ NodeTag type;
+
+ PlannedStmt *stmt; /* target plan */
+ ParamListInfo params; /* EXTERN parameters available for pruning */
+
+ /* Output parameters for ExecGetLockRels and its subroutines. */
+ Bitmapset *lockrels;
+
+ /* See the omment in the definition of ExecLockRelsInfo struct. */
+ List *initPruningOutputs;
+ int *ipoIndexes;
+} ExecGetLockRelsContext;
+
+/*
+ * Appends the provided PlanInitPruningOutput to
+ * ExecGetLockRelsContext.initPruningOutput
+ */
+#define ExecStorePlanInitPruningOutput(cxt, initPruningOutput, plannode) \
+ do { \
+ (cxt)->initPruningOutputs = lappend((cxt)->initPruningOutputs, initPruningOutput); \
+ (cxt)->ipoIndexes[(plannode)->plan_node_id] = list_length((cxt)->initPruningOutputs); \
+ } while (0)
+
+/*
+ * Finds the PlanInitPruningOutput for a given Plan node in
+ * ExecLockRelsInfo.initPruningOutputs.
+ */
+#define ExecFetchPlanInitPruningOutput(execlockrelsinfo, plannode) \
+ (((execlockrelsinfo) != NULL && (execlockrelsinfo)->initPruningOutputs != NIL) ? \
+ list_nth((execlockrelsinfo)->initPruningOutputs, \
+ (execlockrelsinfo)->ipoIndexes[(plannode)->plan_node_id] - 1) : NULL)
+
+/* ---------------
+ * PlanInitPruningOutput
+ *
+ * Node to remember the result of performing initial partition pruning steps
+ * during ExecutorGetLockRels() on nodes that support pruning.
+ *
+ * ExecLockRelsDoInitPruning(), which runs during ExecutorGetLockRels(),
+ * creates it and stores it in the corresponding ExecLockRelsInfo.
+ *
+ * ExecInitPartitionPruning(), which runs during ExecuorStart(), fetches it
+ * from the EState's ExecLockRelsInfo (if any) and uses the value of
+ * initially_valid_subplans contained in it as-is to select the subplans to be
+ * initialized for execution, instead of re-evaluating that by performing
+ * initial pruning again.
+ */
+typedef struct PlanInitPruningOutput
+{
+ NodeTag type;
+
+ Bitmapset *initially_valid_subplans;
+} PlanInitPruningOutput;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 53f6b05a3f..928a30c7c6 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -97,6 +97,11 @@ typedef enum NodeTag
T_PartitionPruneStepCombine,
T_PlanInvalItem,
+ /* TAGS FOR EXECUTOR PREP NODES (execnodes.h) */
+ T_ExecGetLockRelsContext,
+ T_ExecLockRelsInfo,
+ T_PlanInitPruningOutput,
+
/*
* TAGS FOR PLAN STATE NODES (execnodes.h)
*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index ef9b54739a..0ed171d3f5 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -129,6 +129,10 @@ typedef struct PlannerGlobal
char maxParallelHazard; /* worst PROPARALLEL hazard level */
+ bool containsInitialPruning; /* Do some Plan nodes in the tree
+ * have initial (pre-exec) pruning
+ * steps? */
+
PartitionDirectory partition_directory; /* partition descriptors */
Bitmapset *elidedAppendPartedRels; /* Combined partitioned_rels of all
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a823c7c20d..4fcba0e55c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -60,10 +60,16 @@ typedef struct PlannedStmt
bool parallelModeNeeded; /* parallel mode required to execute? */
+ bool containsInitialPruning; /* Do some Plan nodes in the tree
+ * have initial (pre-exec) pruning
+ * steps? */
+
int jitFlags; /* which forms of JIT should be performed */
struct Plan *planTree; /* tree of Plan nodes */
+ int numPlanNodes; /* number of nodes in planTree */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
@@ -1192,6 +1198,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1200,6 +1213,8 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 92291a750d..bf80c53bed 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -64,7 +64,7 @@ extern PlannedStmt *pg_plan_query(Query *querytree, const char *query_string,
ParamListInfo boundParams);
extern List *pg_plan_queries(List *querytrees, const char *query_string,
int cursorOptions,
- ParamListInfo boundParams);
+ ParamListInfo boundParams, List **execlockrelsinfo_list);
extern bool check_max_stack_depth(int *newval, void **extra, GucSource source);
extern void assign_max_stack_depth(int newval, void *extra);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 95b99e3d25..56b0dcc6bd 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -148,6 +148,9 @@ typedef struct CachedPlan
{
int magic; /* should equal CACHEDPLAN_MAGIC */
List *stmt_list; /* list of PlannedStmts */
+ List *execlockrelsinfo_list; /* list of ExecutorGetLockRelsResult with one
+ * element for each of stmt_list; NIL
+ * if not a generic plan */
bool is_oneshot; /* is it a "oneshot" plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
@@ -158,6 +161,9 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+ MemoryContext execlockrelsinfo_context; /* context containing
+ * execlockrelsinfo_list,
+ * a child of the above context */
} CachedPlan;
/*
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9abace6734 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,10 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *execlockrelsinfos; /* list of ExecutorGetLockRelsResults with one element
+ * for each of 'stmts'; same as
+ * cplan->execlockrelsinfo_list if cplan is
+ * not NULL */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -241,6 +245,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *execlockrelsinfos,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.24.1
v8-0001-Some-refactoring-of-runtime-pruning-code.patchapplication/octet-stream; name=v8-0001-Some-refactoring-of-runtime-pruning-code.patchDownload
From ce2041b254a7fee3097012f11685b635d58fb9b2 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 2 Mar 2022 15:17:55 +0900
Subject: [PATCH v8 1/4] Some refactoring of runtime pruning code
This does two things mainly:
* Move the execution pruning initialization steps that are common
between both ExecInitAppend() and ExecInitMergeAppend() into a new
function ExecInitPartitionPruning() defined in execPartition.c.
Thus, ExecCreatePartitionPruneState() and
ExecFindInitialMatchingSubPlans() need not be exported.
* Add an ExprContext field to PartitionPruneContext to remove the
implicit assumption in the runtime pruning code that the ExprContext
to use to compute pruning expressions that need one can always rely
on the PlanState providing it. A future patch will allow runtime
pruning (at least the initial pruning steps) to be performed without
the corresponding PlanState yet having been created, so this will
help.
---
src/backend/executor/execPartition.c | 340 ++++++++++++++++---------
src/backend/executor/nodeAppend.c | 33 +--
src/backend/executor/nodeMergeAppend.c | 32 +--
src/backend/partitioning/partprune.c | 20 +-
src/include/executor/execPartition.h | 9 +-
src/include/partitioning/partprune.h | 2 +
6 files changed, 252 insertions(+), 184 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index aca42ca5b8..84b4e4b3d6 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -184,11 +184,18 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
+static PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *partitionpruneinfo);
+static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate);
static void ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate);
+ PlanState *planstate,
+ ExprContext *econtext);
+static void PartitionPruneStateFixSubPlanMap(PartitionPruneState *prunestate,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans);
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
@@ -1590,30 +1597,86 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
- * ExecCreatePartitionPruneState:
+ * ExecInitPartitionPruning:
* Creates the PartitionPruneState required by each of the two pruning
* functions. Details stored include how to map the partition index
- * returned by the partition pruning code into subplan indexes.
- *
- * ExecFindInitialMatchingSubPlans:
- * Returns indexes of matching subplans. Partition pruning is attempted
- * without any evaluation of expressions containing PARAM_EXEC Params.
- * This function must be called during executor startup for the parent
- * plan before the subplans themselves are initialized. Subplans which
- * are found not to match by this function must be removed from the
- * plan's list of subplans during execution, as this function performs a
- * remap of the partition index to subplan index map and the newly
- * created map provides indexes only for subplans which remain after
- * calling this function.
+ * returned by the partition pruning code into subplan indexes. Also
+ * determines the set of initially valid subplans by performing initial
+ * pruning steps, only which need be initialized by the caller such as
+ * ExecInitAppend. Maps in PartitionPruneState are updated to account
+ * for initial pruning having eliminated some of the subplans, if any.
*
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating all available
- * expressions. This function can only be called during execution and
- * must be called again each time the value of a Param listed in
- * PartitionPruneState's 'execparamids' changes.
+ * expressions, that is, using execution pruning steps. This function can
+ * can only be called during execution and must be called again each time
+ * the value of a Param listed in PartitionPruneState's 'execparamids'
+ * changes.
*-------------------------------------------------------------------------
*/
+/*
+ * ExecInitPartitionPruning
+ * Initialize data structure needed for run-time partition pruning
+ *
+ * Initial pruning can be done immediately, so it is done here if needed and
+ * the set of surviving partition subplans' indexes are added to the output
+ * parameter *initially_valid_subplans.
+ *
+ * If subplans are indeed pruned, subplan_map arrays contained in the returned
+ * PartitionPruneState are re-sequenced to not count those, though only if the
+ * maps will be needed for subsequent execution pruning passes.
+ */
+PartitionPruneState *
+ExecInitPartitionPruning(PlanState *planstate,
+ int n_total_subplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **initially_valid_subplans)
+{
+ PartitionPruneState *prunestate;
+ EState *estate = planstate->state;
+
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /*
+ * Create the working data structure for pruning.
+ */
+ prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo);
+
+ /*
+ * Perform an initial partition prune, if required.
+ */
+ if (prunestate->do_initial_prune)
+ {
+ /* Determine which subplans survive initial pruning */
+ *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate);
+ }
+ else
+ {
+ /* We'll need to initialize all subplans */
+ Assert(n_total_subplans > 0);
+ *initially_valid_subplans = bms_add_range(NULL, 0,
+ n_total_subplans - 1);
+ }
+
+ /*
+ * Re-sequence subplan indexes contained in prunestate to account for any
+ * that were removed above due to initial pruning.
+ *
+ * We can safely skip this when !do_exec_prune, even though that leaves
+ * invalid data in prunestate, because that data won't be consulted again
+ * (cf initial Assert in ExecFindMatchingSubPlans).
+ */
+ if (prunestate->do_exec_prune &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ PartitionPruneStateFixSubPlanMap(prunestate,
+ *initially_valid_subplans,
+ n_total_subplans);
+
+ return prunestate;
+}
+
/*
* ExecCreatePartitionPruneState
* Build the data structure required for calling
@@ -1632,7 +1695,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* re-used each time we re-evaluate which partitions match the pruning steps
* provided in each PartitionedRelPruneInfo.
*/
-PartitionPruneState *
+static PartitionPruneState *
ExecCreatePartitionPruneState(PlanState *planstate,
PartitionPruneInfo *partitionpruneinfo)
{
@@ -1641,6 +1704,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
int n_part_hierarchies;
ListCell *lc;
int i;
+ ExprContext *econtext = planstate->ps_ExprContext;
/* For data reading, executor always omits detached partitions */
if (estate->es_partition_directory == NULL)
@@ -1814,7 +1878,8 @@ ExecCreatePartitionPruneState(PlanState *planstate,
{
ExecInitPruningContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether initial pruning is needed at any level */
prunestate->do_initial_prune = true;
}
@@ -1823,7 +1888,8 @@ ExecCreatePartitionPruneState(PlanState *planstate,
{
ExecInitPruningContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether exec pruning is needed at any level */
prunestate->do_exec_prune = true;
}
@@ -1851,7 +1917,8 @@ ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate)
+ PlanState *planstate,
+ ExprContext *econtext)
{
int n_steps;
int partnatts;
@@ -1872,6 +1939,7 @@ ExecInitPruningContext(PartitionPruneContext *context,
context->ppccontext = CurrentMemoryContext;
context->planstate = planstate;
+ context->exprcontext = econtext;
/* Initialize expression state for each expression we need */
context->exprstates = (ExprState **)
@@ -1900,8 +1968,20 @@ ExecInitPruningContext(PartitionPruneContext *context,
step->step.step_id,
keyno);
- context->exprstates[stateidx] =
- ExecInitExpr(expr, context->planstate);
+ /*
+ * When planstate is NULL, pruning_steps is known not to
+ * contain any expressions that depend on the parent plan.
+ * Information of any available EXTERN parameters must be
+ * passed explicitly in that case, which the caller must
+ * have made available via econtext.
+ */
+ if (planstate == NULL)
+ context->exprstates[stateidx] =
+ ExecInitExprWithParams(expr,
+ econtext->ecxt_param_list_info);
+ else
+ context->exprstates[stateidx] =
+ ExecInitExpr(expr, context->planstate);
}
keyno++;
}
@@ -1914,18 +1994,11 @@ ExecInitPruningContext(PartitionPruneContext *context,
* pruning, disregarding any pruning constraints involving PARAM_EXEC
* Params.
*
- * If additional pruning passes will be required (because of PARAM_EXEC
- * Params), we must also update the translation data that allows conversion
- * of partition indexes into subplan indexes to account for the unneeded
- * subplans having been removed.
- *
* Must only be called once per 'prunestate', and only if initial pruning
* is required.
- *
- * 'nsubplans' must be passed as the total number of unpruned subplans.
*/
-Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
+static Bitmapset *
+ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -1950,14 +2023,20 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
PartitionedRelPruningData *pprune;
prunedata = prunestate->partprunedata[i];
+
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
pprune = &prunedata->partrelprunedata[0];
/* Perform pruning without using PARAM_EXEC Params */
find_matching_subplans_recurse(prunedata, pprune, true, &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /* Expression eval may have used space in ExprContext too */
if (pprune->initial_pruning_steps)
- ResetExprContext(pprune->initial_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->initial_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
@@ -1970,118 +2049,120 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
MemoryContextReset(prunestate->prune_context);
+ return result;
+}
+
+/*
+ * PartitionPruneStateFixSubPlanMap
+ * Fix mapping of partition indexes to subplan indexes contained in
+ * prunestate by considering the new list of subplans that survived
+ * initial pruning
+ *
+ * Subplans would previously be indexed 0..(n_total_subplans - 1) should be
+ * changed to index range 0..num(initially_valid_subplans).
+ */
+static void
+PartitionPruneStateFixSubPlanMap(PartitionPruneState *prunestate,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans)
+{
+ int *new_subplan_indexes;
+ Bitmapset *new_other_subplans;
+ int i;
+ int newidx;
+
/*
- * If exec-time pruning is required and we pruned subplans above, then we
- * must re-sequence the subplan indexes so that ExecFindMatchingSubPlans
- * properly returns the indexes from the subplans which will remain after
- * execution of this function.
- *
- * We can safely skip this when !do_exec_prune, even though that leaves
- * invalid data in prunestate, because that data won't be consulted again
- * (cf initial Assert in ExecFindMatchingSubPlans).
+ * First we must build a temporary array which maps old subplan
+ * indexes to new ones. For convenience of initialization, we use
+ * 1-based indexes in this array and leave pruned items as 0.
*/
- if (prunestate->do_exec_prune && bms_num_members(result) < nsubplans)
+ new_subplan_indexes = (int *) palloc0(sizeof(int) * n_total_subplans);
+ newidx = 1;
+ i = -1;
+ while ((i = bms_next_member(initially_valid_subplans, i)) >= 0)
{
- int *new_subplan_indexes;
- Bitmapset *new_other_subplans;
- int i;
- int newidx;
+ Assert(i < n_total_subplans);
+ new_subplan_indexes[i] = newidx++;
+ }
- /*
- * First we must build a temporary array which maps old subplan
- * indexes to new ones. For convenience of initialization, we use
- * 1-based indexes in this array and leave pruned items as 0.
- */
- new_subplan_indexes = (int *) palloc0(sizeof(int) * nsubplans);
- newidx = 1;
- i = -1;
- while ((i = bms_next_member(result, i)) >= 0)
- {
- Assert(i < nsubplans);
- new_subplan_indexes[i] = newidx++;
- }
+ /*
+ * Now we can update each PartitionedRelPruneInfo's subplan_map with
+ * new subplan indexes. We must also recompute its present_parts
+ * bitmap.
+ */
+ for (i = 0; i < prunestate->num_partprunedata; i++)
+ {
+ PartitionPruningData *prunedata = prunestate->partprunedata[i];
+ int j;
/*
- * Now we can update each PartitionedRelPruneInfo's subplan_map with
- * new subplan indexes. We must also recompute its present_parts
- * bitmap.
+ * Within each hierarchy, we perform this loop in back-to-front
+ * order so that we determine present_parts for the lowest-level
+ * partitioned tables first. This way we can tell whether a
+ * sub-partitioned table's partitions were entirely pruned so we
+ * can exclude it from the current level's present_parts.
*/
- for (i = 0; i < prunestate->num_partprunedata; i++)
+ for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
{
- PartitionPruningData *prunedata = prunestate->partprunedata[i];
- int j;
+ PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
+ int nparts = pprune->nparts;
+ int k;
- /*
- * Within each hierarchy, we perform this loop in back-to-front
- * order so that we determine present_parts for the lowest-level
- * partitioned tables first. This way we can tell whether a
- * sub-partitioned table's partitions were entirely pruned so we
- * can exclude it from the current level's present_parts.
- */
- for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
- {
- PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
- int nparts = pprune->nparts;
- int k;
+ /* We just rebuild present_parts from scratch */
+ bms_free(pprune->present_parts);
+ pprune->present_parts = NULL;
- /* We just rebuild present_parts from scratch */
- bms_free(pprune->present_parts);
- pprune->present_parts = NULL;
+ for (k = 0; k < nparts; k++)
+ {
+ int oldidx = pprune->subplan_map[k];
+ int subidx;
- for (k = 0; k < nparts; k++)
+ /*
+ * If this partition existed as a subplan then change the
+ * old subplan index to the new subplan index. The new
+ * index may become -1 if the partition was pruned above,
+ * or it may just come earlier in the subplan list due to
+ * some subplans being removed earlier in the list. If
+ * it's a subpartition, add it to present_parts unless
+ * it's entirely pruned.
+ */
+ if (oldidx >= 0)
{
- int oldidx = pprune->subplan_map[k];
- int subidx;
-
- /*
- * If this partition existed as a subplan then change the
- * old subplan index to the new subplan index. The new
- * index may become -1 if the partition was pruned above,
- * or it may just come earlier in the subplan list due to
- * some subplans being removed earlier in the list. If
- * it's a subpartition, add it to present_parts unless
- * it's entirely pruned.
- */
- if (oldidx >= 0)
- {
- Assert(oldidx < nsubplans);
- pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
+ Assert(oldidx < n_total_subplans);
+ pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
- if (new_subplan_indexes[oldidx] > 0)
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
- }
- else if ((subidx = pprune->subpart_map[k]) >= 0)
- {
- PartitionedRelPruningData *subprune;
+ if (new_subplan_indexes[oldidx] > 0)
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
+ }
+ else if ((subidx = pprune->subpart_map[k]) >= 0)
+ {
+ PartitionedRelPruningData *subprune;
- subprune = &prunedata->partrelprunedata[subidx];
+ subprune = &prunedata->partrelprunedata[subidx];
- if (!bms_is_empty(subprune->present_parts))
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
- }
+ if (!bms_is_empty(subprune->present_parts))
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
}
}
}
+ }
- /*
- * We must also recompute the other_subplans set, since indexes in it
- * may change.
- */
- new_other_subplans = NULL;
- i = -1;
- while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
- new_other_subplans = bms_add_member(new_other_subplans,
- new_subplan_indexes[i] - 1);
-
- bms_free(prunestate->other_subplans);
- prunestate->other_subplans = new_other_subplans;
+ /*
+ * We must also recompute the other_subplans set, since indexes in it
+ * may change.
+ */
+ new_other_subplans = NULL;
+ i = -1;
+ while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
+ new_other_subplans = bms_add_member(new_other_subplans,
+ new_subplan_indexes[i] - 1);
- pfree(new_subplan_indexes);
- }
+ bms_free(prunestate->other_subplans);
+ prunestate->other_subplans = new_other_subplans;
- return result;
+ pfree(new_subplan_indexes);
}
/*
@@ -2123,11 +2204,16 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
prunedata = prunestate->partprunedata[i];
pprune = &prunedata->partrelprunedata[0];
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
find_matching_subplans_recurse(prunedata, pprune, false, &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
- ResetExprContext(pprune->exec_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->exec_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 7937f1c88f..5b6d3eb23b 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -138,30 +138,17 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
{
PartitionPruneState *prunestate;
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &appendstate->ps);
-
- /* Create the working data structure for pruning. */
- prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
- node->part_prune_info);
+ /*
+ * Set up pruning data structure. Initial pruning steps, if any, are
+ * performed as part of the setup, adding the set of indexes of
+ * surviving subplans to 'validsubplans'.
+ */
+ prunestate = ExecInitPartitionPruning(&appendstate->ps,
+ list_length(node->appendplans),
+ node->part_prune_info,
+ &validsubplans);
appendstate->as_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
- {
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->appendplans));
-
- nplans = bms_num_members(validsubplans);
- }
- else
- {
- /* We'll need to initialize all subplans */
- nplans = list_length(node->appendplans);
- Assert(nplans > 0);
- validsubplans = bms_add_range(NULL, 0, nplans - 1);
- }
+ nplans = bms_num_members(validsubplans);
/*
* When no run-time pruning is required and there's at least one
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 418f89dea8..9a9f29e845 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -86,29 +86,17 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
{
PartitionPruneState *prunestate;
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &mergestate->ps);
-
- prunestate = ExecCreatePartitionPruneState(&mergestate->ps,
- node->part_prune_info);
+ /*
+ * Set up pruning data structure. Initial pruning steps, if any, are
+ * performed as part of the setup, adding the set of indexes of
+ * surviving subplans to 'validsubplans'.
+ */
+ prunestate = ExecInitPartitionPruning(&mergestate->ps,
+ list_length(node->mergeplans),
+ node->part_prune_info,
+ &validsubplans);
mergestate->ms_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
- {
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->mergeplans));
-
- nplans = bms_num_members(validsubplans);
- }
- else
- {
- /* We'll need to initialize all subplans */
- nplans = list_length(node->mergeplans);
- Assert(nplans > 0);
- validsubplans = bms_add_range(NULL, 0, nplans - 1);
- }
+ nplans = bms_num_members(validsubplans);
/*
* When no run-time pruning is required and there's at least one
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 1bc00826c1..7080cb25d9 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -798,6 +798,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
/* These are not valid when being called from the planner */
context.planstate = NULL;
+ context.exprcontext = NULL;
context.exprstates = NULL;
/* Actual pruning happens here. */
@@ -808,8 +809,8 @@ prune_append_rel_partitions(RelOptInfo *rel)
* get_matching_partitions
* Determine partitions that survive partition pruning
*
- * Note: context->planstate must be set to a valid PlanState when the
- * pruning_steps were generated with a target other than PARTTARGET_PLANNER.
+ * Note: context->exprcontext must be valid when the pruning_steps were
+ * generated with a target other than PARTTARGET_PLANNER.
*
* Returns a Bitmapset of the RelOptInfo->part_rels indexes of the surviving
* partitions.
@@ -3654,7 +3655,7 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
* exprstate array.
*
* Note that the evaluated result may be in the per-tuple memory context of
- * context->planstate->ps_ExprContext, and we may have leaked other memory
+ * context->exprcontext, and we may have leaked other memory
* there too. This memory must be recovered by resetting that ExprContext
* after we're done with the pruning operation (see execPartition.c).
*/
@@ -3677,13 +3678,18 @@ partkey_datum_from_expr(PartitionPruneContext *context,
ExprContext *ectx;
/*
- * We should never see a non-Const in a step unless we're running in
- * the executor.
+ * We should never see a non-Const in a step unless the caller has
+ * passed a valid ExprContext.
+ *
+ * When context->planstate is valid, context->exprcontext is same
+ * as context->planstate->ps_ExprContext.
*/
- Assert(context->planstate != NULL);
+ Assert(context->planstate != NULL || context->exprcontext != NULL);
+ Assert(context->planstate == NULL ||
+ (context->exprcontext == context->planstate->ps_ExprContext));
exprstate = context->exprstates[stateidx];
- ectx = context->planstate->ps_ExprContext;
+ ectx = context->exprcontext;
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 603d8becc4..fd5735a946 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -119,10 +119,9 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
EState *estate);
extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
PartitionTupleRouting *proute);
-extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
+extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
+ int n_total_subplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
-extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
- int nsubplans);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index ee11b6feae..90684efa25 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -41,6 +41,7 @@ struct RelOptInfo;
* subsidiary data, such as the FmgrInfos.
* planstate Points to the parent plan node's PlanState when called
* during execution; NULL when called from the planner.
+ * exprcontext ExprContext to use when evaluating pruning expressions
* exprstates Array of ExprStates, indexed as per PruneCxtStateIdx; one
* for each partition key in each pruning step. Allocated if
* planstate is non-NULL, otherwise NULL.
@@ -56,6 +57,7 @@ typedef struct PartitionPruneContext
FmgrInfo *stepcmpfuncs;
MemoryContext ppccontext;
PlanState *planstate;
+ ExprContext *exprcontext;
ExprState **exprstates;
} PartitionPruneContext;
--
2.24.1
v8-0003-Add-a-plan_tree_walker.patchapplication/octet-stream; name=v8-0003-Add-a-plan_tree_walker.patchDownload
From 3f3bfe578401c43e578196f46f2bad7d3071411a Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 3 Mar 2022 16:04:13 +0900
Subject: [PATCH v8 3/4] Add a plan_tree_walker()
Like planstate_tree_walker() but for uninitialized plan trees.
---
src/backend/nodes/nodeFuncs.c | 116 ++++++++++++++++++++++++++++++++++
src/include/nodes/nodeFuncs.h | 3 +
2 files changed, 119 insertions(+)
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 4789ba6911..51cac40a3e 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -31,6 +31,10 @@ static bool planstate_walk_subplans(List *plans, bool (*walker) (),
void *context);
static bool planstate_walk_members(PlanState **planstates, int nplans,
bool (*walker) (), void *context);
+static bool plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context);
+static bool plan_walk_members(List *plans, bool (*walker) (), void *context);
/*
@@ -4645,3 +4649,115 @@ planstate_walk_members(PlanState **planstates, int nplans,
return false;
}
+
+/*
+ * plan_tree_walker --- walk plantrees
+ *
+ * The walker has already visited the current node, and so we need only
+ * recurse into any sub-nodes it has.
+ */
+bool
+plan_tree_walker(Plan *plan,
+ bool (*walker) (),
+ void *context)
+{
+ /* Guard against stack overflow due to overly complex plan trees */
+ check_stack_depth();
+
+ /* initPlan-s */
+ if (plan_walk_subplans(plan->initPlan, walker, context))
+ return true;
+
+ /* lefttree */
+ if (outerPlan(plan))
+ {
+ if (walker(outerPlan(plan), context))
+ return true;
+ }
+
+ /* righttree */
+ if (innerPlan(plan))
+ {
+ if (walker(innerPlan(plan), context))
+ return true;
+ }
+
+ /* special child plans */
+ switch (nodeTag(plan))
+ {
+ case T_Append:
+ if (plan_walk_members(((Append *) plan)->appendplans,
+ walker, context))
+ return true;
+ break;
+ case T_MergeAppend:
+ if (plan_walk_members(((MergeAppend *) plan)->mergeplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapAnd:
+ if (plan_walk_members(((BitmapAnd *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapOr:
+ if (plan_walk_members(((BitmapOr *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_CustomScan:
+ if (plan_walk_members(((CustomScan *) plan)->custom_plans,
+ walker, context))
+ return true;
+ break;
+ case T_SubqueryScan:
+ if (walker(((SubqueryScan *) plan)->subplan, context))
+ return true;
+ break;
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * Walk a list of SubPlans (or initPlans, which also use SubPlan nodes).
+ */
+static bool
+plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context)
+{
+ ListCell *lc;
+ PlannedStmt *plannedstmt = (PlannedStmt *) context;
+
+ foreach(lc, plans)
+ {
+ SubPlan *sp = lfirst_node(SubPlan, lc);
+ Plan *p = list_nth(plannedstmt->subplans, sp->plan_id - 1);
+
+ if (walker(p, context))
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Walk the constituent plans of a ModifyTable, Append, MergeAppend,
+ * BitmapAnd, or BitmapOr node.
+ */
+static bool
+plan_walk_members(List *plans, bool (*walker) (), void *context)
+{
+ ListCell *lc;
+
+ foreach(lc, plans)
+ {
+ if (walker(lfirst(lc), context))
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/include/nodes/nodeFuncs.h b/src/include/nodes/nodeFuncs.h
index 93c60bde66..fca107ad65 100644
--- a/src/include/nodes/nodeFuncs.h
+++ b/src/include/nodes/nodeFuncs.h
@@ -158,5 +158,8 @@ extern bool raw_expression_tree_walker(Node *node, bool (*walker) (),
struct PlanState;
extern bool planstate_tree_walker(struct PlanState *planstate, bool (*walker) (),
void *context);
+struct Plan;
+extern bool plan_tree_walker(struct Plan *plan, bool (*walker) (),
+ void *context);
#endif /* NODEFUNCS_H */
--
2.24.1
v8-0002-Add-Merge-Append.partitioned_rels.patchapplication/octet-stream; name=v8-0002-Add-Merge-Append.partitioned_rels.patchDownload
From 8b99146c9b8c4826e1434d3f006597681c24cd45 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 24 Mar 2022 22:47:03 +0900
Subject: [PATCH v8 2/4] Add [Merge]Append.partitioned_rels
To record the RT indexes of all partitioned ancestors leading up to
leaf partitions that are appended by the node.
If a given [Merge]Append node is left out from the plan due to there
being only one element in its list of child subplans, then its
partitioned_rels set is added to PlannerGlobal.elidedAppendPartedRels
that is passed down to the executor through PlannedStmt.
There are no users for partitioned_rels and elidedAppendPartedRels
as of this commit, though a later commit will require the ability
to extract the set of relations that must be locked to make a plan
tree safe for execution by walking the plan tree itself, so having
the partitioned tables be also present in the plan tree will be
helpful. Note that currently the executor relies on the fact that
the set of relations to be locked can be obtained by simply scanning
the range table that's made available in PlannedStmt along with the
plan tree.
---
src/backend/nodes/copyfuncs.c | 3 +++
src/backend/nodes/outfuncs.c | 5 +++++
src/backend/nodes/readfuncs.c | 3 +++
src/backend/optimizer/path/joinrels.c | 9 ++++++++
src/backend/optimizer/plan/createplan.c | 18 +++++++++++++++-
src/backend/optimizer/plan/planner.c | 8 +++++++
src/backend/optimizer/plan/setrefs.c | 28 +++++++++++++++++++++++++
src/backend/optimizer/util/inherit.c | 16 ++++++++++++++
src/backend/optimizer/util/relnode.c | 20 ++++++++++++++++++
src/include/nodes/pathnodes.h | 22 +++++++++++++++++++
src/include/nodes/plannodes.h | 17 +++++++++++++++
11 files changed, 148 insertions(+), 1 deletion(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 56505557bf..29c515d7db 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -106,6 +106,7 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_NODE_FIELD(invalItems);
COPY_NODE_FIELD(paramExecTypes);
COPY_NODE_FIELD(utilityStmt);
+ COPY_BITMAPSET_FIELD(elidedAppendPartedRels);
COPY_LOCATION_FIELD(stmt_location);
COPY_SCALAR_FIELD(stmt_len);
@@ -254,6 +255,7 @@ _copyAppend(const Append *from)
COPY_SCALAR_FIELD(nasyncplans);
COPY_SCALAR_FIELD(first_partial_plan);
COPY_NODE_FIELD(part_prune_info);
+ COPY_BITMAPSET_FIELD(partitioned_rels);
return newnode;
}
@@ -282,6 +284,7 @@ _copyMergeAppend(const MergeAppend *from)
COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
COPY_NODE_FIELD(part_prune_info);
+ COPY_BITMAPSET_FIELD(partitioned_rels);
return newnode;
}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 6e39590730..108ede9af9 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -324,6 +324,7 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_NODE_FIELD(invalItems);
WRITE_NODE_FIELD(paramExecTypes);
WRITE_NODE_FIELD(utilityStmt);
+ WRITE_BITMAPSET_FIELD(elidedAppendPartedRels);
WRITE_LOCATION_FIELD(stmt_location);
WRITE_INT_FIELD(stmt_len);
}
@@ -444,6 +445,7 @@ _outAppend(StringInfo str, const Append *node)
WRITE_INT_FIELD(nasyncplans);
WRITE_INT_FIELD(first_partial_plan);
WRITE_NODE_FIELD(part_prune_info);
+ WRITE_BITMAPSET_FIELD(partitioned_rels);
}
static void
@@ -461,6 +463,7 @@ _outMergeAppend(StringInfo str, const MergeAppend *node)
WRITE_OID_ARRAY(collations, node->numCols);
WRITE_BOOL_ARRAY(nullsFirst, node->numCols);
WRITE_NODE_FIELD(part_prune_info);
+ WRITE_BITMAPSET_FIELD(partitioned_rels);
}
static void
@@ -2404,6 +2407,7 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_BOOL_FIELD(parallelModeOK);
WRITE_BOOL_FIELD(parallelModeNeeded);
WRITE_CHAR_FIELD(maxParallelHazard);
+ WRITE_BITMAPSET_FIELD(elidedAppendPartedRels);
}
static void
@@ -2515,6 +2519,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_BOOL_FIELD(partbounds_merged);
WRITE_BITMAPSET_FIELD(live_parts);
WRITE_BITMAPSET_FIELD(all_partrels);
+ WRITE_BITMAPSET_FIELD(partitioned_rels);
}
static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index c94b2561f0..ce146dd45e 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1794,6 +1794,7 @@ _readPlannedStmt(void)
READ_NODE_FIELD(invalItems);
READ_NODE_FIELD(paramExecTypes);
READ_NODE_FIELD(utilityStmt);
+ READ_BITMAPSET_FIELD(elidedAppendPartedRels);
READ_LOCATION_FIELD(stmt_location);
READ_INT_FIELD(stmt_len);
@@ -1917,6 +1918,7 @@ _readAppend(void)
READ_INT_FIELD(nasyncplans);
READ_INT_FIELD(first_partial_plan);
READ_NODE_FIELD(part_prune_info);
+ READ_BITMAPSET_FIELD(partitioned_rels);
READ_DONE();
}
@@ -1939,6 +1941,7 @@ _readMergeAppend(void)
READ_OID_ARRAY(collations, local_node->numCols);
READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
READ_NODE_FIELD(part_prune_info);
+ READ_BITMAPSET_FIELD(partitioned_rels);
READ_DONE();
}
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 9da3ff2f9a..e74d40fee3 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1549,6 +1549,15 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
child_restrictlist);
+
+ /*
+ * A parent relation's partitioned_rels must be a superset of the sets
+ * of all its children, direct or indirect, so bubble up the child
+ * joinrel's set.
+ */
+ joinrel->partitioned_rels =
+ bms_add_members(joinrel->partitioned_rels,
+ child_joinrel->partitioned_rels);
}
}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 179c87c671..99868a1a79 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -26,10 +26,12 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
#include "optimizer/paramassign.h"
+#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
@@ -1332,11 +1334,11 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
best_path->subpaths,
prunequal);
}
-
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
plan->part_prune_info = partpruneinfo;
+ plan->partitioned_rels = bms_copy(rel->partitioned_rels);
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1500,6 +1502,20 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
node->mergeplans = subplans;
node->part_prune_info = partpruneinfo;
+ /*
+ * We need to explicitly add to the plan node the RT indexes of any
+ * partitioned tables whose partitions will be scanned by the nodes in
+ * 'subplans'. There can be multiple RT indexes in the set due to the
+ * partition tree being multi-level and/or this being a plan for UNION ALL
+ * over multiple partition trees. Along with scanrelids of leaf-level Scan
+ * nodes, this allows the executor to lock the full set of relations being
+ * scanned by this node.
+ *
+ * Note that 'apprelids' only contains the top-level base relation(s), so
+ * is not sufficient for the purpose.
+ */
+ node->partitioned_rels = bms_copy(rel->partitioned_rels);
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
* produce either the exact tlist or a narrow tlist, we should get rid of
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b2569c5d0c..c769b4b4b9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -529,6 +529,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->paramExecTypes = glob->paramExecTypes;
/* utilityStmt should be null, but we might as well copy it */
result->utilityStmt = parse->utilityStmt;
+ result->elidedAppendPartedRels = glob->elidedAppendPartedRels;
result->stmt_location = parse->stmt_location;
result->stmt_len = parse->stmt_len;
@@ -7534,6 +7535,13 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, grouped_rel, grouped_live_children);
}
+
+ /*
+ * Input rel might be a partitioned appendrel, though grouped_rel has at
+ * this point taken its role as the an appendrel owning the former's
+ * children, so copy the former's partitioned_rels set into the latter.
+ */
+ grouped_rel->partitioned_rels = bms_copy(input_rel->partitioned_rels);
}
/*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index bf4c722c02..8214edec54 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1574,6 +1574,10 @@ set_append_references(PlannerInfo *root,
lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset);
}
+ /* Fix up partitioned_rels before possibly removing the Append below. */
+ aplan->partitioned_rels = offset_relid_set(aplan->partitioned_rels,
+ rtoffset);
+
/*
* See if it's safe to get rid of the Append entirely. For this to be
* safe, there must be only one child plan and that child plan's parallel
@@ -1584,8 +1588,17 @@ set_append_references(PlannerInfo *root,
*/
if (list_length(aplan->appendplans) == 1 &&
((Plan *) linitial(aplan->appendplans))->parallel_aware == aplan->plan.parallel_aware)
+ {
+ /*
+ * Partitioned table involved, if any, must be made known to the
+ * executor.
+ */
+ root->glob->elidedAppendPartedRels =
+ bms_add_members(root->glob->elidedAppendPartedRels,
+ aplan->partitioned_rels);
return clean_up_removed_plan_level((Plan *) aplan,
(Plan *) linitial(aplan->appendplans));
+ }
/*
* Otherwise, clean up the Append as needed. It's okay to do this after
@@ -1646,6 +1659,12 @@ set_mergeappend_references(PlannerInfo *root,
lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset);
}
+ /*
+ * Fix up partitioned_rels before possibly removing the MergeAppend below.
+ */
+ mplan->partitioned_rels = offset_relid_set(mplan->partitioned_rels,
+ rtoffset);
+
/*
* See if it's safe to get rid of the MergeAppend entirely. For this to
* be safe, there must be only one child plan and that child plan's
@@ -1656,8 +1675,17 @@ set_mergeappend_references(PlannerInfo *root,
*/
if (list_length(mplan->mergeplans) == 1 &&
((Plan *) linitial(mplan->mergeplans))->parallel_aware == mplan->plan.parallel_aware)
+ {
+ /*
+ * Partitioned tables involved, if any, must be made known to the
+ * executor.
+ */
+ root->glob->elidedAppendPartedRels =
+ bms_add_members(root->glob->elidedAppendPartedRels,
+ mplan->partitioned_rels);
return clean_up_removed_plan_level((Plan *) mplan,
(Plan *) linitial(mplan->mergeplans));
+ }
/*
* Otherwise, clean up the MergeAppend as needed. It's okay to do this
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 7e134822f3..56912e4101 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -406,6 +406,14 @@ expand_partitioned_rtentry(PlannerInfo *root, RelOptInfo *relinfo,
childrte, childRTindex,
childrel, top_parentrc, lockmode);
+ /*
+ * A parent relation's partitioned_rels must be a superset of the sets
+ * of all its children, direct or indirect, so bubble up the child
+ * rel's set.
+ */
+ relinfo->partitioned_rels = bms_add_members(relinfo->partitioned_rels,
+ childrelinfo->partitioned_rels);
+
/* Close child relation, but keep locks */
table_close(childrel, NoLock);
}
@@ -737,6 +745,14 @@ expand_appendrel_subquery(PlannerInfo *root, RelOptInfo *rel,
/* Child may itself be an inherited rel, either table or subquery. */
if (childrte->inh)
expand_inherited_rtentry(root, childrel, childrte, childRTindex);
+
+ /*
+ * A parent relation's partitioned_rels must be a superset of the sets
+ * of all its children, direct or indirect, so bubble up the child
+ * rel's set.
+ */
+ rel->partitioned_rels = bms_add_members(rel->partitioned_rels,
+ childrel->partitioned_rels);
}
}
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 520409f4ba..1d082a8fdd 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -361,6 +361,10 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
}
}
+ /* A partitioned appendrel. */
+ if (rel->part_scheme != NULL)
+ rel->partitioned_rels = bms_copy(rel->relids);
+
/* Save the finished struct in the query's simple_rel_array */
root->simple_rel_array[relid] = rel;
@@ -729,6 +733,14 @@ build_join_rel(PlannerInfo *root,
set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
sjinfo, restrictlist);
+ /*
+ * The joinrel may get processed as an appendrel via partitionwise join
+ * if both outer and inner rels are partitioned, so set partitioned_rels
+ * appropriately.
+ */
+ joinrel->partitioned_rels = bms_union(outer_rel->partitioned_rels,
+ inner_rel->partitioned_rels);
+
/*
* Set the consider_parallel flag if this joinrel could potentially be
* scanned within a parallel worker. If this flag is false for either
@@ -897,6 +909,14 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
sjinfo, restrictlist);
+ /*
+ * The joinrel may get processed as an appendrel via partitionwise join
+ * if both outer and inner rels are partitioned, so set partitioned_rels
+ * appropriately.
+ */
+ joinrel->partitioned_rels = bms_union(outer_rel->partitioned_rels,
+ inner_rel->partitioned_rels);
+
/* We build the join only once. */
Assert(!find_join_rel(root, joinrel->relids));
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6cbcb67bdf..ef9b54739a 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -130,6 +130,11 @@ typedef struct PlannerGlobal
char maxParallelHazard; /* worst PROPARALLEL hazard level */
PartitionDirectory partition_directory; /* partition descriptors */
+
+ Bitmapset *elidedAppendPartedRels; /* Combined partitioned_rels of all
+ * single-subplan [Merge]Append nodes
+ * that have been removed fron the
+ * various plan trees. */
} PlannerGlobal;
/* macro for fetching the Plan associated with a SubPlan node */
@@ -773,6 +778,23 @@ typedef struct RelOptInfo
Relids all_partrels; /* Relids set of all partition relids */
List **partexprs; /* Non-nullable partition key expressions */
List **nullable_partexprs; /* Nullable partition key expressions */
+
+ /*
+ * For an appendrel parent relation (base, join, or upper) that is
+ * partitioned, this stores the RT indexes of all the paritioned ancestors
+ * including itself that lead up to the individual leaf partitions that
+ * will be scanned to produce this relation's output rows. The relid set
+ * is copied into the resulting Append or MergeAppend plan node for
+ * allowing the executor to take appropriate locks on those relations,
+ * unless the node is deemed useless in setrefs.c due to having a single
+ * leaf subplan and thus elided from the final plan, in which case, the set
+ * is added into PlannerGlobal.elidedAppendPartedRels.
+ *
+ * Note that 'apprelids' of those nodes only contains the top-level base
+ * relation(s), so is not sufficient for said purpose.
+ */
+
+ Bitmapset *partitioned_rels;
} RelOptInfo;
/*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 50ef3dda05..a823c7c20d 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -86,6 +86,11 @@ typedef struct PlannedStmt
Node *utilityStmt; /* non-null if this is utility stmt */
+ Bitmapset *elidedAppendPartedRels; /* Combined partitioned_rels of all
+ * single-subplan [Merge]Append nodes
+ * that have been removed from the
+ * various plan trees. */
+
/* statement location in source string (copied from Query) */
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
@@ -264,6 +269,12 @@ typedef struct Append
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /*
+ * RT indexes of all partitioned parents whose partitions' plans are
+ * present in appendplans.
+ */
+ Bitmapset *partitioned_rels;
} Append;
/* ----------------
@@ -284,6 +295,12 @@ typedef struct MergeAppend
bool *nullsFirst; /* NULLS FIRST/LAST directions */
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /*
+ * RT indexes of all partitioned parents whose partitions' plans are
+ * present in appendplans.
+ */
+ Bitmapset *partitioned_rels;
} MergeAppend;
/* ----------------
--
2.24.1
I'm looking at 0001 here with intention to commit later. I see that
there is some resistance to 0004, but I think a final verdict on that
one doesn't materially affect 0001.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"El destino baraja y nosotros jugamos" (A. Schopenhauer)
On Thu, Mar 31, 2022 at 6:55 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I'm looking at 0001 here with intention to commit later. I see that
there is some resistance to 0004, but I think a final verdict on that
one doesn't materially affect 0001.
Thanks.
While the main goal of the refactoring patch is to make it easier to
review the more complex changes that 0004 makes to execPartition.c, I
agree it has merit on its own. Although, one may say that the bit
about providing a PlanState-independent ExprContext is more closely
tied with 0004's requirements...
--
Amit Langote
EDB: http://www.enterprisedb.com
On Thu, 31 Mar 2022 at 16:25, Amit Langote <amitlangote09@gmail.com> wrote:
Rebased.
I've been looking over the v8 patch and I'd like to propose semi-baked
ideas to improve things. I'd need to go and write them myself to
fully know if they'd actually work ok.
1. You've changed the signature of various functions by adding
ExecLockRelsInfo *execlockrelsinfo. I'm wondering why you didn't just
put the ExecLockRelsInfo as a new field in PlannedStmt?
I think the above gets around messing the signatures of
CreateQueryDesc(), ExplainOnePlan(), pg_plan_queries(),
PortalDefineQuery(), ProcessQuery() It would get rid of your change of
foreach to forboth in execute_sql_string() / PortalRunMulti() and gets
rid of a number of places where your carrying around a variable named
execlockrelsinfo_list. It would also make the patch significantly
easier to review as you'd be touching far fewer files.
2. I don't really like the way you've gone about most of the patch...
The way I imagine this working is that during create_plan() we visit
all nodes that have run-time pruning then inside create_append_plan()
and create_merge_append_plan() we'd tag those onto a new field in
PlannerGlobal That way you can store the PartitionPruneInfos in the
new PlannedStmt field in standard_planner() after the
makeNode(PlannedStmt).
Instead of storing the PartitionPruneInfo in the Append / MergeAppend
struct, you'd just add a new index field to those structs. The index
would start with 0 for the 0th PartitionPruneInfo. You'd basically
just know the index by assigning
list_length(root->glob->partitionpruneinfos).
You'd then assign the root->glob->partitionpruneinfos to
PlannedStmt.partitionpruneinfos and anytime you needed to do run-time
pruning during execution, you'd need to use the Append / MergeAppend's
partition_prune_info_idx to lookup the PartitionPruneInfo in some new
field you add to EState to store those. You'd leave that index as -1
if there's no PartitionPruneInfo for the Append / MergeAppend node.
When you do AcquireExecutorLocks(), you'd iterate over the
PlannedStmt's PartitionPruneInfo to figure out which subplans to
prune. You'd then have an array sized
list_length(plannedstmt->runtimepruneinfos) where you'd store the
result. When the Append/MergeAppend node starts up you just check if
the part_prune_info_idx >= 0 and if there's a non-NULL result stored
then use that result. That's how you'd ensure you always got the same
run-time prune result between locking and plan startup.
3. Also, looking at ExecGetLockRels(), shouldn't it be the planner's
job to determine the minimum set of relations which must be locked? I
think the plan tree traversal during execution not great. Seems the
whole point of this patch is to reduce overhead during execution. A
full additional plan traversal aside from the 3 that we already do for
start/run/end of execution seems not great.
I think this means that during AcquireExecutorLocks() you'd start with
the minimum set or RTEs that need to be locked as determined during
create_plan() and stored in some Bitmapset field in PlannedStmt. This
minimal set would also only exclude RTIs that would only possibly be
used due to a PartitionPruneInfo with initial pruning steps, i.e.
include RTIs from PartitionPruneInfo with no init pruining steps (you
can't skip any locks for those). All you need to do to determine the
RTEs to lock are to take the minimal set and execute each
PartitionPruneInfo in the PlannedStmt that has init steps
4. It's a bit disappointing to see RelOptInfo.partitioned_rels getting
revived here. Why don't you just add a partitioned_relids to
PartitionPruneInfo and just have make_partitionedrel_pruneinfo build
you a Relids of them. PartitionedRelPruneInfo already has an rtindex
field, so you just need to bms_add_member whatever that rtindex is.
It's a fairly high-level review at this stage. I can look in more
detail if the above points get looked at. You may find or know of
some reason why it can't be done like I mention above.
David
Thanks a lot for looking into this.
On Fri, Apr 1, 2022 at 10:32 AM David Rowley <dgrowleyml@gmail.com> wrote:
I've been looking over the v8 patch and I'd like to propose semi-baked
ideas to improve things. I'd need to go and write them myself to
fully know if they'd actually work ok.1. You've changed the signature of various functions by adding
ExecLockRelsInfo *execlockrelsinfo. I'm wondering why you didn't just
put the ExecLockRelsInfo as a new field in PlannedStmt?I think the above gets around messing the signatures of
CreateQueryDesc(), ExplainOnePlan(), pg_plan_queries(),
PortalDefineQuery(), ProcessQuery() It would get rid of your change of
foreach to forboth in execute_sql_string() / PortalRunMulti() and gets
rid of a number of places where your carrying around a variable named
execlockrelsinfo_list. It would also make the patch significantly
easier to review as you'd be touching far fewer files.
I'm worried about that churn myself and did consider this idea, though
I couldn't shake the feeling that it's maybe wrong to put something in
PlannedStmt that the planner itself doesn't produce. I mean the
definition of PlannedStmt says this:
/* ----------------
* PlannedStmt node
*
* The output of the planner
With the ideas that you've outlined below, perhaps we can frame most
of the things that the patch wants to do as the planner and the
plancache changes. If we twist the above definition a bit to say what
the plancache does in this regard is part of planning, maybe it makes
sense to add the initial pruning related fields (nodes, outputs) into
PlannedStmt.
2. I don't really like the way you've gone about most of the patch...
The way I imagine this working is that during create_plan() we visit
all nodes that have run-time pruning then inside create_append_plan()
and create_merge_append_plan() we'd tag those onto a new field in
PlannerGlobal That way you can store the PartitionPruneInfos in the
new PlannedStmt field in standard_planner() after the
makeNode(PlannedStmt).Instead of storing the PartitionPruneInfo in the Append / MergeAppend
struct, you'd just add a new index field to those structs. The index
would start with 0 for the 0th PartitionPruneInfo. You'd basically
just know the index by assigning
list_length(root->glob->partitionpruneinfos).You'd then assign the root->glob->partitionpruneinfos to
PlannedStmt.partitionpruneinfos and anytime you needed to do run-time
pruning during execution, you'd need to use the Append / MergeAppend's
partition_prune_info_idx to lookup the PartitionPruneInfo in some new
field you add to EState to store those. You'd leave that index as -1
if there's no PartitionPruneInfo for the Append / MergeAppend node.When you do AcquireExecutorLocks(), you'd iterate over the
PlannedStmt's PartitionPruneInfo to figure out which subplans to
prune. You'd then have an array sized
list_length(plannedstmt->runtimepruneinfos) where you'd store the
result. When the Append/MergeAppend node starts up you just check if
the part_prune_info_idx >= 0 and if there's a non-NULL result stored
then use that result. That's how you'd ensure you always got the same
run-time prune result between locking and plan startup.
Actually, Robert too suggested such an idea to me off-list and I think
it's worth trying. I was not sure about the implementation, because
then we'd be passing around lists of initial pruning nodes/results
across many function/module boundaries that you mentioned in your
comment 1, but if we agree that PlannedStmt is an acceptable place for
those things to be stored, then I agree it's an attractive idea.
3. Also, looking at ExecGetLockRels(), shouldn't it be the planner's
job to determine the minimum set of relations which must be locked? I
think the plan tree traversal during execution not great. Seems the
whole point of this patch is to reduce overhead during execution. A
full additional plan traversal aside from the 3 that we already do for
start/run/end of execution seems not great.I think this means that during AcquireExecutorLocks() you'd start with
the minimum set or RTEs that need to be locked as determined during
create_plan() and stored in some Bitmapset field in PlannedStmt.
The patch did have a PlannedStmt.lockrels till v6. Though, it wasn't
the same thing as you are describing it...
This
minimal set would also only exclude RTIs that would only possibly be
used due to a PartitionPruneInfo with initial pruning steps, i.e.
include RTIs from PartitionPruneInfo with no init pruining steps (you
can't skip any locks for those). All you need to do to determine the
RTEs to lock are to take the minimal set and execute each
PartitionPruneInfo in the PlannedStmt that has init steps
So just thinking about an Append/MergeAppend, the minimum set must
include the RT indexes of all the partitioned tables whose direct and
indirect children's plans will be in 'subplans' and also of the
children if the PartitionPruneInfo doesn't contain initial steps or if
there is no PartitionPruneInfo to begin with.
One question is whether the planner should always pay the overhead of
initializing this bitmapset? I mean it's only worthwhile if
AcquireExecutorLocks() is going to be involved, that is, the plan will
be cached and reused.
4. It's a bit disappointing to see RelOptInfo.partitioned_rels getting
revived here. Why don't you just add a partitioned_relids to
PartitionPruneInfo and just have make_partitionedrel_pruneinfo build
you a Relids of them. PartitionedRelPruneInfo already has an rtindex
field, so you just need to bms_add_member whatever that rtindex is.
Hmm, not all Append/MergeAppend nodes in the plan tree may have
make_partition_pruneinfo() called on them though.
If not the proposed RelOptInfo.partitioned_rels that is populated in
the early planning stages, the only reliable way to get all the
partitioned tables involved in Appends/MergeAppends at create_plan()
stage seems to be to make a function out the stanza at the top of
make_partition_pruneinfo() that collects them by scanning the leaf
paths and tracing each path's relation's parents up to the root
partitioned parent and call it from create_{merge_}append_plan() if
make_partition_pruneinfo() was not. I did try to implement that and
found it a bit complex and expensive (the scanning the leaf paths
part).
It's a fairly high-level review at this stage. I can look in more
detail if the above points get looked at. You may find or know of
some reason why it can't be done like I mention above.
I'll try to write a version with the above points addressed, while
keeping RelOptInfo.partitioned_rels around for now.
--
Amit Langote
EDB: http://www.enterprisedb.com
[1]: /messages/by-id/CA+HiwqH9-fAvpG-w9qYCcDWzK3vGPCMyw4f9nHzqkxXVuD1pxw@mail.gmail.com
Amit Langote <amitlangote09@gmail.com> writes:
On Fri, Apr 1, 2022 at 10:32 AM David Rowley <dgrowleyml@gmail.com> wrote:
1. You've changed the signature of various functions by adding
ExecLockRelsInfo *execlockrelsinfo. I'm wondering why you didn't just
put the ExecLockRelsInfo as a new field in PlannedStmt?
I'm worried about that churn myself and did consider this idea, though
I couldn't shake the feeling that it's maybe wrong to put something in
PlannedStmt that the planner itself doesn't produce.
PlannedStmt is part of the plan tree, which MUST be read-only to
the executor. This is not negotiable. However, there's other
places that this data could be put, such as QueryDesc.
Or for that matter, couldn't the data structure be created by
the planner? (It looks like David is proposing exactly that
further down.)
regards, tom lane
On Fri, 1 Apr 2022 at 16:09, Amit Langote <amitlangote09@gmail.com> wrote:
definition of PlannedStmt says this:
/* ----------------
* PlannedStmt node
*
* The output of the plannerWith the ideas that you've outlined below, perhaps we can frame most
of the things that the patch wants to do as the planner and the
plancache changes. If we twist the above definition a bit to say what
the plancache does in this regard is part of planning, maybe it makes
sense to add the initial pruning related fields (nodes, outputs) into
PlannedStmt.
How about the PartitionPruneInfos go into PlannedStmt as a List
indexed in the way I mentioned and the cache of the results of pruning
in EState?
I think that leaves you adding List *partpruneinfos, Bitmapset
*minimumlockrtis to PlannedStmt and the thing you have to cache the
pruning results into EState. I'm not very clear on where you should
stash the results of run-time pruning in the meantime before you can
put them in EState. You might need to invent some intermediate struct
that gets passed around that you can scribble down some details you're
going to need during execution.
One question is whether the planner should always pay the overhead of
initializing this bitmapset? I mean it's only worthwhile if
AcquireExecutorLocks() is going to be involved, that is, the plan will
be cached and reused.
Maybe the Bitmapset for the minimal locks needs to be built with
bms_add_range(NULL, 0, list_length(rtable)); then do
bms_del_members() on the relevant RTIs you find in the listed
PartitionPruneInfos. That way it's very simple and cheap to do when
there are no PartitionPruneInfos.
4. It's a bit disappointing to see RelOptInfo.partitioned_rels getting
revived here. Why don't you just add a partitioned_relids to
PartitionPruneInfo and just have make_partitionedrel_pruneinfo build
you a Relids of them. PartitionedRelPruneInfo already has an rtindex
field, so you just need to bms_add_member whatever that rtindex is.Hmm, not all Append/MergeAppend nodes in the plan tree may have
make_partition_pruneinfo() called on them though.
For Append/MergeAppends without run-time pruning you'll want to add
the RTIs to the minimal locking set of RTIs to go into PlannedStmt.
The only things you want to leave out of that are RTIs for the RTEs
that you might run-time prune away during AcquireExecutorLocks().
David
On Fri, Apr 1, 2022 at 1:08 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Fri, 1 Apr 2022 at 16:09, Amit Langote <amitlangote09@gmail.com> wrote:
definition of PlannedStmt says this:
/* ----------------
* PlannedStmt node
*
* The output of the plannerWith the ideas that you've outlined below, perhaps we can frame most
of the things that the patch wants to do as the planner and the
plancache changes. If we twist the above definition a bit to say what
the plancache does in this regard is part of planning, maybe it makes
sense to add the initial pruning related fields (nodes, outputs) into
PlannedStmt.How about the PartitionPruneInfos go into PlannedStmt as a List
indexed in the way I mentioned and the cache of the results of pruning
in EState?I think that leaves you adding List *partpruneinfos, Bitmapset
*minimumlockrtis to PlannedStmt and the thing you have to cache the
pruning results into EState. I'm not very clear on where you should
stash the results of run-time pruning in the meantime before you can
put them in EState. You might need to invent some intermediate struct
that gets passed around that you can scribble down some details you're
going to need during execution.
Yes, the ExecLockRelsInfo node in the current patch, that first gets
added to the QueryDesc and subsequently to the EState of the query,
serves as that stashing place. Not sure if you've looked at
ExecLockRelInfo in detail in your review of the patch so far, but it
carries the initial pruning result in what are called
PlanInitPruningOutput nodes, which are stored in a list in
ExecLockRelsInfo and their offsets in the list are in turn stored in
an adjacent array that contains an element for every plan node in the
tree. If we go with a PlannedStmt.partpruneinfos list, then maybe we
don't need to have that array, because the Append/MergeAppend nodes
would be carrying those offsets by themselves.
Maybe a different name for ExecLockRelsInfo would be better?
Also, given Tom's apparent dislike for carrying that in PlannedStmt,
maybe the way I have it now is fine?
One question is whether the planner should always pay the overhead of
initializing this bitmapset? I mean it's only worthwhile if
AcquireExecutorLocks() is going to be involved, that is, the plan will
be cached and reused.Maybe the Bitmapset for the minimal locks needs to be built with
bms_add_range(NULL, 0, list_length(rtable)); then do
bms_del_members() on the relevant RTIs you find in the listed
PartitionPruneInfos. That way it's very simple and cheap to do when
there are no PartitionPruneInfos.
Ah, okay. Looking at make_partition_pruneinfo(), I think I see a way
to delete the RTIs of prunable relations -- construct a
all_matched_leaf_part_relids in parallel to allmatchedsubplans and
delete those from the initial set.
4. It's a bit disappointing to see RelOptInfo.partitioned_rels getting
revived here. Why don't you just add a partitioned_relids to
PartitionPruneInfo and just have make_partitionedrel_pruneinfo build
you a Relids of them. PartitionedRelPruneInfo already has an rtindex
field, so you just need to bms_add_member whatever that rtindex is.Hmm, not all Append/MergeAppend nodes in the plan tree may have
make_partition_pruneinfo() called on them though.For Append/MergeAppends without run-time pruning you'll want to add
the RTIs to the minimal locking set of RTIs to go into PlannedStmt.
The only things you want to leave out of that are RTIs for the RTEs
that you might run-time prune away during AcquireExecutorLocks().
Yeah, I see it now.
Thanks.
--
Amit Langote
EDB: http://www.enterprisedb.com
On Fri, Apr 1, 2022 at 12:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Langote <amitlangote09@gmail.com> writes:
On Fri, Apr 1, 2022 at 10:32 AM David Rowley <dgrowleyml@gmail.com> wrote:
1. You've changed the signature of various functions by adding
ExecLockRelsInfo *execlockrelsinfo. I'm wondering why you didn't just
put the ExecLockRelsInfo as a new field in PlannedStmt?I'm worried about that churn myself and did consider this idea, though
I couldn't shake the feeling that it's maybe wrong to put something in
PlannedStmt that the planner itself doesn't produce.PlannedStmt is part of the plan tree, which MUST be read-only to
the executor. This is not negotiable. However, there's other
places that this data could be put, such as QueryDesc.
Or for that matter, couldn't the data structure be created by
the planner? (It looks like David is proposing exactly that
further down.)
The data structure in question is for storing the results of
performing initial partition pruning on a generic plan, which the
proposes to do in plancache.c -- inside the body of
AcquireExecutorLocks()'s loop over PlannedStmts -- so, it's hard to
see it as a product of the planner. :-(
--
Amit Langote
EDB: http://www.enterprisedb.com
On Fri, 1 Apr 2022 at 19:58, Amit Langote <amitlangote09@gmail.com> wrote:
Yes, the ExecLockRelsInfo node in the current patch, that first gets
added to the QueryDesc and subsequently to the EState of the query,
serves as that stashing place. Not sure if you've looked at
ExecLockRelInfo in detail in your review of the patch so far, but it
carries the initial pruning result in what are called
PlanInitPruningOutput nodes, which are stored in a list in
ExecLockRelsInfo and their offsets in the list are in turn stored in
an adjacent array that contains an element for every plan node in the
tree. If we go with a PlannedStmt.partpruneinfos list, then maybe we
don't need to have that array, because the Append/MergeAppend nodes
would be carrying those offsets by themselves.
I saw it, just not in great detail. I saw that you had an array that
was indexed by the plan node's ID. I thought that wouldn't be so good
with large complex plans that we often get with partitioning
workloads. That's why I mentioned using another index that you store
in Append/MergeAppend that starts at 0 and increments by 1 for each
node that has a PartitionPruneInfo made for it during create_plan.
Maybe a different name for ExecLockRelsInfo would be better?
Also, given Tom's apparent dislike for carrying that in PlannedStmt,
maybe the way I have it now is fine?
I think if you change how it's indexed and the other stuff then we can
have another look. I think the patch will be much easier to review
once the ParitionPruneInfos are moved into PlannedStmt.
David
On Fri, Apr 1, 2022 at 5:20 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Fri, 1 Apr 2022 at 19:58, Amit Langote <amitlangote09@gmail.com> wrote:
Yes, the ExecLockRelsInfo node in the current patch, that first gets
added to the QueryDesc and subsequently to the EState of the query,
serves as that stashing place. Not sure if you've looked at
ExecLockRelInfo in detail in your review of the patch so far, but it
carries the initial pruning result in what are called
PlanInitPruningOutput nodes, which are stored in a list in
ExecLockRelsInfo and their offsets in the list are in turn stored in
an adjacent array that contains an element for every plan node in the
tree. If we go with a PlannedStmt.partpruneinfos list, then maybe we
don't need to have that array, because the Append/MergeAppend nodes
would be carrying those offsets by themselves.I saw it, just not in great detail. I saw that you had an array that
was indexed by the plan node's ID. I thought that wouldn't be so good
with large complex plans that we often get with partitioning
workloads. That's why I mentioned using another index that you store
in Append/MergeAppend that starts at 0 and increments by 1 for each
node that has a PartitionPruneInfo made for it during create_plan.Maybe a different name for ExecLockRelsInfo would be better?
Also, given Tom's apparent dislike for carrying that in PlannedStmt,
maybe the way I have it now is fine?I think if you change how it's indexed and the other stuff then we can
have another look. I think the patch will be much easier to review
once the ParitionPruneInfos are moved into PlannedStmt.
Will do, thanks.
--
Amit Langote
EDB: http://www.enterprisedb.com
I noticed a definitional problem in 0001 that's also a bug in some
conditions -- namely that the bitmapset "validplans" is never explicitly
initialized to NIL. In the original coding, the BMS was always returned
from somewhere; in the new code, it is passed from an uninitialized
stack variable into the new ExecInitPartitionPruning function, which
then proceeds to add new members to it without initializing it first.
Indeed that function's header comment explicitly indicates that it is
not initialized:
+ * Initial pruning can be done immediately, so it is done here if needed and
+ * the set of surviving partition subplans' indexes are added to the output
+ * parameter *initially_valid_subplans.
even though this is not fully correct, because when prunestate->do_initial_prune
is false, then the BMS *is* initialized.
I have no opinion on where to initialize it, but it needs to be done
somewhere and the comment needs to agree.
I think the names ExecCreatePartitionPruneState and
ExecInitPartitionPruning are too confusingly similar. Maybe the former
should be renamed to somehow make it clear that it is a subroutine for
the former.
At the top of the file, there's a new comment that reads:
* ExecInitPartitionPruning:
* Creates the PartitionPruneState required by each of the two pruning
* functions.
What are "the two pruning functions"? I think here you mean "Append"
and "MergeAppend". Maybe spell that out explicitly.
I think this comment needs to be reworded:
+ * Subplans would previously be indexed 0..(n_total_subplans - 1) should be
+ * changed to index range 0..num(initially_valid_subplans).
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Thanks for the review.
On Sun, Apr 3, 2022 at 8:33 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I noticed a definitional problem in 0001 that's also a bug in some
conditions -- namely that the bitmapset "validplans" is never explicitly
initialized to NIL. In the original coding, the BMS was always returned
from somewhere; in the new code, it is passed from an uninitialized
stack variable into the new ExecInitPartitionPruning function, which
then proceeds to add new members to it without initializing it first.
Hmm, the following blocks in ExecInitPartitionPruning() define
*initially_valid_subplans:
/*
* Perform an initial partition prune pass, if required.
*/
if (prunestate->do_initial_prune)
{
/* Determine which subplans survive initial pruning */
*initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate);
}
else
{
/* We'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
}
AFAICS, both assign *initially_valid_subplans a value whose
computation is not dependent on reading it first, so I don't see a
problem.
Am I missing something?
Indeed that function's header comment explicitly indicates that it is
not initialized:+ * Initial pruning can be done immediately, so it is done here if needed and + * the set of surviving partition subplans' indexes are added to the output + * parameter *initially_valid_subplans.even though this is not fully correct, because when prunestate->do_initial_prune
is false, then the BMS *is* initialized.I have no opinion on where to initialize it, but it needs to be done
somewhere and the comment needs to agree.
I can see that the comment is insufficient, so I've expanded it as follows:
- * Initial pruning can be done immediately, so it is done here if needed and
- * the set of surviving partition subplans' indexes are added to the output
- * parameter *initially_valid_subplans.
+ * On return, *initially_valid_subplans is assigned the set of indexes of
+ * child subplans that must be initialized along with the parent plan node.
+ * Initial pruning is performed here if needed and in that case only the
+ * surviving subplans' indexes are added.
I think the names ExecCreatePartitionPruneState and
ExecInitPartitionPruning are too confusingly similar. Maybe the former
should be renamed to somehow make it clear that it is a subroutine for
the former.
Ah, yes. I've taken out the "Exec" from the former.
At the top of the file, there's a new comment that reads:
* ExecInitPartitionPruning:
* Creates the PartitionPruneState required by each of the two pruning
* functions.What are "the two pruning functions"? I think here you mean "Append"
and "MergeAppend". Maybe spell that out explicitly.
Actually it meant: ExecFindInitiaMatchingSubPlans() and
ExecFindMatchingSubPlans(). They perform "initial" and "exec" set of
pruning steps, respectively.
I realized that both functions have identical bodies at this point,
except that they pass 'true' and 'false', respectively, for
initial_prune argument of the sub-routine
find_matching_subplans_recurse(), which is where the pruning using the
appropriate set of steps contained in PartitionPruneState
(initial_pruning_steps or exec_pruning_steps) actually occurs. So,
I've updated the patch to just retain the latter, adding an
initial_prune parameter to it to pass to the aforementioned
find_matching_subplans_recurse().
I've also updated the run-time pruning module comment to describe this change:
* ExecFindMatchingSubPlans:
- * Returns indexes of matching subplans after evaluating all available
- * expressions, that is, using execution pruning steps. This function can
- * can only be called during execution and must be called again each time
- * the value of a Param listed in PartitionPruneState's 'execparamids'
- * changes.
+ * Returns indexes of matching subplans after evaluating the expressions
+ * that are safe to evaluate at a given point. This function is first
+ * called during ExecInitPartitionPruning() to find the initially
+ * matching subplans based on performing the initial pruning steps and
+ * then must be called again each time the value of a Param listed in
+ * PartitionPruneState's 'execparamids' changes.
I think this comment needs to be reworded:
+ * Subplans would previously be indexed 0..(n_total_subplans - 1) should be + * changed to index range 0..num(initially_valid_subplans).
Assuming you meant to ask to write this without the odd notation, I've
expanded the comment as follows:
- * Subplans would previously be indexed 0..(n_total_subplans - 1) should be
- * changed to index range 0..num(initially_valid_subplans).
+ * Current values of the indexes present in PartitionPruneState count all the
+ * subplans that would be present before initial pruning was done. If initial
+ * pruning got rid of some of the subplans, any subsequent pruning passes will
+ * will be looking at a different set of target subplans to choose from than
+ * those in the pre-initial-pruning set, so the maps in PartitionPruneState
+ * containing those indexes must be updated to reflect the new indexes of
+ * subplans in the post-initial-pruning set.
I've attached only the updated 0001, though I'm still working on the
others to address David's comments.
--
Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v9-0001-Some-refactoring-of-runtime-pruning-code.patchapplication/octet-stream; name=v9-0001-Some-refactoring-of-runtime-pruning-code.patchDownload
From d28e281935e97065d14c70b34f903c385c539a66 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 2 Mar 2022 15:17:55 +0900
Subject: [PATCH v9] Some refactoring of runtime pruning code
* Move the execution pruning initialization steps that are common
between both ExecInitAppend() and ExecInitMergeAppend() into a new
function ExecInitPartitionPruning() defined in execPartition.c.
Those steps include creation of a PartitionPruneState to be used for
all instances of pruning and determining the minimal set of child
subplans that need to be initialized by performing initial pruning if
needed, and finally adjusting the subplan_map arrays in the
PartitionPruneState to reflect the new set of subplans remaining
after initial pruning if it was indeed performed.
ExecCreatePartitionPruneState() is no longer exported out of
execPartition.c and has been renamed to CreatePartitionState()
as a local sub-routine of ExecInitPartitionPruning().
* Likewise, ExecFindInitialMatchingSubPlans() that was in the charge
of performing initial pruning no longer needs to be exported. In
fact, since it would now have the same body as the more generally
named ExecFindMatchingSubPlans(), except differing in the value of
the initial_prune passed to the common subroutine
find_matching_subplans_recurse(), it seems better to just have
ExecFindMatchingSubPlans() with an initial_prune argument.
* Add an ExprContext field to PartitionPruneContext to remove the
implicit assumption in the runtime pruning code that the ExprContext
to use to compute pruning expressions that need one can always rely
on the PlanState providing it. A future patch will allow runtime
pruning (at least the initial pruning steps) to be performed without
the corresponding PlanState yet having been created, so this will
help.
---
src/backend/executor/execPartition.c | 396 +++++++++++++------------
src/backend/executor/nodeAppend.c | 41 +--
src/backend/executor/nodeMergeAppend.c | 34 +--
src/backend/partitioning/partprune.c | 20 +-
src/include/executor/execPartition.h | 12 +-
src/include/partitioning/partprune.h | 2 +
6 files changed, 260 insertions(+), 245 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index aca42ca5b8..83451cf654 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -184,11 +184,17 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
+static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *partitionpruneinfo);
static void ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate);
+ PlanState *planstate,
+ ExprContext *econtext);
+static void PartitionPruneStateFixSubPlanMap(PartitionPruneState *prunestate,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans);
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
@@ -1590,34 +1596,91 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
- * ExecCreatePartitionPruneState:
- * Creates the PartitionPruneState required by each of the two pruning
- * functions. Details stored include how to map the partition index
- * returned by the partition pruning code into subplan indexes.
- *
- * ExecFindInitialMatchingSubPlans:
- * Returns indexes of matching subplans. Partition pruning is attempted
- * without any evaluation of expressions containing PARAM_EXEC Params.
- * This function must be called during executor startup for the parent
- * plan before the subplans themselves are initialized. Subplans which
- * are found not to match by this function must be removed from the
- * plan's list of subplans during execution, as this function performs a
- * remap of the partition index to subplan index map and the newly
- * created map provides indexes only for subplans which remain after
- * calling this function.
+ * ExecInitPartitionPruning:
+ * Creates the PartitionPruneState required by ExecFindMatchingSubPlans.
+ * Details stored include how to map the partition index returned by the
+ * partition pruning code into subplan indexes. Also determines the set
+ * of initially valid subplans by performing initial pruning steps, only
+ * which need be initialized by the caller such as ExecInitAppend. Maps
+ * in PartitionPruneState are updated to account for initial pruning
+ * having eliminated some of the subplans, if any.
*
* ExecFindMatchingSubPlans:
- * Returns indexes of matching subplans after evaluating all available
- * expressions. This function can only be called during execution and
- * must be called again each time the value of a Param listed in
+ * Returns indexes of matching subplans after evaluating the expressions
+ * that are safe to evaluate at a given point. This function is first
+ * called during ExecInitPartitionPruning() to find the initially
+ * matching subplans based on performing the initial pruning steps and
+ * then must be called again each time the value of a Param listed in
* PartitionPruneState's 'execparamids' changes.
*-------------------------------------------------------------------------
*/
/*
- * ExecCreatePartitionPruneState
- * Build the data structure required for calling
- * ExecFindInitialMatchingSubPlans and ExecFindMatchingSubPlans.
+ * ExecInitPartitionPruning
+ * Initialize data structure needed for run-time partition pruning and
+ * do initial pruning if needed
+ *
+ * On return, *initially_valid_subplans is assigned the set of indexes of
+ * child subplans that must be initialized along with the parent plan node.
+ * Initial pruning is performed here if needed and in that case only the
+ * surviving subplans' indexes are added.
+ *
+ * If subplans are indeed pruned, subplan_map arrays contained in the returned
+ * PartitionPruneState are re-sequenced to not count those, though only if the
+ * maps will be needed for subsequent execution pruning passes.
+ */
+PartitionPruneState *
+ExecInitPartitionPruning(PlanState *planstate,
+ int n_total_subplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **initially_valid_subplans)
+{
+ PartitionPruneState *prunestate;
+ EState *estate = planstate->state;
+
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /*
+ * Create the working data structure for pruning.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+
+ /*
+ * Perform an initial partition prune pass, if required.
+ */
+ if (prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ }
+ else
+ {
+ /* No pruning, so we'll need to initialize all subplans */
+ Assert(n_total_subplans > 0);
+ *initially_valid_subplans = bms_add_range(NULL, 0,
+ n_total_subplans - 1);
+ }
+
+ /*
+ * Re-sequence subplan indexes contained in prunestate to account for any
+ * that were removed above due to initial pruning.
+ *
+ * We can safely skip this when !do_exec_prune, even though that leaves
+ * invalid data in prunestate, because that data won't be consulted again
+ * (cf initial Assert in ExecFindMatchingSubPlans).
+ */
+ if (prunestate->do_exec_prune &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ PartitionPruneStateFixSubPlanMap(prunestate,
+ *initially_valid_subplans,
+ n_total_subplans);
+
+ return prunestate;
+}
+
+/*
+ * CreatePartitionPruneState
+ * Build the data structure required for calling ExecFindMatchingSubPlans
*
* 'planstate' is the parent plan node's execution state.
*
@@ -1632,8 +1695,8 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* re-used each time we re-evaluate which partitions match the pruning steps
* provided in each PartitionedRelPruneInfo.
*/
-PartitionPruneState *
-ExecCreatePartitionPruneState(PlanState *planstate,
+static PartitionPruneState *
+CreatePartitionPruneState(PlanState *planstate,
PartitionPruneInfo *partitionpruneinfo)
{
EState *estate = planstate->state;
@@ -1641,6 +1704,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
int n_part_hierarchies;
ListCell *lc;
int i;
+ ExprContext *econtext = planstate->ps_ExprContext;
/* For data reading, executor always omits detached partitions */
if (estate->es_partition_directory == NULL)
@@ -1814,7 +1878,8 @@ ExecCreatePartitionPruneState(PlanState *planstate,
{
ExecInitPruningContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether initial pruning is needed at any level */
prunestate->do_initial_prune = true;
}
@@ -1823,7 +1888,8 @@ ExecCreatePartitionPruneState(PlanState *planstate,
{
ExecInitPruningContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether exec pruning is needed at any level */
prunestate->do_exec_prune = true;
}
@@ -1851,7 +1917,8 @@ ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate)
+ PlanState *planstate,
+ ExprContext *econtext)
{
int n_steps;
int partnatts;
@@ -1872,6 +1939,7 @@ ExecInitPruningContext(PartitionPruneContext *context,
context->ppccontext = CurrentMemoryContext;
context->planstate = planstate;
+ context->exprcontext = econtext;
/* Initialize expression state for each expression we need */
context->exprstates = (ExprState **)
@@ -1900,8 +1968,20 @@ ExecInitPruningContext(PartitionPruneContext *context,
step->step.step_id,
keyno);
- context->exprstates[stateidx] =
- ExecInitExpr(expr, context->planstate);
+ /*
+ * When planstate is NULL, pruning_steps is known not to
+ * contain any expressions that depend on the parent plan.
+ * Information of any available EXTERN parameters must be
+ * passed explicitly in that case, which the caller must
+ * have made available via econtext.
+ */
+ if (planstate == NULL)
+ context->exprstates[stateidx] =
+ ExecInitExprWithParams(expr,
+ econtext->ecxt_param_list_info);
+ else
+ context->exprstates[stateidx] =
+ ExecInitExpr(expr, context->planstate);
}
keyno++;
}
@@ -1909,179 +1989,121 @@ ExecInitPruningContext(PartitionPruneContext *context,
}
/*
- * ExecFindInitialMatchingSubPlans
- * Identify the set of subplans that cannot be eliminated by initial
- * pruning, disregarding any pruning constraints involving PARAM_EXEC
- * Params.
- *
- * If additional pruning passes will be required (because of PARAM_EXEC
- * Params), we must also update the translation data that allows conversion
- * of partition indexes into subplan indexes to account for the unneeded
- * subplans having been removed.
- *
- * Must only be called once per 'prunestate', and only if initial pruning
- * is required.
+ * PartitionPruneStateFixSubPlanMap
+ * Fix mapping of partition indexes to subplan indexes contained in
+ * prunestate by considering the new list of subplans that survived
+ * initial pruning
*
- * 'nsubplans' must be passed as the total number of unpruned subplans.
+ * Current values of the indexes present in PartitionPruneState count all the
+ * subplans that would be present before initial pruning was done. If initial
+ * pruning got rid of some of the subplans, any subsequent pruning passes will
+ * will be looking at a different set of target subplans to choose from than
+ * those in the pre-initial-pruning set, so the maps in PartitionPruneState
+ * containing those indexes must be updated to reflect the new indexes of
+ * subplans in the post-initial-pruning set.
*/
-Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
+static void
+PartitionPruneStateFixSubPlanMap(PartitionPruneState *prunestate,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans)
{
- Bitmapset *result = NULL;
- MemoryContext oldcontext;
+ int *new_subplan_indexes;
+ Bitmapset *new_other_subplans;
int i;
-
- /* Caller error if we get here without do_initial_prune */
- Assert(prunestate->do_initial_prune);
+ int newidx;
/*
- * Switch to a temp context to avoid leaking memory in the executor's
- * query-lifespan memory context.
+ * First we must build a temporary array which maps old subplan
+ * indexes to new ones. For convenience of initialization, we use
+ * 1-based indexes in this array and leave pruned items as 0.
*/
- oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
-
- /*
- * For each hierarchy, do the pruning tests, and add nondeletable
- * subplans' indexes to "result".
- */
- for (i = 0; i < prunestate->num_partprunedata; i++)
+ new_subplan_indexes = (int *) palloc0(sizeof(int) * n_total_subplans);
+ newidx = 1;
+ i = -1;
+ while ((i = bms_next_member(initially_valid_subplans, i)) >= 0)
{
- PartitionPruningData *prunedata;
- PartitionedRelPruningData *pprune;
-
- prunedata = prunestate->partprunedata[i];
- pprune = &prunedata->partrelprunedata[0];
-
- /* Perform pruning without using PARAM_EXEC Params */
- find_matching_subplans_recurse(prunedata, pprune, true, &result);
-
- /* Expression eval may have used space in node's ps_ExprContext too */
- if (pprune->initial_pruning_steps)
- ResetExprContext(pprune->initial_context.planstate->ps_ExprContext);
+ Assert(i < n_total_subplans);
+ new_subplan_indexes[i] = newidx++;
}
- /* Add in any subplans that partition pruning didn't account for */
- result = bms_add_members(result, prunestate->other_subplans);
-
- MemoryContextSwitchTo(oldcontext);
-
- /* Copy result out of the temp context before we reset it */
- result = bms_copy(result);
-
- MemoryContextReset(prunestate->prune_context);
-
/*
- * If exec-time pruning is required and we pruned subplans above, then we
- * must re-sequence the subplan indexes so that ExecFindMatchingSubPlans
- * properly returns the indexes from the subplans which will remain after
- * execution of this function.
- *
- * We can safely skip this when !do_exec_prune, even though that leaves
- * invalid data in prunestate, because that data won't be consulted again
- * (cf initial Assert in ExecFindMatchingSubPlans).
+ * Now we can update each PartitionedRelPruneInfo's subplan_map with
+ * new subplan indexes. We must also recompute its present_parts
+ * bitmap.
*/
- if (prunestate->do_exec_prune && bms_num_members(result) < nsubplans)
+ for (i = 0; i < prunestate->num_partprunedata; i++)
{
- int *new_subplan_indexes;
- Bitmapset *new_other_subplans;
- int i;
- int newidx;
+ PartitionPruningData *prunedata = prunestate->partprunedata[i];
+ int j;
/*
- * First we must build a temporary array which maps old subplan
- * indexes to new ones. For convenience of initialization, we use
- * 1-based indexes in this array and leave pruned items as 0.
+ * Within each hierarchy, we perform this loop in back-to-front
+ * order so that we determine present_parts for the lowest-level
+ * partitioned tables first. This way we can tell whether a
+ * sub-partitioned table's partitions were entirely pruned so we
+ * can exclude it from the current level's present_parts.
*/
- new_subplan_indexes = (int *) palloc0(sizeof(int) * nsubplans);
- newidx = 1;
- i = -1;
- while ((i = bms_next_member(result, i)) >= 0)
+ for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
{
- Assert(i < nsubplans);
- new_subplan_indexes[i] = newidx++;
- }
+ PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
+ int nparts = pprune->nparts;
+ int k;
- /*
- * Now we can update each PartitionedRelPruneInfo's subplan_map with
- * new subplan indexes. We must also recompute its present_parts
- * bitmap.
- */
- for (i = 0; i < prunestate->num_partprunedata; i++)
- {
- PartitionPruningData *prunedata = prunestate->partprunedata[i];
- int j;
+ /* We just rebuild present_parts from scratch */
+ bms_free(pprune->present_parts);
+ pprune->present_parts = NULL;
- /*
- * Within each hierarchy, we perform this loop in back-to-front
- * order so that we determine present_parts for the lowest-level
- * partitioned tables first. This way we can tell whether a
- * sub-partitioned table's partitions were entirely pruned so we
- * can exclude it from the current level's present_parts.
- */
- for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
+ for (k = 0; k < nparts; k++)
{
- PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
- int nparts = pprune->nparts;
- int k;
-
- /* We just rebuild present_parts from scratch */
- bms_free(pprune->present_parts);
- pprune->present_parts = NULL;
+ int oldidx = pprune->subplan_map[k];
+ int subidx;
- for (k = 0; k < nparts; k++)
+ /*
+ * If this partition existed as a subplan then change the
+ * old subplan index to the new subplan index. The new
+ * index may become -1 if the partition was pruned above,
+ * or it may just come earlier in the subplan list due to
+ * some subplans being removed earlier in the list. If
+ * it's a subpartition, add it to present_parts unless
+ * it's entirely pruned.
+ */
+ if (oldidx >= 0)
{
- int oldidx = pprune->subplan_map[k];
- int subidx;
+ Assert(oldidx < n_total_subplans);
+ pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
- /*
- * If this partition existed as a subplan then change the
- * old subplan index to the new subplan index. The new
- * index may become -1 if the partition was pruned above,
- * or it may just come earlier in the subplan list due to
- * some subplans being removed earlier in the list. If
- * it's a subpartition, add it to present_parts unless
- * it's entirely pruned.
- */
- if (oldidx >= 0)
- {
- Assert(oldidx < nsubplans);
- pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
-
- if (new_subplan_indexes[oldidx] > 0)
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
- }
- else if ((subidx = pprune->subpart_map[k]) >= 0)
- {
- PartitionedRelPruningData *subprune;
+ if (new_subplan_indexes[oldidx] > 0)
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
+ }
+ else if ((subidx = pprune->subpart_map[k]) >= 0)
+ {
+ PartitionedRelPruningData *subprune;
- subprune = &prunedata->partrelprunedata[subidx];
+ subprune = &prunedata->partrelprunedata[subidx];
- if (!bms_is_empty(subprune->present_parts))
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
- }
+ if (!bms_is_empty(subprune->present_parts))
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
}
}
}
+ }
- /*
- * We must also recompute the other_subplans set, since indexes in it
- * may change.
- */
- new_other_subplans = NULL;
- i = -1;
- while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
- new_other_subplans = bms_add_member(new_other_subplans,
- new_subplan_indexes[i] - 1);
-
- bms_free(prunestate->other_subplans);
- prunestate->other_subplans = new_other_subplans;
+ /*
+ * We must also recompute the other_subplans set, since indexes in it
+ * may change.
+ */
+ new_other_subplans = NULL;
+ i = -1;
+ while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
+ new_other_subplans = bms_add_member(new_other_subplans,
+ new_subplan_indexes[i] - 1);
- pfree(new_subplan_indexes);
- }
+ bms_free(prunestate->other_subplans);
+ prunestate->other_subplans = new_other_subplans;
- return result;
+ pfree(new_subplan_indexes);
}
/*
@@ -2089,21 +2111,26 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
* Determine which subplans match the pruning steps detailed in
* 'prunestate' for the current comparison expression values.
*
- * Here we assume we may evaluate PARAM_EXEC Params.
+ * For example, if initial_prune is true, the caller is telling us that only
+ * those pruning steps that are known to not contain any expressions involving
+ * PARAM_EXEC Params are safe to evaluate at this point. Whereas when it's
+ * false, it is telling that PARAM_EXEC Params can be safely evaluated and so
+ * also the pruning steps that contain them.
*/
Bitmapset *
-ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
+ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
+ bool initial_prune)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
int i;
/*
- * If !do_exec_prune, we've got problems because
- * ExecFindInitialMatchingSubPlans will not have bothered to update
- * prunestate for whatever pruning it did.
+ * Only get here if prunestate->do_exec_prune, because otherwise
+ * ExecInitPartitionPruning() would not have bothered to update prunestate
+ * to account for the subplans removed by initial pruning.
*/
- Assert(prunestate->do_exec_prune);
+ Assert(prunestate->do_exec_prune || initial_prune);
/*
* Switch to a temp context to avoid leaking memory in the executor's
@@ -2123,11 +2150,17 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
prunedata = prunestate->partprunedata[i];
pprune = &prunedata->partrelprunedata[0];
- find_matching_subplans_recurse(prunedata, pprune, false, &result);
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
+ find_matching_subplans_recurse(prunedata, pprune, initial_prune,
+ &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
- ResetExprContext(pprune->exec_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->exec_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
@@ -2145,8 +2178,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
/*
* find_matching_subplans_recurse
- * Recursive worker function for ExecFindMatchingSubPlans and
- * ExecFindInitialMatchingSubPlans
+ * Recursive worker function for ExecFindMatchingSubPlans
*
* Adds valid (non-prunable) subplan IDs to *validsubplans
*/
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 7937f1c88f..357e10a1d7 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -138,30 +138,17 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
{
PartitionPruneState *prunestate;
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &appendstate->ps);
-
- /* Create the working data structure for pruning. */
- prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
- node->part_prune_info);
+ /*
+ * Set up pruning data structure. This also initializes the set of
+ * subplans to initialize (validsubplans) by taking into account the
+ * result of performing initial pruning if any.
+ */
+ prunestate = ExecInitPartitionPruning(&appendstate->ps,
+ list_length(node->appendplans),
+ node->part_prune_info,
+ &validsubplans);
appendstate->as_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
- {
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->appendplans));
-
- nplans = bms_num_members(validsubplans);
- }
- else
- {
- /* We'll need to initialize all subplans */
- nplans = list_length(node->appendplans);
- Assert(nplans > 0);
- validsubplans = bms_add_range(NULL, 0, nplans - 1);
- }
+ nplans = bms_num_members(validsubplans);
/*
* When no run-time pruning is required and there's at least one
@@ -590,7 +577,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state);
+ ExecFindMatchingSubPlans(node->as_prune_state, false);
whichplan = -1;
}
@@ -655,7 +642,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state);
+ ExecFindMatchingSubPlans(node->as_prune_state, false);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -730,7 +717,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state);
+ ExecFindMatchingSubPlans(node->as_prune_state, false);
mark_invalid_subplans_as_finished(node);
}
@@ -881,7 +868,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state);
+ ExecFindMatchingSubPlans(node->as_prune_state, false);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 418f89dea8..ecf9052e03 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -86,29 +86,17 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
{
PartitionPruneState *prunestate;
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &mergestate->ps);
-
- prunestate = ExecCreatePartitionPruneState(&mergestate->ps,
- node->part_prune_info);
+ /*
+ * Set up pruning data structure. This also initializes the set of
+ * subplans to initialize (validsubplans) by taking into account the
+ * result of performing initial pruning if any.
+ */
+ prunestate = ExecInitPartitionPruning(&mergestate->ps,
+ list_length(node->mergeplans),
+ node->part_prune_info,
+ &validsubplans);
mergestate->ms_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
- {
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->mergeplans));
-
- nplans = bms_num_members(validsubplans);
- }
- else
- {
- /* We'll need to initialize all subplans */
- nplans = list_length(node->mergeplans);
- Assert(nplans > 0);
- validsubplans = bms_add_range(NULL, 0, nplans - 1);
- }
+ nplans = bms_num_members(validsubplans);
/*
* When no run-time pruning is required and there's at least one
@@ -230,7 +218,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 1bc00826c1..7080cb25d9 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -798,6 +798,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
/* These are not valid when being called from the planner */
context.planstate = NULL;
+ context.exprcontext = NULL;
context.exprstates = NULL;
/* Actual pruning happens here. */
@@ -808,8 +809,8 @@ prune_append_rel_partitions(RelOptInfo *rel)
* get_matching_partitions
* Determine partitions that survive partition pruning
*
- * Note: context->planstate must be set to a valid PlanState when the
- * pruning_steps were generated with a target other than PARTTARGET_PLANNER.
+ * Note: context->exprcontext must be valid when the pruning_steps were
+ * generated with a target other than PARTTARGET_PLANNER.
*
* Returns a Bitmapset of the RelOptInfo->part_rels indexes of the surviving
* partitions.
@@ -3654,7 +3655,7 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
* exprstate array.
*
* Note that the evaluated result may be in the per-tuple memory context of
- * context->planstate->ps_ExprContext, and we may have leaked other memory
+ * context->exprcontext, and we may have leaked other memory
* there too. This memory must be recovered by resetting that ExprContext
* after we're done with the pruning operation (see execPartition.c).
*/
@@ -3677,13 +3678,18 @@ partkey_datum_from_expr(PartitionPruneContext *context,
ExprContext *ectx;
/*
- * We should never see a non-Const in a step unless we're running in
- * the executor.
+ * We should never see a non-Const in a step unless the caller has
+ * passed a valid ExprContext.
+ *
+ * When context->planstate is valid, context->exprcontext is same
+ * as context->planstate->ps_ExprContext.
*/
- Assert(context->planstate != NULL);
+ Assert(context->planstate != NULL || context->exprcontext != NULL);
+ Assert(context->planstate == NULL ||
+ (context->exprcontext == context->planstate->ps_ExprContext));
exprstate = context->exprstates[stateidx];
- ectx = context->planstate->ps_ExprContext;
+ ectx = context->exprcontext;
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 603d8becc4..4c706c11b9 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -119,10 +119,10 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
EState *estate);
extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
PartitionTupleRouting *proute);
-extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
-extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
-extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
- int nsubplans);
-
+extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
+ int n_total_subplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **initially_valid_subplans);
+extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
+ bool initial_prune);
#endif /* EXECPARTITION_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index ee11b6feae..90684efa25 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -41,6 +41,7 @@ struct RelOptInfo;
* subsidiary data, such as the FmgrInfos.
* planstate Points to the parent plan node's PlanState when called
* during execution; NULL when called from the planner.
+ * exprcontext ExprContext to use when evaluating pruning expressions
* exprstates Array of ExprStates, indexed as per PruneCxtStateIdx; one
* for each partition key in each pruning step. Allocated if
* planstate is non-NULL, otherwise NULL.
@@ -56,6 +57,7 @@ typedef struct PartitionPruneContext
FmgrInfo *stepcmpfuncs;
MemoryContext ppccontext;
PlanState *planstate;
+ ExprContext *exprcontext;
ExprState **exprstates;
} PartitionPruneContext;
--
2.24.1
On Mon, Apr 4, 2022 at 9:55 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Sun, Apr 3, 2022 at 8:33 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I think the names ExecCreatePartitionPruneState and
ExecInitPartitionPruning are too confusingly similar. Maybe the former
should be renamed to somehow make it clear that it is a subroutine for
the former.Ah, yes. I've taken out the "Exec" from the former.
While at it, maybe it's better to rename ExecInitPruningContext() to
InitPartitionPruneContext(), which I've done in the attached updated
patch.
--
Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v10-0001-Some-refactoring-of-runtime-pruning-code.patchapplication/octet-stream; name=v10-0001-Some-refactoring-of-runtime-pruning-code.patchDownload
From 7568b90570f27dc5efd8a6923854cb7aa6b4045f Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 2 Mar 2022 15:17:55 +0900
Subject: [PATCH v10] Some refactoring of runtime pruning code
* Move the execution pruning initialization steps that are common
between both ExecInitAppend() and ExecInitMergeAppend() into a new
function ExecInitPartitionPruning() defined in execPartition.c.
Those steps include creation of a PartitionPruneState to be used for
all instances of pruning and determining the minimal set of child
subplans that need to be initialized by performing initial pruning if
needed, and finally adjusting the subplan_map arrays in the
PartitionPruneState to reflect the new set of subplans remaining
after initial pruning if it was indeed performed.
* ExecCreatePartitionPruneState() is no longer exported out of
execPartition.c and has been renamed to CreatePartitionState()
as a local subroutine of ExecInitPartitionPruning(). Also, its
subroutine ExecInitPruningContext() renamed to
InitPartitionPruneContext() for consistency.
* ExecFindInitialMatchingSubPlans() that was in the charge
of performing initial pruning no longer needs to be exported. In
fact, since it would now have the same body as the more generally
named ExecFindMatchingSubPlans(), except differing in the value of
the initial_prune passed to the common subroutine
find_matching_subplans_recurse(), it seems better to just have
ExecFindMatchingSubPlans() with an initial_prune argument.
* Add an ExprContext field to PartitionPruneContext to remove the
implicit assumption in the runtime pruning code that the ExprContext
to use to compute pruning expressions that need one can always rely
on the PlanState providing it. A future patch will allow runtime
pruning (at least the initial pruning steps) to be performed without
the corresponding PlanState yet having been created, so this will
help.
---
src/backend/executor/execPartition.c | 420 +++++++++++++------------
src/backend/executor/nodeAppend.c | 41 +--
src/backend/executor/nodeMergeAppend.c | 34 +-
src/backend/partitioning/partprune.c | 20 +-
src/include/executor/execPartition.h | 12 +-
src/include/partitioning/partprune.h | 2 +
6 files changed, 272 insertions(+), 257 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index aca42ca5b8..27ca869d7c 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -184,11 +184,17 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
-static void ExecInitPruningContext(PartitionPruneContext *context,
- List *pruning_steps,
- PartitionDesc partdesc,
- PartitionKey partkey,
- PlanState *planstate);
+static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *partitionpruneinfo);
+static void InitPartitionPruneContext(PartitionPruneContext *context,
+ List *pruning_steps,
+ PartitionDesc partdesc,
+ PartitionKey partkey,
+ PlanState *planstate,
+ ExprContext *econtext);
+static void PartitionPruneStateFixSubPlanMap(PartitionPruneState *prunestate,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans);
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
@@ -1590,34 +1596,91 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
- * ExecCreatePartitionPruneState:
- * Creates the PartitionPruneState required by each of the two pruning
- * functions. Details stored include how to map the partition index
- * returned by the partition pruning code into subplan indexes.
- *
- * ExecFindInitialMatchingSubPlans:
- * Returns indexes of matching subplans. Partition pruning is attempted
- * without any evaluation of expressions containing PARAM_EXEC Params.
- * This function must be called during executor startup for the parent
- * plan before the subplans themselves are initialized. Subplans which
- * are found not to match by this function must be removed from the
- * plan's list of subplans during execution, as this function performs a
- * remap of the partition index to subplan index map and the newly
- * created map provides indexes only for subplans which remain after
- * calling this function.
+ * ExecInitPartitionPruning:
+ * Creates the PartitionPruneState required by ExecFindMatchingSubPlans.
+ * Details stored include how to map the partition index returned by the
+ * partition pruning code into subplan indexes. Also determines the set
+ * of subplans to initialize considering the result of performing initial
+ * pruning steps if any. Maps in PartitionPruneState are updated to
+ * account for initial pruning possibly having eliminated some of the
+ * subplans.
*
* ExecFindMatchingSubPlans:
- * Returns indexes of matching subplans after evaluating all available
- * expressions. This function can only be called during execution and
- * must be called again each time the value of a Param listed in
+ * Returns indexes of matching subplans after evaluating the expressions
+ * that are safe to evaluate at a given point. This function is first
+ * called during ExecInitPartitionPruning() to find the initially
+ * matching subplans based on performing the initial pruning steps and
+ * then must be called again each time the value of a Param listed in
* PartitionPruneState's 'execparamids' changes.
*-------------------------------------------------------------------------
*/
/*
- * ExecCreatePartitionPruneState
- * Build the data structure required for calling
- * ExecFindInitialMatchingSubPlans and ExecFindMatchingSubPlans.
+ * ExecInitPartitionPruning
+ * Initialize data structure needed for run-time partition pruning and
+ * do initial pruning if needed
+ *
+ * On return, *initially_valid_subplans is assigned the set of indexes of
+ * child subplans that must be initialized along with the parent plan node.
+ * Initial pruning is performed here if needed and in that case only the
+ * surviving subplans' indexes are added.
+ *
+ * If subplans are indeed pruned, subplan_map arrays contained in the returned
+ * PartitionPruneState are re-sequenced to not count those, though only if the
+ * maps will be needed for subsequent execution pruning passes.
+ */
+PartitionPruneState *
+ExecInitPartitionPruning(PlanState *planstate,
+ int n_total_subplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **initially_valid_subplans)
+{
+ PartitionPruneState *prunestate;
+ EState *estate = planstate->state;
+
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /*
+ * Create the working data structure for pruning.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+
+ /*
+ * Perform an initial partition prune pass, if required.
+ */
+ if (prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ }
+ else
+ {
+ /* No pruning, so we'll need to initialize all subplans */
+ Assert(n_total_subplans > 0);
+ *initially_valid_subplans = bms_add_range(NULL, 0,
+ n_total_subplans - 1);
+ }
+
+ /*
+ * Re-sequence subplan indexes contained in prunestate to account for any
+ * that were removed above due to initial pruning.
+ *
+ * We can safely skip this when !do_exec_prune, even though that leaves
+ * invalid data in prunestate, because that data won't be consulted again
+ * (cf initial Assert in ExecFindMatchingSubPlans).
+ */
+ if (prunestate->do_exec_prune &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ PartitionPruneStateFixSubPlanMap(prunestate,
+ *initially_valid_subplans,
+ n_total_subplans);
+
+ return prunestate;
+}
+
+/*
+ * CreatePartitionPruneState
+ * Build the data structure required for calling ExecFindMatchingSubPlans
*
* 'planstate' is the parent plan node's execution state.
*
@@ -1632,8 +1695,8 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* re-used each time we re-evaluate which partitions match the pruning steps
* provided in each PartitionedRelPruneInfo.
*/
-PartitionPruneState *
-ExecCreatePartitionPruneState(PlanState *planstate,
+static PartitionPruneState *
+CreatePartitionPruneState(PlanState *planstate,
PartitionPruneInfo *partitionpruneinfo)
{
EState *estate = planstate->state;
@@ -1641,6 +1704,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
int n_part_hierarchies;
ListCell *lc;
int i;
+ ExprContext *econtext = planstate->ps_ExprContext;
/* For data reading, executor always omits detached partitions */
if (estate->es_partition_directory == NULL)
@@ -1812,18 +1876,20 @@ ExecCreatePartitionPruneState(PlanState *planstate,
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
if (pinfo->initial_pruning_steps)
{
- ExecInitPruningContext(&pprune->initial_context,
- pinfo->initial_pruning_steps,
- partdesc, partkey, planstate);
+ InitPartitionPruneContext(&pprune->initial_context,
+ pinfo->initial_pruning_steps,
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether initial pruning is needed at any level */
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
if (pinfo->exec_pruning_steps)
{
- ExecInitPruningContext(&pprune->exec_context,
- pinfo->exec_pruning_steps,
- partdesc, partkey, planstate);
+ InitPartitionPruneContext(&pprune->exec_context,
+ pinfo->exec_pruning_steps,
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether exec pruning is needed at any level */
prunestate->do_exec_prune = true;
}
@@ -1847,11 +1913,12 @@ ExecCreatePartitionPruneState(PlanState *planstate,
* Initialize a PartitionPruneContext for the given list of pruning steps.
*/
static void
-ExecInitPruningContext(PartitionPruneContext *context,
- List *pruning_steps,
- PartitionDesc partdesc,
- PartitionKey partkey,
- PlanState *planstate)
+InitPartitionPruneContext(PartitionPruneContext *context,
+ List *pruning_steps,
+ PartitionDesc partdesc,
+ PartitionKey partkey,
+ PlanState *planstate,
+ ExprContext *econtext)
{
int n_steps;
int partnatts;
@@ -1872,6 +1939,7 @@ ExecInitPruningContext(PartitionPruneContext *context,
context->ppccontext = CurrentMemoryContext;
context->planstate = planstate;
+ context->exprcontext = econtext;
/* Initialize expression state for each expression we need */
context->exprstates = (ExprState **)
@@ -1900,8 +1968,20 @@ ExecInitPruningContext(PartitionPruneContext *context,
step->step.step_id,
keyno);
- context->exprstates[stateidx] =
- ExecInitExpr(expr, context->planstate);
+ /*
+ * When planstate is NULL, pruning_steps is known not to
+ * contain any expressions that depend on the parent plan.
+ * Information of any available EXTERN parameters must be
+ * passed explicitly in that case, which the caller must
+ * have made available via econtext.
+ */
+ if (planstate == NULL)
+ context->exprstates[stateidx] =
+ ExecInitExprWithParams(expr,
+ econtext->ecxt_param_list_info);
+ else
+ context->exprstates[stateidx] =
+ ExecInitExpr(expr, context->planstate);
}
keyno++;
}
@@ -1909,179 +1989,121 @@ ExecInitPruningContext(PartitionPruneContext *context,
}
/*
- * ExecFindInitialMatchingSubPlans
- * Identify the set of subplans that cannot be eliminated by initial
- * pruning, disregarding any pruning constraints involving PARAM_EXEC
- * Params.
- *
- * If additional pruning passes will be required (because of PARAM_EXEC
- * Params), we must also update the translation data that allows conversion
- * of partition indexes into subplan indexes to account for the unneeded
- * subplans having been removed.
- *
- * Must only be called once per 'prunestate', and only if initial pruning
- * is required.
+ * PartitionPruneStateFixSubPlanMap
+ * Fix mapping of partition indexes to subplan indexes contained in
+ * prunestate by considering the new list of subplans that survived
+ * initial pruning
*
- * 'nsubplans' must be passed as the total number of unpruned subplans.
+ * Current values of the indexes present in PartitionPruneState count all the
+ * subplans that would be present before initial pruning was done. If initial
+ * pruning got rid of some of the subplans, any subsequent pruning passes will
+ * will be looking at a different set of target subplans to choose from than
+ * those in the pre-initial-pruning set, so the maps in PartitionPruneState
+ * containing those indexes must be updated to reflect the new indexes of
+ * subplans in the post-initial-pruning set.
*/
-Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
+static void
+PartitionPruneStateFixSubPlanMap(PartitionPruneState *prunestate,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans)
{
- Bitmapset *result = NULL;
- MemoryContext oldcontext;
+ int *new_subplan_indexes;
+ Bitmapset *new_other_subplans;
int i;
-
- /* Caller error if we get here without do_initial_prune */
- Assert(prunestate->do_initial_prune);
+ int newidx;
/*
- * Switch to a temp context to avoid leaking memory in the executor's
- * query-lifespan memory context.
+ * First we must build a temporary array which maps old subplan
+ * indexes to new ones. For convenience of initialization, we use
+ * 1-based indexes in this array and leave pruned items as 0.
*/
- oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
-
- /*
- * For each hierarchy, do the pruning tests, and add nondeletable
- * subplans' indexes to "result".
- */
- for (i = 0; i < prunestate->num_partprunedata; i++)
+ new_subplan_indexes = (int *) palloc0(sizeof(int) * n_total_subplans);
+ newidx = 1;
+ i = -1;
+ while ((i = bms_next_member(initially_valid_subplans, i)) >= 0)
{
- PartitionPruningData *prunedata;
- PartitionedRelPruningData *pprune;
-
- prunedata = prunestate->partprunedata[i];
- pprune = &prunedata->partrelprunedata[0];
-
- /* Perform pruning without using PARAM_EXEC Params */
- find_matching_subplans_recurse(prunedata, pprune, true, &result);
-
- /* Expression eval may have used space in node's ps_ExprContext too */
- if (pprune->initial_pruning_steps)
- ResetExprContext(pprune->initial_context.planstate->ps_ExprContext);
+ Assert(i < n_total_subplans);
+ new_subplan_indexes[i] = newidx++;
}
- /* Add in any subplans that partition pruning didn't account for */
- result = bms_add_members(result, prunestate->other_subplans);
-
- MemoryContextSwitchTo(oldcontext);
-
- /* Copy result out of the temp context before we reset it */
- result = bms_copy(result);
-
- MemoryContextReset(prunestate->prune_context);
-
/*
- * If exec-time pruning is required and we pruned subplans above, then we
- * must re-sequence the subplan indexes so that ExecFindMatchingSubPlans
- * properly returns the indexes from the subplans which will remain after
- * execution of this function.
- *
- * We can safely skip this when !do_exec_prune, even though that leaves
- * invalid data in prunestate, because that data won't be consulted again
- * (cf initial Assert in ExecFindMatchingSubPlans).
+ * Now we can update each PartitionedRelPruneInfo's subplan_map with
+ * new subplan indexes. We must also recompute its present_parts
+ * bitmap.
*/
- if (prunestate->do_exec_prune && bms_num_members(result) < nsubplans)
+ for (i = 0; i < prunestate->num_partprunedata; i++)
{
- int *new_subplan_indexes;
- Bitmapset *new_other_subplans;
- int i;
- int newidx;
+ PartitionPruningData *prunedata = prunestate->partprunedata[i];
+ int j;
/*
- * First we must build a temporary array which maps old subplan
- * indexes to new ones. For convenience of initialization, we use
- * 1-based indexes in this array and leave pruned items as 0.
+ * Within each hierarchy, we perform this loop in back-to-front
+ * order so that we determine present_parts for the lowest-level
+ * partitioned tables first. This way we can tell whether a
+ * sub-partitioned table's partitions were entirely pruned so we
+ * can exclude it from the current level's present_parts.
*/
- new_subplan_indexes = (int *) palloc0(sizeof(int) * nsubplans);
- newidx = 1;
- i = -1;
- while ((i = bms_next_member(result, i)) >= 0)
+ for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
{
- Assert(i < nsubplans);
- new_subplan_indexes[i] = newidx++;
- }
+ PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
+ int nparts = pprune->nparts;
+ int k;
- /*
- * Now we can update each PartitionedRelPruneInfo's subplan_map with
- * new subplan indexes. We must also recompute its present_parts
- * bitmap.
- */
- for (i = 0; i < prunestate->num_partprunedata; i++)
- {
- PartitionPruningData *prunedata = prunestate->partprunedata[i];
- int j;
+ /* We just rebuild present_parts from scratch */
+ bms_free(pprune->present_parts);
+ pprune->present_parts = NULL;
- /*
- * Within each hierarchy, we perform this loop in back-to-front
- * order so that we determine present_parts for the lowest-level
- * partitioned tables first. This way we can tell whether a
- * sub-partitioned table's partitions were entirely pruned so we
- * can exclude it from the current level's present_parts.
- */
- for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
+ for (k = 0; k < nparts; k++)
{
- PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
- int nparts = pprune->nparts;
- int k;
-
- /* We just rebuild present_parts from scratch */
- bms_free(pprune->present_parts);
- pprune->present_parts = NULL;
+ int oldidx = pprune->subplan_map[k];
+ int subidx;
- for (k = 0; k < nparts; k++)
+ /*
+ * If this partition existed as a subplan then change the
+ * old subplan index to the new subplan index. The new
+ * index may become -1 if the partition was pruned above,
+ * or it may just come earlier in the subplan list due to
+ * some subplans being removed earlier in the list. If
+ * it's a subpartition, add it to present_parts unless
+ * it's entirely pruned.
+ */
+ if (oldidx >= 0)
{
- int oldidx = pprune->subplan_map[k];
- int subidx;
+ Assert(oldidx < n_total_subplans);
+ pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
- /*
- * If this partition existed as a subplan then change the
- * old subplan index to the new subplan index. The new
- * index may become -1 if the partition was pruned above,
- * or it may just come earlier in the subplan list due to
- * some subplans being removed earlier in the list. If
- * it's a subpartition, add it to present_parts unless
- * it's entirely pruned.
- */
- if (oldidx >= 0)
- {
- Assert(oldidx < nsubplans);
- pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
-
- if (new_subplan_indexes[oldidx] > 0)
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
- }
- else if ((subidx = pprune->subpart_map[k]) >= 0)
- {
- PartitionedRelPruningData *subprune;
+ if (new_subplan_indexes[oldidx] > 0)
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
+ }
+ else if ((subidx = pprune->subpart_map[k]) >= 0)
+ {
+ PartitionedRelPruningData *subprune;
- subprune = &prunedata->partrelprunedata[subidx];
+ subprune = &prunedata->partrelprunedata[subidx];
- if (!bms_is_empty(subprune->present_parts))
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
- }
+ if (!bms_is_empty(subprune->present_parts))
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
}
}
}
+ }
- /*
- * We must also recompute the other_subplans set, since indexes in it
- * may change.
- */
- new_other_subplans = NULL;
- i = -1;
- while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
- new_other_subplans = bms_add_member(new_other_subplans,
- new_subplan_indexes[i] - 1);
-
- bms_free(prunestate->other_subplans);
- prunestate->other_subplans = new_other_subplans;
+ /*
+ * We must also recompute the other_subplans set, since indexes in it
+ * may change.
+ */
+ new_other_subplans = NULL;
+ i = -1;
+ while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
+ new_other_subplans = bms_add_member(new_other_subplans,
+ new_subplan_indexes[i] - 1);
- pfree(new_subplan_indexes);
- }
+ bms_free(prunestate->other_subplans);
+ prunestate->other_subplans = new_other_subplans;
- return result;
+ pfree(new_subplan_indexes);
}
/*
@@ -2089,21 +2111,26 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
* Determine which subplans match the pruning steps detailed in
* 'prunestate' for the current comparison expression values.
*
- * Here we assume we may evaluate PARAM_EXEC Params.
+ * For example, if initial_prune is true, the caller is telling us that only
+ * those pruning steps that are known to not contain any expressions involving
+ * PARAM_EXEC Params are safe to evaluate at this point. Whereas when it's
+ * false, it is telling that PARAM_EXEC Params can be safely evaluated and so
+ * also the pruning steps that contain them.
*/
Bitmapset *
-ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
+ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
+ bool initial_prune)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
int i;
/*
- * If !do_exec_prune, we've got problems because
- * ExecFindInitialMatchingSubPlans will not have bothered to update
- * prunestate for whatever pruning it did.
+ * Only get here if prunestate->do_exec_prune, because otherwise
+ * ExecInitPartitionPruning() would not have bothered to update prunestate
+ * to account for the subplans removed by initial pruning.
*/
- Assert(prunestate->do_exec_prune);
+ Assert(prunestate->do_exec_prune || initial_prune);
/*
* Switch to a temp context to avoid leaking memory in the executor's
@@ -2123,11 +2150,17 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
prunedata = prunestate->partprunedata[i];
pprune = &prunedata->partrelprunedata[0];
- find_matching_subplans_recurse(prunedata, pprune, false, &result);
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
+ find_matching_subplans_recurse(prunedata, pprune, initial_prune,
+ &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
- ResetExprContext(pprune->exec_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->exec_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
@@ -2145,8 +2178,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
/*
* find_matching_subplans_recurse
- * Recursive worker function for ExecFindMatchingSubPlans and
- * ExecFindInitialMatchingSubPlans
+ * Recursive worker function for ExecFindMatchingSubPlans
*
* Adds valid (non-prunable) subplan IDs to *validsubplans
*/
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 7937f1c88f..357e10a1d7 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -138,30 +138,17 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
{
PartitionPruneState *prunestate;
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &appendstate->ps);
-
- /* Create the working data structure for pruning. */
- prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
- node->part_prune_info);
+ /*
+ * Set up pruning data structure. This also initializes the set of
+ * subplans to initialize (validsubplans) by taking into account the
+ * result of performing initial pruning if any.
+ */
+ prunestate = ExecInitPartitionPruning(&appendstate->ps,
+ list_length(node->appendplans),
+ node->part_prune_info,
+ &validsubplans);
appendstate->as_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
- {
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->appendplans));
-
- nplans = bms_num_members(validsubplans);
- }
- else
- {
- /* We'll need to initialize all subplans */
- nplans = list_length(node->appendplans);
- Assert(nplans > 0);
- validsubplans = bms_add_range(NULL, 0, nplans - 1);
- }
+ nplans = bms_num_members(validsubplans);
/*
* When no run-time pruning is required and there's at least one
@@ -590,7 +577,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state);
+ ExecFindMatchingSubPlans(node->as_prune_state, false);
whichplan = -1;
}
@@ -655,7 +642,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state);
+ ExecFindMatchingSubPlans(node->as_prune_state, false);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -730,7 +717,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state);
+ ExecFindMatchingSubPlans(node->as_prune_state, false);
mark_invalid_subplans_as_finished(node);
}
@@ -881,7 +868,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state);
+ ExecFindMatchingSubPlans(node->as_prune_state, false);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 418f89dea8..ecf9052e03 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -86,29 +86,17 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
{
PartitionPruneState *prunestate;
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &mergestate->ps);
-
- prunestate = ExecCreatePartitionPruneState(&mergestate->ps,
- node->part_prune_info);
+ /*
+ * Set up pruning data structure. This also initializes the set of
+ * subplans to initialize (validsubplans) by taking into account the
+ * result of performing initial pruning if any.
+ */
+ prunestate = ExecInitPartitionPruning(&mergestate->ps,
+ list_length(node->mergeplans),
+ node->part_prune_info,
+ &validsubplans);
mergestate->ms_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
- {
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->mergeplans));
-
- nplans = bms_num_members(validsubplans);
- }
- else
- {
- /* We'll need to initialize all subplans */
- nplans = list_length(node->mergeplans);
- Assert(nplans > 0);
- validsubplans = bms_add_range(NULL, 0, nplans - 1);
- }
+ nplans = bms_num_members(validsubplans);
/*
* When no run-time pruning is required and there's at least one
@@ -230,7 +218,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 1bc00826c1..7080cb25d9 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -798,6 +798,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
/* These are not valid when being called from the planner */
context.planstate = NULL;
+ context.exprcontext = NULL;
context.exprstates = NULL;
/* Actual pruning happens here. */
@@ -808,8 +809,8 @@ prune_append_rel_partitions(RelOptInfo *rel)
* get_matching_partitions
* Determine partitions that survive partition pruning
*
- * Note: context->planstate must be set to a valid PlanState when the
- * pruning_steps were generated with a target other than PARTTARGET_PLANNER.
+ * Note: context->exprcontext must be valid when the pruning_steps were
+ * generated with a target other than PARTTARGET_PLANNER.
*
* Returns a Bitmapset of the RelOptInfo->part_rels indexes of the surviving
* partitions.
@@ -3654,7 +3655,7 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
* exprstate array.
*
* Note that the evaluated result may be in the per-tuple memory context of
- * context->planstate->ps_ExprContext, and we may have leaked other memory
+ * context->exprcontext, and we may have leaked other memory
* there too. This memory must be recovered by resetting that ExprContext
* after we're done with the pruning operation (see execPartition.c).
*/
@@ -3677,13 +3678,18 @@ partkey_datum_from_expr(PartitionPruneContext *context,
ExprContext *ectx;
/*
- * We should never see a non-Const in a step unless we're running in
- * the executor.
+ * We should never see a non-Const in a step unless the caller has
+ * passed a valid ExprContext.
+ *
+ * When context->planstate is valid, context->exprcontext is same
+ * as context->planstate->ps_ExprContext.
*/
- Assert(context->planstate != NULL);
+ Assert(context->planstate != NULL || context->exprcontext != NULL);
+ Assert(context->planstate == NULL ||
+ (context->exprcontext == context->planstate->ps_ExprContext));
exprstate = context->exprstates[stateidx];
- ectx = context->planstate->ps_ExprContext;
+ ectx = context->exprcontext;
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 603d8becc4..4c706c11b9 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -119,10 +119,10 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
EState *estate);
extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
PartitionTupleRouting *proute);
-extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
-extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
-extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
- int nsubplans);
-
+extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
+ int n_total_subplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **initially_valid_subplans);
+extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
+ bool initial_prune);
#endif /* EXECPARTITION_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index ee11b6feae..90684efa25 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -41,6 +41,7 @@ struct RelOptInfo;
* subsidiary data, such as the FmgrInfos.
* planstate Points to the parent plan node's PlanState when called
* during execution; NULL when called from the planner.
+ * exprcontext ExprContext to use when evaluating pruning expressions
* exprstates Array of ExprStates, indexed as per PruneCxtStateIdx; one
* for each partition key in each pruning step. Allocated if
* planstate is non-NULL, otherwise NULL.
@@ -56,6 +57,7 @@ typedef struct PartitionPruneContext
FmgrInfo *stepcmpfuncs;
MemoryContext ppccontext;
PlanState *planstate;
+ ExprContext *exprcontext;
ExprState **exprstates;
} PartitionPruneContext;
--
2.24.1
On 2022-Apr-05, Amit Langote wrote:
While at it, maybe it's better to rename ExecInitPruningContext() to
InitPartitionPruneContext(), which I've done in the attached updated
patch.
Good call. I had changed that name too, but yours seems a better
choice.
I made a few other cosmetic changes and pushed. I'm afraid this will
cause a few conflicts with your 0004 -- hopefully these should mostly be
minor.
One change that's not completely cosmetic is a change in the test on
whether to call PartitionPruneFixSubPlanMap or not. Originally it was:
if (partprune->do_exec_prune &&
bms_num_members( ... ))
do_stuff();
which meant that bms_num_members() is only evaluated if do_exec_prune.
However, the do_exec_prune bit is an optimization (we can skip doing
that stuff if it's not going to be used), but the other test is more
strict: the stuff is completely irrelevant if no plans have been
removed, since the data structure does not need fixing. So I changed it
to be like this
if (bms_num_members( .. ))
{
/* can skip if it's pointless */
if (do_exec_prune)
do_stuff();
}
I think that it is clearer to the human reader this way; and I think a
smart compiler may realize that the test can be reversed and avoid
counting bits when it's pointless.
So your 0004 patch should add the new condition to the outer if(), since
it's a critical consideration rather than an optimization:
if (partprune && bms_num_members())
{
/* can skip if pointless */
if (do_exec_prune)
do_stuff()
}
Now, if we disagree and think that counting bits in the BMS when it's
going to be discarded by do_exec_prune being false, then we can flip
that back as originally and a more explicit comment. With no evidence,
I doubt it matters.
Thanks for the patch! I think the new coding is indeed a bit easier to
follow.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
<inflex> really, I see PHP as like a strange amalgamation of C, Perl, Shell
<crab> inflex: you know that "amalgam" means "mixture with mercury",
more or less, right?
<crab> i.e., "deadly poison"
On Tue, Apr 5, 2022 at 7:00 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Apr-05, Amit Langote wrote:
While at it, maybe it's better to rename ExecInitPruningContext() to
InitPartitionPruneContext(), which I've done in the attached updated
patch.Good call. I had changed that name too, but yours seems a better
choice.I made a few other cosmetic changes and pushed.
Thanks!
I'm afraid this will
cause a few conflicts with your 0004 -- hopefully these should mostly be
minor.One change that's not completely cosmetic is a change in the test on
whether to call PartitionPruneFixSubPlanMap or not. Originally it was:if (partprune->do_exec_prune &&
bms_num_members( ... ))
do_stuff();which meant that bms_num_members() is only evaluated if do_exec_prune.
However, the do_exec_prune bit is an optimization (we can skip doing
that stuff if it's not going to be used), but the other test is more
strict: the stuff is completely irrelevant if no plans have been
removed, since the data structure does not need fixing. So I changed it
to be like thisif (bms_num_members( .. ))
{
/* can skip if it's pointless */
if (do_exec_prune)
do_stuff();
}I think that it is clearer to the human reader this way; and I think a
smart compiler may realize that the test can be reversed and avoid
counting bits when it's pointless.So your 0004 patch should add the new condition to the outer if(), since
it's a critical consideration rather than an optimization:
if (partprune && bms_num_members())
{
/* can skip if pointless */
if (do_exec_prune)
do_stuff()
}Now, if we disagree and think that counting bits in the BMS when it's
going to be discarded by do_exec_prune being false, then we can flip
that back as originally and a more explicit comment. With no evidence,
I doubt it matters.
I agree that counting bits in the outer condition makes this easier to
read, so see no problem with keeping it that way.
Will post the rebased main patch soon, whose rewrite I'm close to
being done with.
--
Amit Langote
EDB: http://www.enterprisedb.com
On Fri, Apr 1, 2022 at 5:36 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Fri, Apr 1, 2022 at 5:20 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Fri, 1 Apr 2022 at 19:58, Amit Langote <amitlangote09@gmail.com> wrote:
Yes, the ExecLockRelsInfo node in the current patch, that first gets
added to the QueryDesc and subsequently to the EState of the query,
serves as that stashing place. Not sure if you've looked at
ExecLockRelInfo in detail in your review of the patch so far, but it
carries the initial pruning result in what are called
PlanInitPruningOutput nodes, which are stored in a list in
ExecLockRelsInfo and their offsets in the list are in turn stored in
an adjacent array that contains an element for every plan node in the
tree. If we go with a PlannedStmt.partpruneinfos list, then maybe we
don't need to have that array, because the Append/MergeAppend nodes
would be carrying those offsets by themselves.I saw it, just not in great detail. I saw that you had an array that
was indexed by the plan node's ID. I thought that wouldn't be so good
with large complex plans that we often get with partitioning
workloads. That's why I mentioned using another index that you store
in Append/MergeAppend that starts at 0 and increments by 1 for each
node that has a PartitionPruneInfo made for it during create_plan.Maybe a different name for ExecLockRelsInfo would be better?
Also, given Tom's apparent dislike for carrying that in PlannedStmt,
maybe the way I have it now is fine?I think if you change how it's indexed and the other stuff then we can
have another look. I think the patch will be much easier to review
once the ParitionPruneInfos are moved into PlannedStmt.Will do, thanks.
And here is a version like that that passes make check-world. Maybe
still a WIP as I think comments could use more editing.
Here's how the new implementation works:
AcquireExecutorLocks() calls ExecutorDoInitialPruning(), which in turn
iterates over a list of PartitionPruneInfos in a given PlannedStmt
coming from a CachedPlan. For each PartitionPruneInfo,
ExecPartitionDoInitialPruning() is called, which sets up
PartitionPruneState and performs initial pruning steps present in the
PartitionPruneInfo. The resulting bitmapsets of valid subplans, one
for each PartitionPruneInfo, are collected in a list and added to a
result node called PartitionPruneResult. It represents the result of
performing initial pruning on all PartitionPruneInfos found in a plan.
A list of PartitionPruneResults is passed along with the PlannedStmt
to the executor, which is referenced when initializing
Append/MergeAppend nodes.
PlannedStmt.minLockRelids defined by the planner contains the RT
indexes of all the entries in the range table minus those of the leaf
partitions whose subplans are subject to removal due to initial
pruning. AcquireExecutoLocks() adds back the RT indexes of only those
leaf partitions whose subplans survive ExecutorDoInitialPruning(). To
get the leaf partition RT indexes from the PartitionPruneInfo, a new
rti_map array is added to PartitionedRelPruneInfo.
There's only one patch this time. Patches that added partitioned_rels
and plan_tree_walker() are no longer necessary.
--
Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v11-0001-Optimize-AcquireExecutorLocks-to-skip-pruned-par.patchapplication/octet-stream; name=v11-0001-Optimize-AcquireExecutorLocks-to-skip-pruned-par.patchDownload
From b0c8f18835ea2f455ea503a7c1702195be989df8 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v11] Optimize AcquireExecutorLocks() to skip pruned partitions
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 13 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 17 +-
src/backend/executor/README | 28 +++
src/backend/executor/execMain.c | 46 +++++
src/backend/executor/execParallel.c | 28 ++-
src/backend/executor/execPartition.c | 238 ++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 16 +-
src/backend/executor/nodeMergeAppend.c | 9 +-
src/backend/executor/spi.c | 14 +-
src/backend/nodes/copyfuncs.c | 33 +++-
src/backend/nodes/outfuncs.c | 36 +++-
src/backend/nodes/readfuncs.c | 56 +++++-
src/backend/optimizer/plan/createplan.c | 20 +-
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/plan/setrefs.c | 104 ++++++++---
src/backend/partitioning/partprune.c | 41 +++-
src/backend/tcop/postgres.c | 15 +-
src/backend/tcop/pquery.c | 22 ++-
src/backend/utils/cache/plancache.c | 232 ++++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execPartition.h | 12 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 2 +
src/include/nodes/execnodes.h | 15 ++
src/include/nodes/nodes.h | 4 +
src/include/nodes/pathnodes.h | 15 ++
src/include/nodes/plannodes.h | 39 +++-
src/include/tcop/tcopprot.h | 2 +-
src/include/utils/plancache.h | 7 +
src/include/utils/portal.h | 5 +
38 files changed, 942 insertions(+), 155 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 55c38b04c4..d403eb2309 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -542,7 +542,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 1e5701b8eb..7ba9852e51 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_result,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_result, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 1013790dbb..1151d95e1f 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -741,8 +741,10 @@ execute_sql_string(const char *sql)
RawStmt *parsetree = lfirst_node(RawStmt, lc1);
MemoryContext per_parsetree_context,
oldcontext;
- List *stmt_list;
- ListCell *lc2;
+ List *stmt_list,
+ *part_prune_result_list;
+ ListCell *lc2,
+ *lc3;
/*
* We do the work for each parsetree in a short-lived context, to
@@ -762,11 +764,13 @@ execute_sql_string(const char *sql)
NULL,
0,
NULL);
- stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL);
+ stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL,
+ &part_prune_result_list);
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_result_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, lc3);
CommandCounterIncrement();
@@ -777,6 +781,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ part_prune_result,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 05e7b60059..4ef44aaf23 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -416,7 +416,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 9902c5c566..cac653f535 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -107,6 +107,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ list_make1(NULL), /* no PartitionPruneResult to pass */
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 80738547ed..8b15159374 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *plan_part_prune_result_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -195,6 +196,7 @@ ExecuteQuery(ParseState *pstate,
/* Replan if needed, and increment plan refcount for portal */
cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
plan_list = cplan->stmt_list;
+ plan_part_prune_result_list = cplan->part_prune_result_list;
/*
* DO NOT add any logic that could possibly throw an error between
@@ -204,7 +206,7 @@ ExecuteQuery(ParseState *pstate,
NULL,
query_string,
entry->plansource->commandTag,
- plan_list,
+ plan_list, plan_part_prune_result_list,
cplan);
/*
@@ -576,7 +578,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *plan_part_prune_result_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -632,15 +636,18 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ plan_part_prune_result_list = cplan->part_prune_result_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, plan_part_prune_result_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_result, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..8418e758da 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,30 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+proceeds by locking all the relations that will be scanned by that plan. If
+the generic plan has nodes that contain so-called initial pruning steps (a
+subset of execution pruning steps that do not depend on full-fledged execution
+having started), they are performed at this point to figure out the minimal
+set of child subplans that satisfy those pruning instructions and the result
+of performing that pruning is saved in a data structure that gets passed to
+the executor alongside the plan tree. Relations scanned by only those
+surviving subplans are then locked while those scanned by the pruned subplans
+are not, even though the pruned subplans themselves are not removed from the
+plan tree. So, it is imperative that the executor and any third party code
+invoked by it that gets passed the plan tree look at the initial pruning result
+made available via the aforementioned data structure to determine whether or
+not a particular subplan is valid. The data structure basically consists of
+a PartitionPruneResult node passed through the QueryDesc (subsequently added
+to EState) containing a list of bitmapsets with one element for every
+PartitionPruneInfo found in PlannedStmt.partPruneInfos. The list is indexed
+with part_prune_index of the individual PartitionPruneInfos that's stored in
+the parent plan nodes to which a given PartitionPruneInfo belongs. Each
+bitmapset of the indexes of the child subplans of the given parent plan
+node that survive initial partiiton pruning.
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +310,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ef2fd46092..05cc99df8f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,11 +49,13 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
#include "parser/parsetree.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
@@ -104,6 +106,47 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * Performs initial partition pruning to figure out the minimal set of
+ * subplans to be executed and the set of RT indexes of the corresponding
+ * leaf partitions
+ *
+ * Returned PartitionPruneResult must be subsequently passed to the executor
+ * so that it can reuse the result of pruning. It's important that the
+ * has the same view of which partitions are initially pruned (by not doing
+ * the pruning again itself) or otherwise it risks initializing subplans whose
+ * partitions would not have been locked.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ */
+PartitionPruneResult *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ PartitionPruneResult *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ result = makeNode(PartitionPruneResult);
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *valid_subplan_offs;
+
+ valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ &result->scan_leafpart_rtis);
+ result->valid_subplan_offs_list =
+ lappend(result->valid_subplan_offs_list,
+ valid_subplan_offs);
+ }
+
+ return result;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +849,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ PartitionPruneResult *part_prune_result = queryDesc->part_prune_result;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -825,6 +869,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_result = part_prune_result;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 9a0d5d59ef..805f86c503 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITIONPRUNERESULT UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,7 +183,9 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
@@ -596,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_result_data;
+ char *part_prune_result_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_result_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -630,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_result_data = nodeToString(estate->es_part_prune_result);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -656,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized PartitionPruneResult. */
+ part_prune_result_len = strlen(part_prune_result_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_result_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -750,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized PartitionPruneResult */
+ part_prune_result_space = shm_toc_allocate(pcxt->toc, part_prune_result_len);
+ memcpy(part_prune_result_space, part_prune_result_data, part_prune_result_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITIONPRUNERESULT,
+ part_prune_result_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1231,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_result_space;
char *paramspace;
PlannedStmt *pstmt;
+ PartitionPruneResult *part_prune_result;
ParamListInfo paramLI;
char *queryString;
@@ -1243,12 +1263,18 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_result_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITIONPRUNERESULT, false);
+ part_prune_result = (PartitionPruneResult *)
+ stringToNode(part_prune_result_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_result,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 615bd80973..3037742b8d 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1587,8 +1593,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1605,6 +1613,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1622,8 +1637,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecDoInitialPruning()), and in that case only the surviving subplans'
+ * indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1632,23 +1648,59 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
+ PartitionPruneResult *pruneresult = estate->es_part_prune_result;
+ PartitionPruneState *prunestate;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_result
+ * has been set.
+ */
+ if (pruneresult)
+ do_pruning = pruneinfo->needs_exec_pruning;
+
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans =
+ list_nth(pruneresult->valid_subplan_offs_list, part_prune_index);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
/* No pruning, so we'll need to initialize all subplans */
@@ -1669,7 +1721,7 @@ ExecInitPartitionPruning(PlanState *planstate,
* leaves invalid data in prunestate, because that data won't be
* consulted again (cf initial Assert in ExecFindMatchingSubPlans).
*/
- if (prunestate->do_exec_prune)
+ if (prunestate && prunestate->do_exec_prune)
PartitionPruneFixSubPlanMap(prunestate,
*initially_valid_subplans,
n_total_subplans);
@@ -1678,11 +1730,72 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans to be executed of the parent plan
+ * node to which the PartitionPruneInfo belongs and also the set of RT
+ * indexes of leaf partitions that will scanned with those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context to allocate stuff needded to run the pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so must create
+ * a standalone ExprContext to evaluate pruning expressions, equipped with
+ * the information about the EXTERN parameters that the caller passed us.
+ * Note that that's okay because the initial pruning steps do not contain
+ * anything that requires the execution to have started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1696,19 +1809,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1759,19 +1874,48 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionedRelPruneInfo *pinfo = lfirst_node(PartitionedRelPruneInfo, lc2);
PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
Relation partrel;
+ bool close_partrel = false;
PartitionDesc partdesc;
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ close_partrel = true;
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (close_partrel)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1785,6 +1929,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1795,6 +1940,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -1845,6 +1992,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -1852,6 +2001,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -1873,7 +2023,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1883,7 +2033,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2111,10 +2261,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2149,7 +2303,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2163,6 +2317,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2173,13 +2329,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2206,8 +2364,13 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ if (scan_leafpart_rtis && pprune->rti_map[i] > 0)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2215,7 +2378,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..639145abe9 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,7 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_result = NULL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f9460ae506..a2182a6b1f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..09f26658e2 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -94,6 +94,7 @@ static bool ExecAppendAsyncRequest(AppendState *node, TupleTableSlot **result);
static void ExecAppendAsyncEventWait(AppendState *node);
static void classify_matching_subplans(AppendState *node);
+
/* ----------------------------------------------------------------
* ExecInitAppend
*
@@ -134,7 +135,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +146,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -155,7 +156,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +579,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +644,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +719,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +870,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index ecf9052e03..7708cfffda 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 042a5f8b0a..d2ea2a8914 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_result_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1659,6 +1660,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
/* Replan if needed, and increment plan refcount for portal */
cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ part_prune_result_list = cplan->part_prune_result_list;
if (!plan->saved)
{
@@ -1670,6 +1672,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
oldcontext = MemoryContextSwitchTo(portal->portalContext);
stmt_list = copyObject(stmt_list);
+ part_prune_result_list = copyObject(part_prune_result_list);
MemoryContextSwitchTo(oldcontext);
ReleaseCachedPlan(cplan, NULL);
cplan = NULL; /* portal shouldn't depend on cplan */
@@ -1683,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ part_prune_result_list,
cplan);
/*
@@ -2473,7 +2477,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_result_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2552,6 +2558,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ part_prune_result_list = cplan->part_prune_result_list;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2589,9 +2596,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_result_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2671,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_result,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index d5760b1006..d2d86c9841 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -96,7 +96,10 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(parallelModeNeeded);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
+ COPY_NODE_FIELD(partPruneInfos);
+ COPY_SCALAR_FIELD(containsInitialPruning);
COPY_NODE_FIELD(rtable);
+ COPY_BITMAPSET_FIELD(minLockRelids);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
COPY_NODE_FIELD(subplans);
@@ -253,7 +256,7 @@ _copyAppend(const Append *from)
COPY_NODE_FIELD(appendplans);
COPY_SCALAR_FIELD(nasyncplans);
COPY_SCALAR_FIELD(first_partial_plan);
- COPY_NODE_FIELD(part_prune_info);
+ COPY_SCALAR_FIELD(part_prune_index);
return newnode;
}
@@ -281,7 +284,7 @@ _copyMergeAppend(const MergeAppend *from)
COPY_POINTER_FIELD(sortOperators, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
- COPY_NODE_FIELD(part_prune_info);
+ COPY_SCALAR_FIELD(part_prune_index);
return newnode;
}
@@ -1279,6 +1282,8 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(needs_init_pruning);
+ COPY_SCALAR_FIELD(needs_exec_pruning);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
@@ -1295,6 +1300,7 @@ _copyPartitionedRelPruneInfo(const PartitionedRelPruneInfo *from)
COPY_POINTER_FIELD(subplan_map, from->nparts * sizeof(int));
COPY_POINTER_FIELD(subpart_map, from->nparts * sizeof(int));
COPY_POINTER_FIELD(relid_map, from->nparts * sizeof(Oid));
+ COPY_POINTER_FIELD(rti_map, from->nparts * sizeof(Index));
COPY_NODE_FIELD(initial_pruning_steps);
COPY_NODE_FIELD(exec_pruning_steps);
COPY_BITMAPSET_FIELD(execparamids);
@@ -5468,6 +5474,21 @@ _copyExtensibleNode(const ExtensibleNode *from)
return newnode;
}
+/* ****************************************************************
+ * execnodes.h copy functions
+ * ****************************************************************
+ */
+static PartitionPruneResult *
+_copyPartitionPruneResult(const PartitionPruneResult *from)
+{
+ PartitionPruneResult *newnode = makeNode(PartitionPruneResult);
+
+ COPY_BITMAPSET_FIELD(scan_leafpart_rtis);
+ COPY_NODE_FIELD(valid_subplan_offs_list);
+
+ return newnode;
+}
+
/* ****************************************************************
* value.h copy functions
* ****************************************************************
@@ -5522,7 +5543,6 @@ _copyBitString(const BitString *from)
return newnode;
}
-
static ForeignKeyCacheInfo *
_copyForeignKeyCacheInfo(const ForeignKeyCacheInfo *from)
{
@@ -6564,6 +6584,13 @@ copyObjectImpl(const void *from)
retval = _copyPublicationTable(from);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_PartitionPruneResult:
+ retval = _copyPartitionPruneResult(from);
+ break;
+
/*
* MISCELLANEOUS NODES
*/
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index abb1f787ef..96d305102d 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -314,7 +314,10 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(parallelModeNeeded);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
+ WRITE_NODE_FIELD(partPruneInfos);
+ WRITE_BOOL_FIELD(containsInitialPruning);
WRITE_NODE_FIELD(rtable);
+ WRITE_BITMAPSET_FIELD(minLockRelids);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
WRITE_NODE_FIELD(subplans);
@@ -443,7 +446,7 @@ _outAppend(StringInfo str, const Append *node)
WRITE_NODE_FIELD(appendplans);
WRITE_INT_FIELD(nasyncplans);
WRITE_INT_FIELD(first_partial_plan);
- WRITE_NODE_FIELD(part_prune_info);
+ WRITE_INT_FIELD(part_prune_index);
}
static void
@@ -460,7 +463,7 @@ _outMergeAppend(StringInfo str, const MergeAppend *node)
WRITE_OID_ARRAY(sortOperators, node->numCols);
WRITE_OID_ARRAY(collations, node->numCols);
WRITE_BOOL_ARRAY(nullsFirst, node->numCols);
- WRITE_NODE_FIELD(part_prune_info);
+ WRITE_INT_FIELD(part_prune_index);
}
static void
@@ -1005,6 +1008,8 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(needs_init_pruning);
+ WRITE_BOOL_FIELD(needs_exec_pruning);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -1019,6 +1024,7 @@ _outPartitionedRelPruneInfo(StringInfo str, const PartitionedRelPruneInfo *node)
WRITE_INT_ARRAY(subplan_map, node->nparts);
WRITE_INT_ARRAY(subpart_map, node->nparts);
WRITE_OID_ARRAY(relid_map, node->nparts);
+ WRITE_INDEX_ARRAY(rti_map, node->nparts);
WRITE_NODE_FIELD(initial_pruning_steps);
WRITE_NODE_FIELD(exec_pruning_steps);
WRITE_BITMAPSET_FIELD(execparamids);
@@ -2419,6 +2425,9 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_NODE_FIELD(finalrowmarks);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
+ WRITE_NODE_FIELD(partPruneInfos);
+ WRITE_BOOL_FIELD(containsInitialPruning);
+ WRITE_BITMAPSET_FIELD(minLockRelids);
WRITE_NODE_FIELD(relationOids);
WRITE_NODE_FIELD(invalItems);
WRITE_NODE_FIELD(paramExecTypes);
@@ -2486,6 +2495,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
WRITE_BOOL_FIELD(partColsUpdated);
+ WRITE_NODE_FIELD(partPruneInfos);
}
static void
@@ -2839,6 +2849,21 @@ _outExtensibleNode(StringInfo str, const ExtensibleNode *node)
methods->nodeOut(str, node);
}
+/*****************************************************************************
+ *
+ * Stuff from execnodes.h
+ *
+ *****************************************************************************/
+
+static void
+_outPartitionPruneResult(StringInfo str, const PartitionPruneResult *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNERESULT");
+
+ WRITE_BITMAPSET_FIELD(scan_leafpart_rtis);
+ WRITE_NODE_FIELD(valid_subplan_offs_list);
+}
+
/*****************************************************************************
*
* Stuff from parsenodes.h.
@@ -4747,6 +4772,13 @@ outNode(StringInfo str, const void *obj)
_outJsonTableSibling(str, obj);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_PartitionPruneResult:
+ _outPartitionPruneResult(str, obj);
+ break;
+
default:
/*
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index e7d008b2c5..677ec055d6 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -164,6 +164,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -1814,7 +1819,10 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(parallelModeNeeded);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
+ READ_NODE_FIELD(partPruneInfos);
+ READ_BOOL_FIELD(containsInitialPruning);
READ_NODE_FIELD(rtable);
+ READ_BITMAPSET_FIELD(minLockRelids);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
READ_NODE_FIELD(subplans);
@@ -1946,7 +1954,7 @@ _readAppend(void)
READ_NODE_FIELD(appendplans);
READ_INT_FIELD(nasyncplans);
READ_INT_FIELD(first_partial_plan);
- READ_NODE_FIELD(part_prune_info);
+ READ_INT_FIELD(part_prune_index);
READ_DONE();
}
@@ -1968,7 +1976,7 @@ _readMergeAppend(void)
READ_OID_ARRAY(sortOperators, local_node->numCols);
READ_OID_ARRAY(collations, local_node->numCols);
READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
- READ_NODE_FIELD(part_prune_info);
+ READ_INT_FIELD(part_prune_index);
READ_DONE();
}
@@ -2762,6 +2770,8 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(needs_init_pruning);
+ READ_BOOL_FIELD(needs_exec_pruning);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
@@ -2778,6 +2788,7 @@ _readPartitionedRelPruneInfo(void)
READ_INT_ARRAY(subplan_map, local_node->nparts);
READ_INT_ARRAY(subpart_map, local_node->nparts);
READ_OID_ARRAY(relid_map, local_node->nparts);
+ READ_INDEX_ARRAY(rti_map, local_node->nparts);
READ_NODE_FIELD(initial_pruning_steps);
READ_NODE_FIELD(exec_pruning_steps);
READ_BITMAPSET_FIELD(execparamids);
@@ -2931,6 +2942,21 @@ _readPartitionRangeDatum(void)
READ_DONE();
}
+
+/*
+ * _readPartitionPruneResult
+ */
+static PartitionPruneResult *
+_readPartitionPruneResult(void)
+{
+ READ_LOCALS(PartitionPruneResult);
+
+ READ_BITMAPSET_FIELD(scan_leafpart_rtis);
+ READ_NODE_FIELD(valid_subplan_offs_list);
+
+ READ_DONE();
+}
+
/*
* parseNodeString
*
@@ -3228,6 +3254,8 @@ parseNodeString(void)
return_value = _readJsonTableParent();
else if (MATCH("JSONTABSNODE", 12))
return_value = _readJsonTableSibling();
+ else if (MATCH("PARTITIONPRUNERESULT", 20))
+ return_value = _readPartitionPruneResult();
else
{
elog(ERROR, "badly formatted node string \"%.32s\"...", token);
@@ -3371,6 +3399,30 @@ readIntCols(int numCols)
return int_vals;
}
+/*
+ * readIndexCols
+ */
+Index *
+readIndexCols(int numCols)
+{
+ int tokenLength,
+ i;
+ const char *token;
+ Index *index_vals;
+
+ if (numCols <= 0)
+ return NULL;
+
+ index_vals = (Index *) palloc(numCols * sizeof(Index));
+ for (i = 0; i < numCols; i++)
+ {
+ token = pg_strtok(&tokenLength);
+ index_vals[i] = atoui(token);
+ }
+
+ return index_vals;
+}
+
/*
* readBoolCols
*/
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 179c87c671..2f9260abed 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1336,7 +1336,15 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
+
+ if (partpruneinfo)
+ {
+ root->partPruneInfos = lappend(root->partPruneInfos, partpruneinfo);
+ /* Will be updated later in set_plan_references(). */
+ plan->part_prune_index = list_length(root->partPruneInfos) - 1;
+ }
+ else
+ plan->part_prune_index = -1;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1498,7 +1506,15 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
+ if (partpruneinfo)
+ {
+ root->partPruneInfos = lappend(root->partPruneInfos, partpruneinfo);
+ /* Will be updated later in set_plan_references(). */
+ node->part_prune_index = list_length(root->partPruneInfos) - 1;
+ }
+ else
+ node->part_prune_index = -1;
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b2569c5d0c..2aa051d862 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -518,7 +518,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index bf4c722c02..8d9ab2c74d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -252,7 +252,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
Plan *result;
PlannerGlobal *glob = root->glob;
int rtoffset = list_length(glob->finalrtable);
- ListCell *lc;
+ ListCell *lc;
/*
* Add all the query's RTEs to the flattened rangetable. The live ones
@@ -261,6 +261,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -339,6 +349,56 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
+
+ /* RT index of the partitione table. */
+ pinfo->rtindex += rtoffset;
+
+ /* And also those of the leaf partitions. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
+ }
+ }
+
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1596,21 +1656,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1668,21 +1719,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9d3c05aed3..0eaff15ed0 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -230,6 +232,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -309,12 +313,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -323,6 +331,10 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+ if (!needs_init_pruning)
+ needs_init_pruning = partrel_needs_init_pruning;
+ if (!needs_exec_pruning)
+ needs_exec_pruning = partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -337,6 +349,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -435,13 +449,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -452,6 +471,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -539,6 +562,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * by noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -613,6 +639,11 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ if (!*needs_init_pruning)
+ *needs_init_pruning = (initial_pruning_steps != NIL);
+ if (!*needs_exec_pruning)
+ *needs_exec_pruning = (exec_pruning_steps != NIL);
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -640,6 +671,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -652,6 +684,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -666,6 +699,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -690,6 +724,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index ba2fcfeb4a..fecffdba65 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -945,15 +945,17 @@ pg_plan_query(Query *querytree, const char *query_string, int cursorOptions,
* For normal optimizable statements, invoke the planner. For utility
* statements, just make a wrapper PlannedStmt node.
*
- * The result is a list of PlannedStmt nodes.
+ * The result is a list of PlannedStmt nodes. Also, a NULL is appended to
+ * *part_prune_result_list for each PlannedStmt added to the returned list.
*/
List *
pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
- ParamListInfo boundParams)
+ ParamListInfo boundParams, List **part_prune_result_list)
{
List *stmt_list = NIL;
ListCell *query_list;
+ *part_prune_result_list = NIL;
foreach(query_list, querytrees)
{
Query *query = lfirst_node(Query, query_list);
@@ -977,6 +979,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
}
stmt_list = lappend(stmt_list, stmt);
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
}
return stmt_list;
@@ -1080,7 +1083,8 @@ exec_simple_query(const char *query_string)
QueryCompletion qc;
MemoryContext per_parsetree_context = NULL;
List *querytree_list,
- *plantree_list;
+ *plantree_list,
+ *plantree_part_prune_result_list;
Portal portal;
DestReceiver *receiver;
int16 format;
@@ -1167,7 +1171,8 @@ exec_simple_query(const char *query_string)
NULL, 0, NULL);
plantree_list = pg_plan_queries(querytree_list, query_string,
- CURSOR_OPT_PARALLEL_OK, NULL);
+ CURSOR_OPT_PARALLEL_OK, NULL,
+ &plantree_part_prune_result_list);
/*
* Done with the snapshot used for parsing/planning.
@@ -1203,6 +1208,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ plantree_part_prune_result_list,
NULL);
/*
@@ -1991,6 +1997,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ cplan->part_prune_result_list,
cplan);
/* Done with the snapshot used for parameter I/O and parsing/planning */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..fcba303b53 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_result = part_prune_result; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_result: ExecutorDoInitialPruning() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_result, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -493,6 +498,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ linitial_node(PartitionPruneResult, portal->part_prune_results),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1193,7 +1199,8 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *stmtlist_item,
+ *part_prune_results_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1221,12 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ forboth(stmtlist_item, portal->stmts,
+ part_prune_results_item, portal->part_prune_results)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult,
+ part_prune_results_item);
/*
* If we got a cancel signal in prior command, quit
@@ -1274,7 +1284,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1293,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 4cf6db504f..80564dd874 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,16 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
+static void CachedPlanSavePartitionPruneResults(CachedPlan *plan, List *part_prune_result_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static List *AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams);
+static void ReleaseExecutorLocks(List *stmt_list, List *part_prune_result_list);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,9 +792,21 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * If the CachedPlan is valid, this may in some cases call
+ * ExecutorDoInitialPruning() on each PlannedStmt contained in it to determine
+ * the set of relations to be locked by AcquireExecutorLocks(), instead of just
+ * scanning its range table, which is done to prune away any nodes in the tree
+ * that need not be executed based on the result of initial partition pruning.
+ * The result of pruning which consists of List of Lists of bitmapsets of child
+ * subplan indexes, allocated in a child context of the context containing the
+ * plan itself, are added into plan->part_prune_results. The previous contents
+ * of the list from the last invocation on the same CachedPlan are deleted,
+ * because they would no longer be valid given the fresh set of parameter
+ * values which may be used as pruning parameters.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
CachedPlan *plan = plansource->gplan;
@@ -820,13 +834,24 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *part_prune_result_list;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. If ExecutorDoInitialPruning()
+ * asked to omit some relations because the plan nodes that scan them
+ * were found to be pruned, the executor will be informed of the
+ * omission of the plan nodes themselves via part_prune_result_list
+ * that is passed to it along with the list of PlannedStmts, so that
+ * it doesn't accidentally try to execute those nodes.
+ */
+ part_prune_result_list = AcquireExecutorLocks(plan->stmt_list,
+ boundParams);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -844,11 +869,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (plan->is_valid)
{
/* Successfully revalidated and locked the query. */
+
+ /* Remember pruning results in the CachedPlan. */
+ CachedPlanSavePartitionPruneResults(plan, part_prune_result_list);
return true;
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, part_prune_result_list);
}
/*
@@ -880,7 +908,8 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv)
{
CachedPlan *plan;
- List *plist;
+ List *plist,
+ *part_prune_result_list;
bool snapshot_set;
bool is_transient;
MemoryContext plan_context;
@@ -933,7 +962,8 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
* Generate the plan.
*/
plist = pg_plan_queries(qlist, plansource->query_string,
- plansource->cursor_options, boundParams);
+ plansource->cursor_options, boundParams,
+ &part_prune_result_list);
/* Release snapshot if we got one */
if (snapshot_set)
@@ -1002,6 +1032,16 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->is_saved = false;
plan->is_valid = true;
+ /*
+ * Save a dummy part_prune_result_list, that is a list containing NULLs
+ * as elements. We must do this, becasue users of the CachedPlan expect
+ * one to go with the list of PlannedStmts.
+ * XXX maybe get rid of that contract.
+ */
+ plan->part_prune_result_list_context = NULL;
+ CachedPlanSavePartitionPruneResults(plan, part_prune_result_list);
+ Assert(MemoryContextIsValid(plan->part_prune_result_list_context));
+
/* assign generation number to new plan */
plan->generation = ++(plansource->generation);
@@ -1160,7 +1200,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1586,6 +1626,49 @@ CopyCachedPlan(CachedPlanSource *plansource)
return newsource;
}
+/*
+ * CachedPlanSavePartitionPruneResults
+ * Save the list containing PartitionPruneResult nodes into the given
+ * CachedPlan
+ *
+ * The provided list is copied into a dedicated context that is a child of
+ * plan->context. If the child context already exists, it is emptied, because
+ * any PartitionPruneResult contained therein would no longer be useful.
+ */
+static void
+CachedPlanSavePartitionPruneResults(CachedPlan *plan, List *part_prune_result_list)
+{
+ MemoryContext part_prune_result_list_context = plan->part_prune_result_list_context,
+ oldcontext = CurrentMemoryContext;
+ List *part_prune_result_list_copy;
+
+ /*
+ * Set up the dedicated context if not already done, saving it as a child
+ * of the CachedPlan's context.
+ */
+ if (part_prune_result_list_context == NULL)
+ {
+ part_prune_result_list_context = AllocSetContextCreate(CurrentMemoryContext,
+ "CachedPlan part_prune_results list",
+ ALLOCSET_START_SMALL_SIZES);
+ MemoryContextSetParent(part_prune_result_list_context, plan->context);
+ MemoryContextSetIdentifier(part_prune_result_list_context, plan->context->ident);
+ plan->part_prune_result_list_context = part_prune_result_list_context;
+ }
+ else
+ {
+ /* Just clear existing contents by resetting the context. */
+ Assert(MemoryContextIsValid(part_prune_result_list_context));
+ MemoryContextReset(part_prune_result_list_context);
+ }
+
+ MemoryContextSwitchTo(part_prune_result_list_context);
+ part_prune_result_list_copy = copyObject(part_prune_result_list);
+ MemoryContextSwitchTo(oldcontext);
+
+ plan->part_prune_result_list = part_prune_result_list_copy;
+}
+
/*
* CachedPlanIsValid: test whether the rewritten querytree within a
* CachedPlanSource is currently valid (that is, not marked as being in need
@@ -1737,17 +1820,21 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * Returns a list of PartitionPruneResult nodes containing one element for each
+ * PlannedStmt in stmt_list or NULL if the latter is utility statement or its
+ * containsInitialPruning is false.
*/
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+static List *
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams)
{
ListCell *lc1;
+ List *part_prune_result_list = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ PartitionPruneResult *part_prune_result = NULL;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,27 +1848,122 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
- continue;
+ ScanQueryForLocks(query, true);
}
-
- foreach(lc2, plannedstmt->rtable)
+ else
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (rte->rtekind != RTE_RELATION)
- continue;
+ Bitmapset *lockRelids;
+ int rti;
/*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
*/
- if (acquire)
+ if (plannedstmt->containsInitialPruning)
+ {
+ /*
+ * Obtain the set of partitions to be locked from the
+ * PartitionPruneInfos by considering the result of performing
+ * initial partition pruning.
+ */
+ PartitionPruneResult *part_prune_result =
+ ExecutorDoInitialPruning(plannedstmt, boundParams);
+
+ lockRelids = bms_union(plannedstmt->minLockRelids,
+ part_prune_result->scan_leafpart_rtis);
+ }
+ else
+ lockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(lockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ if (rte->rtekind != RTE_RELATION)
+ continue;
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID.
+ * Note that we don't actually try to open the rel, and hence
+ * will not fail if it's been dropped entirely --- we'll just
+ * transiently acquire a non-conflicting lock.
+ */
LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+
+ /*
+ * Remember PartitionPruneResult for later adding to the QueryDesc that
+ * will be passed to the executor when executing this plan. May be
+ * NULL, but must keep the list the same length as stmt_list.
+ */
+ part_prune_result_list = lappend(part_prune_result_list,
+ part_prune_result);
+ }
+
+ return part_prune_result_list;
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *part_prune_result_list)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, part_prune_result_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, lc2);
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ }
+ else
+ {
+ Bitmapset *lockRelids;
+ int rti;
+
+ if (part_prune_result == NULL)
+ {
+ Assert(!plannedstmt->containsInitialPruning);
+ lockRelids = plannedstmt->minLockRelids;
+ }
else
+ {
+ Assert(plannedstmt->containsInitialPruning);
+ lockRelids = bms_union(plannedstmt->minLockRelids,
+ part_prune_result->scan_leafpart_rtis);
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ if (rte->rtekind != RTE_RELATION)
+ continue;
+
+ /* See the comment in AcquireExecutorLocks(). */
UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index d549f66d4a..4705dc4097 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -285,6 +285,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *part_prune_results,
CachedPlan *cplan)
{
AssertArg(PortalIsValid(portal));
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->qc.nprocessed = 0;
portal->commandTag = commandTag;
portal->stmts = stmts;
+ portal->part_prune_results = part_prune_results;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 666977fb1f..34975c69ee 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_resul,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -123,9 +125,13 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
-
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..60d5644908 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ PartitionPruneResult *part_prune_result; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 873772f188..57dc0e8077 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern PartitionPruneResult *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cbbcff81d2..b5a7fd7e16 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -596,6 +596,8 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ struct PartitionPruneResult *es_part_prune_result; /* QueryDesc.part_prune_result */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -984,6 +986,19 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * PartitionPruneResult
+ *
+ * Result of ExecutorDoInitialPruning() invocation on a given plan.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *scan_leafpart_rtis;
+ List *valid_subplan_offs_list;
+} PartitionPruneResult;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 300824258e..de312b9215 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -97,6 +97,9 @@ typedef enum NodeTag
T_PartitionPruneStepCombine,
T_PlanInvalItem,
+ /* TAGS FOR EXECUTOR PREP NODES (execnodes.h) */
+ T_PartitionPruneResult,
+
/*
* TAGS FOR PLAN STATE NODES (execnodes.h)
*
@@ -673,6 +676,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6cbcb67bdf..f2039071c9 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -107,6 +107,18 @@ typedef struct PlannerGlobal
List *appendRelations; /* "flat" list of AppendRelInfos */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
+ Bitmapset *minLockRelids; /* RT indexes of RTE_RELATION entries that
+ * must always be locked to execute the plan;
+ * those scanned by initial-prunable plan
+ * nodes are not included */
+
List *relationOids; /* OIDs of relations the plan depends on */
List *invalItems; /* other dependencies, as PlanInvalItems */
@@ -377,6 +389,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 50ef3dda05..0a144a1e92 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -64,8 +64,19 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* RT indexes of RTE_RELATION entries that
+ * must be locked, except those scanned by
+ * initial-prunable plan nodes */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -262,8 +273,12 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /*
+ * Index of this plan's PartitionPruneInfo in PlannedStmt.part_prune_infos
+ * to be used for run-time subplan pruning; -1 if run-time pruning is
+ * not needed.
+ */
+ int part_prune_index;
} Append;
/* ----------------
@@ -282,8 +297,13 @@ typedef struct MergeAppend
Oid *sortOperators; /* OIDs of operators to sort them by */
Oid *collations; /* OIDs of collations */
bool *nullsFirst; /* NULLS FIRST/LAST directions */
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+
+ /*
+ * Index of this plan's PartitionPruneInfo in PlannedStmt.part_prune_infos
+ * to be used for run-time subplan pruning; -1 if run-time pruning is
+ * not needed.
+ */
+ int part_prune_index;
} MergeAppend;
/* ----------------
@@ -1175,6 +1195,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1183,6 +1210,9 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ Bitmapset *leafpart_rtis;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1213,6 +1243,7 @@ typedef struct PartitionedRelPruneInfo
int *subplan_map; /* subplan index by partition index, or -1 */
int *subpart_map; /* subpart index by partition index, or -1 */
Oid *relid_map; /* relation OID by partition index, or 0 */
+ Index *rti_map; /* Range table index by partition index, 0. */
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 92291a750d..119d4a1d10 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -64,7 +64,7 @@ extern PlannedStmt *pg_plan_query(Query *querytree, const char *query_string,
ParamListInfo boundParams);
extern List *pg_plan_queries(List *querytrees, const char *query_string,
int cursorOptions,
- ParamListInfo boundParams);
+ ParamListInfo boundParams, List **part_prune_result_list);
extern bool check_max_stack_depth(int *newval, void **extra, GucSource source);
extern void assign_max_stack_depth(int newval, void *extra);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 95b99e3d25..f591b9df9c 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -148,6 +148,9 @@ typedef struct CachedPlan
{
int magic; /* should equal CACHEDPLAN_MAGIC */
List *stmt_list; /* list of PlannedStmts */
+ List *part_prune_result_list; /* list of PartitionPruneResult with
+ * one element for each of stmt_list; NIL
+ * if not a generic plan */
bool is_oneshot; /* is it a "oneshot" plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
@@ -158,6 +161,10 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+ MemoryContext part_prune_result_list_context; /* context containing
+ * part_prune_result_list,
+ * a child of the above
+ * context */
} CachedPlan;
/*
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..c1e304f9d7 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,10 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *part_prune_results; /* list of PartitionPruneResults with one element
+ * for each of 'stmts'; same as
+ * cplan->part_prune_result_list if cplan is
+ * not NULL */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -241,6 +245,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *part_prune_results,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.24.1
On Wed, Apr 6, 2022 at 4:20 PM Amit Langote <amitlangote09@gmail.com> wrote:
And here is a version like that that passes make check-world. Maybe
still a WIP as I think comments could use more editing.Here's how the new implementation works:
AcquireExecutorLocks() calls ExecutorDoInitialPruning(), which in turn
iterates over a list of PartitionPruneInfos in a given PlannedStmt
coming from a CachedPlan. For each PartitionPruneInfo,
ExecPartitionDoInitialPruning() is called, which sets up
PartitionPruneState and performs initial pruning steps present in the
PartitionPruneInfo. The resulting bitmapsets of valid subplans, one
for each PartitionPruneInfo, are collected in a list and added to a
result node called PartitionPruneResult. It represents the result of
performing initial pruning on all PartitionPruneInfos found in a plan.
A list of PartitionPruneResults is passed along with the PlannedStmt
to the executor, which is referenced when initializing
Append/MergeAppend nodes.PlannedStmt.minLockRelids defined by the planner contains the RT
indexes of all the entries in the range table minus those of the leaf
partitions whose subplans are subject to removal due to initial
pruning. AcquireExecutoLocks() adds back the RT indexes of only those
leaf partitions whose subplans survive ExecutorDoInitialPruning(). To
get the leaf partition RT indexes from the PartitionPruneInfo, a new
rti_map array is added to PartitionedRelPruneInfo.There's only one patch this time. Patches that added partitioned_rels
and plan_tree_walker() are no longer necessary.
Here's an updated version. In Particular, I removed
part_prune_results list from PortalData, in favor of anything that
needs to look at the list can instead get it from the CachedPlan
(PortalData.cplan). This makes things better in 2 ways:
* All the changes that were needed to produce the list to be pass to
PortalDefineQuery() are now unnecessary (especially ugly ones were
those made to pg_plan_queries()'s interface)
* The cases in which the PartitionPruneResult being added to a
QueryDesc can be assumed to be valid is more clearly define now; it's
the cases where the portal's CachedPlan is also valid, that is, if the
accompanying PlannedStmt is a cached one.
--
Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v12-0001-Optimize-AcquireExecutorLocks-to-skip-pruned-par.patchapplication/octet-stream; name=v12-0001-Optimize-AcquireExecutorLocks-to-skip-pruned-par.patchDownload
From f55a622383c90c3f300dede0d04247f7cf2d9e77 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v12] Optimize AcquireExecutorLocks() to skip pruned partitions
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 13 +-
src/backend/executor/README | 28 +++
src/backend/executor/execMain.c | 46 +++++
src/backend/executor/execParallel.c | 28 ++-
src/backend/executor/execPartition.c | 238 ++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 16 +-
src/backend/executor/nodeMergeAppend.c | 9 +-
src/backend/executor/spi.c | 10 +-
src/backend/nodes/copyfuncs.c | 33 +++-
src/backend/nodes/outfuncs.c | 36 +++-
src/backend/nodes/readfuncs.c | 56 +++++-
src/backend/optimizer/plan/createplan.c | 20 +-
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/plan/setrefs.c | 104 ++++++++---
src/backend/partitioning/partprune.c | 41 +++-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 236 ++++++++++++++++++++---
src/include/commands/explain.h | 3 +-
src/include/executor/execPartition.h | 12 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 2 +
src/include/nodes/execnodes.h | 15 ++
src/include/nodes/nodes.h | 4 +
src/include/nodes/pathnodes.h | 15 ++
src/include/nodes/plannodes.h | 39 +++-
src/include/utils/plancache.h | 7 +
33 files changed, 919 insertions(+), 144 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 55c38b04c4..d403eb2309 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -542,7 +542,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 1e5701b8eb..7ba9852e51 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_result,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_result, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 1013790dbb..54734a3a93 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ab248d25e..2be1782bc4 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -416,7 +416,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 80738547ed..45039e64be 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -576,7 +576,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *plan_part_prune_result_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -632,15 +634,18 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ plan_part_prune_result_list = cplan->part_prune_result_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, plan_part_prune_result_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_result, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..8418e758da 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,30 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+proceeds by locking all the relations that will be scanned by that plan. If
+the generic plan has nodes that contain so-called initial pruning steps (a
+subset of execution pruning steps that do not depend on full-fledged execution
+having started), they are performed at this point to figure out the minimal
+set of child subplans that satisfy those pruning instructions and the result
+of performing that pruning is saved in a data structure that gets passed to
+the executor alongside the plan tree. Relations scanned by only those
+surviving subplans are then locked while those scanned by the pruned subplans
+are not, even though the pruned subplans themselves are not removed from the
+plan tree. So, it is imperative that the executor and any third party code
+invoked by it that gets passed the plan tree look at the initial pruning result
+made available via the aforementioned data structure to determine whether or
+not a particular subplan is valid. The data structure basically consists of
+a PartitionPruneResult node passed through the QueryDesc (subsequently added
+to EState) containing a list of bitmapsets with one element for every
+PartitionPruneInfo found in PlannedStmt.partPruneInfos. The list is indexed
+with part_prune_index of the individual PartitionPruneInfos that's stored in
+the parent plan nodes to which a given PartitionPruneInfo belongs. Each
+bitmapset of the indexes of the child subplans of the given parent plan
+node that survive initial partiiton pruning.
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +310,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ef2fd46092..05cc99df8f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,11 +49,13 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
#include "parser/parsetree.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
@@ -104,6 +106,47 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * Performs initial partition pruning to figure out the minimal set of
+ * subplans to be executed and the set of RT indexes of the corresponding
+ * leaf partitions
+ *
+ * Returned PartitionPruneResult must be subsequently passed to the executor
+ * so that it can reuse the result of pruning. It's important that the
+ * has the same view of which partitions are initially pruned (by not doing
+ * the pruning again itself) or otherwise it risks initializing subplans whose
+ * partitions would not have been locked.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ */
+PartitionPruneResult *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ PartitionPruneResult *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ result = makeNode(PartitionPruneResult);
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *valid_subplan_offs;
+
+ valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ &result->scan_leafpart_rtis);
+ result->valid_subplan_offs_list =
+ lappend(result->valid_subplan_offs_list,
+ valid_subplan_offs);
+ }
+
+ return result;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +849,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ PartitionPruneResult *part_prune_result = queryDesc->part_prune_result;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -825,6 +869,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_result = part_prune_result;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 9a0d5d59ef..805f86c503 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITIONPRUNERESULT UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,7 +183,9 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
@@ -596,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_result_data;
+ char *part_prune_result_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_result_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -630,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_result_data = nodeToString(estate->es_part_prune_result);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -656,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized PartitionPruneResult. */
+ part_prune_result_len = strlen(part_prune_result_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_result_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -750,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized PartitionPruneResult */
+ part_prune_result_space = shm_toc_allocate(pcxt->toc, part_prune_result_len);
+ memcpy(part_prune_result_space, part_prune_result_data, part_prune_result_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITIONPRUNERESULT,
+ part_prune_result_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1231,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_result_space;
char *paramspace;
PlannedStmt *pstmt;
+ PartitionPruneResult *part_prune_result;
ParamListInfo paramLI;
char *queryString;
@@ -1243,12 +1263,18 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_result_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITIONPRUNERESULT, false);
+ part_prune_result = (PartitionPruneResult *)
+ stringToNode(part_prune_result_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_result,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 615bd80973..3037742b8d 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1587,8 +1593,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1605,6 +1613,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1622,8 +1637,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecDoInitialPruning()), and in that case only the surviving subplans'
+ * indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1632,23 +1648,59 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
+ PartitionPruneResult *pruneresult = estate->es_part_prune_result;
+ PartitionPruneState *prunestate;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_result
+ * has been set.
+ */
+ if (pruneresult)
+ do_pruning = pruneinfo->needs_exec_pruning;
+
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans =
+ list_nth(pruneresult->valid_subplan_offs_list, part_prune_index);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
/* No pruning, so we'll need to initialize all subplans */
@@ -1669,7 +1721,7 @@ ExecInitPartitionPruning(PlanState *planstate,
* leaves invalid data in prunestate, because that data won't be
* consulted again (cf initial Assert in ExecFindMatchingSubPlans).
*/
- if (prunestate->do_exec_prune)
+ if (prunestate && prunestate->do_exec_prune)
PartitionPruneFixSubPlanMap(prunestate,
*initially_valid_subplans,
n_total_subplans);
@@ -1678,11 +1730,72 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans to be executed of the parent plan
+ * node to which the PartitionPruneInfo belongs and also the set of RT
+ * indexes of leaf partitions that will scanned with those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context to allocate stuff needded to run the pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so must create
+ * a standalone ExprContext to evaluate pruning expressions, equipped with
+ * the information about the EXTERN parameters that the caller passed us.
+ * Note that that's okay because the initial pruning steps do not contain
+ * anything that requires the execution to have started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1696,19 +1809,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1759,19 +1874,48 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionedRelPruneInfo *pinfo = lfirst_node(PartitionedRelPruneInfo, lc2);
PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
Relation partrel;
+ bool close_partrel = false;
PartitionDesc partdesc;
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ close_partrel = true;
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (close_partrel)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1785,6 +1929,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1795,6 +1940,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -1845,6 +1992,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -1852,6 +2001,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -1873,7 +2023,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1883,7 +2033,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2111,10 +2261,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2149,7 +2303,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2163,6 +2317,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2173,13 +2329,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2206,8 +2364,13 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ if (scan_leafpart_rtis && pprune->rti_map[i] > 0)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2215,7 +2378,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..639145abe9 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,7 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_result = NULL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f9460ae506..a2182a6b1f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..09f26658e2 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -94,6 +94,7 @@ static bool ExecAppendAsyncRequest(AppendState *node, TupleTableSlot **result);
static void ExecAppendAsyncEventWait(AppendState *node);
static void classify_matching_subplans(AppendState *node);
+
/* ----------------------------------------------------------------
* ExecInitAppend
*
@@ -134,7 +135,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +146,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -155,7 +156,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +579,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +644,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +719,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +870,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index ecf9052e03..7708cfffda 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 042a5f8b0a..05db2e9de1 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2473,7 +2473,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_result_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2552,6 +2554,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ part_prune_result_list = cplan->part_prune_result_list;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2589,9 +2592,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_result_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2667,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_result,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 46a1943d97..c5c70593de 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -96,7 +96,10 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(parallelModeNeeded);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
+ COPY_NODE_FIELD(partPruneInfos);
+ COPY_SCALAR_FIELD(containsInitialPruning);
COPY_NODE_FIELD(rtable);
+ COPY_BITMAPSET_FIELD(minLockRelids);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
COPY_NODE_FIELD(subplans);
@@ -253,7 +256,7 @@ _copyAppend(const Append *from)
COPY_NODE_FIELD(appendplans);
COPY_SCALAR_FIELD(nasyncplans);
COPY_SCALAR_FIELD(first_partial_plan);
- COPY_NODE_FIELD(part_prune_info);
+ COPY_SCALAR_FIELD(part_prune_index);
return newnode;
}
@@ -281,7 +284,7 @@ _copyMergeAppend(const MergeAppend *from)
COPY_POINTER_FIELD(sortOperators, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
- COPY_NODE_FIELD(part_prune_info);
+ COPY_SCALAR_FIELD(part_prune_index);
return newnode;
}
@@ -1280,6 +1283,8 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(needs_init_pruning);
+ COPY_SCALAR_FIELD(needs_exec_pruning);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
@@ -1296,6 +1301,7 @@ _copyPartitionedRelPruneInfo(const PartitionedRelPruneInfo *from)
COPY_POINTER_FIELD(subplan_map, from->nparts * sizeof(int));
COPY_POINTER_FIELD(subpart_map, from->nparts * sizeof(int));
COPY_POINTER_FIELD(relid_map, from->nparts * sizeof(Oid));
+ COPY_POINTER_FIELD(rti_map, from->nparts * sizeof(Index));
COPY_NODE_FIELD(initial_pruning_steps);
COPY_NODE_FIELD(exec_pruning_steps);
COPY_BITMAPSET_FIELD(execparamids);
@@ -5469,6 +5475,21 @@ _copyExtensibleNode(const ExtensibleNode *from)
return newnode;
}
+/* ****************************************************************
+ * execnodes.h copy functions
+ * ****************************************************************
+ */
+static PartitionPruneResult *
+_copyPartitionPruneResult(const PartitionPruneResult *from)
+{
+ PartitionPruneResult *newnode = makeNode(PartitionPruneResult);
+
+ COPY_BITMAPSET_FIELD(scan_leafpart_rtis);
+ COPY_NODE_FIELD(valid_subplan_offs_list);
+
+ return newnode;
+}
+
/* ****************************************************************
* value.h copy functions
* ****************************************************************
@@ -5523,7 +5544,6 @@ _copyBitString(const BitString *from)
return newnode;
}
-
static ForeignKeyCacheInfo *
_copyForeignKeyCacheInfo(const ForeignKeyCacheInfo *from)
{
@@ -6565,6 +6585,13 @@ copyObjectImpl(const void *from)
retval = _copyPublicationTable(from);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_PartitionPruneResult:
+ retval = _copyPartitionPruneResult(from);
+ break;
+
/*
* MISCELLANEOUS NODES
*/
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 13e1643530..ca54022fee 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -314,7 +314,10 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(parallelModeNeeded);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
+ WRITE_NODE_FIELD(partPruneInfos);
+ WRITE_BOOL_FIELD(containsInitialPruning);
WRITE_NODE_FIELD(rtable);
+ WRITE_BITMAPSET_FIELD(minLockRelids);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
WRITE_NODE_FIELD(subplans);
@@ -443,7 +446,7 @@ _outAppend(StringInfo str, const Append *node)
WRITE_NODE_FIELD(appendplans);
WRITE_INT_FIELD(nasyncplans);
WRITE_INT_FIELD(first_partial_plan);
- WRITE_NODE_FIELD(part_prune_info);
+ WRITE_INT_FIELD(part_prune_index);
}
static void
@@ -460,7 +463,7 @@ _outMergeAppend(StringInfo str, const MergeAppend *node)
WRITE_OID_ARRAY(sortOperators, node->numCols);
WRITE_OID_ARRAY(collations, node->numCols);
WRITE_BOOL_ARRAY(nullsFirst, node->numCols);
- WRITE_NODE_FIELD(part_prune_info);
+ WRITE_INT_FIELD(part_prune_index);
}
static void
@@ -1006,6 +1009,8 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(needs_init_pruning);
+ WRITE_BOOL_FIELD(needs_exec_pruning);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -1020,6 +1025,7 @@ _outPartitionedRelPruneInfo(StringInfo str, const PartitionedRelPruneInfo *node)
WRITE_INT_ARRAY(subplan_map, node->nparts);
WRITE_INT_ARRAY(subpart_map, node->nparts);
WRITE_OID_ARRAY(relid_map, node->nparts);
+ WRITE_INDEX_ARRAY(rti_map, node->nparts);
WRITE_NODE_FIELD(initial_pruning_steps);
WRITE_NODE_FIELD(exec_pruning_steps);
WRITE_BITMAPSET_FIELD(execparamids);
@@ -2420,6 +2426,9 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_NODE_FIELD(finalrowmarks);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
+ WRITE_NODE_FIELD(partPruneInfos);
+ WRITE_BOOL_FIELD(containsInitialPruning);
+ WRITE_BITMAPSET_FIELD(minLockRelids);
WRITE_NODE_FIELD(relationOids);
WRITE_NODE_FIELD(invalItems);
WRITE_NODE_FIELD(paramExecTypes);
@@ -2487,6 +2496,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
WRITE_BOOL_FIELD(partColsUpdated);
+ WRITE_NODE_FIELD(partPruneInfos);
}
static void
@@ -2840,6 +2850,21 @@ _outExtensibleNode(StringInfo str, const ExtensibleNode *node)
methods->nodeOut(str, node);
}
+/*****************************************************************************
+ *
+ * Stuff from execnodes.h
+ *
+ *****************************************************************************/
+
+static void
+_outPartitionPruneResult(StringInfo str, const PartitionPruneResult *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNERESULT");
+
+ WRITE_BITMAPSET_FIELD(scan_leafpart_rtis);
+ WRITE_NODE_FIELD(valid_subplan_offs_list);
+}
+
/*****************************************************************************
*
* Stuff from parsenodes.h.
@@ -4748,6 +4773,13 @@ outNode(StringInfo str, const void *obj)
_outJsonTableSibling(str, obj);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_PartitionPruneResult:
+ _outPartitionPruneResult(str, obj);
+ break;
+
default:
/*
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 48f7216c9e..acce5e29cc 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -164,6 +164,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -1814,7 +1819,10 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(parallelModeNeeded);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
+ READ_NODE_FIELD(partPruneInfos);
+ READ_BOOL_FIELD(containsInitialPruning);
READ_NODE_FIELD(rtable);
+ READ_BITMAPSET_FIELD(minLockRelids);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
READ_NODE_FIELD(subplans);
@@ -1946,7 +1954,7 @@ _readAppend(void)
READ_NODE_FIELD(appendplans);
READ_INT_FIELD(nasyncplans);
READ_INT_FIELD(first_partial_plan);
- READ_NODE_FIELD(part_prune_info);
+ READ_INT_FIELD(part_prune_index);
READ_DONE();
}
@@ -1968,7 +1976,7 @@ _readMergeAppend(void)
READ_OID_ARRAY(sortOperators, local_node->numCols);
READ_OID_ARRAY(collations, local_node->numCols);
READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
- READ_NODE_FIELD(part_prune_info);
+ READ_INT_FIELD(part_prune_index);
READ_DONE();
}
@@ -2763,6 +2771,8 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(needs_init_pruning);
+ READ_BOOL_FIELD(needs_exec_pruning);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
@@ -2779,6 +2789,7 @@ _readPartitionedRelPruneInfo(void)
READ_INT_ARRAY(subplan_map, local_node->nparts);
READ_INT_ARRAY(subpart_map, local_node->nparts);
READ_OID_ARRAY(relid_map, local_node->nparts);
+ READ_INDEX_ARRAY(rti_map, local_node->nparts);
READ_NODE_FIELD(initial_pruning_steps);
READ_NODE_FIELD(exec_pruning_steps);
READ_BITMAPSET_FIELD(execparamids);
@@ -2932,6 +2943,21 @@ _readPartitionRangeDatum(void)
READ_DONE();
}
+
+/*
+ * _readPartitionPruneResult
+ */
+static PartitionPruneResult *
+_readPartitionPruneResult(void)
+{
+ READ_LOCALS(PartitionPruneResult);
+
+ READ_BITMAPSET_FIELD(scan_leafpart_rtis);
+ READ_NODE_FIELD(valid_subplan_offs_list);
+
+ READ_DONE();
+}
+
/*
* parseNodeString
*
@@ -3229,6 +3255,8 @@ parseNodeString(void)
return_value = _readJsonTableParent();
else if (MATCH("JSONTABSNODE", 12))
return_value = _readJsonTableSibling();
+ else if (MATCH("PARTITIONPRUNERESULT", 20))
+ return_value = _readPartitionPruneResult();
else
{
elog(ERROR, "badly formatted node string \"%.32s\"...", token);
@@ -3372,6 +3400,30 @@ readIntCols(int numCols)
return int_vals;
}
+/*
+ * readIndexCols
+ */
+Index *
+readIndexCols(int numCols)
+{
+ int tokenLength,
+ i;
+ const char *token;
+ Index *index_vals;
+
+ if (numCols <= 0)
+ return NULL;
+
+ index_vals = (Index *) palloc(numCols * sizeof(Index));
+ for (i = 0; i < numCols; i++)
+ {
+ token = pg_strtok(&tokenLength);
+ index_vals[i] = atoui(token);
+ }
+
+ return index_vals;
+}
+
/*
* readBoolCols
*/
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 51591bb812..453f720759 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1366,7 +1366,15 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
+
+ if (partpruneinfo)
+ {
+ root->partPruneInfos = lappend(root->partPruneInfos, partpruneinfo);
+ /* Will be updated later in set_plan_references(). */
+ plan->part_prune_index = list_length(root->partPruneInfos) - 1;
+ }
+ else
+ plan->part_prune_index = -1;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1528,7 +1536,15 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
+ if (partpruneinfo)
+ {
+ root->partPruneInfos = lappend(root->partPruneInfos, partpruneinfo);
+ /* Will be updated later in set_plan_references(). */
+ node->part_prune_index = list_length(root->partPruneInfos) - 1;
+ }
+ else
+ node->part_prune_index = -1;
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b2569c5d0c..2aa051d862 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -518,7 +518,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 7519723081..fc66986e1c 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -251,7 +251,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
Plan *result;
PlannerGlobal *glob = root->glob;
int rtoffset = list_length(glob->finalrtable);
- ListCell *lc;
+ ListCell *lc;
/*
* Add all the query's RTEs to the flattened rangetable. The live ones
@@ -260,6 +260,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -338,6 +348,56 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
+
+ /* RT index of the partitione table. */
+ pinfo->rtindex += rtoffset;
+
+ /* And also those of the leaf partitions. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
+ }
+ }
+
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1610,21 +1670,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1682,21 +1733,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9d3c05aed3..0eaff15ed0 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -230,6 +232,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -309,12 +313,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -323,6 +331,10 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+ if (!needs_init_pruning)
+ needs_init_pruning = partrel_needs_init_pruning;
+ if (!needs_exec_pruning)
+ needs_exec_pruning = partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -337,6 +349,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -435,13 +449,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -452,6 +471,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -539,6 +562,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * by noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -613,6 +639,11 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ if (!*needs_init_pruning)
+ *needs_init_pruning = (initial_pruning_steps != NIL);
+ if (!*needs_exec_pruning)
+ *needs_exec_pruning = (exec_pruning_steps != NIL);
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -640,6 +671,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -652,6 +684,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -666,6 +699,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -690,6 +724,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..163ba956c4 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_result = part_prune_result; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_result: ExecutorDoInitialPruning() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_result, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,14 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan == NULL ? NULL :
+ linitial_node(PartitionPruneResult,
+ portal->cplan->part_prune_result_list),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1194,6 +1205,9 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int i;
+ List *part_prune_results = portal->cplan == NULL ? NIL:
+ portal->cplan->part_prune_result_list;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1228,15 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ i = 0;
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ PartitionPruneResult *part_prune_result = part_prune_results ?
+ list_nth(part_prune_results, i) :
+ NULL;
+
+ i++;
/*
* If we got a cancel signal in prior command, quit
@@ -1274,7 +1294,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 4cf6db504f..216401bcfb 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,16 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
+static void CachedPlanSavePartitionPruneResults(CachedPlan *plan, List *part_prune_result_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static List *AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams);
+static void ReleaseExecutorLocks(List *stmt_list, List *part_prune_result_list);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,9 +792,21 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * If the CachedPlan is valid, this may in some cases call
+ * ExecutorDoInitialPruning() on each PlannedStmt contained in it to determine
+ * the set of relations to be locked by AcquireExecutorLocks(), instead of just
+ * scanning its range table, which is done to prune away any nodes in the tree
+ * that need not be executed based on the result of initial partition pruning.
+ * The result of pruning which consists of List of Lists of bitmapsets of child
+ * subplan indexes, allocated in a child context of the context containing the
+ * plan itself, are added into plan->part_prune_results. The previous contents
+ * of the list from the last invocation on the same CachedPlan are deleted,
+ * because they would no longer be valid given the fresh set of parameter
+ * values which may be used as pruning parameters.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
CachedPlan *plan = plansource->gplan;
@@ -820,13 +834,24 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *part_prune_result_list;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. If ExecutorDoInitialPruning()
+ * asked to omit some relations because the plan nodes that scan them
+ * were found to be pruned, the executor will be informed of the
+ * omission of the plan nodes themselves via part_prune_result_list
+ * that is passed to it along with the list of PlannedStmts, so that
+ * it doesn't accidentally try to execute those nodes.
+ */
+ part_prune_result_list = AcquireExecutorLocks(plan->stmt_list,
+ boundParams);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -844,11 +869,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (plan->is_valid)
{
/* Successfully revalidated and locked the query. */
+
+ /* Remember pruning results in the CachedPlan. */
+ CachedPlanSavePartitionPruneResults(plan, part_prune_result_list);
return true;
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, part_prune_result_list);
}
/*
@@ -880,10 +908,12 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv)
{
CachedPlan *plan;
- List *plist;
+ List *plist,
+ *dummy_part_prune_result_list;
bool snapshot_set;
bool is_transient;
- MemoryContext plan_context;
+ MemoryContext plan_context,
+ part_prune_result_context;
MemoryContext oldcxt = CurrentMemoryContext;
ListCell *lc;
@@ -962,6 +992,16 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
else
plan_context = CurrentMemoryContext;
+ /*
+ * Also create a dedicated context for part_prune_result_list, making it
+ * a child of plan_context.
+ */
+ part_prune_result_context = AllocSetContextCreate(CurrentMemoryContext,
+ "CachedPlan part_prune_results list",
+ ALLOCSET_START_SMALL_SIZES);
+ MemoryContextSetParent(part_prune_result_context, plan_context);
+ MemoryContextSetIdentifier(part_prune_result_context, plan_context->ident);
+
/*
* Create and fill the CachedPlan struct within the new context.
*/
@@ -977,10 +1017,20 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->planRoleId = GetUserId();
plan->dependsOnRole = plansource->dependsOnRLS;
is_transient = false;
+ dummy_part_prune_result_list = NIL;
foreach(lc, plist)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc);
+ /*
+ * Real values will be added during subsequent CheckCachedPlan() calls
+ * on this plan, but must add "something" for now, becasue users of
+ * CachedPlan expect stmt_list and part_prune_result_list to have
+ * the same number of elements.
+ */
+ dummy_part_prune_result_list = lappend(dummy_part_prune_result_list,
+ NULL);
+
if (plannedstmt->commandType == CMD_UTILITY)
continue; /* Ignore utility statements */
@@ -1002,6 +1052,13 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->is_saved = false;
plan->is_valid = true;
+ /*
+ * While still dummy, save the list so that it is discarded on next use of
+ * the CachedPlan.
+ */
+ plan->part_prune_result_context = part_prune_result_context;
+ CachedPlanSavePartitionPruneResults(plan, dummy_part_prune_result_list);
+
/* assign generation number to new plan */
plan->generation = ++(plansource->generation);
@@ -1160,7 +1217,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1586,6 +1643,36 @@ CopyCachedPlan(CachedPlanSource *plansource)
return newsource;
}
+/*
+ * CachedPlanSavePartitionPruneResults
+ * Save the list containing PartitionPruneResult nodes into the given
+ * CachedPlan
+ *
+ * They must be hanged on to for the duration of a given execution of the
+ * CachedPlan. The provided list is copied into a dedicated context that is
+ * a child of plan->context after dropping the existing contents of the list,
+ * because any PartitionPruneResult contained therein would no longer be
+ * valid for the current execution.
+ */
+static void
+CachedPlanSavePartitionPruneResults(CachedPlan *plan,
+ List *part_prune_result_list)
+{
+ MemoryContext part_prune_result_context = plan->part_prune_result_context,
+ oldcontext = CurrentMemoryContext;
+ List *part_prune_result_list_copy;
+
+ /* First clear the existing contents of the list. */
+ Assert(MemoryContextIsValid(part_prune_result_context));
+ MemoryContextReset(part_prune_result_context);
+
+ MemoryContextSwitchTo(part_prune_result_context);
+ part_prune_result_list_copy = copyObject(part_prune_result_list);
+ MemoryContextSwitchTo(oldcontext);
+
+ plan->part_prune_result_list = part_prune_result_list_copy;
+}
+
/*
* CachedPlanIsValid: test whether the rewritten querytree within a
* CachedPlanSource is currently valid (that is, not marked as being in need
@@ -1737,17 +1824,21 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * Returns a list of PartitionPruneResult nodes containing one element for each
+ * PlannedStmt in stmt_list or NULL if the latter is utility statement or its
+ * containsInitialPruning is false.
*/
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+static List *
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams)
{
ListCell *lc1;
+ List *part_prune_result_list = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ PartitionPruneResult *part_prune_result = NULL;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,27 +1852,122 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
- continue;
+ ScanQueryForLocks(query, true);
}
-
- foreach(lc2, plannedstmt->rtable)
+ else
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (rte->rtekind != RTE_RELATION)
- continue;
+ Bitmapset *lockRelids;
+ int rti;
/*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
*/
- if (acquire)
+ if (plannedstmt->containsInitialPruning)
+ {
+ /*
+ * Obtain the set of partitions to be locked from the
+ * PartitionPruneInfos by considering the result of performing
+ * initial partition pruning.
+ */
+ PartitionPruneResult *part_prune_result =
+ ExecutorDoInitialPruning(plannedstmt, boundParams);
+
+ lockRelids = bms_union(plannedstmt->minLockRelids,
+ part_prune_result->scan_leafpart_rtis);
+ }
+ else
+ lockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(lockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ if (rte->rtekind != RTE_RELATION)
+ continue;
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID.
+ * Note that we don't actually try to open the rel, and hence
+ * will not fail if it's been dropped entirely --- we'll just
+ * transiently acquire a non-conflicting lock.
+ */
LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+
+ /*
+ * Remember PartitionPruneResult for later adding to the QueryDesc that
+ * will be passed to the executor when executing this plan. May be
+ * NULL, but must keep the list the same length as stmt_list.
+ */
+ part_prune_result_list = lappend(part_prune_result_list,
+ part_prune_result);
+ }
+
+ return part_prune_result_list;
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *part_prune_result_list)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, part_prune_result_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, lc2);
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ }
+ else
+ {
+ Bitmapset *lockRelids;
+ int rti;
+
+ if (part_prune_result == NULL)
+ {
+ Assert(!plannedstmt->containsInitialPruning);
+ lockRelids = plannedstmt->minLockRelids;
+ }
else
+ {
+ Assert(plannedstmt->containsInitialPruning);
+ lockRelids = bms_union(plannedstmt->minLockRelids,
+ part_prune_result->scan_leafpart_rtis);
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ if (rte->rtekind != RTE_RELATION)
+ continue;
+
+ /* See the comment in AcquireExecutorLocks(). */
UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+
}
}
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 666977fb1f..34975c69ee 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_resul,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -123,9 +125,13 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
-
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..60d5644908 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ PartitionPruneResult *part_prune_result; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 873772f188..57dc0e8077 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern PartitionPruneResult *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cbbcff81d2..b5a7fd7e16 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -596,6 +596,8 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ struct PartitionPruneResult *es_part_prune_result; /* QueryDesc.part_prune_result */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -984,6 +986,19 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * PartitionPruneResult
+ *
+ * Result of ExecutorDoInitialPruning() invocation on a given plan.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *scan_leafpart_rtis;
+ List *valid_subplan_offs_list;
+} PartitionPruneResult;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 300824258e..de312b9215 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -97,6 +97,9 @@ typedef enum NodeTag
T_PartitionPruneStepCombine,
T_PlanInvalItem,
+ /* TAGS FOR EXECUTOR PREP NODES (execnodes.h) */
+ T_PartitionPruneResult,
+
/*
* TAGS FOR PLAN STATE NODES (execnodes.h)
*
@@ -673,6 +676,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6cbcb67bdf..f2039071c9 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -107,6 +107,18 @@ typedef struct PlannerGlobal
List *appendRelations; /* "flat" list of AppendRelInfos */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
+ Bitmapset *minLockRelids; /* RT indexes of RTE_RELATION entries that
+ * must always be locked to execute the plan;
+ * those scanned by initial-prunable plan
+ * nodes are not included */
+
List *relationOids; /* OIDs of relations the plan depends on */
List *invalItems; /* other dependencies, as PlanInvalItems */
@@ -377,6 +389,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 10dd35f011..ecdc950fde 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -64,8 +64,19 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* RT indexes of RTE_RELATION entries that
+ * must be locked, except those scanned by
+ * initial-prunable plan nodes */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -262,8 +273,12 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /*
+ * Index of this plan's PartitionPruneInfo in PlannedStmt.part_prune_infos
+ * to be used for run-time subplan pruning; -1 if run-time pruning is
+ * not needed.
+ */
+ int part_prune_index;
} Append;
/* ----------------
@@ -282,8 +297,13 @@ typedef struct MergeAppend
Oid *sortOperators; /* OIDs of operators to sort them by */
Oid *collations; /* OIDs of collations */
bool *nullsFirst; /* NULLS FIRST/LAST directions */
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+
+ /*
+ * Index of this plan's PartitionPruneInfo in PlannedStmt.part_prune_infos
+ * to be used for run-time subplan pruning; -1 if run-time pruning is
+ * not needed.
+ */
+ int part_prune_index;
} MergeAppend;
/* ----------------
@@ -1187,6 +1207,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1195,6 +1222,9 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ Bitmapset *leafpart_rtis;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1225,6 +1255,7 @@ typedef struct PartitionedRelPruneInfo
int *subplan_map; /* subplan index by partition index, or -1 */
int *subpart_map; /* subpart index by partition index, or -1 */
Oid *relid_map; /* relation OID by partition index, or 0 */
+ Index *rti_map; /* Range table index by partition index, 0. */
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 95b99e3d25..fd7f129aea 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -148,6 +148,9 @@ typedef struct CachedPlan
{
int magic; /* should equal CACHEDPLAN_MAGIC */
List *stmt_list; /* list of PlannedStmts */
+ List *part_prune_result_list; /* list of PartitionPruneResult with
+ * one element for each of stmt_list;
+ * NIL if not a generic plan */
bool is_oneshot; /* is it a "oneshot" plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
@@ -158,6 +161,10 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+ MemoryContext part_prune_result_context; /* context containing
+ * part_prune_result_list,
+ * a child of the above
+ * context */
} CachedPlan;
/*
--
2.24.1
On Thu, 7 Apr 2022 at 20:28, Amit Langote <amitlangote09@gmail.com> wrote:
Here's an updated version. In Particular, I removed
part_prune_results list from PortalData, in favor of anything that
needs to look at the list can instead get it from the CachedPlan
(PortalData.cplan). This makes things better in 2 ways:
Thanks for making those changes.
I'm not overly familiar with the data structures we use for planning
around plans between the planner and executor, but storing the pruning
results in CachedPlan seems pretty bad. I see you've stashed it in
there and invented a new memory context to stop leaks into the cache
memory.
Since I'm not overly familiar with these structures, I'm trying to
imagine why you made that choice and the best I can come up with was
that it was the most convenient thing you had to hand inside
CheckCachedPlan().
I don't really have any great ideas right now on how to make this
better. I wonder if GetCachedPlan() should be changed to return some
struct that wraps up the CachedPlan with some sort of executor prep
info struct that we can stash the list of PartitionPruneResults in,
and perhaps something else one day.
Some lesser important stuff that I think could be done better.
* Are you also able to put meaningful comments on the
PartitionPruneResult struct in execnodes.h?
* In create_append_plan() and create_merge_append_plan() you have the
same code to set the part_prune_index. Why not just move all that code
into make_partition_pruneinfo() and have make_partition_pruneinfo()
return the index and append to the PlannerInfo.partPruneInfos List?
* Why not forboth() here?
i = 0;
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
PartitionPruneResult *part_prune_result = part_prune_results ?
list_nth(part_prune_results, i) :
NULL;
i++;
* It would be good if ReleaseExecutorLocks() already knew the RTIs
that were locked. Maybe the executor prep info struct I mentioned
above could also store the RTIs that have been locked already and
allow ReleaseExecutorLocks() to just iterate over those to release the
locks.
David
On Thu, Apr 7, 2022 at 9:41 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 7 Apr 2022 at 20:28, Amit Langote <amitlangote09@gmail.com> wrote:
Here's an updated version. In Particular, I removed
part_prune_results list from PortalData, in favor of anything that
needs to look at the list can instead get it from the CachedPlan
(PortalData.cplan). This makes things better in 2 ways:Thanks for making those changes.
I'm not overly familiar with the data structures we use for planning
around plans between the planner and executor, but storing the pruning
results in CachedPlan seems pretty bad. I see you've stashed it in
there and invented a new memory context to stop leaks into the cache
memory.Since I'm not overly familiar with these structures, I'm trying to
imagine why you made that choice and the best I can come up with was
that it was the most convenient thing you had to hand inside
CheckCachedPlan().
Yeah, it's that way because it felt convenient, though I have wondered
if a simpler scheme that doesn't require any changes to the CachedPlan
data structure might be better after all. Your pointing it out has
made me think a bit harder on that.
I don't really have any great ideas right now on how to make this
better. I wonder if GetCachedPlan() should be changed to return some
struct that wraps up the CachedPlan with some sort of executor prep
info struct that we can stash the list of PartitionPruneResults in,
and perhaps something else one day.
I think what might be better to do now is just add an output List
parameter to GetCachedPlan() to add the PartitionPruneResult node to
instead of stashing them into CachedPlan as now. IMHO, we should
leave inventing a new generic struct to the next project that will
make it necessary to return more information from GetCachedPlan() to
its users. I find it hard to convincingly describe what the new
generic struct really is if we invent it *now*, when it's going to
carry a single list whose purpose is pretty narrow.
So, I've implemented this by making the callers of GetCachedPlan()
pass a list to add the PartitionPruneResults that may be produced.
Most callers can put that into the Portal for passing that to other
modules, so I have reinstated PortalData.part_prune_results. As for
its memory management, the list and the PartitionPruneResults therein
will be allocated in a context that holds the Portal itself.
Some lesser important stuff that I think could be done better.
* Are you also able to put meaningful comments on the
PartitionPruneResult struct in execnodes.h?* In create_append_plan() and create_merge_append_plan() you have the
same code to set the part_prune_index. Why not just move all that code
into make_partition_pruneinfo() and have make_partition_pruneinfo()
return the index and append to the PlannerInfo.partPruneInfos List?
That sounds better, so done.
* Why not forboth() here?
i = 0;
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
PartitionPruneResult *part_prune_result = part_prune_results ?
list_nth(part_prune_results, i) :
NULL;i++;
Because the PartitionPruneResult list may not always be available. To
wit, it's only available when it is GetCachedPlan() that gave the
portal its plan. I know this is a bit ugly, but it seems better than
fixing all users of Portal to build a dummy list, not that it is
totally avoidable even in the current implementation.
* It would be good if ReleaseExecutorLocks() already knew the RTIs
that were locked. Maybe the executor prep info struct I mentioned
above could also store the RTIs that have been locked already and
allow ReleaseExecutorLocks() to just iterate over those to release the
locks.
Rewrote this such that ReleaseExecutorLocks() just receives a list of
per-PlannedStmt bitmapsets containing the RT indexes of only the
locked entries in that plan.
Attached updated patch with these changes.
--
Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v13-0001-Optimize-AcquireExecutorLocks-to-skip-pruned-par.patchapplication/octet-stream; name=v13-0001-Optimize-AcquireExecutorLocks-to-skip-pruned-par.patchDownload
From 3c0c7f9f5f8bdf89c6afd06e26ba6d5490af9221 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v13] Optimize AcquireExecutorLocks() to skip pruned partitions
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 27 +++
src/backend/executor/execMain.c | 46 +++++
src/backend/executor/execParallel.c | 28 ++-
src/backend/executor/execPartition.c | 238 ++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 16 +-
src/backend/executor/nodeMergeAppend.c | 9 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/copyfuncs.c | 33 +++-
src/backend/nodes/outfuncs.c | 36 +++-
src/backend/nodes/readfuncs.c | 56 +++++-
src/backend/optimizer/plan/createplan.c | 25 +--
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/plan/setrefs.c | 104 ++++++++---
src/backend/partitioning/partprune.c | 59 +++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 25 ++-
src/backend/utils/cache/plancache.c | 184 +++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 3 +-
src/include/executor/execPartition.h | 12 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 2 +
src/include/nodes/execnodes.h | 30 +++
src/include/nodes/nodes.h | 4 +
src/include/nodes/pathnodes.h | 15 ++
src/include/nodes/plannodes.h | 39 +++-
src/include/partitioning/partprune.h | 8 +-
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
37 files changed, 942 insertions(+), 167 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 55c38b04c4..d403eb2309 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -542,7 +542,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 1e5701b8eb..7ba9852e51 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_result,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_result, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 1013790dbb..54734a3a93 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ab248d25e..2be1782bc4 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -416,7 +416,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 80738547ed..c7360712b1 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_result_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_result_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_result_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_result, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..e0802be723 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,29 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+proceeds by locking all the relations that will be scanned by that plan. If
+the generic plan contains nodes that can perform execution time partition
+pruning (that is, contain a PartitionPruneInfo), a subset of pruning steps
+contained in the PartitionPruneInfos that do not depend on execution actually
+having started (called "initial" pruning steps) are performed at this point
+to figure out the minimal set of child subplans that satisfy those pruning
+instructions. AcquireExecutorLocks() looking at a particular plan will then
+lock only the relations scanned by those surviving subplans (along with those
+present in PlannedStmt.minLockRelids), and ignore those scanned by the pruned
+subplans, even though the pruned subplans themselves are not removed from the
+plan tree. The result of pruning (that is, the set of indexes of surviving
+subplans in their parent's list of child subplans) is saved as a list of
+bitmapsets, with one element for every PartitionPruneInfo referenced in the
+plan (PlannedStmt.partPruneInfos). The list is packaged into a
+PartitionPruneResult node, which is passed along with the PlannedStmt to the
+executor via the QueryDesc. It is imperative that the executor and any third
+party code invoked by it that gets passed the plan tree look at the plan's
+PartitionPruneResult to determine whether a particular child subplan of a
+parent node that supports pruning is valid for a given execution.
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +309,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ef2fd46092..05cc99df8f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,11 +49,13 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
#include "parser/parsetree.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
@@ -104,6 +106,47 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * Performs initial partition pruning to figure out the minimal set of
+ * subplans to be executed and the set of RT indexes of the corresponding
+ * leaf partitions
+ *
+ * Returned PartitionPruneResult must be subsequently passed to the executor
+ * so that it can reuse the result of pruning. It's important that the
+ * has the same view of which partitions are initially pruned (by not doing
+ * the pruning again itself) or otherwise it risks initializing subplans whose
+ * partitions would not have been locked.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ */
+PartitionPruneResult *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ PartitionPruneResult *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ result = makeNode(PartitionPruneResult);
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *valid_subplan_offs;
+
+ valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ &result->scan_leafpart_rtis);
+ result->valid_subplan_offs_list =
+ lappend(result->valid_subplan_offs_list,
+ valid_subplan_offs);
+ }
+
+ return result;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +849,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ PartitionPruneResult *part_prune_result = queryDesc->part_prune_result;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -825,6 +869,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_result = part_prune_result;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 9a0d5d59ef..805f86c503 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITIONPRUNERESULT UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,7 +183,9 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
@@ -596,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_result_data;
+ char *part_prune_result_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_result_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -630,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_result_data = nodeToString(estate->es_part_prune_result);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -656,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized PartitionPruneResult. */
+ part_prune_result_len = strlen(part_prune_result_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_result_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -750,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized PartitionPruneResult */
+ part_prune_result_space = shm_toc_allocate(pcxt->toc, part_prune_result_len);
+ memcpy(part_prune_result_space, part_prune_result_data, part_prune_result_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITIONPRUNERESULT,
+ part_prune_result_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1231,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_result_space;
char *paramspace;
PlannedStmt *pstmt;
+ PartitionPruneResult *part_prune_result;
ParamListInfo paramLI;
char *queryString;
@@ -1243,12 +1263,18 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_result_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITIONPRUNERESULT, false);
+ part_prune_result = (PartitionPruneResult *)
+ stringToNode(part_prune_result_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_result,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 615bd80973..3037742b8d 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1587,8 +1593,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1605,6 +1613,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1622,8 +1637,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecDoInitialPruning()), and in that case only the surviving subplans'
+ * indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1632,23 +1648,59 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
+ PartitionPruneResult *pruneresult = estate->es_part_prune_result;
+ PartitionPruneState *prunestate;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_result
+ * has been set.
+ */
+ if (pruneresult)
+ do_pruning = pruneinfo->needs_exec_pruning;
+
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans =
+ list_nth(pruneresult->valid_subplan_offs_list, part_prune_index);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
/* No pruning, so we'll need to initialize all subplans */
@@ -1669,7 +1721,7 @@ ExecInitPartitionPruning(PlanState *planstate,
* leaves invalid data in prunestate, because that data won't be
* consulted again (cf initial Assert in ExecFindMatchingSubPlans).
*/
- if (prunestate->do_exec_prune)
+ if (prunestate && prunestate->do_exec_prune)
PartitionPruneFixSubPlanMap(prunestate,
*initially_valid_subplans,
n_total_subplans);
@@ -1678,11 +1730,72 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans to be executed of the parent plan
+ * node to which the PartitionPruneInfo belongs and also the set of RT
+ * indexes of leaf partitions that will scanned with those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context to allocate stuff needded to run the pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so must create
+ * a standalone ExprContext to evaluate pruning expressions, equipped with
+ * the information about the EXTERN parameters that the caller passed us.
+ * Note that that's okay because the initial pruning steps do not contain
+ * anything that requires the execution to have started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1696,19 +1809,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1759,19 +1874,48 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionedRelPruneInfo *pinfo = lfirst_node(PartitionedRelPruneInfo, lc2);
PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
Relation partrel;
+ bool close_partrel = false;
PartitionDesc partdesc;
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ close_partrel = true;
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (close_partrel)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1785,6 +1929,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1795,6 +1940,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -1845,6 +1992,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -1852,6 +2001,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -1873,7 +2023,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1883,7 +2033,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2111,10 +2261,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2149,7 +2303,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2163,6 +2317,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2173,13 +2329,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2206,8 +2364,13 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ if (scan_leafpart_rtis && pprune->rti_map[i] > 0)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2215,7 +2378,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..639145abe9 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,7 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_result = NULL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f9460ae506..a2182a6b1f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..09f26658e2 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -94,6 +94,7 @@ static bool ExecAppendAsyncRequest(AppendState *node, TupleTableSlot **result);
static void ExecAppendAsyncEventWait(AppendState *node);
static void classify_matching_subplans(AppendState *node);
+
/* ----------------------------------------------------------------
* ExecInitAppend
*
@@ -134,7 +135,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +146,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -155,7 +156,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +579,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +644,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +719,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +870,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index ecf9052e03..7708cfffda 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 042a5f8b0a..729e2fd7b2 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_result_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_result_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_result_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_result,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 46a1943d97..9642e74ef1 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -96,7 +96,10 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(parallelModeNeeded);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
+ COPY_NODE_FIELD(partPruneInfos);
+ COPY_SCALAR_FIELD(containsInitialPruning);
COPY_NODE_FIELD(rtable);
+ COPY_BITMAPSET_FIELD(minLockRelids);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
COPY_NODE_FIELD(subplans);
@@ -253,7 +256,7 @@ _copyAppend(const Append *from)
COPY_NODE_FIELD(appendplans);
COPY_SCALAR_FIELD(nasyncplans);
COPY_SCALAR_FIELD(first_partial_plan);
- COPY_NODE_FIELD(part_prune_info);
+ COPY_SCALAR_FIELD(part_prune_index);
return newnode;
}
@@ -281,7 +284,7 @@ _copyMergeAppend(const MergeAppend *from)
COPY_POINTER_FIELD(sortOperators, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
- COPY_NODE_FIELD(part_prune_info);
+ COPY_SCALAR_FIELD(part_prune_index);
return newnode;
}
@@ -1280,6 +1283,8 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(needs_init_pruning);
+ COPY_SCALAR_FIELD(needs_exec_pruning);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
@@ -1296,6 +1301,7 @@ _copyPartitionedRelPruneInfo(const PartitionedRelPruneInfo *from)
COPY_POINTER_FIELD(subplan_map, from->nparts * sizeof(int));
COPY_POINTER_FIELD(subpart_map, from->nparts * sizeof(int));
COPY_POINTER_FIELD(relid_map, from->nparts * sizeof(Oid));
+ COPY_POINTER_FIELD(rti_map, from->nparts * sizeof(Index));
COPY_NODE_FIELD(initial_pruning_steps);
COPY_NODE_FIELD(exec_pruning_steps);
COPY_BITMAPSET_FIELD(execparamids);
@@ -5469,6 +5475,21 @@ _copyExtensibleNode(const ExtensibleNode *from)
return newnode;
}
+/* ****************************************************************
+ * execnodes.h copy functions
+ * ****************************************************************
+ */
+static PartitionPruneResult *
+_copyPartitionPruneResult(const PartitionPruneResult *from)
+{
+ PartitionPruneResult *newnode = makeNode(PartitionPruneResult);
+
+ COPY_NODE_FIELD(valid_subplan_offs_list);
+ COPY_BITMAPSET_FIELD(scan_leafpart_rtis);
+
+ return newnode;
+}
+
/* ****************************************************************
* value.h copy functions
* ****************************************************************
@@ -5523,7 +5544,6 @@ _copyBitString(const BitString *from)
return newnode;
}
-
static ForeignKeyCacheInfo *
_copyForeignKeyCacheInfo(const ForeignKeyCacheInfo *from)
{
@@ -6565,6 +6585,13 @@ copyObjectImpl(const void *from)
retval = _copyPublicationTable(from);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_PartitionPruneResult:
+ retval = _copyPartitionPruneResult(from);
+ break;
+
/*
* MISCELLANEOUS NODES
*/
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 13e1643530..0cbcbc8ed4 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -314,7 +314,10 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(parallelModeNeeded);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
+ WRITE_NODE_FIELD(partPruneInfos);
+ WRITE_BOOL_FIELD(containsInitialPruning);
WRITE_NODE_FIELD(rtable);
+ WRITE_BITMAPSET_FIELD(minLockRelids);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
WRITE_NODE_FIELD(subplans);
@@ -443,7 +446,7 @@ _outAppend(StringInfo str, const Append *node)
WRITE_NODE_FIELD(appendplans);
WRITE_INT_FIELD(nasyncplans);
WRITE_INT_FIELD(first_partial_plan);
- WRITE_NODE_FIELD(part_prune_info);
+ WRITE_INT_FIELD(part_prune_index);
}
static void
@@ -460,7 +463,7 @@ _outMergeAppend(StringInfo str, const MergeAppend *node)
WRITE_OID_ARRAY(sortOperators, node->numCols);
WRITE_OID_ARRAY(collations, node->numCols);
WRITE_BOOL_ARRAY(nullsFirst, node->numCols);
- WRITE_NODE_FIELD(part_prune_info);
+ WRITE_INT_FIELD(part_prune_index);
}
static void
@@ -1006,6 +1009,8 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(needs_init_pruning);
+ WRITE_BOOL_FIELD(needs_exec_pruning);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -1020,6 +1025,7 @@ _outPartitionedRelPruneInfo(StringInfo str, const PartitionedRelPruneInfo *node)
WRITE_INT_ARRAY(subplan_map, node->nparts);
WRITE_INT_ARRAY(subpart_map, node->nparts);
WRITE_OID_ARRAY(relid_map, node->nparts);
+ WRITE_INDEX_ARRAY(rti_map, node->nparts);
WRITE_NODE_FIELD(initial_pruning_steps);
WRITE_NODE_FIELD(exec_pruning_steps);
WRITE_BITMAPSET_FIELD(execparamids);
@@ -2420,6 +2426,9 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_NODE_FIELD(finalrowmarks);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
+ WRITE_NODE_FIELD(partPruneInfos);
+ WRITE_BOOL_FIELD(containsInitialPruning);
+ WRITE_BITMAPSET_FIELD(minLockRelids);
WRITE_NODE_FIELD(relationOids);
WRITE_NODE_FIELD(invalItems);
WRITE_NODE_FIELD(paramExecTypes);
@@ -2487,6 +2496,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
WRITE_BOOL_FIELD(partColsUpdated);
+ WRITE_NODE_FIELD(partPruneInfos);
}
static void
@@ -2840,6 +2850,21 @@ _outExtensibleNode(StringInfo str, const ExtensibleNode *node)
methods->nodeOut(str, node);
}
+/*****************************************************************************
+ *
+ * Stuff from execnodes.h
+ *
+ *****************************************************************************/
+
+static void
+_outPartitionPruneResult(StringInfo str, const PartitionPruneResult *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNERESULT");
+
+ WRITE_NODE_FIELD(valid_subplan_offs_list);
+ WRITE_BITMAPSET_FIELD(scan_leafpart_rtis);
+}
+
/*****************************************************************************
*
* Stuff from parsenodes.h.
@@ -4748,6 +4773,13 @@ outNode(StringInfo str, const void *obj)
_outJsonTableSibling(str, obj);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_PartitionPruneResult:
+ _outPartitionPruneResult(str, obj);
+ break;
+
default:
/*
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 48f7216c9e..25e1df7068 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -164,6 +164,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -1814,7 +1819,10 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(parallelModeNeeded);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
+ READ_NODE_FIELD(partPruneInfos);
+ READ_BOOL_FIELD(containsInitialPruning);
READ_NODE_FIELD(rtable);
+ READ_BITMAPSET_FIELD(minLockRelids);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
READ_NODE_FIELD(subplans);
@@ -1946,7 +1954,7 @@ _readAppend(void)
READ_NODE_FIELD(appendplans);
READ_INT_FIELD(nasyncplans);
READ_INT_FIELD(first_partial_plan);
- READ_NODE_FIELD(part_prune_info);
+ READ_INT_FIELD(part_prune_index);
READ_DONE();
}
@@ -1968,7 +1976,7 @@ _readMergeAppend(void)
READ_OID_ARRAY(sortOperators, local_node->numCols);
READ_OID_ARRAY(collations, local_node->numCols);
READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
- READ_NODE_FIELD(part_prune_info);
+ READ_INT_FIELD(part_prune_index);
READ_DONE();
}
@@ -2763,6 +2771,8 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(needs_init_pruning);
+ READ_BOOL_FIELD(needs_exec_pruning);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
@@ -2779,6 +2789,7 @@ _readPartitionedRelPruneInfo(void)
READ_INT_ARRAY(subplan_map, local_node->nparts);
READ_INT_ARRAY(subpart_map, local_node->nparts);
READ_OID_ARRAY(relid_map, local_node->nparts);
+ READ_INDEX_ARRAY(rti_map, local_node->nparts);
READ_NODE_FIELD(initial_pruning_steps);
READ_NODE_FIELD(exec_pruning_steps);
READ_BITMAPSET_FIELD(execparamids);
@@ -2932,6 +2943,21 @@ _readPartitionRangeDatum(void)
READ_DONE();
}
+
+/*
+ * _readPartitionPruneResult
+ */
+static PartitionPruneResult *
+_readPartitionPruneResult(void)
+{
+ READ_LOCALS(PartitionPruneResult);
+
+ READ_NODE_FIELD(valid_subplan_offs_list);
+ READ_BITMAPSET_FIELD(scan_leafpart_rtis);
+
+ READ_DONE();
+}
+
/*
* parseNodeString
*
@@ -3229,6 +3255,8 @@ parseNodeString(void)
return_value = _readJsonTableParent();
else if (MATCH("JSONTABSNODE", 12))
return_value = _readJsonTableSibling();
+ else if (MATCH("PARTITIONPRUNERESULT", 20))
+ return_value = _readPartitionPruneResult();
else
{
elog(ERROR, "badly formatted node string \"%.32s\"...", token);
@@ -3372,6 +3400,30 @@ readIntCols(int numCols)
return int_vals;
}
+/*
+ * readIndexCols
+ */
+Index *
+readIndexCols(int numCols)
+{
+ int tokenLength,
+ i;
+ const char *token;
+ Index *index_vals;
+
+ if (numCols <= 0)
+ return NULL;
+
+ index_vals = (Index *) palloc(numCols * sizeof(Index));
+ for (i = 0; i < numCols; i++)
+ {
+ token = pg_strtok(&tokenLength);
+ index_vals[i] = atoui(token);
+ }
+
+ return index_vals;
+}
+
/*
* readBoolCols
*/
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 51591bb812..e7f977fb96 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1183,7 +1183,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
+ int part_prune_index = -1;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1357,16 +1357,17 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ part_prune_index= make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
+
+ /* Will be updated later in set_plan_references(). */
+ plan->part_prune_index = part_prune_index;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1406,7 +1407,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
+ int part_prune_index = -1;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1522,13 +1523,15 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ part_prune_index= make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
+ /* Will be updated later in set_plan_references(). */
+ node->part_prune_index = part_prune_index;
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b2569c5d0c..2aa051d862 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -518,7 +518,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 7519723081..fc66986e1c 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -251,7 +251,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
Plan *result;
PlannerGlobal *glob = root->glob;
int rtoffset = list_length(glob->finalrtable);
- ListCell *lc;
+ ListCell *lc;
/*
* Add all the query's RTEs to the flattened rangetable. The live ones
@@ -260,6 +260,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -338,6 +348,56 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
+
+ /* RT index of the partitione table. */
+ pinfo->rtindex += rtoffset;
+
+ /* And also those of the leaf partitions. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
+ }
+ }
+
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1610,21 +1670,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1682,21 +1733,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9d3c05aed3..5a5f5dee46 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -209,16 +211,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor will use to prune useless ones from given set of
+ * child paths, and if so builds a PartitionPruneInfo that will allow the
+ * executor to do do and append it to root->partPruneInfos.
+ *
+ * Return value is 0-based index of the added PartitionPruneInfo or -1 if one
+ * was not built after all.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -230,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -309,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -323,6 +335,10 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+ if (!needs_init_pruning)
+ needs_init_pruning = partrel_needs_init_pruning;
+ if (!needs_exec_pruning)
+ needs_exec_pruning = partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -332,11 +348,13 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -358,7 +376,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
@@ -435,13 +455,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -452,6 +477,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -539,6 +568,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * by noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -613,6 +645,11 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ if (!*needs_init_pruning)
+ *needs_init_pruning = (initial_pruning_steps != NIL);
+ if (!*needs_exec_pruning)
+ *needs_exec_pruning = (exec_pruning_steps != NIL);
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -640,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -652,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -666,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -690,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 95dc2e2c83..8dc52a158f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1603,6 +1603,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_result_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1978,7 +1979,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
/*
* Now we can define the portal.
@@ -1993,6 +1996,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..a627448a5a 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_result = part_prune_result; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_result: ExecutorDoInitialPruning() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_result, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results == NIL ? NULL :
+ linitial(portal->part_prune_results),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1194,6 +1204,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int i;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1225,15 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ i = 0;
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ PartitionPruneResult *part_prune_result = portal->part_prune_results ?
+ list_nth(portal->part_prune_results, i) :
+ NULL;
+
+ i++;
/*
* If we got a cancel signal in prior command, quit
@@ -1274,7 +1291,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1300,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 4cf6db504f..6cb473f2f4 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,15 +795,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_result_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +830,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_result_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +866,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /*
+ * The output list and any objects therein have been allocated in the
+ * caller's hopefully short-lived context, so will not remain leaked
+ * for long, though reset to avoid its accidentally being looked at.
+ */
+ *part_prune_result_list = NIL;
}
/*
@@ -874,10 +899,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NULLs is returned in *part_prune_result_list, meaning that no
+ * PartitionPruneResult nodes have yet been created for the plans in
+ * stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1037,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_result_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
+ }
+
return plan;
}
@@ -1126,6 +1167,17 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a PartitionPruneResult or a NULL is added to
+ * *part_prune_result_list if needed. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and contains at least one
+ * PartitionPruneInfo that has "initial" pruning steps. Those steps are
+ * performed by calling ExecutorDoInitialPruning() to determine only those
+ * leaf partitions that need to be locked by AcquireExecutorLocks() by pruning
+ * away subplans that don't match the pruning conditions. The
+ * PartitionPruneResult contains a list of bitmapsets of the indexes of
+ * matching subplans, one for each PartitionPruneInfo.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1191,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_result_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1214,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_result_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1224,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_result_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1270,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_result_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1303,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_result_list)
+ *part_prune_result_list = my_part_prune_result_list;
+
return plan;
}
@@ -1737,17 +1797,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_result_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ PartitionPruneResult *part_prune_result = NULL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1833,35 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Obtain the set of partitions to be locked from the
+ * PartitionPruneInfos by considering the result of performing
+ * initial partition pruning.
+ */
+ PartitionPruneResult *part_prune_result =
+ ExecutorDoInitialPruning(plannedstmt, boundParams);
+
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ part_prune_result->scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1872,58 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_result_list = lappend(*part_prune_result_list,
+ part_prune_result);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index d549f66d4a..1bbe6b704b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given list of PartitionPruneResults into the portal's
+ * context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results)
+{
+ MemoryContext oldcxt;
+
+ AssertArg(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results = copyObject(part_prune_results);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 666977fb1f..34975c69ee 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_resul,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -123,9 +125,13 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
-
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..60d5644908 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ PartitionPruneResult *part_prune_result; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 873772f188..57dc0e8077 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern PartitionPruneResult *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cbbcff81d2..3de4df1b05 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -596,6 +596,8 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ struct PartitionPruneResult *es_part_prune_result; /* QueryDesc.part_prune_result */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -984,6 +986,34 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecutorDoInitialPruning() invocation on a given
+ * PlannedStmt.
+ *
+ * Contains a list of Bitmapset of the indexes of the subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() for every
+ * PartitionPruneInfos found in PlannedStmt.partPruneInfos. RT indexes of the
+ * leaf partitions scanned by those subplans across all PartitionPruneInfos
+ * are added into scan_leafpart_rtis.
+ *
+ * This is used by GetCachedPlan() to inform its callers of the pruning
+ * decisions made when performing AcquireExecutorLocks() on a given cached
+ * PlannedStmt, which the callers then pass that on to the executor. The
+ * executor refers to this node when made available when initializing the plan
+ * nodes to which those PartitionPruneInfos apply so that the same set of
+ * qualifying subplans are initialized, rather than deriving that set again by
+ * redoing initial pruning.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ List *valid_subplan_offs_list;
+ Bitmapset *scan_leafpart_rtis;
+} PartitionPruneResult;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 300824258e..de312b9215 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -97,6 +97,9 @@ typedef enum NodeTag
T_PartitionPruneStepCombine,
T_PlanInvalItem,
+ /* TAGS FOR EXECUTOR PREP NODES (execnodes.h) */
+ T_PartitionPruneResult,
+
/*
* TAGS FOR PLAN STATE NODES (execnodes.h)
*
@@ -673,6 +676,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6cbcb67bdf..d9c482e08b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -107,6 +107,18 @@ typedef struct PlannerGlobal
List *appendRelations; /* "flat" list of AppendRelInfos */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
List *relationOids; /* OIDs of relations the plan depends on */
List *invalItems; /* other dependencies, as PlanInvalItems */
@@ -377,6 +389,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 10dd35f011..44997d595d 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -64,8 +64,20 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -262,8 +274,12 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /*
+ * Index of this plan's PartitionPruneInfo in PlannedStmt.partPruneInfos
+ * to be used for run-time subplan pruning; -1 if run-time pruning is
+ * not needed.
+ */
+ int part_prune_index;
} Append;
/* ----------------
@@ -282,8 +298,13 @@ typedef struct MergeAppend
Oid *sortOperators; /* OIDs of operators to sort them by */
Oid *collations; /* OIDs of collations */
bool *nullsFirst; /* NULLS FIRST/LAST directions */
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+
+ /*
+ * Index of this plan's PartitionPruneInfo in PlannedStmt.partPruneInfos
+ * to be used for run-time subplan pruning; -1 if run-time pruning is
+ * not needed.
+ */
+ int part_prune_index;
} MergeAppend;
/* ----------------
@@ -1187,6 +1208,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1195,6 +1223,8 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1225,6 +1255,7 @@ typedef struct PartitionedRelPruneInfo
int *subplan_map; /* subplan index by partition index, or -1 */
int *subpart_map; /* subpart index by partition index, or -1 */
Oid *relid_map; /* relation OID by partition index, or 0 */
+ Index *rti_map; /* Range table index by partition index, 0. */
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 95b99e3d25..449200b949 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9f7727a837 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results; /* list of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_result_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.24.1
On Fri, 8 Apr 2022 at 17:49, Amit Langote <amitlangote09@gmail.com> wrote:
Attached updated patch with these changes.
Thanks for making the changes. I started looking over this patch but
really feel like it needs quite a few more iterations of what we've
just been doing to get it into proper committable shape. There seems
to be only about 40 mins to go before the freeze, so it seems very
unrealistic that it could be made to work.
I started trying to take a serious look at it this evening, but I feel
like I just failed to get into it deep enough to make any meaningful
improvements. I'd need more time to study the problem before I could
build up a proper opinion on how exactly I think it should work.
Anyway. I've attached a small patch that's just a few things I
adjusted or questions while reading over your v13 patch. Some of
these are just me questioning your code (See XXX comments) and some I
think are improvements. Feel free to take the hunks that you see fit
and drop anything you don't.
David
Attachments:
misc_fixes.patch.txttext/plain; charset=US-ASCII; name=misc_fixes.patch.txtDownload
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 05cc99df8f..5ee978937d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -121,6 +121,8 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
*
* Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
* drive the pruning will be locked before doing the pruning.
+ *
+ * ----------------------------------------------------------------
*/
PartitionPruneResult *
ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 3037742b8d..e9ca6bc55f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1707,6 +1707,7 @@ ExecInitPartitionPruning(PlanState *planstate,
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1714,14 +1715,15 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
* leaves invalid data in prunestate, because that data won't be
* consulted again (cf initial Assert in ExecFindMatchingSubPlans).
*/
- if (prunestate && prunestate->do_exec_prune)
+ if (prunestate->do_exec_prune)
PartitionPruneFixSubPlanMap(prunestate,
*initially_valid_subplans,
n_total_subplans);
@@ -1751,7 +1753,8 @@ ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
Bitmapset *valid_subplan_offs;
/*
- * A temporary context to allocate stuff needded to run the pruning steps.
+ * A temporary context to for memory allocations required while execution
+ * partition pruning steps.
*/
tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
"initial pruning working data",
@@ -1765,11 +1768,12 @@ ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
/*
- * We don't yet have a PlanState for the parent plan node, so must create
- * a standalone ExprContext to evaluate pruning expressions, equipped with
- * the information about the EXTERN parameters that the caller passed us.
- * Note that that's okay because the initial pruning steps do not contain
- * anything that requires the execution to have started.
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started.
*/
econtext = CreateStandaloneExprContext();
econtext->ecxt_param_list_info = params;
@@ -1874,7 +1878,6 @@ CreatePartitionPruneState(PlanState *planstate,
PartitionedRelPruneInfo *pinfo = lfirst_node(PartitionedRelPruneInfo, lc2);
PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
Relation partrel;
- bool close_partrel = false;
PartitionDesc partdesc;
PartitionKey partkey;
@@ -1894,7 +1897,6 @@ CreatePartitionPruneState(PlanState *planstate,
int lockmode = (j == 0) ? NoLock : rte->rellockmode;
partrel = table_open(rte->relid, lockmode);
- close_partrel = true;
}
else
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
@@ -1914,7 +1916,7 @@ CreatePartitionPruneState(PlanState *planstate,
* Must close partrel, keeping the lock taken, if we're not using
* EState's entry.
*/
- if (close_partrel)
+ if (estate == NULL)
table_close(partrel, NoLock);
/*
@@ -2367,6 +2369,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
{
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ /* XXX why would pprune->rti_map[i] ever be zero here??? */
+ Assert(pprune->rti_map[i] > 0);
if (scan_leafpart_rtis && pprune->rti_map[i] > 0)
*scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
pprune->rti_map[i]);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 639145abe9..f9c7976ff2 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,7 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
estate->es_part_prune_result = NULL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 09f26658e2..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -94,7 +94,6 @@ static bool ExecAppendAsyncRequest(AppendState *node, TupleTableSlot **result);
static void ExecAppendAsyncEventWait(AppendState *node);
static void classify_matching_subplans(AppendState *node);
-
/* ----------------------------------------------------------------
* ExecInitAppend
*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ec6b1f1fc0..fe0df2f1d1 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1184,7 +1184,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- int part_prune_index = -1;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1335,6 +1334,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1358,18 +1360,15 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- part_prune_index= make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- /* Will be updated later in set_plan_references(). */
- plan->part_prune_index = part_prune_index;
-
copy_generic_path_info(&plan->plan, (Path *) best_path);
/*
@@ -1408,7 +1407,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- int part_prune_index = -1;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1501,6 +1499,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1524,15 +1525,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- part_prune_index= make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- /* Will be updated later in set_plan_references(). */
- node->part_prune_index = part_prune_index;
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index c88e5bacac..63ec8a98fc 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -408,6 +408,13 @@ set_plan_references(PlannerInfo *root, Plan *plan)
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * XXX is it worth doing a bms_copy() on glob->minLockRelids if
+ * glob->containsInitialPruning is true?. I'm slighly worried that the
+ * Bitmapset could have a very long empty tail resulting in excessive
+ * looping during AcquireExecutorLocks().
+ */
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 5a5f5dee46..952c5b8327 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -212,12 +212,12 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
* Checks if the given set of quals can be used to build pruning steps
- * that the executor will use to prune useless ones from given set of
- * child paths, and if so builds a PartitionPruneInfo that will allow the
- * executor to do do and append it to root->partPruneInfos.
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
*
- * Return value is 0-based index of the added PartitionPruneInfo or -1 if one
- * was not built after all.
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
@@ -335,10 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
- if (!needs_init_pruning)
- needs_init_pruning = partrel_needs_init_pruning;
- if (!needs_exec_pruning)
- needs_exec_pruning = partrel_needs_exec_pruning;
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -570,7 +569,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* that would require per-scan pruning.
*
* In the first pass, we note whether the 2nd pass is necessary by
- * by noting the presence of EXEC parameters.
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -645,10 +644,11 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
- if (!*needs_init_pruning)
- *needs_init_pruning = (initial_pruning_steps != NIL);
- if (!*needs_exec_pruning)
- *needs_exec_pruning = (exec_pruning_steps != NIL);
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
pinfolist = lappend(pinfolist, pinfo);
}
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index a627448a5a..8cc2e2162d 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -1204,7 +1204,6 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
- int i;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1225,15 +1224,9 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- i = 0;
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
- PartitionPruneResult *part_prune_result = portal->part_prune_results ?
- list_nth(portal->part_prune_results, i) :
- NULL;
-
- i++;
/*
* If we got a cancel signal in prior command, quit
@@ -1242,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ PartitionPruneResult *part_prune_result = NULL;
+
/*
* process a plannable query.
*/
@@ -1288,6 +1283,14 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding PartitionPruneResult for
+ * this PlannedStmt.
+ */
+ if (portal->part_prune_results != NIL)
+ part_prune_result = list_nth(portal->part_prune_results,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 34975c69ee..bbc8c42d88 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_resul,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 43bd293433..a8bf908d63 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1000,11 +1000,11 @@ typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
*
* This is used by GetCachedPlan() to inform its callers of the pruning
* decisions made when performing AcquireExecutorLocks() on a given cached
- * PlannedStmt, which the callers then pass that on to the executor. The
- * executor refers to this node when made available when initializing the plan
- * nodes to which those PartitionPruneInfos apply so that the same set of
- * qualifying subplans are initialized, rather than deriving that set again by
- * redoing initial pruning.
+ * PlannedStmt, which the callers then pass onto the executor. The executor
+ * refers to this node when made available when initializing the plan nodes to
+ * which those PartitionPruneInfos apply so that the same set of qualifying
+ * subplans are initialized, rather than deriving that set again by redoing
+ * initial pruning.
*/
typedef struct PartitionPruneResult
{
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 550308147d..f8f3971f44 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -274,11 +274,7 @@ typedef struct Append
*/
int first_partial_plan;
- /*
- * Index of this plan's PartitionPruneInfo in PlannedStmt.partPruneInfos
- * to be used for run-time subplan pruning; -1 if run-time pruning is
- * not needed.
- */
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
int part_prune_index;
} Append;
@@ -299,11 +295,7 @@ typedef struct MergeAppend
Oid *collations; /* OIDs of collations */
bool *nullsFirst; /* NULLS FIRST/LAST directions */
- /*
- * Index of this plan's PartitionPruneInfo in PlannedStmt.partPruneInfos
- * to be used for run-time subplan pruning; -1 if run-time pruning is
- * not needed.
- */
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
int part_prune_index;
} MergeAppend;
Hi David,
On Fri, Apr 8, 2022 at 8:16 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Fri, 8 Apr 2022 at 17:49, Amit Langote <amitlangote09@gmail.com> wrote:
Attached updated patch with these changes.
Thanks for making the changes. I started looking over this patch but
really feel like it needs quite a few more iterations of what we've
just been doing to get it into proper committable shape. There seems
to be only about 40 mins to go before the freeze, so it seems very
unrealistic that it could be made to work.
Yeah, totally understandable.
I started trying to take a serious look at it this evening, but I feel
like I just failed to get into it deep enough to make any meaningful
improvements. I'd need more time to study the problem before I could
build up a proper opinion on how exactly I think it should work.Anyway. I've attached a small patch that's just a few things I
adjusted or questions while reading over your v13 patch. Some of
these are just me questioning your code (See XXX comments) and some I
think are improvements. Feel free to take the hunks that you see fit
and drop anything you don't.
Thanks a lot for compiling those.
Most looked fine changes to me except a couple of typos, so I've
adopted those into the attached new version, even though I know it's
too late to try to apply it. Re the XXX comments:
+ /* XXX why would pprune->rti_map[i] ever be zero here??? */
Yeah, no there can't be, was perhaps being overly paraioid.
+ * XXX is it worth doing a bms_copy() on glob->minLockRelids if
+ * glob->containsInitialPruning is true?. I'm slighly worried that the
+ * Bitmapset could have a very long empty tail resulting in excessive
+ * looping during AcquireExecutorLocks().
+ */
I guess I trust your instincts about bitmapset operation efficiency
and what you've written here makes sense. It's typical for leaf
partitions to have been appended toward the tail end of rtable and I'd
imagine their indexes would be in the tail words of minLockRelids. If
copying the bitmapset removes those useless words, I don't see why we
shouldn't do that. So added:
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bit from it just above to prevent empty tail bits resulting in
+ * inefficient looping during AcquireExecutorLocks().
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids)
Not 100% about the comment I wrote.
--
Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v14-0001-Optimize-AcquireExecutorLocks-to-skip-pruned-par.patchapplication/octet-stream; name=v14-0001-Optimize-AcquireExecutorLocks-to-skip-pruned-par.patchDownload
From 552da9453f0c4896bcc8748719960db52b3ccad1 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v14] Optimize AcquireExecutorLocks() to skip pruned partitions
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 27 +++
src/backend/executor/execMain.c | 48 +++++
src/backend/executor/execParallel.c | 28 ++-
src/backend/executor/execPartition.c | 241 ++++++++++++++++++++----
src/backend/executor/execUtils.c | 2 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 15 +-
src/backend/executor/nodeMergeAppend.c | 9 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/copyfuncs.c | 33 +++-
src/backend/nodes/outfuncs.c | 36 +++-
src/backend/nodes/readfuncs.c | 56 +++++-
src/backend/optimizer/plan/createplan.c | 24 +--
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/plan/setrefs.c | 112 ++++++++---
src/backend/partitioning/partprune.c | 59 +++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 184 +++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 12 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 2 +
src/include/nodes/execnodes.h | 30 +++
src/include/nodes/nodes.h | 4 +
src/include/nodes/pathnodes.h | 15 ++
src/include/nodes/plannodes.h | 31 ++-
src/include/partitioning/partprune.h | 8 +-
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
37 files changed, 950 insertions(+), 167 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 55c38b04c4..d403eb2309 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -542,7 +542,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d2a2479822..35dd24adf8 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_result,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_result, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 1013790dbb..54734a3a93 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ab248d25e..2be1782bc4 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -416,7 +416,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 80738547ed..c7360712b1 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_result_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_result_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_result_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_result, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..e0802be723 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,29 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+proceeds by locking all the relations that will be scanned by that plan. If
+the generic plan contains nodes that can perform execution time partition
+pruning (that is, contain a PartitionPruneInfo), a subset of pruning steps
+contained in the PartitionPruneInfos that do not depend on execution actually
+having started (called "initial" pruning steps) are performed at this point
+to figure out the minimal set of child subplans that satisfy those pruning
+instructions. AcquireExecutorLocks() looking at a particular plan will then
+lock only the relations scanned by those surviving subplans (along with those
+present in PlannedStmt.minLockRelids), and ignore those scanned by the pruned
+subplans, even though the pruned subplans themselves are not removed from the
+plan tree. The result of pruning (that is, the set of indexes of surviving
+subplans in their parent's list of child subplans) is saved as a list of
+bitmapsets, with one element for every PartitionPruneInfo referenced in the
+plan (PlannedStmt.partPruneInfos). The list is packaged into a
+PartitionPruneResult node, which is passed along with the PlannedStmt to the
+executor via the QueryDesc. It is imperative that the executor and any third
+party code invoked by it that gets passed the plan tree look at the plan's
+PartitionPruneResult to determine whether a particular child subplan of a
+parent node that supports pruning is valid for a given execution.
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +309,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ef2fd46092..5ee978937d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,11 +49,13 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
#include "parser/parsetree.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
@@ -104,6 +106,49 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * Performs initial partition pruning to figure out the minimal set of
+ * subplans to be executed and the set of RT indexes of the corresponding
+ * leaf partitions
+ *
+ * Returned PartitionPruneResult must be subsequently passed to the executor
+ * so that it can reuse the result of pruning. It's important that the
+ * has the same view of which partitions are initially pruned (by not doing
+ * the pruning again itself) or otherwise it risks initializing subplans whose
+ * partitions would not have been locked.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ *
+ * ----------------------------------------------------------------
+ */
+PartitionPruneResult *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ PartitionPruneResult *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ result = makeNode(PartitionPruneResult);
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *valid_subplan_offs;
+
+ valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ &result->scan_leafpart_rtis);
+ result->valid_subplan_offs_list =
+ lappend(result->valid_subplan_offs_list,
+ valid_subplan_offs);
+ }
+
+ return result;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +851,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ PartitionPruneResult *part_prune_result = queryDesc->part_prune_result;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -825,6 +871,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_result = part_prune_result;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 9a0d5d59ef..805f86c503 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITIONPRUNERESULT UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,7 +183,9 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
@@ -596,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_result_data;
+ char *part_prune_result_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_result_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -630,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_result_data = nodeToString(estate->es_part_prune_result);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -656,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized PartitionPruneResult. */
+ part_prune_result_len = strlen(part_prune_result_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_result_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -750,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized PartitionPruneResult */
+ part_prune_result_space = shm_toc_allocate(pcxt->toc, part_prune_result_len);
+ memcpy(part_prune_result_space, part_prune_result_data, part_prune_result_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITIONPRUNERESULT,
+ part_prune_result_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1231,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_result_space;
char *paramspace;
PlannedStmt *pstmt;
+ PartitionPruneResult *part_prune_result;
ParamListInfo paramLI;
char *queryString;
@@ -1243,12 +1263,18 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_result_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITIONPRUNERESULT, false);
+ part_prune_result = (PartitionPruneResult *)
+ stringToNode(part_prune_result_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_result,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 615bd80973..af87b9197f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1587,8 +1593,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1605,6 +1613,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1622,8 +1637,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecDoInitialPruning()), and in that case only the surviving subplans'
+ * indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1632,29 +1648,66 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
+ PartitionPruneResult *pruneresult = estate->es_part_prune_result;
+ PartitionPruneState *prunestate;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_result
+ * has been set.
+ */
+ if (pruneresult)
+ do_pruning = pruneinfo->needs_exec_pruning;
+
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans =
+ list_nth(pruneresult->valid_subplan_offs_list, part_prune_index);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
/* No pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1662,7 +1715,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1678,11 +1732,74 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans to be executed of the parent plan
+ * node to which the PartitionPruneInfo belongs and also the set of RT
+ * indexes of leaf partitions that will scanned with those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1696,19 +1813,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1763,15 +1882,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1785,6 +1931,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1795,6 +1942,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -1845,6 +1994,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -1852,6 +2003,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -1873,7 +2025,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1883,7 +2035,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2111,10 +2263,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2149,7 +2305,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2163,6 +2319,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2173,13 +2331,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2206,8 +2366,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2215,7 +2381,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..f9c7976ff2 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,8 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
+ estate->es_part_prune_result = NULL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f9460ae506..a2182a6b1f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -155,7 +155,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +578,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +643,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +718,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +869,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index ecf9052e03..7708cfffda 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 042a5f8b0a..729e2fd7b2 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_result_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_result_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_result_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_result,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 836f427ea8..59a7054011 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -96,7 +96,10 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(parallelModeNeeded);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
+ COPY_NODE_FIELD(partPruneInfos);
+ COPY_SCALAR_FIELD(containsInitialPruning);
COPY_NODE_FIELD(rtable);
+ COPY_BITMAPSET_FIELD(minLockRelids);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
COPY_NODE_FIELD(subplans);
@@ -253,7 +256,7 @@ _copyAppend(const Append *from)
COPY_NODE_FIELD(appendplans);
COPY_SCALAR_FIELD(nasyncplans);
COPY_SCALAR_FIELD(first_partial_plan);
- COPY_NODE_FIELD(part_prune_info);
+ COPY_SCALAR_FIELD(part_prune_index);
return newnode;
}
@@ -281,7 +284,7 @@ _copyMergeAppend(const MergeAppend *from)
COPY_POINTER_FIELD(sortOperators, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
- COPY_NODE_FIELD(part_prune_info);
+ COPY_SCALAR_FIELD(part_prune_index);
return newnode;
}
@@ -1283,6 +1286,8 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(needs_init_pruning);
+ COPY_SCALAR_FIELD(needs_exec_pruning);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
@@ -1299,6 +1304,7 @@ _copyPartitionedRelPruneInfo(const PartitionedRelPruneInfo *from)
COPY_POINTER_FIELD(subplan_map, from->nparts * sizeof(int));
COPY_POINTER_FIELD(subpart_map, from->nparts * sizeof(int));
COPY_POINTER_FIELD(relid_map, from->nparts * sizeof(Oid));
+ COPY_POINTER_FIELD(rti_map, from->nparts * sizeof(Index));
COPY_NODE_FIELD(initial_pruning_steps);
COPY_NODE_FIELD(exec_pruning_steps);
COPY_BITMAPSET_FIELD(execparamids);
@@ -5473,6 +5479,21 @@ _copyExtensibleNode(const ExtensibleNode *from)
return newnode;
}
+/* ****************************************************************
+ * execnodes.h copy functions
+ * ****************************************************************
+ */
+static PartitionPruneResult *
+_copyPartitionPruneResult(const PartitionPruneResult *from)
+{
+ PartitionPruneResult *newnode = makeNode(PartitionPruneResult);
+
+ COPY_NODE_FIELD(valid_subplan_offs_list);
+ COPY_BITMAPSET_FIELD(scan_leafpart_rtis);
+
+ return newnode;
+}
+
/* ****************************************************************
* value.h copy functions
* ****************************************************************
@@ -5527,7 +5548,6 @@ _copyBitString(const BitString *from)
return newnode;
}
-
static ForeignKeyCacheInfo *
_copyForeignKeyCacheInfo(const ForeignKeyCacheInfo *from)
{
@@ -6569,6 +6589,13 @@ copyObjectImpl(const void *from)
retval = _copyPublicationTable(from);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_PartitionPruneResult:
+ retval = _copyPartitionPruneResult(from);
+ break;
+
/*
* MISCELLANEOUS NODES
*/
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index d5f5e76c55..3dada68291 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -314,7 +314,10 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(parallelModeNeeded);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
+ WRITE_NODE_FIELD(partPruneInfos);
+ WRITE_BOOL_FIELD(containsInitialPruning);
WRITE_NODE_FIELD(rtable);
+ WRITE_BITMAPSET_FIELD(minLockRelids);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
WRITE_NODE_FIELD(subplans);
@@ -443,7 +446,7 @@ _outAppend(StringInfo str, const Append *node)
WRITE_NODE_FIELD(appendplans);
WRITE_INT_FIELD(nasyncplans);
WRITE_INT_FIELD(first_partial_plan);
- WRITE_NODE_FIELD(part_prune_info);
+ WRITE_INT_FIELD(part_prune_index);
}
static void
@@ -460,7 +463,7 @@ _outMergeAppend(StringInfo str, const MergeAppend *node)
WRITE_OID_ARRAY(sortOperators, node->numCols);
WRITE_OID_ARRAY(collations, node->numCols);
WRITE_BOOL_ARRAY(nullsFirst, node->numCols);
- WRITE_NODE_FIELD(part_prune_info);
+ WRITE_INT_FIELD(part_prune_index);
}
static void
@@ -1009,6 +1012,8 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(needs_init_pruning);
+ WRITE_BOOL_FIELD(needs_exec_pruning);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -1023,6 +1028,7 @@ _outPartitionedRelPruneInfo(StringInfo str, const PartitionedRelPruneInfo *node)
WRITE_INT_ARRAY(subplan_map, node->nparts);
WRITE_INT_ARRAY(subpart_map, node->nparts);
WRITE_OID_ARRAY(relid_map, node->nparts);
+ WRITE_INDEX_ARRAY(rti_map, node->nparts);
WRITE_NODE_FIELD(initial_pruning_steps);
WRITE_NODE_FIELD(exec_pruning_steps);
WRITE_BITMAPSET_FIELD(execparamids);
@@ -2425,6 +2431,9 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_NODE_FIELD(finalrowmarks);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
+ WRITE_NODE_FIELD(partPruneInfos);
+ WRITE_BOOL_FIELD(containsInitialPruning);
+ WRITE_BITMAPSET_FIELD(minLockRelids);
WRITE_NODE_FIELD(relationOids);
WRITE_NODE_FIELD(invalItems);
WRITE_NODE_FIELD(paramExecTypes);
@@ -2492,6 +2501,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
WRITE_BOOL_FIELD(partColsUpdated);
+ WRITE_NODE_FIELD(partPruneInfos);
}
static void
@@ -2845,6 +2855,21 @@ _outExtensibleNode(StringInfo str, const ExtensibleNode *node)
methods->nodeOut(str, node);
}
+/*****************************************************************************
+ *
+ * Stuff from execnodes.h
+ *
+ *****************************************************************************/
+
+static void
+_outPartitionPruneResult(StringInfo str, const PartitionPruneResult *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNERESULT");
+
+ WRITE_NODE_FIELD(valid_subplan_offs_list);
+ WRITE_BITMAPSET_FIELD(scan_leafpart_rtis);
+}
+
/*****************************************************************************
*
* Stuff from parsenodes.h.
@@ -4754,6 +4779,13 @@ outNode(StringInfo str, const void *obj)
_outJsonTableSibling(str, obj);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_PartitionPruneResult:
+ _outPartitionPruneResult(str, obj);
+ break;
+
default:
/*
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 3d150cb25d..6a6fcec03b 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -164,6 +164,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -1815,7 +1820,10 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(parallelModeNeeded);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
+ READ_NODE_FIELD(partPruneInfos);
+ READ_BOOL_FIELD(containsInitialPruning);
READ_NODE_FIELD(rtable);
+ READ_BITMAPSET_FIELD(minLockRelids);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
READ_NODE_FIELD(subplans);
@@ -1947,7 +1955,7 @@ _readAppend(void)
READ_NODE_FIELD(appendplans);
READ_INT_FIELD(nasyncplans);
READ_INT_FIELD(first_partial_plan);
- READ_NODE_FIELD(part_prune_info);
+ READ_INT_FIELD(part_prune_index);
READ_DONE();
}
@@ -1969,7 +1977,7 @@ _readMergeAppend(void)
READ_OID_ARRAY(sortOperators, local_node->numCols);
READ_OID_ARRAY(collations, local_node->numCols);
READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
- READ_NODE_FIELD(part_prune_info);
+ READ_INT_FIELD(part_prune_index);
READ_DONE();
}
@@ -2767,6 +2775,8 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(needs_init_pruning);
+ READ_BOOL_FIELD(needs_exec_pruning);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
@@ -2783,6 +2793,7 @@ _readPartitionedRelPruneInfo(void)
READ_INT_ARRAY(subplan_map, local_node->nparts);
READ_INT_ARRAY(subpart_map, local_node->nparts);
READ_OID_ARRAY(relid_map, local_node->nparts);
+ READ_INDEX_ARRAY(rti_map, local_node->nparts);
READ_NODE_FIELD(initial_pruning_steps);
READ_NODE_FIELD(exec_pruning_steps);
READ_BITMAPSET_FIELD(execparamids);
@@ -2936,6 +2947,21 @@ _readPartitionRangeDatum(void)
READ_DONE();
}
+
+/*
+ * _readPartitionPruneResult
+ */
+static PartitionPruneResult *
+_readPartitionPruneResult(void)
+{
+ READ_LOCALS(PartitionPruneResult);
+
+ READ_NODE_FIELD(valid_subplan_offs_list);
+ READ_BITMAPSET_FIELD(scan_leafpart_rtis);
+
+ READ_DONE();
+}
+
/*
* parseNodeString
*
@@ -3233,6 +3259,8 @@ parseNodeString(void)
return_value = _readJsonTableParent();
else if (MATCH("JSONTABSNODE", 12))
return_value = _readJsonTableSibling();
+ else if (MATCH("PARTITIONPRUNERESULT", 20))
+ return_value = _readPartitionPruneResult();
else
{
elog(ERROR, "badly formatted node string \"%.32s\"...", token);
@@ -3376,6 +3404,30 @@ readIntCols(int numCols)
return int_vals;
}
+/*
+ * readIndexCols
+ */
+Index *
+readIndexCols(int numCols)
+{
+ int tokenLength,
+ i;
+ const char *token;
+ Index *index_vals;
+
+ if (numCols <= 0)
+ return NULL;
+
+ index_vals = (Index *) palloc(numCols * sizeof(Index));
+ for (i = 0; i < numCols; i++)
+ {
+ token = pg_strtok(&tokenLength);
+ index_vals[i] = atoui(token);
+ }
+
+ return index_vals;
+}
+
/*
* readBoolCols
*/
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 95476ada0b..fe0df2f1d1 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1184,7 +1184,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1335,6 +1334,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1358,16 +1360,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1407,7 +1407,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1500,6 +1499,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1523,13 +1525,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b090b087e9..f425362491 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -518,7 +518,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 6ea3505646..94d4ff0b9d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -261,7 +261,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
Plan *result;
PlannerGlobal *glob = root->glob;
int rtoffset = list_length(glob->finalrtable);
- ListCell *lc;
+ ListCell *lc;
/*
* Add all the query's RTEs to the flattened rangetable. The live ones
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -348,6 +358,64 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
+
+ /* RT index of the partitione table. */
+ pinfo->rtindex += rtoffset;
+
+ /* And also those of the leaf partitions. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
+ }
+ }
+
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bit from it just above to prevent empty tail bits resulting in
+ * inefficient looping during AcquireExecutorLocks().
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids)
+
return result;
}
@@ -1640,21 +1708,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1712,21 +1771,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9d3c05aed3..952c5b8327 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -209,16 +211,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -230,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -309,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -323,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -332,11 +347,13 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -358,7 +375,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
@@ -435,13 +454,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -452,6 +476,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -539,6 +567,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -613,6 +644,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -640,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -652,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -666,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -690,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 95dc2e2c83..8dc52a158f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1603,6 +1603,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_result_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1978,7 +1979,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
/*
* Now we can define the portal.
@@ -1993,6 +1996,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..8cc2e2162d 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_result = part_prune_result; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_result: ExecutorDoInitialPruning() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_result, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results == NIL ? NULL :
+ linitial(portal->part_prune_results),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ PartitionPruneResult *part_prune_result = NULL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding PartitionPruneResult for
+ * this PlannedStmt.
+ */
+ if (portal->part_prune_results != NIL)
+ part_prune_result = list_nth(portal->part_prune_results,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 4cf6db504f..6cb473f2f4 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,15 +795,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_result_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +830,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_result_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +866,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /*
+ * The output list and any objects therein have been allocated in the
+ * caller's hopefully short-lived context, so will not remain leaked
+ * for long, though reset to avoid its accidentally being looked at.
+ */
+ *part_prune_result_list = NIL;
}
/*
@@ -874,10 +899,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NULLs is returned in *part_prune_result_list, meaning that no
+ * PartitionPruneResult nodes have yet been created for the plans in
+ * stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1037,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_result_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
+ }
+
return plan;
}
@@ -1126,6 +1167,17 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a PartitionPruneResult or a NULL is added to
+ * *part_prune_result_list if needed. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and contains at least one
+ * PartitionPruneInfo that has "initial" pruning steps. Those steps are
+ * performed by calling ExecutorDoInitialPruning() to determine only those
+ * leaf partitions that need to be locked by AcquireExecutorLocks() by pruning
+ * away subplans that don't match the pruning conditions. The
+ * PartitionPruneResult contains a list of bitmapsets of the indexes of
+ * matching subplans, one for each PartitionPruneInfo.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1191,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_result_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1214,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_result_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1224,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_result_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1270,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_result_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1303,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_result_list)
+ *part_prune_result_list = my_part_prune_result_list;
+
return plan;
}
@@ -1737,17 +1797,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_result_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ PartitionPruneResult *part_prune_result = NULL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1833,35 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Obtain the set of partitions to be locked from the
+ * PartitionPruneInfos by considering the result of performing
+ * initial partition pruning.
+ */
+ PartitionPruneResult *part_prune_result =
+ ExecutorDoInitialPruning(plannedstmt, boundParams);
+
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ part_prune_result->scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1872,58 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_result_list = lappend(*part_prune_result_list,
+ part_prune_result);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index d549f66d4a..1bbe6b704b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given list of PartitionPruneResults into the portal's
+ * context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results)
+{
+ MemoryContext oldcxt;
+
+ AssertArg(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results = copyObject(part_prune_results);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 666977fb1f..bbc8c42d88 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -123,9 +125,13 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
-
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..60d5644908 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ PartitionPruneResult *part_prune_result; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 873772f188..57dc0e8077 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern PartitionPruneResult *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 94b191f8ae..a8bf908d63 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -596,6 +596,8 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ struct PartitionPruneResult *es_part_prune_result; /* QueryDesc.part_prune_result */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -984,6 +986,34 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecutorDoInitialPruning() invocation on a given
+ * PlannedStmt.
+ *
+ * Contains a list of Bitmapset of the indexes of the subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() for every
+ * PartitionPruneInfos found in PlannedStmt.partPruneInfos. RT indexes of the
+ * leaf partitions scanned by those subplans across all PartitionPruneInfos
+ * are added into scan_leafpart_rtis.
+ *
+ * This is used by GetCachedPlan() to inform its callers of the pruning
+ * decisions made when performing AcquireExecutorLocks() on a given cached
+ * PlannedStmt, which the callers then pass onto the executor. The executor
+ * refers to this node when made available when initializing the plan nodes to
+ * which those PartitionPruneInfos apply so that the same set of qualifying
+ * subplans are initialized, rather than deriving that set again by redoing
+ * initial pruning.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ List *valid_subplan_offs_list;
+ Bitmapset *scan_leafpart_rtis;
+} PartitionPruneResult;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 340d28f4e1..66416bce97 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -97,6 +97,9 @@ typedef enum NodeTag
T_PartitionPruneStepCombine,
T_PlanInvalItem,
+ /* TAGS FOR EXECUTOR PREP NODES (execnodes.h) */
+ T_PartitionPruneResult,
+
/*
* TAGS FOR PLAN STATE NODES (execnodes.h)
*
@@ -674,6 +677,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c5ab53e05c..11007cda25 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -107,6 +107,18 @@ typedef struct PlannerGlobal
List *appendRelations; /* "flat" list of AppendRelInfos */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
List *relationOids; /* OIDs of relations the plan depends on */
List *invalItems; /* other dependencies, as PlanInvalItems */
@@ -377,6 +389,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index e43e360d9b..f8f3971f44 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -64,8 +64,20 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -262,8 +274,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -282,8 +294,9 @@ typedef struct MergeAppend
Oid *sortOperators; /* OIDs of operators to sort them by */
Oid *collations; /* OIDs of collations */
bool *nullsFirst; /* NULLS FIRST/LAST directions */
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
@@ -1191,6 +1204,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1199,6 +1219,8 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1229,6 +1251,7 @@ typedef struct PartitionedRelPruneInfo
int *subplan_map; /* subplan index by partition index, or -1 */
int *subpart_map; /* subpart index by partition index, or -1 */
Oid *relid_map; /* relation OID by partition index, or 0 */
+ Index *rti_map; /* Range table index by partition index, 0. */
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 95b99e3d25..449200b949 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9f7727a837 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results; /* list of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_result_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.24.1
On Fri, Apr 8, 2022 at 8:45 PM Amit Langote <amitlangote09@gmail.com> wrote:
Most looked fine changes to me except a couple of typos, so I've
adopted those into the attached new version, even though I know it's
too late to try to apply it.+ * XXX is it worth doing a bms_copy() on glob->minLockRelids if + * glob->containsInitialPruning is true?. I'm slighly worried that the + * Bitmapset could have a very long empty tail resulting in excessive + * looping during AcquireExecutorLocks(). + */I guess I trust your instincts about bitmapset operation efficiency
and what you've written here makes sense. It's typical for leaf
partitions to have been appended toward the tail end of rtable and I'd
imagine their indexes would be in the tail words of minLockRelids. If
copying the bitmapset removes those useless words, I don't see why we
shouldn't do that. So added:+ /* + * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted + * bit from it just above to prevent empty tail bits resulting in + * inefficient looping during AcquireExecutorLocks(). + */ + if (glob->containsInitialPruning) + glob->minLockRelids = bms_copy(glob->minLockRelids)Not 100% about the comment I wrote.
And the quoted code change missed a semicolon in the v14 that I
hurriedly sent on Friday. (Had apparently forgotten to `git add` the
hunk to fix that).
Sending v15 that fixes that to keep the cfbot green for now.
--
Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v15-0001-Optimize-AcquireExecutorLocks-to-skip-pruned-par.patchapplication/octet-stream; name=v15-0001-Optimize-AcquireExecutorLocks-to-skip-pruned-par.patchDownload
From e974c27abda9c53744b93f2c6e0f1083ddeedbba Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v15] Optimize AcquireExecutorLocks() to skip pruned partitions
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 27 +++
src/backend/executor/execMain.c | 48 +++++
src/backend/executor/execParallel.c | 28 ++-
src/backend/executor/execPartition.c | 241 ++++++++++++++++++++----
src/backend/executor/execUtils.c | 2 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 15 +-
src/backend/executor/nodeMergeAppend.c | 9 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/copyfuncs.c | 33 +++-
src/backend/nodes/outfuncs.c | 36 +++-
src/backend/nodes/readfuncs.c | 56 +++++-
src/backend/optimizer/plan/createplan.c | 24 +--
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/plan/setrefs.c | 112 ++++++++---
src/backend/partitioning/partprune.c | 59 +++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 184 +++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 12 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 2 +
src/include/nodes/execnodes.h | 30 +++
src/include/nodes/nodes.h | 4 +
src/include/nodes/pathnodes.h | 15 ++
src/include/nodes/plannodes.h | 31 ++-
src/include/partitioning/partprune.h | 8 +-
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
37 files changed, 950 insertions(+), 167 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 55c38b04c4..d403eb2309 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -542,7 +542,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d2a2479822..35dd24adf8 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_result,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_result, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 1013790dbb..54734a3a93 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ab248d25e..2be1782bc4 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -416,7 +416,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 80738547ed..c7360712b1 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_result_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_result_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_result_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_result, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..e0802be723 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,29 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+proceeds by locking all the relations that will be scanned by that plan. If
+the generic plan contains nodes that can perform execution time partition
+pruning (that is, contain a PartitionPruneInfo), a subset of pruning steps
+contained in the PartitionPruneInfos that do not depend on execution actually
+having started (called "initial" pruning steps) are performed at this point
+to figure out the minimal set of child subplans that satisfy those pruning
+instructions. AcquireExecutorLocks() looking at a particular plan will then
+lock only the relations scanned by those surviving subplans (along with those
+present in PlannedStmt.minLockRelids), and ignore those scanned by the pruned
+subplans, even though the pruned subplans themselves are not removed from the
+plan tree. The result of pruning (that is, the set of indexes of surviving
+subplans in their parent's list of child subplans) is saved as a list of
+bitmapsets, with one element for every PartitionPruneInfo referenced in the
+plan (PlannedStmt.partPruneInfos). The list is packaged into a
+PartitionPruneResult node, which is passed along with the PlannedStmt to the
+executor via the QueryDesc. It is imperative that the executor and any third
+party code invoked by it that gets passed the plan tree look at the plan's
+PartitionPruneResult to determine whether a particular child subplan of a
+parent node that supports pruning is valid for a given execution.
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +309,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ef2fd46092..5ee978937d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,11 +49,13 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
#include "parser/parsetree.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
@@ -104,6 +106,49 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * Performs initial partition pruning to figure out the minimal set of
+ * subplans to be executed and the set of RT indexes of the corresponding
+ * leaf partitions
+ *
+ * Returned PartitionPruneResult must be subsequently passed to the executor
+ * so that it can reuse the result of pruning. It's important that the
+ * has the same view of which partitions are initially pruned (by not doing
+ * the pruning again itself) or otherwise it risks initializing subplans whose
+ * partitions would not have been locked.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ *
+ * ----------------------------------------------------------------
+ */
+PartitionPruneResult *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ PartitionPruneResult *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ result = makeNode(PartitionPruneResult);
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *valid_subplan_offs;
+
+ valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ &result->scan_leafpart_rtis);
+ result->valid_subplan_offs_list =
+ lappend(result->valid_subplan_offs_list,
+ valid_subplan_offs);
+ }
+
+ return result;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +851,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ PartitionPruneResult *part_prune_result = queryDesc->part_prune_result;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -825,6 +871,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_result = part_prune_result;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 9a0d5d59ef..805f86c503 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITIONPRUNERESULT UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,7 +183,9 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
@@ -596,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_result_data;
+ char *part_prune_result_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_result_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -630,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_result_data = nodeToString(estate->es_part_prune_result);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -656,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized PartitionPruneResult. */
+ part_prune_result_len = strlen(part_prune_result_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_result_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -750,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized PartitionPruneResult */
+ part_prune_result_space = shm_toc_allocate(pcxt->toc, part_prune_result_len);
+ memcpy(part_prune_result_space, part_prune_result_data, part_prune_result_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITIONPRUNERESULT,
+ part_prune_result_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1231,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_result_space;
char *paramspace;
PlannedStmt *pstmt;
+ PartitionPruneResult *part_prune_result;
ParamListInfo paramLI;
char *queryString;
@@ -1243,12 +1263,18 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_result_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITIONPRUNERESULT, false);
+ part_prune_result = (PartitionPruneResult *)
+ stringToNode(part_prune_result_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_result,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 615bd80973..af87b9197f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1587,8 +1593,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1605,6 +1613,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1622,8 +1637,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecDoInitialPruning()), and in that case only the surviving subplans'
+ * indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1632,29 +1648,66 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
+ PartitionPruneResult *pruneresult = estate->es_part_prune_result;
+ PartitionPruneState *prunestate;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_result
+ * has been set.
+ */
+ if (pruneresult)
+ do_pruning = pruneinfo->needs_exec_pruning;
+
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans =
+ list_nth(pruneresult->valid_subplan_offs_list, part_prune_index);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
/* No pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1662,7 +1715,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1678,11 +1732,74 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans to be executed of the parent plan
+ * node to which the PartitionPruneInfo belongs and also the set of RT
+ * indexes of leaf partitions that will scanned with those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1696,19 +1813,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1763,15 +1882,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1785,6 +1931,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1795,6 +1942,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -1845,6 +1994,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -1852,6 +2003,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -1873,7 +2025,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1883,7 +2035,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2111,10 +2263,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2149,7 +2305,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2163,6 +2319,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2173,13 +2331,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2206,8 +2366,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2215,7 +2381,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..f9c7976ff2 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,8 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
+ estate->es_part_prune_result = NULL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f9460ae506..a2182a6b1f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -155,7 +155,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +578,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +643,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +718,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +869,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index ecf9052e03..7708cfffda 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 042a5f8b0a..729e2fd7b2 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_result_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_result_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_result_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_result,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 836f427ea8..59a7054011 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -96,7 +96,10 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(parallelModeNeeded);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
+ COPY_NODE_FIELD(partPruneInfos);
+ COPY_SCALAR_FIELD(containsInitialPruning);
COPY_NODE_FIELD(rtable);
+ COPY_BITMAPSET_FIELD(minLockRelids);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
COPY_NODE_FIELD(subplans);
@@ -253,7 +256,7 @@ _copyAppend(const Append *from)
COPY_NODE_FIELD(appendplans);
COPY_SCALAR_FIELD(nasyncplans);
COPY_SCALAR_FIELD(first_partial_plan);
- COPY_NODE_FIELD(part_prune_info);
+ COPY_SCALAR_FIELD(part_prune_index);
return newnode;
}
@@ -281,7 +284,7 @@ _copyMergeAppend(const MergeAppend *from)
COPY_POINTER_FIELD(sortOperators, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
- COPY_NODE_FIELD(part_prune_info);
+ COPY_SCALAR_FIELD(part_prune_index);
return newnode;
}
@@ -1283,6 +1286,8 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(needs_init_pruning);
+ COPY_SCALAR_FIELD(needs_exec_pruning);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
@@ -1299,6 +1304,7 @@ _copyPartitionedRelPruneInfo(const PartitionedRelPruneInfo *from)
COPY_POINTER_FIELD(subplan_map, from->nparts * sizeof(int));
COPY_POINTER_FIELD(subpart_map, from->nparts * sizeof(int));
COPY_POINTER_FIELD(relid_map, from->nparts * sizeof(Oid));
+ COPY_POINTER_FIELD(rti_map, from->nparts * sizeof(Index));
COPY_NODE_FIELD(initial_pruning_steps);
COPY_NODE_FIELD(exec_pruning_steps);
COPY_BITMAPSET_FIELD(execparamids);
@@ -5473,6 +5479,21 @@ _copyExtensibleNode(const ExtensibleNode *from)
return newnode;
}
+/* ****************************************************************
+ * execnodes.h copy functions
+ * ****************************************************************
+ */
+static PartitionPruneResult *
+_copyPartitionPruneResult(const PartitionPruneResult *from)
+{
+ PartitionPruneResult *newnode = makeNode(PartitionPruneResult);
+
+ COPY_NODE_FIELD(valid_subplan_offs_list);
+ COPY_BITMAPSET_FIELD(scan_leafpart_rtis);
+
+ return newnode;
+}
+
/* ****************************************************************
* value.h copy functions
* ****************************************************************
@@ -5527,7 +5548,6 @@ _copyBitString(const BitString *from)
return newnode;
}
-
static ForeignKeyCacheInfo *
_copyForeignKeyCacheInfo(const ForeignKeyCacheInfo *from)
{
@@ -6569,6 +6589,13 @@ copyObjectImpl(const void *from)
retval = _copyPublicationTable(from);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_PartitionPruneResult:
+ retval = _copyPartitionPruneResult(from);
+ break;
+
/*
* MISCELLANEOUS NODES
*/
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index d5f5e76c55..3dada68291 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -314,7 +314,10 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(parallelModeNeeded);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
+ WRITE_NODE_FIELD(partPruneInfos);
+ WRITE_BOOL_FIELD(containsInitialPruning);
WRITE_NODE_FIELD(rtable);
+ WRITE_BITMAPSET_FIELD(minLockRelids);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
WRITE_NODE_FIELD(subplans);
@@ -443,7 +446,7 @@ _outAppend(StringInfo str, const Append *node)
WRITE_NODE_FIELD(appendplans);
WRITE_INT_FIELD(nasyncplans);
WRITE_INT_FIELD(first_partial_plan);
- WRITE_NODE_FIELD(part_prune_info);
+ WRITE_INT_FIELD(part_prune_index);
}
static void
@@ -460,7 +463,7 @@ _outMergeAppend(StringInfo str, const MergeAppend *node)
WRITE_OID_ARRAY(sortOperators, node->numCols);
WRITE_OID_ARRAY(collations, node->numCols);
WRITE_BOOL_ARRAY(nullsFirst, node->numCols);
- WRITE_NODE_FIELD(part_prune_info);
+ WRITE_INT_FIELD(part_prune_index);
}
static void
@@ -1009,6 +1012,8 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(needs_init_pruning);
+ WRITE_BOOL_FIELD(needs_exec_pruning);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -1023,6 +1028,7 @@ _outPartitionedRelPruneInfo(StringInfo str, const PartitionedRelPruneInfo *node)
WRITE_INT_ARRAY(subplan_map, node->nparts);
WRITE_INT_ARRAY(subpart_map, node->nparts);
WRITE_OID_ARRAY(relid_map, node->nparts);
+ WRITE_INDEX_ARRAY(rti_map, node->nparts);
WRITE_NODE_FIELD(initial_pruning_steps);
WRITE_NODE_FIELD(exec_pruning_steps);
WRITE_BITMAPSET_FIELD(execparamids);
@@ -2425,6 +2431,9 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_NODE_FIELD(finalrowmarks);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
+ WRITE_NODE_FIELD(partPruneInfos);
+ WRITE_BOOL_FIELD(containsInitialPruning);
+ WRITE_BITMAPSET_FIELD(minLockRelids);
WRITE_NODE_FIELD(relationOids);
WRITE_NODE_FIELD(invalItems);
WRITE_NODE_FIELD(paramExecTypes);
@@ -2492,6 +2501,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
WRITE_BOOL_FIELD(partColsUpdated);
+ WRITE_NODE_FIELD(partPruneInfos);
}
static void
@@ -2845,6 +2855,21 @@ _outExtensibleNode(StringInfo str, const ExtensibleNode *node)
methods->nodeOut(str, node);
}
+/*****************************************************************************
+ *
+ * Stuff from execnodes.h
+ *
+ *****************************************************************************/
+
+static void
+_outPartitionPruneResult(StringInfo str, const PartitionPruneResult *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNERESULT");
+
+ WRITE_NODE_FIELD(valid_subplan_offs_list);
+ WRITE_BITMAPSET_FIELD(scan_leafpart_rtis);
+}
+
/*****************************************************************************
*
* Stuff from parsenodes.h.
@@ -4754,6 +4779,13 @@ outNode(StringInfo str, const void *obj)
_outJsonTableSibling(str, obj);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_PartitionPruneResult:
+ _outPartitionPruneResult(str, obj);
+ break;
+
default:
/*
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 3d150cb25d..6a6fcec03b 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -164,6 +164,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -1815,7 +1820,10 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(parallelModeNeeded);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
+ READ_NODE_FIELD(partPruneInfos);
+ READ_BOOL_FIELD(containsInitialPruning);
READ_NODE_FIELD(rtable);
+ READ_BITMAPSET_FIELD(minLockRelids);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
READ_NODE_FIELD(subplans);
@@ -1947,7 +1955,7 @@ _readAppend(void)
READ_NODE_FIELD(appendplans);
READ_INT_FIELD(nasyncplans);
READ_INT_FIELD(first_partial_plan);
- READ_NODE_FIELD(part_prune_info);
+ READ_INT_FIELD(part_prune_index);
READ_DONE();
}
@@ -1969,7 +1977,7 @@ _readMergeAppend(void)
READ_OID_ARRAY(sortOperators, local_node->numCols);
READ_OID_ARRAY(collations, local_node->numCols);
READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
- READ_NODE_FIELD(part_prune_info);
+ READ_INT_FIELD(part_prune_index);
READ_DONE();
}
@@ -2767,6 +2775,8 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(needs_init_pruning);
+ READ_BOOL_FIELD(needs_exec_pruning);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
@@ -2783,6 +2793,7 @@ _readPartitionedRelPruneInfo(void)
READ_INT_ARRAY(subplan_map, local_node->nparts);
READ_INT_ARRAY(subpart_map, local_node->nparts);
READ_OID_ARRAY(relid_map, local_node->nparts);
+ READ_INDEX_ARRAY(rti_map, local_node->nparts);
READ_NODE_FIELD(initial_pruning_steps);
READ_NODE_FIELD(exec_pruning_steps);
READ_BITMAPSET_FIELD(execparamids);
@@ -2936,6 +2947,21 @@ _readPartitionRangeDatum(void)
READ_DONE();
}
+
+/*
+ * _readPartitionPruneResult
+ */
+static PartitionPruneResult *
+_readPartitionPruneResult(void)
+{
+ READ_LOCALS(PartitionPruneResult);
+
+ READ_NODE_FIELD(valid_subplan_offs_list);
+ READ_BITMAPSET_FIELD(scan_leafpart_rtis);
+
+ READ_DONE();
+}
+
/*
* parseNodeString
*
@@ -3233,6 +3259,8 @@ parseNodeString(void)
return_value = _readJsonTableParent();
else if (MATCH("JSONTABSNODE", 12))
return_value = _readJsonTableSibling();
+ else if (MATCH("PARTITIONPRUNERESULT", 20))
+ return_value = _readPartitionPruneResult();
else
{
elog(ERROR, "badly formatted node string \"%.32s\"...", token);
@@ -3376,6 +3404,30 @@ readIntCols(int numCols)
return int_vals;
}
+/*
+ * readIndexCols
+ */
+Index *
+readIndexCols(int numCols)
+{
+ int tokenLength,
+ i;
+ const char *token;
+ Index *index_vals;
+
+ if (numCols <= 0)
+ return NULL;
+
+ index_vals = (Index *) palloc(numCols * sizeof(Index));
+ for (i = 0; i < numCols; i++)
+ {
+ token = pg_strtok(&tokenLength);
+ index_vals[i] = atoui(token);
+ }
+
+ return index_vals;
+}
+
/*
* readBoolCols
*/
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 95476ada0b..fe0df2f1d1 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1184,7 +1184,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1335,6 +1334,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1358,16 +1360,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1407,7 +1407,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1500,6 +1499,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1523,13 +1525,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b090b087e9..f425362491 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -518,7 +518,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 6ea3505646..c5549a19b4 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -261,7 +261,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
Plan *result;
PlannerGlobal *glob = root->glob;
int rtoffset = list_length(glob->finalrtable);
- ListCell *lc;
+ ListCell *lc;
/*
* Add all the query's RTEs to the flattened rangetable. The live ones
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -348,6 +358,64 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
+
+ /* RT index of the partitione table. */
+ pinfo->rtindex += rtoffset;
+
+ /* And also those of the leaf partitions. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
+ }
+ }
+
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bit from it just above to prevent empty tail bits resulting in
+ * inefficient looping during AcquireExecutorLocks().
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
@@ -1640,21 +1708,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1712,21 +1771,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9d3c05aed3..952c5b8327 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -209,16 +211,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -230,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -309,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -323,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -332,11 +347,13 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -358,7 +375,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
@@ -435,13 +454,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -452,6 +476,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -539,6 +567,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -613,6 +644,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -640,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -652,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -666,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -690,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 95dc2e2c83..8dc52a158f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1603,6 +1603,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_result_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1978,7 +1979,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
/*
* Now we can define the portal.
@@ -1993,6 +1996,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..8cc2e2162d 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_result = part_prune_result; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_result: ExecutorDoInitialPruning() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_result, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results == NIL ? NULL :
+ linitial(portal->part_prune_results),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ PartitionPruneResult *part_prune_result = NULL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding PartitionPruneResult for
+ * this PlannedStmt.
+ */
+ if (portal->part_prune_results != NIL)
+ part_prune_result = list_nth(portal->part_prune_results,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 4cf6db504f..6cb473f2f4 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,15 +795,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_result_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +830,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_result_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +866,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /*
+ * The output list and any objects therein have been allocated in the
+ * caller's hopefully short-lived context, so will not remain leaked
+ * for long, though reset to avoid its accidentally being looked at.
+ */
+ *part_prune_result_list = NIL;
}
/*
@@ -874,10 +899,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NULLs is returned in *part_prune_result_list, meaning that no
+ * PartitionPruneResult nodes have yet been created for the plans in
+ * stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1037,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_result_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
+ }
+
return plan;
}
@@ -1126,6 +1167,17 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a PartitionPruneResult or a NULL is added to
+ * *part_prune_result_list if needed. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and contains at least one
+ * PartitionPruneInfo that has "initial" pruning steps. Those steps are
+ * performed by calling ExecutorDoInitialPruning() to determine only those
+ * leaf partitions that need to be locked by AcquireExecutorLocks() by pruning
+ * away subplans that don't match the pruning conditions. The
+ * PartitionPruneResult contains a list of bitmapsets of the indexes of
+ * matching subplans, one for each PartitionPruneInfo.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1191,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_result_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1214,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_result_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1224,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_result_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1270,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_result_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1303,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_result_list)
+ *part_prune_result_list = my_part_prune_result_list;
+
return plan;
}
@@ -1737,17 +1797,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_result_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ PartitionPruneResult *part_prune_result = NULL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1833,35 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Obtain the set of partitions to be locked from the
+ * PartitionPruneInfos by considering the result of performing
+ * initial partition pruning.
+ */
+ PartitionPruneResult *part_prune_result =
+ ExecutorDoInitialPruning(plannedstmt, boundParams);
+
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ part_prune_result->scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1872,58 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_result_list = lappend(*part_prune_result_list,
+ part_prune_result);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index d549f66d4a..1bbe6b704b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given list of PartitionPruneResults into the portal's
+ * context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results)
+{
+ MemoryContext oldcxt;
+
+ AssertArg(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results = copyObject(part_prune_results);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 666977fb1f..bbc8c42d88 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -123,9 +125,13 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
-
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..60d5644908 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ PartitionPruneResult *part_prune_result; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 873772f188..57dc0e8077 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern PartitionPruneResult *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 94b191f8ae..a8bf908d63 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -596,6 +596,8 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ struct PartitionPruneResult *es_part_prune_result; /* QueryDesc.part_prune_result */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -984,6 +986,34 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecutorDoInitialPruning() invocation on a given
+ * PlannedStmt.
+ *
+ * Contains a list of Bitmapset of the indexes of the subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() for every
+ * PartitionPruneInfos found in PlannedStmt.partPruneInfos. RT indexes of the
+ * leaf partitions scanned by those subplans across all PartitionPruneInfos
+ * are added into scan_leafpart_rtis.
+ *
+ * This is used by GetCachedPlan() to inform its callers of the pruning
+ * decisions made when performing AcquireExecutorLocks() on a given cached
+ * PlannedStmt, which the callers then pass onto the executor. The executor
+ * refers to this node when made available when initializing the plan nodes to
+ * which those PartitionPruneInfos apply so that the same set of qualifying
+ * subplans are initialized, rather than deriving that set again by redoing
+ * initial pruning.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ List *valid_subplan_offs_list;
+ Bitmapset *scan_leafpart_rtis;
+} PartitionPruneResult;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 340d28f4e1..66416bce97 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -97,6 +97,9 @@ typedef enum NodeTag
T_PartitionPruneStepCombine,
T_PlanInvalItem,
+ /* TAGS FOR EXECUTOR PREP NODES (execnodes.h) */
+ T_PartitionPruneResult,
+
/*
* TAGS FOR PLAN STATE NODES (execnodes.h)
*
@@ -674,6 +677,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c5ab53e05c..11007cda25 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -107,6 +107,18 @@ typedef struct PlannerGlobal
List *appendRelations; /* "flat" list of AppendRelInfos */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
List *relationOids; /* OIDs of relations the plan depends on */
List *invalItems; /* other dependencies, as PlanInvalItems */
@@ -377,6 +389,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index e43e360d9b..f8f3971f44 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -64,8 +64,20 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -262,8 +274,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -282,8 +294,9 @@ typedef struct MergeAppend
Oid *sortOperators; /* OIDs of operators to sort them by */
Oid *collations; /* OIDs of collations */
bool *nullsFirst; /* NULLS FIRST/LAST directions */
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
@@ -1191,6 +1204,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1199,6 +1219,8 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1229,6 +1251,7 @@ typedef struct PartitionedRelPruneInfo
int *subplan_map; /* subplan index by partition index, or -1 */
int *subpart_map; /* subpart index by partition index, or -1 */
Oid *relid_map; /* relation OID by partition index, or 0 */
+ Index *rti_map; /* Range table index by partition index, 0. */
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 95b99e3d25..449200b949 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9f7727a837 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results; /* list of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_result_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.24.1
On Sun, Apr 10, 2022 at 8:05 PM Amit Langote <amitlangote09@gmail.com>
wrote:
On Fri, Apr 8, 2022 at 8:45 PM Amit Langote <amitlangote09@gmail.com>
wrote:Most looked fine changes to me except a couple of typos, so I've
adopted those into the attached new version, even though I know it's
too late to try to apply it.+ * XXX is it worth doing a bms_copy() on glob->minLockRelids if + * glob->containsInitialPruning is true?. I'm slighly worried that the + * Bitmapset could have a very long empty tail resulting in excessive + * looping during AcquireExecutorLocks(). + */I guess I trust your instincts about bitmapset operation efficiency
and what you've written here makes sense. It's typical for leaf
partitions to have been appended toward the tail end of rtable and I'd
imagine their indexes would be in the tail words of minLockRelids. If
copying the bitmapset removes those useless words, I don't see why we
shouldn't do that. So added:+ /* + * It seems worth doing a bms_copy() on glob->minLockRelids if wedeleted
+ * bit from it just above to prevent empty tail bits resulting in + * inefficient looping during AcquireExecutorLocks(). + */ + if (glob->containsInitialPruning) + glob->minLockRelids = bms_copy(glob->minLockRelids)Not 100% about the comment I wrote.
And the quoted code change missed a semicolon in the v14 that I
hurriedly sent on Friday. (Had apparently forgotten to `git add` the
hunk to fix that).Sending v15 that fixes that to keep the cfbot green for now.
--
Amit Langote
EDB: http://www.enterprisedb.com
Hi,
+ /* RT index of the partitione table. */
partitione -> partitioned
Cheers
On Mon, Apr 11, 2022 at 12:53 PM Zhihong Yu <zyu@yugabyte.com> wrote:
On Sun, Apr 10, 2022 at 8:05 PM Amit Langote <amitlangote09@gmail.com> wrote:
Sending v15 that fixes that to keep the cfbot green for now.
Hi,
+ /* RT index of the partitione table. */
partitione -> partitioned
Thanks, fixed.
Also, I broke this into patches:
0001 contains the mechanical changes of moving PartitionPruneInfo out
of Append/MergeAppend into a list in PlannedStmt.
0002 is the main patch to "Optimize AcquireExecutorLocks() by locking
only unpruned partitions".
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v16-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patchapplication/octet-stream; name=v16-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patchDownload
From 16fd07b7c8ffde7632ffa7b03e4595e1e08d7e06 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 27 May 2022 16:00:28 +0900
Subject: [PATCH v16 1/2] Move PartitioPruneInfo out of plan nodes into
PlannedStmt
The planner will now add a given PartitioPruneInfo to
PlannedStmt.partPruneInfos instead of the Append/MergeAppend plan
node to which it would be added until now and set an index field in
the plan node that point to the list element.
A later commit will make AcquireExecutorLocks() do the initial
partition pruning to determine a minimal set of partitions to be
locked to validate a plan tree and it will need to consult the
PartitioPruneInfos referenced therein to do so. It would be better
for the PartitioPruneInfos to be accessible directly than them having
to be found individually by walking the plan tree, which can be done
by simply iterative over PlannedStmt.partPruneInfos.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 4 +-
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeAppend.c | 4 +-
src/backend/executor/nodeMergeAppend.c | 4 +-
src/backend/nodes/copyfuncs.c | 5 +-
src/backend/nodes/outfuncs.c | 7 ++-
src/backend/nodes/readfuncs.c | 5 +-
src/backend/optimizer/plan/createplan.c | 24 ++++-----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 65 +++++++++++++------------
src/backend/partitioning/partprune.c | 18 ++++---
src/include/executor/execPartition.h | 3 +-
src/include/nodes/execnodes.h | 2 +
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 12 +++--
src/include/partitioning/partprune.h | 8 +--
18 files changed, 104 insertions(+), 68 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ef2fd46092..72fc273524 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -825,6 +825,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f1fd7f7e8b..f73b8c2607 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index e03ea27299..b55cdd2580 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1638,11 +1638,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..f9c7976ff2 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,8 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
+ estate->es_part_prune_result = NULL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..c6f86a6510 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index c5c62fa5c7..8d35860c30 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 51d630fa89..8fbeaa4f36 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -96,6 +96,7 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(parallelModeNeeded);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
+ COPY_NODE_FIELD(partPruneInfos);
COPY_NODE_FIELD(rtable);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
@@ -253,7 +254,7 @@ _copyAppend(const Append *from)
COPY_NODE_FIELD(appendplans);
COPY_SCALAR_FIELD(nasyncplans);
COPY_SCALAR_FIELD(first_partial_plan);
- COPY_NODE_FIELD(part_prune_info);
+ COPY_SCALAR_FIELD(part_prune_index);
return newnode;
}
@@ -281,7 +282,7 @@ _copyMergeAppend(const MergeAppend *from)
COPY_POINTER_FIELD(sortOperators, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
- COPY_NODE_FIELD(part_prune_info);
+ COPY_SCALAR_FIELD(part_prune_index);
return newnode;
}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index ce12915592..72fcd8a6ee 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -321,6 +321,7 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(parallelModeNeeded);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
+ WRITE_NODE_FIELD(partPruneInfos);
WRITE_NODE_FIELD(rtable);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
@@ -450,7 +451,7 @@ _outAppend(StringInfo str, const Append *node)
WRITE_NODE_FIELD(appendplans);
WRITE_INT_FIELD(nasyncplans);
WRITE_INT_FIELD(first_partial_plan);
- WRITE_NODE_FIELD(part_prune_info);
+ WRITE_INT_FIELD(part_prune_index);
}
static void
@@ -467,7 +468,7 @@ _outMergeAppend(StringInfo str, const MergeAppend *node)
WRITE_OID_ARRAY(sortOperators, node->numCols);
WRITE_OID_ARRAY(collations, node->numCols);
WRITE_BOOL_ARRAY(nullsFirst, node->numCols);
- WRITE_NODE_FIELD(part_prune_info);
+ WRITE_INT_FIELD(part_prune_index);
}
static void
@@ -2434,6 +2435,7 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_NODE_FIELD(finalrowmarks);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
+ WRITE_NODE_FIELD(partPruneInfos);
WRITE_NODE_FIELD(relationOids);
WRITE_NODE_FIELD(invalItems);
WRITE_NODE_FIELD(paramExecTypes);
@@ -2501,6 +2503,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
WRITE_BOOL_FIELD(partColsUpdated);
+ WRITE_NODE_FIELD(partPruneInfos);
}
static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 6a05b69415..bf602ff93e 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1817,6 +1817,7 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(parallelModeNeeded);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
+ READ_NODE_FIELD(partPruneInfos);
READ_NODE_FIELD(rtable);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
@@ -1949,7 +1950,7 @@ _readAppend(void)
READ_NODE_FIELD(appendplans);
READ_INT_FIELD(nasyncplans);
READ_INT_FIELD(first_partial_plan);
- READ_NODE_FIELD(part_prune_info);
+ READ_INT_FIELD(part_prune_index);
READ_DONE();
}
@@ -1971,7 +1972,7 @@ _readMergeAppend(void)
READ_OID_ARRAY(sortOperators, local_node->numCols);
READ_OID_ARRAY(collations, local_node->numCols);
READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
- READ_NODE_FIELD(part_prune_info);
+ READ_INT_FIELD(part_prune_index);
READ_DONE();
}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 76606faa3e..58a05cf673 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1203,7 +1203,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1354,6 +1353,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1377,16 +1379,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1426,7 +1426,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1519,6 +1518,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1542,13 +1544,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a0f2390334..32e658b5d6 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -518,6 +518,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index d95fd89807..aafe1c149d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -348,6 +348,29 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ /* RT index of the table to which the pinfo belongs. */
+ pinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1640,21 +1663,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1712,21 +1726,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9d3c05aed3..d77f7d3aef 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -209,16 +209,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -332,7 +336,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
@@ -358,7 +362,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bf962af7af 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,8 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5728801379..25e0bb976e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -596,6 +596,8 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ struct PartitionPruneResult *es_part_prune_result; /* QueryDesc.part_prune_result */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index a6e5db4eec..6995b0ecec 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -107,6 +107,9 @@ typedef struct PlannerGlobal
List *appendRelations; /* "flat" list of AppendRelInfos */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *relationOids; /* OIDs of relations the plan depends on */
List *invalItems; /* other dependencies, as PlanInvalItems */
@@ -378,6 +381,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 0ea9a22dfb..297cacfb5b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -64,6 +64,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
@@ -262,8 +265,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -282,8 +285,9 @@ typedef struct MergeAppend
Oid *sortOperators; /* OIDs of operators to sort them by */
Oid *collations; /* OIDs of collations */
bool *nullsFirst; /* NULLS FIRST/LAST directions */
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v16-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patchapplication/octet-stream; name=v16-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patchDownload
From 6654d7c2b5c54d69d3f8a0136cfaf5593a3b7aae Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v16 2/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 27 +++
src/backend/executor/execMain.c | 53 ++++++
src/backend/executor/execParallel.c | 27 ++-
src/backend/executor/execPartition.c | 234 +++++++++++++++++++++----
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/copyfuncs.c | 27 +++
src/backend/nodes/outfuncs.c | 29 +++
src/backend/nodes/readfuncs.c | 51 ++++++
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 45 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 184 ++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 2 +
src/include/nodes/execnodes.h | 28 +++
src/include/nodes/nodes.h | 4 +
src/include/nodes/pathnodes.h | 9 +
src/include/nodes/plannodes.h | 19 ++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
34 files changed, 849 insertions(+), 96 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index fca29a9a10..d839517693 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -541,7 +541,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5d1f7089da..111d384982 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_result,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_result, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 767d9b9619..1d55a23ded 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index d1ee106465..e878209674 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 80738547ed..c7360712b1 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_result_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_result_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_result_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_result, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..e0802be723 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,29 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+proceeds by locking all the relations that will be scanned by that plan. If
+the generic plan contains nodes that can perform execution time partition
+pruning (that is, contain a PartitionPruneInfo), a subset of pruning steps
+contained in the PartitionPruneInfos that do not depend on execution actually
+having started (called "initial" pruning steps) are performed at this point
+to figure out the minimal set of child subplans that satisfy those pruning
+instructions. AcquireExecutorLocks() looking at a particular plan will then
+lock only the relations scanned by those surviving subplans (along with those
+present in PlannedStmt.minLockRelids), and ignore those scanned by the pruned
+subplans, even though the pruned subplans themselves are not removed from the
+plan tree. The result of pruning (that is, the set of indexes of surviving
+subplans in their parent's list of child subplans) is saved as a list of
+bitmapsets, with one element for every PartitionPruneInfo referenced in the
+plan (PlannedStmt.partPruneInfos). The list is packaged into a
+PartitionPruneResult node, which is passed along with the PlannedStmt to the
+executor via the QueryDesc. It is imperative that the executor and any third
+party code invoked by it that gets passed the plan tree look at the plan's
+PartitionPruneResult to determine whether a particular child subplan of a
+parent node that supports pruning is valid for a given execution.
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +309,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 72fc273524..45824624f8 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,56 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+PartitionPruneResult *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ PartitionPruneResult *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ result = makeNode(PartitionPruneResult);
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *valid_subplan_offs;
+
+ valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ &result->scan_leafpart_rtis);
+ result->valid_subplan_offs_list =
+ lappend(result->valid_subplan_offs_list,
+ valid_subplan_offs);
+ }
+
+ return result;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +857,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ PartitionPruneResult *part_prune_result = queryDesc->part_prune_result;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +878,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_result = part_prune_result;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f73b8c2607..7e6dab5623 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITIONPRUNERESULT UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_result_data;
+ char *part_prune_result_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_result_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_result_data = nodeToString(estate->es_part_prune_result);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized PartitionPruneResult. */
+ part_prune_result_len = strlen(part_prune_result_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_result_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized PartitionPruneResult */
+ part_prune_result_space = shm_toc_allocate(pcxt->toc, part_prune_result_len);
+ memcpy(part_prune_result_space, part_prune_result_data, part_prune_result_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITIONPRUNERESULT,
+ part_prune_result_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_result_space;
char *paramspace;
PlannedStmt *pstmt;
+ PartitionPruneResult *part_prune_result;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,18 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_result_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITIONPRUNERESULT, false);
+ part_prune_result = (PartitionPruneResult *)
+ stringToNode(part_prune_result_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_result,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b55cdd2580..86227301e9 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1593,8 +1599,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1611,6 +1619,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1628,8 +1643,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecDoInitialPruning()), and in that case only the surviving subplans'
+ * indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1645,24 +1661,59 @@ ExecInitPartitionPruning(PlanState *planstate,
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
part_prune_index);
+ PartitionPruneResult *pruneresult = estate->es_part_prune_result;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_result
+ * has been set.
+ */
+ if (pruneresult)
+ do_pruning = pruneinfo->needs_exec_pruning;
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans =
+ list_nth(pruneresult->valid_subplan_offs_list, part_prune_index);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
/* No pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1670,7 +1721,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1686,11 +1738,73 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1704,19 +1818,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1771,15 +1887,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1793,6 +1936,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1803,6 +1947,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -1853,6 +1999,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -1860,6 +2008,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -1881,7 +2030,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1891,7 +2040,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2119,10 +2268,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2157,7 +2310,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2171,6 +2324,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2181,13 +2336,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2214,8 +2371,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2223,7 +2386,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f9460ae506..a2182a6b1f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c6f86a6510..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -155,7 +155,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +578,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +643,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +718,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +869,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8d35860c30..2312e5a633 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 29bc26669b..303a572c02 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_result_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_result_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_result_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_result,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 8fbeaa4f36..ca139797a8 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -97,7 +97,9 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
COPY_NODE_FIELD(partPruneInfos);
+ COPY_SCALAR_FIELD(containsInitialPruning);
COPY_NODE_FIELD(rtable);
+ COPY_BITMAPSET_FIELD(minLockRelids);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
COPY_NODE_FIELD(subplans);
@@ -1284,6 +1286,8 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(needs_init_pruning);
+ COPY_SCALAR_FIELD(needs_exec_pruning);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
@@ -1300,6 +1304,7 @@ _copyPartitionedRelPruneInfo(const PartitionedRelPruneInfo *from)
COPY_POINTER_FIELD(subplan_map, from->nparts * sizeof(int));
COPY_POINTER_FIELD(subpart_map, from->nparts * sizeof(int));
COPY_POINTER_FIELD(relid_map, from->nparts * sizeof(Oid));
+ COPY_POINTER_FIELD(rti_map, from->nparts * sizeof(Index));
COPY_NODE_FIELD(initial_pruning_steps);
COPY_NODE_FIELD(exec_pruning_steps);
COPY_BITMAPSET_FIELD(execparamids);
@@ -5475,6 +5480,21 @@ _copyExtensibleNode(const ExtensibleNode *from)
return newnode;
}
+/* ****************************************************************
+ * execnodes.h copy functions
+ * ****************************************************************
+ */
+static PartitionPruneResult *
+_copyPartitionPruneResult(const PartitionPruneResult *from)
+{
+ PartitionPruneResult *newnode = makeNode(PartitionPruneResult);
+
+ COPY_NODE_FIELD(valid_subplan_offs_list);
+ COPY_BITMAPSET_FIELD(scan_leafpart_rtis);
+
+ return newnode;
+}
+
/* ****************************************************************
* value.h copy functions
* ****************************************************************
@@ -6571,6 +6591,13 @@ copyObjectImpl(const void *from)
retval = _copyPublicationTable(from);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_PartitionPruneResult:
+ retval = _copyPartitionPruneResult(from);
+ break;
+
/*
* MISCELLANEOUS NODES
*/
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 72fcd8a6ee..53010bf059 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -322,7 +322,9 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
WRITE_NODE_FIELD(partPruneInfos);
+ WRITE_BOOL_FIELD(containsInitialPruning);
WRITE_NODE_FIELD(rtable);
+ WRITE_BITMAPSET_FIELD(minLockRelids);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
WRITE_NODE_FIELD(subplans);
@@ -1017,6 +1019,8 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(needs_init_pruning);
+ WRITE_BOOL_FIELD(needs_exec_pruning);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -1031,6 +1035,7 @@ _outPartitionedRelPruneInfo(StringInfo str, const PartitionedRelPruneInfo *node)
WRITE_INT_ARRAY(subplan_map, node->nparts);
WRITE_INT_ARRAY(subpart_map, node->nparts);
WRITE_OID_ARRAY(relid_map, node->nparts);
+ WRITE_INDEX_ARRAY(rti_map, node->nparts);
WRITE_NODE_FIELD(initial_pruning_steps);
WRITE_NODE_FIELD(exec_pruning_steps);
WRITE_BITMAPSET_FIELD(execparamids);
@@ -2436,6 +2441,8 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
WRITE_NODE_FIELD(partPruneInfos);
+ WRITE_BOOL_FIELD(containsInitialPruning);
+ WRITE_BITMAPSET_FIELD(minLockRelids);
WRITE_NODE_FIELD(relationOids);
WRITE_NODE_FIELD(invalItems);
WRITE_NODE_FIELD(paramExecTypes);
@@ -2857,6 +2864,21 @@ _outExtensibleNode(StringInfo str, const ExtensibleNode *node)
methods->nodeOut(str, node);
}
+/*****************************************************************************
+ *
+ * Stuff from execnodes.h
+ *
+ *****************************************************************************/
+
+static void
+_outPartitionPruneResult(StringInfo str, const PartitionPruneResult *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNERESULT");
+
+ WRITE_NODE_FIELD(valid_subplan_offs_list);
+ WRITE_BITMAPSET_FIELD(scan_leafpart_rtis);
+}
+
/*****************************************************************************
*
* Stuff from parsenodes.h.
@@ -4766,6 +4788,13 @@ outNode(StringInfo str, const void *obj)
_outJsonTableSibling(str, obj);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_PartitionPruneResult:
+ _outPartitionPruneResult(str, obj);
+ break;
+
default:
/*
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index bf602ff93e..c1d131aa99 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -164,6 +164,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -1818,7 +1823,9 @@ _readPlannedStmt(void)
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
READ_NODE_FIELD(partPruneInfos);
+ READ_BOOL_FIELD(containsInitialPruning);
READ_NODE_FIELD(rtable);
+ READ_BITMAPSET_FIELD(minLockRelids);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
READ_NODE_FIELD(subplans);
@@ -2770,6 +2777,8 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(needs_init_pruning);
+ READ_BOOL_FIELD(needs_exec_pruning);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
@@ -2786,6 +2795,7 @@ _readPartitionedRelPruneInfo(void)
READ_INT_ARRAY(subplan_map, local_node->nparts);
READ_INT_ARRAY(subpart_map, local_node->nparts);
READ_OID_ARRAY(relid_map, local_node->nparts);
+ READ_INDEX_ARRAY(rti_map, local_node->nparts);
READ_NODE_FIELD(initial_pruning_steps);
READ_NODE_FIELD(exec_pruning_steps);
READ_BITMAPSET_FIELD(execparamids);
@@ -2939,6 +2949,21 @@ _readPartitionRangeDatum(void)
READ_DONE();
}
+
+/*
+ * _readPartitionPruneResult
+ */
+static PartitionPruneResult *
+_readPartitionPruneResult(void)
+{
+ READ_LOCALS(PartitionPruneResult);
+
+ READ_NODE_FIELD(valid_subplan_offs_list);
+ READ_BITMAPSET_FIELD(scan_leafpart_rtis);
+
+ READ_DONE();
+}
+
/*
* parseNodeString
*
@@ -3236,6 +3261,8 @@ parseNodeString(void)
return_value = _readJsonTableParent();
else if (MATCH("JSONTABLESIBLING", 16))
return_value = _readJsonTableSibling();
+ else if (MATCH("PARTITIONPRUNERESULT", 20))
+ return_value = _readPartitionPruneResult();
else
{
elog(ERROR, "badly formatted node string \"%.32s\"...", token);
@@ -3379,6 +3406,30 @@ readIntCols(int numCols)
return int_vals;
}
+/*
+ * readIndexCols
+ */
+Index *
+readIndexCols(int numCols)
+{
+ int tokenLength,
+ i;
+ const char *token;
+ Index *index_vals;
+
+ if (numCols <= 0)
+ return NULL;
+
+ index_vals = (Index *) palloc(numCols * sizeof(Index));
+ for (i = 0; i < numCols; i++)
+ {
+ token = pg_strtok(&tokenLength);
+ index_vals[i] = atoui(token);
+ }
+
+ return index_vals;
+}
+
/*
* readBoolCols
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 32e658b5d6..edbf19716e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -519,7 +519,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index aafe1c149d..a32fc70785 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
foreach(l, pruneinfo->prune_infos)
@@ -362,15 +373,49 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bit from it just above to prevent empty tail bits resulting in
+ * inefficient looping during AcquireExecutorLocks().
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d77f7d3aef..952c5b8327 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -341,6 +352,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -441,13 +454,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -458,6 +476,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -545,6 +567,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -619,6 +644,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -646,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -658,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -672,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -696,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8b6b5bbaaa..7f0eda48a4 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1603,6 +1603,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_result_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1978,7 +1979,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
/*
* Now we can define the portal.
@@ -1993,6 +1996,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..8cc2e2162d 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_result = part_prune_result; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_result: ExecutorDoInitialPruning() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_result, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results == NIL ? NULL :
+ linitial(portal->part_prune_results),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ PartitionPruneResult *part_prune_result = NULL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding PartitionPruneResult for
+ * this PlannedStmt.
+ */
+ if (portal->part_prune_results != NIL)
+ part_prune_result = list_nth(portal->part_prune_results,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 0d6a295674..8c164741f7 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,15 +795,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_result_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +830,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_result_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +866,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /*
+ * The output list and any objects therein have been allocated in the
+ * caller's hopefully short-lived context, so will not remain leaked
+ * for long, though reset to avoid its accidentally being looked at.
+ */
+ *part_prune_result_list = NIL;
}
/*
@@ -874,10 +899,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NULLs is returned in *part_prune_result_list, meaning that no
+ * PartitionPruneResult nodes have yet been created for the plans in
+ * stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1037,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_result_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
+ }
+
return plan;
}
@@ -1126,6 +1167,17 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a PartitionPruneResult or a NULL is added to
+ * *part_prune_result_list if needed. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and contains at least one
+ * PartitionPruneInfo that has "initial" pruning steps. Those steps are
+ * performed by calling ExecutorDoInitialPruning() to determine only those
+ * leaf partitions that need to be locked by AcquireExecutorLocks() by pruning
+ * away subplans that don't match the pruning conditions. The
+ * PartitionPruneResult contains a list of bitmapsets of the indexes of
+ * matching subplans, one for each PartitionPruneInfo.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1191,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_result_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1214,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_result_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1224,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_result_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1270,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_result_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1303,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_result_list)
+ *part_prune_result_list = my_part_prune_result_list;
+
return plan;
}
@@ -1737,17 +1797,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_result_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ PartitionPruneResult *part_prune_result = NULL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1833,35 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Obtain the set of partitions to be locked from the
+ * PartitionPruneInfos by considering the result of performing
+ * initial partition pruning.
+ */
+ PartitionPruneResult *part_prune_result =
+ ExecutorDoInitialPruning(plannedstmt, boundParams);
+
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ part_prune_result->scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1872,58 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_result_list = lappend(*part_prune_result_list,
+ part_prune_result);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index d549f66d4a..1bbe6b704b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given list of PartitionPruneResults into the portal's
+ * context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results)
+{
+ MemoryContext oldcxt;
+
+ AssertArg(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results = copyObject(part_prune_results);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 666977fb1f..bbc8c42d88 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index bf962af7af..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -126,5 +128,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..60d5644908 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ PartitionPruneResult *part_prune_result; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d68a6b9d28..5c4a282be0 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern PartitionPruneResult *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 25e0bb976e..d3ae0fa52d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -986,6 +986,34 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecutorDoInitialPruning() invocation on a given
+ * PlannedStmt.
+ *
+ * Contains a list of Bitmapset of the indexes of the subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() for every
+ * PartitionPruneInfos found in PlannedStmt.partPruneInfos. RT indexes of the
+ * leaf partitions scanned by those subplans across all PartitionPruneInfos
+ * are added into scan_leafpart_rtis.
+ *
+ * This is used by GetCachedPlan() to inform its callers of the pruning
+ * decisions made when performing AcquireExecutorLocks() on a given cached
+ * PlannedStmt, which the callers then pass onto the executor. The executor
+ * refers to this node when made available when initializing the plan nodes to
+ * which those PartitionPruneInfos apply so that the same set of qualifying
+ * subplans are initialized, rather than deriving that set again by redoing
+ * initial pruning.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ List *valid_subplan_offs_list;
+ Bitmapset *scan_leafpart_rtis;
+} PartitionPruneResult;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index b3b407579b..84d67d5dcf 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -97,6 +97,9 @@ typedef enum NodeTag
T_PartitionPruneStepCombine,
T_PlanInvalItem,
+ /* TAGS FOR EXECUTOR PREP NODES (execnodes.h) */
+ T_PartitionPruneResult,
+
/*
* TAGS FOR PLAN STATE NODES (execnodes.h)
*
@@ -674,6 +677,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6995b0ecec..c47ce6c09b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -110,6 +110,15 @@ typedef struct PlannerGlobal
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
List *relationOids; /* OIDs of relations the plan depends on */
List *invalItems; /* other dependencies, as PlanInvalItems */
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 297cacfb5b..ffb52e2ac2 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -67,8 +67,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1196,6 +1205,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1204,6 +1220,8 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1234,6 +1252,7 @@ typedef struct PartitionedRelPruneInfo
int *subplan_map; /* subplan index by partition index, or -1 */
int *subpart_map; /* subpart index by partition index, or -1 */
Oid *relid_map; /* relation OID by partition index, or 0 */
+ Index *rti_map; /* Range table index by partition index, 0. */
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..1c5bb5ece1 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9f7727a837 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results; /* list of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_result_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
On Fri, May 27, 2022 at 1:10 AM Amit Langote <amitlangote09@gmail.com>
wrote:
On Mon, Apr 11, 2022 at 12:53 PM Zhihong Yu <zyu@yugabyte.com> wrote:
On Sun, Apr 10, 2022 at 8:05 PM Amit Langote <amitlangote09@gmail.com>
wrote:
Sending v15 that fixes that to keep the cfbot green for now.
Hi,
+ /* RT index of the partitione table. */
partitione -> partitioned
Thanks, fixed.
Also, I broke this into patches:
0001 contains the mechanical changes of moving PartitionPruneInfo out
of Append/MergeAppend into a list in PlannedStmt.0002 is the main patch to "Optimize AcquireExecutorLocks() by locking
only unpruned partitions".--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Hi,
In the description:
is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
I think the second `made available` is redundant (can be omitted).
+ * Initial pruning is performed here if needed (unless it has already been
done
+ * by ExecDoInitialPruning()), and in that case only the surviving
subplans'
I wonder if there is a typo above - I don't find ExecDoInitialPruning
either in PG codebase or in the patches (except for this in the comment).
I think it should be ExecutorDoInitialPruning.
+ * bit from it just above to prevent empty tail bits resulting in
I searched in the code base but didn't find mentioning of `empty tail bit`.
Do you mind explaining a bit about it ?
Cheers
On Fri, May 27, 2022 at 1:09 AM Amit Langote <amitlangote09@gmail.com> wrote:
0001 contains the mechanical changes of moving PartitionPruneInfo out
of Append/MergeAppend into a list in PlannedStmt.0002 is the main patch to "Optimize AcquireExecutorLocks() by locking
only unpruned partitions".
This patchset will need to be rebased over 835d476fd21; looks like
just a cosmetic change.
--Jacob
On Wed, Jul 6, 2022 at 2:43 AM Jacob Champion <jchampion@timescale.com> wrote:
On Fri, May 27, 2022 at 1:09 AM Amit Langote <amitlangote09@gmail.com> wrote:
0001 contains the mechanical changes of moving PartitionPruneInfo out
of Append/MergeAppend into a list in PlannedStmt.0002 is the main patch to "Optimize AcquireExecutorLocks() by locking
only unpruned partitions".This patchset will need to be rebased over 835d476fd21; looks like
just a cosmetic change.
Thanks for the heads up.
Rebased and also fixed per comments given by Zhihong Yu on May 28.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v17-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patchapplication/octet-stream; name=v17-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patchDownload
From 665055be44caaec9dcc2a3251f20ceb3c678fa3d Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 27 May 2022 16:00:28 +0900
Subject: [PATCH v17 1/2] Move PartitioPruneInfo out of plan nodes into
PlannedStmt
The planner will now add a given PartitioPruneInfo to
PlannedStmt.partPruneInfos instead of directly to the
Append/MergeAppend plan node. What gets set instead in the
latter is an index field which points to the list element
of PlannedStmt.partPruneInfos containing the PartitioPruneInfo
belonging to the plan node.
A later commit will make AcquireExecutorLocks() do the initial
partition pruning to determine a minimal set of partitions to be
locked when validating a plan tree and it will need to consult the
PartitioPruneInfos referenced therein to do so. It would be better
for the PartitioPruneInfos to be accessible directly than requiring
a walk of the plan tree to find them, which is easier when it can be
done by simply iterating over PlannedStmt.partPruneInfos.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 4 +-
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeAppend.c | 4 +-
src/backend/executor/nodeMergeAppend.c | 4 +-
src/backend/nodes/copyfuncs.c | 5 +-
src/backend/nodes/outfuncs.c | 7 ++-
src/backend/nodes/readfuncs.c | 5 +-
src/backend/optimizer/plan/createplan.c | 24 ++++-----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 65 +++++++++++++------------
src/backend/partitioning/partprune.c | 18 ++++---
src/include/executor/execPartition.h | 3 +-
src/include/nodes/execnodes.h | 2 +
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 11 +++--
src/include/partitioning/partprune.h | 8 +--
18 files changed, 103 insertions(+), 68 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ef2fd46092..72fc273524 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -825,6 +825,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f1fd7f7e8b..f73b8c2607 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index e03ea27299..b55cdd2580 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1638,11 +1638,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..f9c7976ff2 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,8 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
+ estate->es_part_prune_result = NULL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..c6f86a6510 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index c5c62fa5c7..8d35860c30 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 706d283a92..b02b4a641c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -96,6 +96,7 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(parallelModeNeeded);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
+ COPY_NODE_FIELD(partPruneInfos);
COPY_NODE_FIELD(rtable);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
@@ -253,7 +254,7 @@ _copyAppend(const Append *from)
COPY_NODE_FIELD(appendplans);
COPY_SCALAR_FIELD(nasyncplans);
COPY_SCALAR_FIELD(first_partial_plan);
- COPY_NODE_FIELD(part_prune_info);
+ COPY_SCALAR_FIELD(part_prune_index);
return newnode;
}
@@ -281,7 +282,7 @@ _copyMergeAppend(const MergeAppend *from)
COPY_POINTER_FIELD(sortOperators, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
- COPY_NODE_FIELD(part_prune_info);
+ COPY_SCALAR_FIELD(part_prune_index);
return newnode;
}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 4315c53080..7618444b4d 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -325,6 +325,7 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(parallelModeNeeded);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
+ WRITE_NODE_FIELD(partPruneInfos);
WRITE_NODE_FIELD(rtable);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
@@ -454,7 +455,7 @@ _outAppend(StringInfo str, const Append *node)
WRITE_NODE_FIELD(appendplans);
WRITE_INT_FIELD(nasyncplans);
WRITE_INT_FIELD(first_partial_plan);
- WRITE_NODE_FIELD(part_prune_info);
+ WRITE_INT_FIELD(part_prune_index);
}
static void
@@ -471,7 +472,7 @@ _outMergeAppend(StringInfo str, const MergeAppend *node)
WRITE_OID_ARRAY(sortOperators, node->numCols);
WRITE_OID_ARRAY(collations, node->numCols);
WRITE_BOOL_ARRAY(nullsFirst, node->numCols);
- WRITE_NODE_FIELD(part_prune_info);
+ WRITE_INT_FIELD(part_prune_index);
}
static void
@@ -2438,6 +2439,7 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_NODE_FIELD(finalrowmarks);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
+ WRITE_NODE_FIELD(partPruneInfos);
WRITE_NODE_FIELD(relationOids);
WRITE_NODE_FIELD(invalItems);
WRITE_NODE_FIELD(paramExecTypes);
@@ -2505,6 +2507,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
WRITE_BOOL_FIELD(partColsUpdated);
+ WRITE_NODE_FIELD(partPruneInfos);
}
static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 6a05b69415..bf602ff93e 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1817,6 +1817,7 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(parallelModeNeeded);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
+ READ_NODE_FIELD(partPruneInfos);
READ_NODE_FIELD(rtable);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
@@ -1949,7 +1950,7 @@ _readAppend(void)
READ_NODE_FIELD(appendplans);
READ_INT_FIELD(nasyncplans);
READ_INT_FIELD(first_partial_plan);
- READ_NODE_FIELD(part_prune_info);
+ READ_INT_FIELD(part_prune_index);
READ_DONE();
}
@@ -1971,7 +1972,7 @@ _readMergeAppend(void)
READ_OID_ARRAY(sortOperators, local_node->numCols);
READ_OID_ARRAY(collations, local_node->numCols);
READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
- READ_NODE_FIELD(part_prune_info);
+ READ_INT_FIELD(part_prune_index);
READ_DONE();
}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 76606faa3e..58a05cf673 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1203,7 +1203,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1354,6 +1353,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1377,16 +1379,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1426,7 +1426,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1519,6 +1518,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1542,13 +1544,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 06ad856eac..b11249ed8f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -518,6 +518,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 9cef92cab2..b8d5610593 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -348,6 +348,29 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ /* RT index of the table to which the pinfo belongs. */
+ pinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1655,21 +1678,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1727,21 +1741,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9d3c05aed3..d77f7d3aef 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -209,16 +209,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -332,7 +336,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
@@ -358,7 +362,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bf962af7af 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,8 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5728801379..25e0bb976e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -596,6 +596,8 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ struct PartitionPruneResult *es_part_prune_result; /* QueryDesc.part_prune_result */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index b88cfb8dc0..a0f3a46334 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -107,6 +107,9 @@ typedef struct PlannerGlobal
List *appendRelations; /* "flat" list of AppendRelInfos */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *relationOids; /* OIDs of relations the plan depends on */
List *invalItems; /* other dependencies, as PlanInvalItems */
@@ -386,6 +389,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index d5c0ebe859..c3f4a39657 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -64,6 +64,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
@@ -262,8 +265,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -297,8 +300,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v17-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patchapplication/octet-stream; name=v17-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patchDownload
From e5d0283732311fb068ad75ee4ff282ebe5306266 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v17 2/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 32 ++++
src/backend/executor/execMain.c | 53 ++++++
src/backend/executor/execParallel.c | 27 ++-
src/backend/executor/execPartition.c | 234 +++++++++++++++++++++----
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/copyfuncs.c | 27 +++
src/backend/nodes/outfuncs.c | 29 +++
src/backend/nodes/readfuncs.c | 51 ++++++
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 184 ++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 2 +
src/include/nodes/execnodes.h | 27 +++
src/include/nodes/nodes.h | 4 +
src/include/nodes/pathnodes.h | 9 +
src/include/nodes/plannodes.h | 21 +++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
34 files changed, 856 insertions(+), 96 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index fca29a9a10..d839517693 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -541,7 +541,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e29c2ae206..e41b13a3ea 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_result,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_result, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 3db859c3ea..631cc07217 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index d1ee106465..e878209674 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 2333aae467..83465e40f8 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_result_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_result_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_result_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_result, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..953a476ea5 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,34 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed at this
+point to figure out the minimal set of child subplans that satisfy those
+pruning steps. AcquireExecutorLocks() looking at a given plan tree will then
+lock only the relations scanned by the child subplans that survived such
+pruning, along with those present in PlannedStmt.minLockRelids. Note that the
+subplans are only notionally pruned in that they are not removed from the plan
+tree as such.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of pruning is passed to the executor as a
+PartitionPruneResult node via the QueryDesc. It consists of the set of
+indexes of surviving subplans in their respective parent plan node's list of
+child subplans, saved as a list of bitmapsets, with one element for every
+parent plan node whose PartitionPruneInfo is present in
+PlannedStmt.partPruneInfos. In other words, the executor should not
+re-evaluate the set of initially valid subplans by redoing the initial pruning
+if it was already done by AcquireExecutorLocks(), because the re-evaluation may
+very well end up resulting in a different set of subplans, containing some
+whose relations were not locked by AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +314,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 72fc273524..45824624f8 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,56 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+PartitionPruneResult *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ PartitionPruneResult *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ result = makeNode(PartitionPruneResult);
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *valid_subplan_offs;
+
+ valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ &result->scan_leafpart_rtis);
+ result->valid_subplan_offs_list =
+ lappend(result->valid_subplan_offs_list,
+ valid_subplan_offs);
+ }
+
+ return result;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +857,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ PartitionPruneResult *part_prune_result = queryDesc->part_prune_result;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +878,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_result = part_prune_result;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f73b8c2607..7e6dab5623 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITIONPRUNERESULT UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_result_data;
+ char *part_prune_result_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_result_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_result_data = nodeToString(estate->es_part_prune_result);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized PartitionPruneResult. */
+ part_prune_result_len = strlen(part_prune_result_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_result_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized PartitionPruneResult */
+ part_prune_result_space = shm_toc_allocate(pcxt->toc, part_prune_result_len);
+ memcpy(part_prune_result_space, part_prune_result_data, part_prune_result_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITIONPRUNERESULT,
+ part_prune_result_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_result_space;
char *paramspace;
PlannedStmt *pstmt;
+ PartitionPruneResult *part_prune_result;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,18 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_result_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITIONPRUNERESULT, false);
+ part_prune_result = (PartitionPruneResult *)
+ stringToNode(part_prune_result_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_result,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b55cdd2580..24e6f6e988 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1593,8 +1599,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1611,6 +1619,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1628,8 +1643,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1645,24 +1661,59 @@ ExecInitPartitionPruning(PlanState *planstate,
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
part_prune_index);
+ PartitionPruneResult *pruneresult = estate->es_part_prune_result;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_result
+ * has been set.
+ */
+ if (pruneresult)
+ do_pruning = pruneinfo->needs_exec_pruning;
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans =
+ list_nth(pruneresult->valid_subplan_offs_list, part_prune_index);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
/* No pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1670,7 +1721,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1686,11 +1738,73 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1704,19 +1818,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1771,15 +1887,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1793,6 +1936,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1803,6 +1947,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -1853,6 +1999,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -1860,6 +2008,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -1881,7 +2030,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1891,7 +2040,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2119,10 +2268,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2157,7 +2310,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2171,6 +2324,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2181,13 +2336,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2214,8 +2371,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2223,7 +2386,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f9460ae506..a2182a6b1f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c6f86a6510..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -155,7 +155,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +578,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +643,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +718,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +869,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8d35860c30..2312e5a633 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 29bc26669b..303a572c02 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_result_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_result_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_result_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_result,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index b02b4a641c..332d58381b 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -97,7 +97,9 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
COPY_NODE_FIELD(partPruneInfos);
+ COPY_SCALAR_FIELD(containsInitialPruning);
COPY_NODE_FIELD(rtable);
+ COPY_BITMAPSET_FIELD(minLockRelids);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
COPY_NODE_FIELD(subplans);
@@ -1284,6 +1286,8 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(needs_init_pruning);
+ COPY_SCALAR_FIELD(needs_exec_pruning);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
@@ -1300,6 +1304,7 @@ _copyPartitionedRelPruneInfo(const PartitionedRelPruneInfo *from)
COPY_POINTER_FIELD(subplan_map, from->nparts * sizeof(int));
COPY_POINTER_FIELD(subpart_map, from->nparts * sizeof(int));
COPY_POINTER_FIELD(relid_map, from->nparts * sizeof(Oid));
+ COPY_POINTER_FIELD(rti_map, from->nparts * sizeof(Index));
COPY_NODE_FIELD(initial_pruning_steps);
COPY_NODE_FIELD(exec_pruning_steps);
COPY_BITMAPSET_FIELD(execparamids);
@@ -5476,6 +5481,21 @@ _copyExtensibleNode(const ExtensibleNode *from)
return newnode;
}
+/* ****************************************************************
+ * execnodes.h copy functions
+ * ****************************************************************
+ */
+static PartitionPruneResult *
+_copyPartitionPruneResult(const PartitionPruneResult *from)
+{
+ PartitionPruneResult *newnode = makeNode(PartitionPruneResult);
+
+ COPY_NODE_FIELD(valid_subplan_offs_list);
+ COPY_BITMAPSET_FIELD(scan_leafpart_rtis);
+
+ return newnode;
+}
+
/* ****************************************************************
* value.h copy functions
* ****************************************************************
@@ -6572,6 +6592,13 @@ copyObjectImpl(const void *from)
retval = _copyPublicationTable(from);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_PartitionPruneResult:
+ retval = _copyPartitionPruneResult(from);
+ break;
+
/*
* MISCELLANEOUS NODES
*/
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 7618444b4d..7346820eee 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -326,7 +326,9 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
WRITE_NODE_FIELD(partPruneInfos);
+ WRITE_BOOL_FIELD(containsInitialPruning);
WRITE_NODE_FIELD(rtable);
+ WRITE_BITMAPSET_FIELD(minLockRelids);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
WRITE_NODE_FIELD(subplans);
@@ -1021,6 +1023,8 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(needs_init_pruning);
+ WRITE_BOOL_FIELD(needs_exec_pruning);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -1035,6 +1039,7 @@ _outPartitionedRelPruneInfo(StringInfo str, const PartitionedRelPruneInfo *node)
WRITE_INT_ARRAY(subplan_map, node->nparts);
WRITE_INT_ARRAY(subpart_map, node->nparts);
WRITE_OID_ARRAY(relid_map, node->nparts);
+ WRITE_INDEX_ARRAY(rti_map, node->nparts);
WRITE_NODE_FIELD(initial_pruning_steps);
WRITE_NODE_FIELD(exec_pruning_steps);
WRITE_BITMAPSET_FIELD(execparamids);
@@ -2440,6 +2445,8 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
WRITE_NODE_FIELD(partPruneInfos);
+ WRITE_BOOL_FIELD(containsInitialPruning);
+ WRITE_BITMAPSET_FIELD(minLockRelids);
WRITE_NODE_FIELD(relationOids);
WRITE_NODE_FIELD(invalItems);
WRITE_NODE_FIELD(paramExecTypes);
@@ -2861,6 +2868,21 @@ _outExtensibleNode(StringInfo str, const ExtensibleNode *node)
methods->nodeOut(str, node);
}
+/*****************************************************************************
+ *
+ * Stuff from execnodes.h
+ *
+ *****************************************************************************/
+
+static void
+_outPartitionPruneResult(StringInfo str, const PartitionPruneResult *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNERESULT");
+
+ WRITE_NODE_FIELD(valid_subplan_offs_list);
+ WRITE_BITMAPSET_FIELD(scan_leafpart_rtis);
+}
+
/*****************************************************************************
*
* Stuff from parsenodes.h.
@@ -4770,6 +4792,13 @@ outNode(StringInfo str, const void *obj)
_outJsonTableSibling(str, obj);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_PartitionPruneResult:
+ _outPartitionPruneResult(str, obj);
+ break;
+
default:
/*
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index bf602ff93e..c1d131aa99 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -164,6 +164,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -1818,7 +1823,9 @@ _readPlannedStmt(void)
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
READ_NODE_FIELD(partPruneInfos);
+ READ_BOOL_FIELD(containsInitialPruning);
READ_NODE_FIELD(rtable);
+ READ_BITMAPSET_FIELD(minLockRelids);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
READ_NODE_FIELD(subplans);
@@ -2770,6 +2777,8 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(needs_init_pruning);
+ READ_BOOL_FIELD(needs_exec_pruning);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
@@ -2786,6 +2795,7 @@ _readPartitionedRelPruneInfo(void)
READ_INT_ARRAY(subplan_map, local_node->nparts);
READ_INT_ARRAY(subpart_map, local_node->nparts);
READ_OID_ARRAY(relid_map, local_node->nparts);
+ READ_INDEX_ARRAY(rti_map, local_node->nparts);
READ_NODE_FIELD(initial_pruning_steps);
READ_NODE_FIELD(exec_pruning_steps);
READ_BITMAPSET_FIELD(execparamids);
@@ -2939,6 +2949,21 @@ _readPartitionRangeDatum(void)
READ_DONE();
}
+
+/*
+ * _readPartitionPruneResult
+ */
+static PartitionPruneResult *
+_readPartitionPruneResult(void)
+{
+ READ_LOCALS(PartitionPruneResult);
+
+ READ_NODE_FIELD(valid_subplan_offs_list);
+ READ_BITMAPSET_FIELD(scan_leafpart_rtis);
+
+ READ_DONE();
+}
+
/*
* parseNodeString
*
@@ -3236,6 +3261,8 @@ parseNodeString(void)
return_value = _readJsonTableParent();
else if (MATCH("JSONTABLESIBLING", 16))
return_value = _readJsonTableSibling();
+ else if (MATCH("PARTITIONPRUNERESULT", 20))
+ return_value = _readPartitionPruneResult();
else
{
elog(ERROR, "badly formatted node string \"%.32s\"...", token);
@@ -3379,6 +3406,30 @@ readIntCols(int numCols)
return int_vals;
}
+/*
+ * readIndexCols
+ */
+Index *
+readIndexCols(int numCols)
+{
+ int tokenLength,
+ i;
+ const char *token;
+ Index *index_vals;
+
+ if (numCols <= 0)
+ return NULL;
+
+ index_vals = (Index *) palloc(numCols * sizeof(Index));
+ for (i = 0; i < numCols; i++)
+ {
+ token = pg_strtok(&tokenLength);
+ index_vals[i] = atoui(token);
+ }
+
+ return index_vals;
+}
+
/*
* readBoolCols
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b11249ed8f..7141035cc4 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -519,7 +519,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index b8d5610593..da749e331e 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
foreach(l, pruneinfo->prune_infos)
@@ -362,15 +373,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d77f7d3aef..952c5b8327 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -341,6 +352,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -441,13 +454,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -458,6 +476,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -545,6 +567,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -619,6 +644,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -646,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -658,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -672,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -696,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 5ab91c2c58..5ae967608d 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1603,6 +1603,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_result_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1978,7 +1979,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
/*
* Now we can define the portal.
@@ -1993,6 +1996,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..8cc2e2162d 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_result = part_prune_result; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_result: ExecutorDoInitialPruning() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_result, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results == NIL ? NULL :
+ linitial(portal->part_prune_results),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ PartitionPruneResult *part_prune_result = NULL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding PartitionPruneResult for
+ * this PlannedStmt.
+ */
+ if (portal->part_prune_results != NIL)
+ part_prune_result = list_nth(portal->part_prune_results,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 0d6a295674..8c164741f7 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,15 +795,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_result_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +830,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_result_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +866,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /*
+ * The output list and any objects therein have been allocated in the
+ * caller's hopefully short-lived context, so will not remain leaked
+ * for long, though reset to avoid its accidentally being looked at.
+ */
+ *part_prune_result_list = NIL;
}
/*
@@ -874,10 +899,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NULLs is returned in *part_prune_result_list, meaning that no
+ * PartitionPruneResult nodes have yet been created for the plans in
+ * stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1037,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_result_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
+ }
+
return plan;
}
@@ -1126,6 +1167,17 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a PartitionPruneResult or a NULL is added to
+ * *part_prune_result_list if needed. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and contains at least one
+ * PartitionPruneInfo that has "initial" pruning steps. Those steps are
+ * performed by calling ExecutorDoInitialPruning() to determine only those
+ * leaf partitions that need to be locked by AcquireExecutorLocks() by pruning
+ * away subplans that don't match the pruning conditions. The
+ * PartitionPruneResult contains a list of bitmapsets of the indexes of
+ * matching subplans, one for each PartitionPruneInfo.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1191,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_result_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1214,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_result_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1224,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_result_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1270,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_result_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1303,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_result_list)
+ *part_prune_result_list = my_part_prune_result_list;
+
return plan;
}
@@ -1737,17 +1797,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_result_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ PartitionPruneResult *part_prune_result = NULL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1833,35 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Obtain the set of partitions to be locked from the
+ * PartitionPruneInfos by considering the result of performing
+ * initial partition pruning.
+ */
+ PartitionPruneResult *part_prune_result =
+ ExecutorDoInitialPruning(plannedstmt, boundParams);
+
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ part_prune_result->scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1872,58 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_result_list = lappend(*part_prune_result_list,
+ part_prune_result);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index d549f66d4a..1bbe6b704b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given list of PartitionPruneResults into the portal's
+ * context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results)
+{
+ MemoryContext oldcxt;
+
+ AssertArg(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results = copyObject(part_prune_results);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..e57e133f0e 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index bf962af7af..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -126,5 +128,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..60d5644908 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ PartitionPruneResult *part_prune_result; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d68a6b9d28..5c4a282be0 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern PartitionPruneResult *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 25e0bb976e..4d4bb3fc3c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -986,6 +986,33 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecutorDoInitialPruning() invocation on a given
+ * PlannedStmt.
+ *
+ * Contains a list of Bitmapset of the indexes of the subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() for every
+ * PartitionPruneInfo found in PlannedStmt.partPruneInfos. RT indexes of the
+ * leaf partitions scanned by those subplans across all PartitionPruneInfos
+ * are added into scan_leafpart_rtis.
+ *
+ * This is used by GetCachedPlan() to inform its callers of the pruning
+ * decisions made when performing AcquireExecutorLocks() on a given cached
+ * PlannedStmt, which the callers then pass on to the executor. The executor
+ * refers to this node when initializing the plan nodes which contain subplans
+ * that may have been pruned by ExecutorDoInitialPruning(), rather than
+ * redoing initial pruning.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ List *valid_subplan_offs_list;
+ Bitmapset *scan_leafpart_rtis;
+} PartitionPruneResult;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 7ce1fc4deb..c7f256028e 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -97,6 +97,9 @@ typedef enum NodeTag
T_PartitionPruneStepCombine,
T_PlanInvalItem,
+ /* TAGS FOR EXECUTOR PREP NODES (execnodes.h) */
+ T_PartitionPruneResult,
+
/*
* TAGS FOR PLAN STATE NODES (execnodes.h)
*
@@ -675,6 +678,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index a0f3a46334..c2d91bb12f 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -110,6 +110,15 @@ typedef struct PlannerGlobal
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
List *relationOids; /* OIDs of relations the plan depends on */
List *invalItems; /* other dependencies, as PlanInvalItems */
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c3f4a39657..869bf535bc 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -67,8 +67,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1386,6 +1395,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1394,6 +1410,8 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1436,6 +1454,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map;
+ /* Range table index by partition index, or 0. */
+ Index *rti_map;
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..1c5bb5ece1 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9f7727a837 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results; /* list of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_result_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
Rebased over 964d01ae90c.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v18-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patchapplication/octet-stream; name=v18-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patchDownload
From 567059057ee35bcd8ca066f46d4c6b23641af090 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v18 2/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 32 ++++
src/backend/executor/execMain.c | 53 ++++++
src/backend/executor/execParallel.c | 27 ++-
src/backend/executor/execPartition.c | 234 +++++++++++++++++++++----
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/copyfuncs.c | 1 -
src/backend/nodes/outfuncs.c | 1 -
src/backend/nodes/readfuncs.c | 29 +++
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 187 +++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 2 +
src/include/nodes/execnodes.h | 27 +++
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 13 ++
src/include/nodes/plannodes.h | 21 +++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
34 files changed, 782 insertions(+), 98 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index fca29a9a10..d839517693 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -541,7 +541,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e29c2ae206..e41b13a3ea 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_result,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_result, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 3db859c3ea..631cc07217 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..b0ed96e56c 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 2333aae467..83465e40f8 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_result_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_result_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_result_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_result, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..953a476ea5 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,34 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed at this
+point to figure out the minimal set of child subplans that satisfy those
+pruning steps. AcquireExecutorLocks() looking at a given plan tree will then
+lock only the relations scanned by the child subplans that survived such
+pruning, along with those present in PlannedStmt.minLockRelids. Note that the
+subplans are only notionally pruned in that they are not removed from the plan
+tree as such.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of pruning is passed to the executor as a
+PartitionPruneResult node via the QueryDesc. It consists of the set of
+indexes of surviving subplans in their respective parent plan node's list of
+child subplans, saved as a list of bitmapsets, with one element for every
+parent plan node whose PartitionPruneInfo is present in
+PlannedStmt.partPruneInfos. In other words, the executor should not
+re-evaluate the set of initially valid subplans by redoing the initial pruning
+if it was already done by AcquireExecutorLocks(), because the re-evaluation may
+very well end up resulting in a different set of subplans, containing some
+whose relations were not locked by AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +314,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 72fc273524..45824624f8 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,56 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+PartitionPruneResult *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ PartitionPruneResult *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ result = makeNode(PartitionPruneResult);
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *valid_subplan_offs;
+
+ valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ &result->scan_leafpart_rtis);
+ result->valid_subplan_offs_list =
+ lappend(result->valid_subplan_offs_list,
+ valid_subplan_offs);
+ }
+
+ return result;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +857,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ PartitionPruneResult *part_prune_result = queryDesc->part_prune_result;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +878,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_result = part_prune_result;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f73b8c2607..7e6dab5623 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITIONPRUNERESULT UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_result_data;
+ char *part_prune_result_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_result_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_result_data = nodeToString(estate->es_part_prune_result);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized PartitionPruneResult. */
+ part_prune_result_len = strlen(part_prune_result_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_result_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized PartitionPruneResult */
+ part_prune_result_space = shm_toc_allocate(pcxt->toc, part_prune_result_len);
+ memcpy(part_prune_result_space, part_prune_result_data, part_prune_result_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITIONPRUNERESULT,
+ part_prune_result_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_result_space;
char *paramspace;
PlannedStmt *pstmt;
+ PartitionPruneResult *part_prune_result;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,18 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_result_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITIONPRUNERESULT, false);
+ part_prune_result = (PartitionPruneResult *)
+ stringToNode(part_prune_result_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_result,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b55cdd2580..24e6f6e988 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1593,8 +1599,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1611,6 +1619,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1628,8 +1643,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1645,24 +1661,59 @@ ExecInitPartitionPruning(PlanState *planstate,
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
part_prune_index);
+ PartitionPruneResult *pruneresult = estate->es_part_prune_result;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_result
+ * has been set.
+ */
+ if (pruneresult)
+ do_pruning = pruneinfo->needs_exec_pruning;
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans =
+ list_nth(pruneresult->valid_subplan_offs_list, part_prune_index);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
/* No pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1670,7 +1721,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1686,11 +1738,73 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1704,19 +1818,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1771,15 +1887,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1793,6 +1936,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1803,6 +1947,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -1853,6 +1999,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -1860,6 +2008,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -1881,7 +2030,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1891,7 +2040,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2119,10 +2268,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2157,7 +2310,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2171,6 +2324,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2181,13 +2336,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2214,8 +2371,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2223,7 +2386,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f9460ae506..a2182a6b1f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c6f86a6510..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -155,7 +155,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +578,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +643,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +718,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +869,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8d35860c30..2312e5a633 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 29bc26669b..303a572c02 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_result_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_result_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_result_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_result,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e76fda8eba..afd0332ddd 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -160,7 +160,6 @@ _copyExtensibleNode(const ExtensibleNode *from)
return newnode;
}
-
/*
* copyObjectImpl -- implementation of copyObject(); see nodes/nodes.h
*
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 81f6a9093c..84a195adca 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -294,7 +294,6 @@ outDatum(StringInfo str, Datum value, int typlen, bool typbyval)
#include "outfuncs.funcs.c"
-
/*
* Support functions for nodes with custom_read_write attribute or
* special_read_write attribute
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 1421686938..d57478bde9 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -623,6 +628,30 @@ readIntCols(int numCols)
return int_vals;
}
+/*
+ * readIndexCols
+ */
+Index *
+readIndexCols(int numCols)
+{
+ int tokenLength,
+ i;
+ const char *token;
+ Index *index_vals;
+
+ if (numCols <= 0)
+ return NULL;
+
+ index_vals = (Index *) palloc(numCols * sizeof(Index));
+ for (i = 0; i < numCols; i++)
+ {
+ token = pg_strtok(&tokenLength);
+ index_vals[i] = atoui(token);
+ }
+
+ return index_vals;
+}
+
/*
* readBoolCols
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b11249ed8f..7141035cc4 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -519,7 +519,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index b8d5610593..da749e331e 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
foreach(l, pruneinfo->prune_infos)
@@ -362,15 +373,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d77f7d3aef..952c5b8327 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -341,6 +352,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -441,13 +454,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -458,6 +476,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -545,6 +567,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -619,6 +644,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -646,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -658,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -672,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -696,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 6f18b68856..16bda42f11 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1596,6 +1596,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_result_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1971,7 +1972,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
/*
* Now we can define the portal.
@@ -1986,6 +1989,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..8cc2e2162d 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_result = part_prune_result; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_result: ExecutorDoInitialPruning() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_result, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results == NIL ? NULL :
+ linitial(portal->part_prune_results),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ PartitionPruneResult *part_prune_result = NULL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding PartitionPruneResult for
+ * this PlannedStmt.
+ */
+ if (portal->part_prune_results != NIL)
+ part_prune_result = list_nth(portal->part_prune_results,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 0d6a295674..d1c9605979 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,15 +795,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_result_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +830,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_result_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +866,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /*
+ * The output list and any objects therein have been allocated in the
+ * caller's hopefully short-lived context, so will not remain leaked
+ * for long, though reset to avoid its accidentally being looked at.
+ */
+ *part_prune_result_list = NIL;
}
/*
@@ -874,10 +899,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NULLs is returned in *part_prune_result_list, meaning that no
+ * PartitionPruneResult nodes have yet been created for the plans in
+ * stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1037,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_result_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
+ }
+
return plan;
}
@@ -1126,6 +1167,17 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a PartitionPruneResult or a NULL is added to
+ * *part_prune_result_list if needed. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and contains at least one
+ * PartitionPruneInfo that has "initial" pruning steps. Those steps are
+ * performed by calling ExecutorDoInitialPruning() to determine only those
+ * leaf partitions that need to be locked by AcquireExecutorLocks() by pruning
+ * away subplans that don't match the pruning conditions. The
+ * PartitionPruneResult contains a list of bitmapsets of the indexes of
+ * matching subplans, one for each PartitionPruneInfo.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1191,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_result_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1214,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_result_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1224,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_result_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1270,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_result_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1303,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_result_list)
+ *part_prune_result_list = my_part_prune_result_list;
+
return plan;
}
@@ -1737,17 +1797,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_result_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ PartitionPruneResult *part_prune_result = NULL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1833,38 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ PartitionPruneResult *part_prune_result =
+ ExecutorDoInitialPruning(plannedstmt, boundParams);
+
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ part_prune_result->scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1875,58 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_result_list = lappend(*part_prune_result_list,
+ part_prune_result);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index d549f66d4a..1bbe6b704b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given list of PartitionPruneResults into the portal's
+ * context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results)
+{
+ MemoryContext oldcxt;
+
+ AssertArg(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results = copyObject(part_prune_results);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..e57e133f0e 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index bf962af7af..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -126,5 +128,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..60d5644908 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ PartitionPruneResult *part_prune_result; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d68a6b9d28..5c4a282be0 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern PartitionPruneResult *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63a89474db..12ea06c2f6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1001,6 +1001,33 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecutorDoInitialPruning() invocation on a given
+ * PlannedStmt.
+ *
+ * Contains a list of Bitmapset of the indexes of the subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() for every
+ * PartitionPruneInfo found in PlannedStmt.partPruneInfos. RT indexes of the
+ * leaf partitions scanned by those subplans across all PartitionPruneInfos
+ * are added into scan_leafpart_rtis.
+ *
+ * This is used by GetCachedPlan() to inform its callers of the pruning
+ * decisions made when performing AcquireExecutorLocks() on a given cached
+ * PlannedStmt, which the callers then pass on to the executor. The executor
+ * refers to this node when initializing the plan nodes which contain subplans
+ * that may have been pruned by ExecutorDoInitialPruning(), rather than
+ * redoing initial pruning.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ List *valid_subplan_offs_list;
+ Bitmapset *scan_leafpart_rtis;
+} PartitionPruneResult;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index cdd6debfa0..b33d9e426d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -212,6 +212,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d87957ff6c..7957aeb6d7 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,19 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial (pre-exec) pruning
+ * steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index f2daabb3b7..1d2c0d9bdf 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -72,8 +72,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1409,6 +1418,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1419,6 +1435,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1463,6 +1481,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..1c5bb5ece1 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9f7727a837 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results; /* list of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_result_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
v18-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patchapplication/octet-stream; name=v18-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patchDownload
From 571424d7f1d5cb8b3ee59853649d35731b033b03 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 27 May 2022 16:00:28 +0900
Subject: [PATCH v18 1/2] Move PartitioPruneInfo out of plan nodes into
PlannedStmt
The planner will now add a given PartitioPruneInfo to
PlannedStmt.partPruneInfos instead of directly to the
Append/MergeAppend plan node. What gets set instead in the
latter is an index field which points to the list element
of PlannedStmt.partPruneInfos containing the PartitioPruneInfo
belonging to the plan node.
A later commit will make AcquireExecutorLocks() do the initial
partition pruning to determine a minimal set of partitions to be
locked when validating a plan tree and it will need to consult the
PartitioPruneInfos referenced therein to do so. It would be better
for the PartitioPruneInfos to be accessible directly than requiring
a walk of the plan tree to find them, which is easier when it can be
done by simply iterating over PlannedStmt.partPruneInfos.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 4 +-
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeAppend.c | 4 +-
src/backend/executor/nodeMergeAppend.c | 4 +-
src/backend/nodes/outfuncs.c | 1 -
src/backend/optimizer/plan/createplan.c | 24 ++++-----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 65 +++++++++++++------------
src/backend/partitioning/partprune.c | 18 ++++---
src/include/executor/execPartition.h | 3 +-
src/include/nodes/execnodes.h | 2 +
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 11 +++--
src/include/partitioning/partprune.h | 8 +--
16 files changed, 92 insertions(+), 63 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ef2fd46092..72fc273524 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -825,6 +825,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f1fd7f7e8b..f73b8c2607 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index e03ea27299..b55cdd2580 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1638,11 +1638,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..f9c7976ff2 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,8 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
+ estate->es_part_prune_result = NULL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..c6f86a6510 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index c5c62fa5c7..8d35860c30 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 4d776e7b51..81f6a9093c 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -299,7 +299,6 @@ outDatum(StringInfo str, Datum value, int typlen, bool typbyval)
* Support functions for nodes with custom_read_write attribute or
* special_read_write attribute
*/
-
static void
_outConst(StringInfo str, const Const *node)
{
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 76606faa3e..58a05cf673 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1203,7 +1203,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1354,6 +1353,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1377,16 +1379,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1426,7 +1426,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1519,6 +1518,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1542,13 +1544,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 06ad856eac..b11249ed8f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -518,6 +518,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 9cef92cab2..b8d5610593 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -348,6 +348,29 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ /* RT index of the table to which the pinfo belongs. */
+ pinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1655,21 +1678,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1727,21 +1741,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9d3c05aed3..d77f7d3aef 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -209,16 +209,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -332,7 +336,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
@@ -358,7 +362,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bf962af7af 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,8 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 01b1727fc0..63a89474db 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -611,6 +611,8 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ struct PartitionPruneResult *es_part_prune_result; /* QueryDesc.part_prune_result */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 44ffc73f15..d87957ff6c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -122,6 +122,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* List of PartitionPruneInfo contained in the plan */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -480,6 +483,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index dca2a21e7a..f2daabb3b7 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -69,6 +69,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
@@ -269,8 +272,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -304,8 +307,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
On Wed, Jul 13, 2022 at 3:40 PM Amit Langote <amitlangote09@gmail.com> wrote:
Rebased over 964d01ae90c.
Sorry, left some pointless hunks in there while rebasing. Fixed in
the attached.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v19-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patchapplication/octet-stream; name=v19-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patchDownload
From 9fa5cd5f4256b7249ab6f560edca9d3609a126ef Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 27 May 2022 16:00:28 +0900
Subject: [PATCH v19 1/2] Move PartitioPruneInfo out of plan nodes into
PlannedStmt
The planner will now add a given PartitioPruneInfo to
PlannedStmt.partPruneInfos instead of directly to the
Append/MergeAppend plan node. What gets set instead in the
latter is an index field which points to the list element
of PlannedStmt.partPruneInfos containing the PartitioPruneInfo
belonging to the plan node.
A later commit will make AcquireExecutorLocks() do the initial
partition pruning to determine a minimal set of partitions to be
locked when validating a plan tree and it will need to consult the
PartitioPruneInfos referenced therein to do so. It would be better
for the PartitioPruneInfos to be accessible directly than requiring
a walk of the plan tree to find them, which is easier when it can be
done by simply iterating over PlannedStmt.partPruneInfos.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 4 +-
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeAppend.c | 4 +-
src/backend/executor/nodeMergeAppend.c | 4 +-
src/backend/optimizer/plan/createplan.c | 24 ++++-----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 65 +++++++++++++------------
src/backend/partitioning/partprune.c | 18 ++++---
src/include/executor/execPartition.h | 3 +-
src/include/nodes/execnodes.h | 2 +
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 11 +++--
src/include/partitioning/partprune.h | 8 +--
15 files changed, 92 insertions(+), 62 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ef2fd46092..72fc273524 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -825,6 +825,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f1fd7f7e8b..f73b8c2607 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index e03ea27299..b55cdd2580 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1638,11 +1638,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..f9c7976ff2 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,8 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
+ estate->es_part_prune_result = NULL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..c6f86a6510 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index c5c62fa5c7..8d35860c30 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index e37f2933eb..fd8ab4a167 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1203,7 +1203,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1354,6 +1353,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1377,16 +1379,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1425,7 +1425,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1518,6 +1517,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1541,13 +1543,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 06ad856eac..b11249ed8f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -518,6 +518,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 9cef92cab2..b8d5610593 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -348,6 +348,29 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ /* RT index of the table to which the pinfo belongs. */
+ pinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1655,21 +1678,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1727,21 +1741,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9d3c05aed3..d77f7d3aef 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -209,16 +209,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -332,7 +336,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
@@ -358,7 +362,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bf962af7af 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,8 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 01b1727fc0..63a89474db 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -611,6 +611,8 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ struct PartitionPruneResult *es_part_prune_result; /* QueryDesc.part_prune_result */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 44ffc73f15..d87957ff6c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -122,6 +122,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* List of PartitionPruneInfo contained in the plan */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -480,6 +483,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index dca2a21e7a..f2daabb3b7 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -69,6 +69,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
@@ -269,8 +272,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -304,8 +307,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v19-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patchapplication/octet-stream; name=v19-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patchDownload
From b67911f2ae182f7158501e7ce4b1799ff2e1efb4 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v19 2/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 32 ++++
src/backend/executor/execMain.c | 53 ++++++
src/backend/executor/execParallel.c | 27 ++-
src/backend/executor/execPartition.c | 234 +++++++++++++++++++++----
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/readfuncs.c | 29 +++
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 187 +++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 2 +
src/include/nodes/execnodes.h | 27 +++
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 13 ++
src/include/nodes/plannodes.h | 21 +++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
32 files changed, 782 insertions(+), 96 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index fca29a9a10..d839517693 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -541,7 +541,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e29c2ae206..e41b13a3ea 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_result,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_result, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 3db859c3ea..631cc07217 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..b0ed96e56c 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 2333aae467..83465e40f8 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_result_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_result_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_result_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_result, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..953a476ea5 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,34 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed at this
+point to figure out the minimal set of child subplans that satisfy those
+pruning steps. AcquireExecutorLocks() looking at a given plan tree will then
+lock only the relations scanned by the child subplans that survived such
+pruning, along with those present in PlannedStmt.minLockRelids. Note that the
+subplans are only notionally pruned in that they are not removed from the plan
+tree as such.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of pruning is passed to the executor as a
+PartitionPruneResult node via the QueryDesc. It consists of the set of
+indexes of surviving subplans in their respective parent plan node's list of
+child subplans, saved as a list of bitmapsets, with one element for every
+parent plan node whose PartitionPruneInfo is present in
+PlannedStmt.partPruneInfos. In other words, the executor should not
+re-evaluate the set of initially valid subplans by redoing the initial pruning
+if it was already done by AcquireExecutorLocks(), because the re-evaluation may
+very well end up resulting in a different set of subplans, containing some
+whose relations were not locked by AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +314,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 72fc273524..45824624f8 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,56 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+PartitionPruneResult *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ PartitionPruneResult *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ result = makeNode(PartitionPruneResult);
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *valid_subplan_offs;
+
+ valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ &result->scan_leafpart_rtis);
+ result->valid_subplan_offs_list =
+ lappend(result->valid_subplan_offs_list,
+ valid_subplan_offs);
+ }
+
+ return result;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +857,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ PartitionPruneResult *part_prune_result = queryDesc->part_prune_result;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +878,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_result = part_prune_result;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f73b8c2607..7e6dab5623 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITIONPRUNERESULT UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_result_data;
+ char *part_prune_result_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_result_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_result_data = nodeToString(estate->es_part_prune_result);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized PartitionPruneResult. */
+ part_prune_result_len = strlen(part_prune_result_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_result_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized PartitionPruneResult */
+ part_prune_result_space = shm_toc_allocate(pcxt->toc, part_prune_result_len);
+ memcpy(part_prune_result_space, part_prune_result_data, part_prune_result_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITIONPRUNERESULT,
+ part_prune_result_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_result_space;
char *paramspace;
PlannedStmt *pstmt;
+ PartitionPruneResult *part_prune_result;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,18 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_result_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITIONPRUNERESULT, false);
+ part_prune_result = (PartitionPruneResult *)
+ stringToNode(part_prune_result_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_result,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b55cdd2580..24e6f6e988 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1593,8 +1599,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1611,6 +1619,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1628,8 +1643,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1645,24 +1661,59 @@ ExecInitPartitionPruning(PlanState *planstate,
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
part_prune_index);
+ PartitionPruneResult *pruneresult = estate->es_part_prune_result;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_result
+ * has been set.
+ */
+ if (pruneresult)
+ do_pruning = pruneinfo->needs_exec_pruning;
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans =
+ list_nth(pruneresult->valid_subplan_offs_list, part_prune_index);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
/* No pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1670,7 +1721,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1686,11 +1738,73 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1704,19 +1818,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1771,15 +1887,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1793,6 +1936,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1803,6 +1947,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -1853,6 +1999,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -1860,6 +2008,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -1881,7 +2030,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1891,7 +2040,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2119,10 +2268,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2157,7 +2310,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2171,6 +2324,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2181,13 +2336,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2214,8 +2371,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2223,7 +2386,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f9460ae506..a2182a6b1f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c6f86a6510..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -155,7 +155,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +578,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +643,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +718,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +869,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8d35860c30..2312e5a633 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 29bc26669b..303a572c02 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_result_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_result_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_result_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_result,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 1421686938..d57478bde9 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -623,6 +628,30 @@ readIntCols(int numCols)
return int_vals;
}
+/*
+ * readIndexCols
+ */
+Index *
+readIndexCols(int numCols)
+{
+ int tokenLength,
+ i;
+ const char *token;
+ Index *index_vals;
+
+ if (numCols <= 0)
+ return NULL;
+
+ index_vals = (Index *) palloc(numCols * sizeof(Index));
+ for (i = 0; i < numCols; i++)
+ {
+ token = pg_strtok(&tokenLength);
+ index_vals[i] = atoui(token);
+ }
+
+ return index_vals;
+}
+
/*
* readBoolCols
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b11249ed8f..7141035cc4 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -519,7 +519,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index b8d5610593..da749e331e 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
foreach(l, pruneinfo->prune_infos)
@@ -362,15 +373,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d77f7d3aef..952c5b8327 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -341,6 +352,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -441,13 +454,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -458,6 +476,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -545,6 +567,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -619,6 +644,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -646,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -658,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -672,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -696,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 6f18b68856..16bda42f11 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1596,6 +1596,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_result_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1971,7 +1972,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
/*
* Now we can define the portal.
@@ -1986,6 +1989,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..8cc2e2162d 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_result = part_prune_result; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_result: ExecutorDoInitialPruning() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_result, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results == NIL ? NULL :
+ linitial(portal->part_prune_results),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ PartitionPruneResult *part_prune_result = NULL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding PartitionPruneResult for
+ * this PlannedStmt.
+ */
+ if (portal->part_prune_results != NIL)
+ part_prune_result = list_nth(portal->part_prune_results,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 0d6a295674..d1c9605979 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,15 +795,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_result_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +830,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_result_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +866,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /*
+ * The output list and any objects therein have been allocated in the
+ * caller's hopefully short-lived context, so will not remain leaked
+ * for long, though reset to avoid its accidentally being looked at.
+ */
+ *part_prune_result_list = NIL;
}
/*
@@ -874,10 +899,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NULLs is returned in *part_prune_result_list, meaning that no
+ * PartitionPruneResult nodes have yet been created for the plans in
+ * stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1037,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_result_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
+ }
+
return plan;
}
@@ -1126,6 +1167,17 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a PartitionPruneResult or a NULL is added to
+ * *part_prune_result_list if needed. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and contains at least one
+ * PartitionPruneInfo that has "initial" pruning steps. Those steps are
+ * performed by calling ExecutorDoInitialPruning() to determine only those
+ * leaf partitions that need to be locked by AcquireExecutorLocks() by pruning
+ * away subplans that don't match the pruning conditions. The
+ * PartitionPruneResult contains a list of bitmapsets of the indexes of
+ * matching subplans, one for each PartitionPruneInfo.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1191,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_result_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1214,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_result_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1224,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_result_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1270,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_result_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1303,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_result_list)
+ *part_prune_result_list = my_part_prune_result_list;
+
return plan;
}
@@ -1737,17 +1797,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_result_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ PartitionPruneResult *part_prune_result = NULL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1833,38 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ PartitionPruneResult *part_prune_result =
+ ExecutorDoInitialPruning(plannedstmt, boundParams);
+
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ part_prune_result->scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1875,58 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_result_list = lappend(*part_prune_result_list,
+ part_prune_result);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index d549f66d4a..1bbe6b704b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given list of PartitionPruneResults into the portal's
+ * context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results)
+{
+ MemoryContext oldcxt;
+
+ AssertArg(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results = copyObject(part_prune_results);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..e57e133f0e 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index bf962af7af..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -126,5 +128,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..60d5644908 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ PartitionPruneResult *part_prune_result; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d68a6b9d28..5c4a282be0 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern PartitionPruneResult *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63a89474db..12ea06c2f6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1001,6 +1001,33 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecutorDoInitialPruning() invocation on a given
+ * PlannedStmt.
+ *
+ * Contains a list of Bitmapset of the indexes of the subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() for every
+ * PartitionPruneInfo found in PlannedStmt.partPruneInfos. RT indexes of the
+ * leaf partitions scanned by those subplans across all PartitionPruneInfos
+ * are added into scan_leafpart_rtis.
+ *
+ * This is used by GetCachedPlan() to inform its callers of the pruning
+ * decisions made when performing AcquireExecutorLocks() on a given cached
+ * PlannedStmt, which the callers then pass on to the executor. The executor
+ * refers to this node when initializing the plan nodes which contain subplans
+ * that may have been pruned by ExecutorDoInitialPruning(), rather than
+ * redoing initial pruning.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ List *valid_subplan_offs_list;
+ Bitmapset *scan_leafpart_rtis;
+} PartitionPruneResult;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index cdd6debfa0..b33d9e426d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -212,6 +212,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d87957ff6c..7957aeb6d7 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,19 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial (pre-exec) pruning
+ * steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index f2daabb3b7..1d2c0d9bdf 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -72,8 +72,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1409,6 +1418,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1419,6 +1435,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1463,6 +1481,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..1c5bb5ece1 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9f7727a837 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results; /* list of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_result_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
On Wed, Jul 13, 2022 at 4:03 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Wed, Jul 13, 2022 at 3:40 PM Amit Langote <amitlangote09@gmail.com> wrote:
Rebased over 964d01ae90c.
Sorry, left some pointless hunks in there while rebasing. Fixed in
the attached.
Needed to be rebased again, over 2d04277121f this time.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v20-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patchapplication/octet-stream; name=v20-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patchDownload
From 8de25528e8f388beffdab3d7c9905712e2f8eeef Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 27 May 2022 16:00:28 +0900
Subject: [PATCH v20 1/2] Move PartitioPruneInfo out of plan nodes into
PlannedStmt
The planner will now add a given PartitioPruneInfo to
PlannedStmt.partPruneInfos instead of directly to the
Append/MergeAppend plan node. What gets set instead in the
latter is an index field which points to the list element
of PlannedStmt.partPruneInfos containing the PartitioPruneInfo
belonging to the plan node.
A later commit will make AcquireExecutorLocks() do the initial
partition pruning to determine a minimal set of partitions to be
locked when validating a plan tree and it will need to consult the
PartitioPruneInfos referenced therein to do so. It would be better
for the PartitioPruneInfos to be accessible directly than requiring
a walk of the plan tree to find them, which is easier when it can be
done by simply iterating over PlannedStmt.partPruneInfos.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 4 +-
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeAppend.c | 4 +-
src/backend/executor/nodeMergeAppend.c | 4 +-
src/backend/optimizer/plan/createplan.c | 24 ++++-----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 65 +++++++++++++------------
src/backend/partitioning/partprune.c | 18 ++++---
src/include/executor/execPartition.h | 3 +-
src/include/nodes/execnodes.h | 2 +
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 11 +++--
src/include/partitioning/partprune.h | 8 +--
15 files changed, 92 insertions(+), 62 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ef2fd46092..72fc273524 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -825,6 +825,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f1fd7f7e8b..f73b8c2607 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index e03ea27299..b55cdd2580 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1638,11 +1638,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..f9c7976ff2 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,8 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
+ estate->es_part_prune_result = NULL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..c6f86a6510 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index c5c62fa5c7..8d35860c30 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index e37f2933eb..fd8ab4a167 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1203,7 +1203,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1354,6 +1353,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1377,16 +1379,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1425,7 +1425,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1518,6 +1517,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1541,13 +1543,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 06ad856eac..b11249ed8f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -518,6 +518,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1cb0abdbc1..720f20f563 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -348,6 +348,29 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ /* RT index of the table to which the pinfo belongs. */
+ pinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1658,21 +1681,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1734,21 +1748,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9d3c05aed3..d77f7d3aef 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -209,16 +209,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -332,7 +336,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
@@ -358,7 +362,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bf962af7af 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,8 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 01b1727fc0..63a89474db 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -611,6 +611,8 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ struct PartitionPruneResult *es_part_prune_result; /* QueryDesc.part_prune_result */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index e2081db4ed..a4e6b4db92 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -122,6 +122,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* List of PartitionPruneInfo contained in the plan */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -488,6 +491,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index dca2a21e7a..f2daabb3b7 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -69,6 +69,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
@@ -269,8 +272,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -304,8 +307,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v20-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patchapplication/octet-stream; name=v20-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patchDownload
From 7a1454c6a1ecde5c871bec5a4d646da4e41a62c3 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v20 2/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 32 ++++
src/backend/executor/execMain.c | 53 ++++++
src/backend/executor/execParallel.c | 27 ++-
src/backend/executor/execPartition.c | 234 +++++++++++++++++++++----
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/readfuncs.c | 8 +-
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 187 +++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 2 +
src/include/nodes/execnodes.h | 27 +++
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 13 ++
src/include/nodes/plannodes.h | 21 +++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
32 files changed, 759 insertions(+), 98 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index fca29a9a10..d839517693 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -541,7 +541,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e29c2ae206..e41b13a3ea 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_result,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_result, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 6b6720c690..374c0ff807 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..b0ed96e56c 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 579825c159..b6285958bc 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_result_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_result_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_result_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_result, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..953a476ea5 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,34 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed at this
+point to figure out the minimal set of child subplans that satisfy those
+pruning steps. AcquireExecutorLocks() looking at a given plan tree will then
+lock only the relations scanned by the child subplans that survived such
+pruning, along with those present in PlannedStmt.minLockRelids. Note that the
+subplans are only notionally pruned in that they are not removed from the plan
+tree as such.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of pruning is passed to the executor as a
+PartitionPruneResult node via the QueryDesc. It consists of the set of
+indexes of surviving subplans in their respective parent plan node's list of
+child subplans, saved as a list of bitmapsets, with one element for every
+parent plan node whose PartitionPruneInfo is present in
+PlannedStmt.partPruneInfos. In other words, the executor should not
+re-evaluate the set of initially valid subplans by redoing the initial pruning
+if it was already done by AcquireExecutorLocks(), because the re-evaluation may
+very well end up resulting in a different set of subplans, containing some
+whose relations were not locked by AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +314,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 72fc273524..45824624f8 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,56 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+PartitionPruneResult *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ PartitionPruneResult *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ result = makeNode(PartitionPruneResult);
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *valid_subplan_offs;
+
+ valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ &result->scan_leafpart_rtis);
+ result->valid_subplan_offs_list =
+ lappend(result->valid_subplan_offs_list,
+ valid_subplan_offs);
+ }
+
+ return result;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +857,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ PartitionPruneResult *part_prune_result = queryDesc->part_prune_result;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +878,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_result = part_prune_result;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f73b8c2607..7e6dab5623 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITIONPRUNERESULT UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_result_data;
+ char *part_prune_result_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_result_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_result_data = nodeToString(estate->es_part_prune_result);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized PartitionPruneResult. */
+ part_prune_result_len = strlen(part_prune_result_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_result_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized PartitionPruneResult */
+ part_prune_result_space = shm_toc_allocate(pcxt->toc, part_prune_result_len);
+ memcpy(part_prune_result_space, part_prune_result_data, part_prune_result_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITIONPRUNERESULT,
+ part_prune_result_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_result_space;
char *paramspace;
PlannedStmt *pstmt;
+ PartitionPruneResult *part_prune_result;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,18 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_result_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITIONPRUNERESULT, false);
+ part_prune_result = (PartitionPruneResult *)
+ stringToNode(part_prune_result_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_result,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b55cdd2580..24e6f6e988 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1593,8 +1599,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1611,6 +1619,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1628,8 +1643,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1645,24 +1661,59 @@ ExecInitPartitionPruning(PlanState *planstate,
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
part_prune_index);
+ PartitionPruneResult *pruneresult = estate->es_part_prune_result;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_result
+ * has been set.
+ */
+ if (pruneresult)
+ do_pruning = pruneinfo->needs_exec_pruning;
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans =
+ list_nth(pruneresult->valid_subplan_offs_list, part_prune_index);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
/* No pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1670,7 +1721,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1686,11 +1738,73 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1704,19 +1818,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1771,15 +1887,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1793,6 +1936,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1803,6 +1947,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -1853,6 +1999,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -1860,6 +2008,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -1881,7 +2030,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1891,7 +2040,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2119,10 +2268,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2157,7 +2310,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2171,6 +2324,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2181,13 +2336,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2214,8 +2371,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2223,7 +2386,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 076226868f..ed359b5153 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c6f86a6510..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -155,7 +155,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +578,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +643,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +718,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +869,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8d35860c30..2312e5a633 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 29bc26669b..303a572c02 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_result_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_result_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_result_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_result,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index bee62fc15c..e7886afa35 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -542,7 +547,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b11249ed8f..7141035cc4 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -519,7 +519,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 720f20f563..61d6934978 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
foreach(l, pruneinfo->prune_infos)
@@ -362,15 +373,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d77f7d3aef..952c5b8327 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -341,6 +352,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -441,13 +454,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -458,6 +476,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -545,6 +567,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -619,6 +644,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -646,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -658,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -672,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -696,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 078fbdb5a0..02fc5a011b 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1603,6 +1603,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_result_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1978,7 +1979,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
/*
* Now we can define the portal.
@@ -1993,6 +1996,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..8cc2e2162d 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_result = part_prune_result; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_result: ExecutorDoInitialPruning() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_result, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results == NIL ? NULL :
+ linitial(portal->part_prune_results),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ PartitionPruneResult *part_prune_result = NULL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding PartitionPruneResult for
+ * this PlannedStmt.
+ */
+ if (portal->part_prune_results != NIL)
+ part_prune_result = list_nth(portal->part_prune_results,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 0d6a295674..d1c9605979 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,15 +795,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_result_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +830,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_result_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +866,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /*
+ * The output list and any objects therein have been allocated in the
+ * caller's hopefully short-lived context, so will not remain leaked
+ * for long, though reset to avoid its accidentally being looked at.
+ */
+ *part_prune_result_list = NIL;
}
/*
@@ -874,10 +899,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NULLs is returned in *part_prune_result_list, meaning that no
+ * PartitionPruneResult nodes have yet been created for the plans in
+ * stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1037,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_result_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
+ }
+
return plan;
}
@@ -1126,6 +1167,17 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a PartitionPruneResult or a NULL is added to
+ * *part_prune_result_list if needed. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and contains at least one
+ * PartitionPruneInfo that has "initial" pruning steps. Those steps are
+ * performed by calling ExecutorDoInitialPruning() to determine only those
+ * leaf partitions that need to be locked by AcquireExecutorLocks() by pruning
+ * away subplans that don't match the pruning conditions. The
+ * PartitionPruneResult contains a list of bitmapsets of the indexes of
+ * matching subplans, one for each PartitionPruneInfo.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1191,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_result_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1214,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_result_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1224,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_result_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1270,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_result_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1303,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_result_list)
+ *part_prune_result_list = my_part_prune_result_list;
+
return plan;
}
@@ -1737,17 +1797,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_result_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ PartitionPruneResult *part_prune_result = NULL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1833,38 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ PartitionPruneResult *part_prune_result =
+ ExecutorDoInitialPruning(plannedstmt, boundParams);
+
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ part_prune_result->scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1875,58 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_result_list = lappend(*part_prune_result_list,
+ part_prune_result);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 3a161bdb88..27407a7f0f 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given list of PartitionPruneResults into the portal's
+ * context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results)
+{
+ MemoryContext oldcxt;
+
+ AssertArg(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results = copyObject(part_prune_results);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..e57e133f0e 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index bf962af7af..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -126,5 +128,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..60d5644908 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ PartitionPruneResult *part_prune_result; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d68a6b9d28..5c4a282be0 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern PartitionPruneResult *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63a89474db..12ea06c2f6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1001,6 +1001,33 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecutorDoInitialPruning() invocation on a given
+ * PlannedStmt.
+ *
+ * Contains a list of Bitmapset of the indexes of the subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() for every
+ * PartitionPruneInfo found in PlannedStmt.partPruneInfos. RT indexes of the
+ * leaf partitions scanned by those subplans across all PartitionPruneInfos
+ * are added into scan_leafpart_rtis.
+ *
+ * This is used by GetCachedPlan() to inform its callers of the pruning
+ * decisions made when performing AcquireExecutorLocks() on a given cached
+ * PlannedStmt, which the callers then pass on to the executor. The executor
+ * refers to this node when initializing the plan nodes which contain subplans
+ * that may have been pruned by ExecutorDoInitialPruning(), rather than
+ * redoing initial pruning.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ List *valid_subplan_offs_list;
+ Bitmapset *scan_leafpart_rtis;
+} PartitionPruneResult;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index cdd6debfa0..b33d9e426d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -212,6 +212,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index a4e6b4db92..86eda6c7c3 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,19 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial (pre-exec) pruning
+ * steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index f2daabb3b7..1d2c0d9bdf 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -72,8 +72,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1409,6 +1418,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1419,6 +1435,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1463,6 +1481,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..1c5bb5ece1 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9f7727a837 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results; /* list of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_result_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
On Tue, Jul 26, 2022 at 11:01 PM Amit Langote <amitlangote09@gmail.com> wrote:
Needed to be rebased again, over 2d04277121f this time.
0001 adds es_part_prune_result but does not use it, so maybe the
introduction of that field should be deferred until it's needed for
something.
I wonder whether it's really necessary to added the PartitionPruneInfo
objects to a list in PlannerInfo first and then roll them up into
PlannerGlobal later. I know we do that for range table entries, but
I've never quite understood why we do it that way instead of creating
a flat range table in PlannerGlobal from the start. And so by
extension I wonder whether this table couldn't be flat from the start
also.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Jul 28, 2022 at 1:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Jul 26, 2022 at 11:01 PM Amit Langote <amitlangote09@gmail.com> wrote:
Needed to be rebased again, over 2d04277121f this time.
Thanks for looking.
0001 adds es_part_prune_result but does not use it, so maybe the
introduction of that field should be deferred until it's needed for
something.
Oops, looks like a mistake when breaking the patch. Will move that bit to 0002.
I wonder whether it's really necessary to added the PartitionPruneInfo
objects to a list in PlannerInfo first and then roll them up into
PlannerGlobal later. I know we do that for range table entries, but
I've never quite understood why we do it that way instead of creating
a flat range table in PlannerGlobal from the start. And so by
extension I wonder whether this table couldn't be flat from the start
also.
Tom may want to correct me but my understanding of why the planner
waits till the end of planning to start populating the PlannerGlobal
range table is that it is not until then that we know which subqueries
will be scanned by the final plan tree, so also whose range table
entries will be included in the range table passed to the executor. I
can see that subquery pull-up causes a pulled-up subquery's range
table entries to be added into the parent's query's and all its nodes
changed using OffsetVarNodes() to refer to the new RT indexes. But
for subqueries that are not pulled up, their subplans' nodes (present
in PlannerGlboal.subplans) would still refer to the original RT
indexes (per range table in the corresponding PlannerGlobal.subroot),
which must be fixed and the end of planning is the time to do so. Or
maybe that could be done when build_subplan() creates a subplan and
adds it to PlannerGlobal.subplans, but for some reason it's not?
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Amit Langote <amitlangote09@gmail.com> writes:
On Thu, Jul 28, 2022 at 1:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
I wonder whether it's really necessary to added the PartitionPruneInfo
objects to a list in PlannerInfo first and then roll them up into
PlannerGlobal later. I know we do that for range table entries, but
I've never quite understood why we do it that way instead of creating
a flat range table in PlannerGlobal from the start. And so by
extension I wonder whether this table couldn't be flat from the start
also.
Tom may want to correct me but my understanding of why the planner
waits till the end of planning to start populating the PlannerGlobal
range table is that it is not until then that we know which subqueries
will be scanned by the final plan tree, so also whose range table
entries will be included in the range table passed to the executor.
It would not be profitable to flatten the range table before we've
done remove_useless_joins. We'd end up with useless entries from
subqueries that ultimately aren't there. We could perhaps do it
after we finish that phase, but I don't really see the point: it
wouldn't be better than what we do now, just the same work at a
different time.
regards, tom lane
On Fri, Jul 29, 2022 at 12:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
It would not be profitable to flatten the range table before we've
done remove_useless_joins. We'd end up with useless entries from
subqueries that ultimately aren't there. We could perhaps do it
after we finish that phase, but I don't really see the point: it
wouldn't be better than what we do now, just the same work at a
different time.
That's not quite my question, though. Why do we ever build a non-flat
range table in the first place? Like, instead of assigning indexes
relative to the current subquery level, why not just assign them
relative to the whole query from the start? It can't really be that
we've done it this way because of remove_useless_joins(), because
we've been building separate range tables and later flattening them
for longer than join removal has existed as a feature.
What bugs me is that it's very much not free. By building a bunch of
separate range tables and combining them later, we generate extra
work: we have to go back and adjust RT indexes after-the-fact. We pay
that overhead for every query, not just the ones that end up with some
unused entries in the range table. And why would it matter if we did
end up with some useless entries in the range table, anyway? If
there's some semantic difference, we could add a flag to mark those
entries as needing to be ignored, which seems way better than crawling
all over the whole tree adjusting RTIs everywhere.
I don't really expect that we're ever going to change this -- and
certainly not on this thread. The idea of running around and replacing
RT indexes all over the tree is deeply embedded in the system. But are
we really sure we want to add a second kind of index that we have to
run around and adjust at the same time?
If we are, so be it, I guess. It just looks really ugly and unnecessary to me.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
That's not quite my question, though. Why do we ever build a non-flat
range table in the first place? Like, instead of assigning indexes
relative to the current subquery level, why not just assign them
relative to the whole query from the start?
We could probably make that work, but I'm skeptical that it would
really be an improvement overall, for a couple of reasons.
(1) The need for merge-rangetables-and-renumber-Vars logic doesn't
go away. It just moves from setrefs.c to the rewriter, which would
have to do it when expanding views. This would be a net loss
performance-wise, I think, because setrefs.c can do it as part of a
parsetree scan that it has to perform anyway for other housekeeping
reasons; but the rewriter would need a brand new pass over the tree.
Admittedly that pass would only happen for view replacement, but
it's still not open-and-shut that there'd be a performance win.
(2) The need for varlevelsup and similar fields doesn't go away,
I think, because we need those for semantic purposes such as
discovering the query level that aggregates are associated with.
That means that subquery flattening still has to make a pass over
the tree to touch every Var's varlevelsup; so not having to adjust
varno at the same time would save little.
I'm not sure whether I think it's a net plus or net minus that
varno would become effectively independent of varlevelsup.
It'd be different from the way we think of them now, for sure,
and I think it'd take awhile to flush out bugs arising from such
a redefinition.
I don't really expect that we're ever going to change this -- and
certainly not on this thread. The idea of running around and replacing
RT indexes all over the tree is deeply embedded in the system. But are
we really sure we want to add a second kind of index that we have to
run around and adjust at the same time?
You probably want to avert your eyes from [1]/messages/by-id/CA+HiwqGjJDmUhDSfv-U2qhKJjt9ST7Xh9JXC_irsAQ1TAUsJYg@mail.gmail.com, then ;-). Although
I'm far from convinced that the cross-list index fields currently
proposed there are actually necessary; the cost to adjust them
during rangetable merging could outweigh any benefit.
regards, tom lane
[1]: /messages/by-id/CA+HiwqGjJDmUhDSfv-U2qhKJjt9ST7Xh9JXC_irsAQ1TAUsJYg@mail.gmail.com
On Fri, Jul 29, 2022 at 11:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
We could probably make that work, but I'm skeptical that it would
really be an improvement overall, for a couple of reasons.(1) The need for merge-rangetables-and-renumber-Vars logic doesn't
go away. It just moves from setrefs.c to the rewriter, which would
have to do it when expanding views. This would be a net loss
performance-wise, I think, because setrefs.c can do it as part of a
parsetree scan that it has to perform anyway for other housekeeping
reasons; but the rewriter would need a brand new pass over the tree.
Admittedly that pass would only happen for view replacement, but
it's still not open-and-shut that there'd be a performance win.(2) The need for varlevelsup and similar fields doesn't go away,
I think, because we need those for semantic purposes such as
discovering the query level that aggregates are associated with.
That means that subquery flattening still has to make a pass over
the tree to touch every Var's varlevelsup; so not having to adjust
varno at the same time would save little.I'm not sure whether I think it's a net plus or net minus that
varno would become effectively independent of varlevelsup.
It'd be different from the way we think of them now, for sure,
and I think it'd take awhile to flush out bugs arising from such
a redefinition.
Interesting. Thanks for your thoughts. I guess it's not as clear-cut
as I thought, but I still can't help feeling like we're doing an awful
lot of expensive rearrangement at the end of query planning.
I kind of wonder whether varlevelsup is the wrong idea. Like, suppose
we instead handed out subquery identifiers serially, sort of like what
we do with SubTransactionId values. Then instead of testing whether
varlevelsup>0 you test whether varsubqueryid==mysubqueryid. If you
flatten a query into its parent, you still need to adjust every var
that refers to the dead subquery, but you don't need to adjust vars
that refer to subqueries underneath it. Their level changes, but their
identity doesn't. Maybe that doesn't really help that much, but it's
always struck me as a little unfortunate that we basically test
whether a var is equal by testing whether the varno and varlevelsup
are equal. That only works if you assume that you can never end up
comparing two vars from thoroughly unrelated parts of the tree, such
that the subquery one level up from one might be different from the
subquery one level up from the other.
I don't really expect that we're ever going to change this -- and
certainly not on this thread. The idea of running around and replacing
RT indexes all over the tree is deeply embedded in the system. But are
we really sure we want to add a second kind of index that we have to
run around and adjust at the same time?You probably want to avert your eyes from [1], then ;-). Although
I'm far from convinced that the cross-list index fields currently
proposed there are actually necessary; the cost to adjust them
during rangetable merging could outweigh any benefit.
I really like the idea of that patch overall, actually; I think
permissions checking is a good example of something that shouldn't
require walking the whole query tree but currently does. And actually,
I think the same thing is true here: we shouldn't need to walk the
whole query tree to find the pruning information, but right now we do.
I'm just uncertain whether what Amit has implemented is the
least-annoying way to go about it... any thoughts on that,
specifically as it pertains to this patch?
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
... it's
always struck me as a little unfortunate that we basically test
whether a var is equal by testing whether the varno and varlevelsup
are equal. That only works if you assume that you can never end up
comparing two vars from thoroughly unrelated parts of the tree, such
that the subquery one level up from one might be different from the
subquery one level up from the other.
Yeah, that's always bothered me a little as well. I've yet to see a
case where it causes a problem in practice. But I think that if, say,
we were to try to do any sort of cross-query-level optimization, then
the ambiguity could rise up to bite us. That might be a situation
where a flat rangetable would be worth the trouble.
I'm just uncertain whether what Amit has implemented is the
least-annoying way to go about it... any thoughts on that,
specifically as it pertains to this patch?
I haven't looked at this patch at all. I'll try to make some
time for it, but probably not today.
regards, tom lane
On Fri, Jul 29, 2022 at 12:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm just uncertain whether what Amit has implemented is the
least-annoying way to go about it... any thoughts on that,
specifically as it pertains to this patch?I haven't looked at this patch at all. I'll try to make some
time for it, but probably not today.
OK, thanks. The preliminary patch I'm talking about here is pretty
short, so you could probably look at that part of it, at least, in
some relatively small amount of time. And I think it's also in pretty
reasonable shape apart from this issue. But, as usual, there's the
question of how well one can evaluate a preliminary patch without
reviewing the full patch in detail.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Jul 29, 2022 at 1:20 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Jul 28, 2022 at 1:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
0001 adds es_part_prune_result but does not use it, so maybe the
introduction of that field should be deferred until it's needed for
something.Oops, looks like a mistake when breaking the patch. Will move that bit to 0002.
Fixed that and also noticed that I had defined PartitionPruneResult in
the wrong header (execnodes.h). That led to PartitionPruneResult
nodes not being able to be written and read, because
src/backend/nodes/gen_node_support.pl doesn't create _out* and _read*
routines for the nodes defined in execnodes.h. I moved its definition
to plannodes.h, even though it is not actually the planner that
instantiates those; no other include/nodes header sounds better.
One more thing I realized is that Bitmapsets added to the List
PartitionPruneResult.valid_subplan_offs_list are not actually
read/write-able. That's a problem that I also faced in [1]/messages/by-id/CA+HiwqH80qX1ZLx3HyHmBrOzLQeuKuGx6FzGep0F_9zw9L4PAA@mail.gmail.com, so I
proposed a patch there to make Bitmapset a read/write-able Node and
mark (only) the Bitmapsets that are added into read/write-able node
trees with the corresponding NodeTag. I'm including that patch here
as well (0002) for the main patch to work (pass
-DWRITE_READ_PARSE_PLAN_TREES build tests), though it might make sense
to discuss it in its own thread?
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
[1]: /messages/by-id/CA+HiwqH80qX1ZLx3HyHmBrOzLQeuKuGx6FzGep0F_9zw9L4PAA@mail.gmail.com
Attachments:
v21-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patchapplication/octet-stream; name=v21-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patchDownload
From 06cda14113c3572440a716a4aacb250b2ed52f52 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 27 May 2022 16:00:28 +0900
Subject: [PATCH v21 1/3] Move PartitioPruneInfo out of plan nodes into
PlannedStmt
The planner will now add a given PartitioPruneInfo to
PlannedStmt.partPruneInfos instead of directly to the
Append/MergeAppend plan node. What gets set instead in the
latter is an index field which points to the list element
of PlannedStmt.partPruneInfos containing the PartitioPruneInfo
belonging to the plan node.
A later commit will make AcquireExecutorLocks() do the initial
partition pruning to determine a minimal set of partitions to be
locked when validating a plan tree and it will need to consult the
PartitioPruneInfos referenced therein to do so. It would be better
for the PartitioPruneInfos to be accessible directly than requiring
a walk of the plan tree to find them, which is easier when it can be
done by simply iterating over PlannedStmt.partPruneInfos.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 4 +-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/nodeAppend.c | 4 +-
src/backend/executor/nodeMergeAppend.c | 4 +-
src/backend/optimizer/plan/createplan.c | 24 ++++-----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 65 +++++++++++++------------
src/backend/partitioning/partprune.c | 18 ++++---
src/include/executor/execPartition.h | 3 +-
src/include/nodes/execnodes.h | 1 +
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 11 +++--
src/include/partitioning/partprune.h | 8 +--
15 files changed, 90 insertions(+), 62 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index d78862e660..32475e33ff 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -825,6 +825,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 99512826c5..aca0c6f323 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 40e3c07693..80197d5141 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1791,11 +1791,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..21f4c10937 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,7 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..c6f86a6510 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index c5c62fa5c7..8d35860c30 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ab4d8e201d..2bfb817d75 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1203,7 +1203,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1354,6 +1353,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1377,16 +1379,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1425,7 +1425,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1518,6 +1517,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1541,13 +1543,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5d0fd6e072..31fff597a7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -519,6 +519,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1cb0abdbc1..720f20f563 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -348,6 +348,29 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ /* RT index of the table to which the pinfo belongs. */
+ pinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1658,21 +1681,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1734,21 +1748,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6188bf69cb..6565b6ed01 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -209,16 +209,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -332,7 +336,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
@@ -358,7 +362,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bf962af7af 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,8 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 01b1727fc0..4a741b053f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -611,6 +611,7 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6bda383bea..e392fb6fc0 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -122,6 +122,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* List of PartitionPruneInfo contained in the plan */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -503,6 +506,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 21e642a64c..3eb3e6e527 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -70,6 +70,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
@@ -270,8 +273,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -305,8 +308,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v21-0003-Optimize-AcquireExecutorLocks-by-locking-only-un.patchapplication/octet-stream; name=v21-0003-Optimize-AcquireExecutorLocks-by-locking-only-un.patchDownload
From ce28c4cfe8bc69e313ba7f59b048fe96f73139a6 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v21 3/3] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 32 ++++
src/backend/executor/execMain.c | 55 ++++++
src/backend/executor/execParallel.c | 27 ++-
src/backend/executor/execPartition.c | 238 +++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/readfuncs.c | 8 +-
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 187 ++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 2 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 12 ++
src/include/nodes/plannodes.h | 47 +++++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
33 files changed, 763 insertions(+), 100 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 2527e66059..df4b0dcf0e 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 152c29b551..462651910a 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c660..219c63fa81 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_result,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_result, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 6b6720c690..374c0ff807 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..b0ed96e56c 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index c4b54d0547..69e02e0346 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_result_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_result_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_result_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_result, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..953a476ea5 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,34 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed at this
+point to figure out the minimal set of child subplans that satisfy those
+pruning steps. AcquireExecutorLocks() looking at a given plan tree will then
+lock only the relations scanned by the child subplans that survived such
+pruning, along with those present in PlannedStmt.minLockRelids. Note that the
+subplans are only notionally pruned in that they are not removed from the plan
+tree as such.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of pruning is passed to the executor as a
+PartitionPruneResult node via the QueryDesc. It consists of the set of
+indexes of surviving subplans in their respective parent plan node's list of
+child subplans, saved as a list of bitmapsets, with one element for every
+parent plan node whose PartitionPruneInfo is present in
+PlannedStmt.partPruneInfos. In other words, the executor should not
+re-evaluate the set of initially valid subplans by redoing the initial pruning
+if it was already done by AcquireExecutorLocks(), because the re-evaluation may
+very well end up resulting in a different set of subplans, containing some
+whose relations were not locked by AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +314,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 32475e33ff..6e2cd1596f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,58 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+PartitionPruneResult *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ PartitionPruneResult *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ result = makeNode(PartitionPruneResult);
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *valid_subplan_offs;
+
+ valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ &result->scan_leafpart_rtis);
+ if (valid_subplan_offs)
+ valid_subplan_offs->type = T_Bitmapset;
+ result->valid_subplan_offs_list =
+ lappend(result->valid_subplan_offs_list,
+ valid_subplan_offs);
+ }
+
+ return result;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +859,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ PartitionPruneResult *part_prune_result = queryDesc->part_prune_result;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +880,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_result = part_prune_result;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aca0c6f323..abae5b8623 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITIONPRUNERESULT UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_result_data;
+ char *part_prune_result_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_result_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_result_data = nodeToString(estate->es_part_prune_result);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized PartitionPruneResult. */
+ part_prune_result_len = strlen(part_prune_result_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_result_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized PartitionPruneResult */
+ part_prune_result_space = shm_toc_allocate(pcxt->toc, part_prune_result_len);
+ memcpy(part_prune_result_space, part_prune_result_data, part_prune_result_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITIONPRUNERESULT,
+ part_prune_result_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_result_space;
char *paramspace;
PlannedStmt *pstmt;
+ PartitionPruneResult *part_prune_result;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,18 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_result_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITIONPRUNERESULT, false);
+ part_prune_result = (PartitionPruneResult *)
+ stringToNode(part_prune_result_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_result,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 80197d5141..b612c24d62 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1746,8 +1752,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1764,6 +1772,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1781,8 +1796,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1794,28 +1810,62 @@ ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
part_prune_index);
+ PartitionPruneResult *pruneresult = estate->es_part_prune_result;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_result
+ * has been set.
+ */
+ if (pruneresult)
+ do_pruning = pruneinfo->needs_exec_pruning;
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans =
+ list_nth(pruneresult->valid_subplan_offs_list, part_prune_index);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1823,7 +1873,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1839,11 +1890,74 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors.
+ * Note that we don't omit detached partitions, just like during
+ * execution proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started and thus need the information contained in a PlanState.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1857,19 +1971,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1924,15 +2040,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1946,6 +2089,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1956,6 +2100,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2006,6 +2152,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2013,6 +2161,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -2034,7 +2183,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2044,7 +2193,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2272,10 +2421,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2310,7 +2463,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2324,6 +2477,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2334,13 +2489,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2367,8 +2524,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2376,7 +2539,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 21f4c10937..bb7d028463 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -134,6 +134,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_result = NULL;
estate->es_query_cxt = qcontext;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index e134a82ff7..901768cc34 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c6f86a6510..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -155,7 +155,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +578,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +643,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +718,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +869,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8d35860c30..2312e5a633 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..b3faeae2af 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_result_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_result_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_result_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_result,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 4d6902d3ac..c34226a83b 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -799,7 +804,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 31fff597a7..4097cf7164 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -520,7 +520,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 720f20f563..61d6934978 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
foreach(l, pruneinfo->prune_infos)
@@ -362,15 +373,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6565b6ed01..37f3e6af61 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -341,6 +352,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -441,13 +454,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -458,6 +476,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -545,6 +567,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -619,6 +644,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -646,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -658,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -672,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -696,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 27dee29f42..5a37c4160b 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1598,6 +1598,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_result_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1972,7 +1973,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
/*
* Now we can define the portal.
@@ -1987,6 +1990,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..8cc2e2162d 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_result = part_prune_result; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_result: ExecutorDoInitialPruning() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_result, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results == NIL ? NULL :
+ linitial(portal->part_prune_results),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ PartitionPruneResult *part_prune_result = NULL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding PartitionPruneResult for
+ * this PlannedStmt.
+ */
+ if (portal->part_prune_results != NIL)
+ part_prune_result = list_nth(portal->part_prune_results,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 0d6a295674..c8281e7201 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,15 +795,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_result_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +830,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_result_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +866,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /*
+ * The output list and any objects therein have been allocated in the
+ * caller's hopefully short-lived context, so will not remain leaked
+ * for long, though reset to avoid its accidentally being looked at.
+ */
+ *part_prune_result_list = NIL;
}
/*
@@ -874,10 +899,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NULLs is returned in *part_prune_result_list, meaning that no
+ * PartitionPruneResult nodes have yet been created for the plans in
+ * stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1037,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_result_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
+ }
+
return plan;
}
@@ -1126,6 +1167,17 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a PartitionPruneResult or a NULL is added to
+ * *part_prune_result_list if needed. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and contains at least one
+ * PartitionPruneInfo that has "initial" pruning steps. Those steps are
+ * performed by calling ExecutorDoInitialPruning() to determine only those
+ * leaf partitions that need to be locked by AcquireExecutorLocks() by pruning
+ * away subplans that don't match the pruning conditions. The
+ * PartitionPruneResult contains a list of bitmapsets of the indexes of
+ * matching subplans, one for each PartitionPruneInfo.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1191,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_result_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1214,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_result_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1224,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_result_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1270,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_result_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1303,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_result_list)
+ *part_prune_result_list = my_part_prune_result_list;
+
return plan;
}
@@ -1737,17 +1797,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_result_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ PartitionPruneResult *part_prune_result = NULL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1833,37 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ part_prune_result = ExecutorDoInitialPruning(plannedstmt,
+ boundParams);
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ part_prune_result->scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1874,59 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_result_list = lappend(*part_prune_result_list,
+ part_prune_result);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ Assert(lockedRelids == NULL);
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 3a161bdb88..27407a7f0f 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given list of PartitionPruneResults into the portal's
+ * context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results)
+{
+ MemoryContext oldcxt;
+
+ AssertArg(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results = copyObject(part_prune_results);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..e57e133f0e 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index bf962af7af..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -126,5 +128,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..60d5644908 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ PartitionPruneResult *part_prune_result; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ed95ed1176..6ae897d5d1 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern PartitionPruneResult *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 4a741b053f..63a89474db 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -612,6 +612,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ struct PartitionPruneResult *es_part_prune_result; /* QueryDesc.part_prune_result */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index a80f43e540..937cc4629d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -212,6 +212,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index e392fb6fc0..494ae461be 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,18 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 3eb3e6e527..a1e06719e6 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,8 +73,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1410,6 +1419,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1420,6 +1436,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1464,6 +1482,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
@@ -1548,6 +1569,32 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecutorDoInitialPruning() invocation on a given
+ * PlannedStmt.
+ *
+ * Contains a list of Bitmapset of the indexes of the subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() for every
+ * PartitionPruneInfo found in PlannedStmt.partPruneInfos. RT indexes of the
+ * leaf partitions scanned by those subplans across all PartitionPruneInfos
+ * are added into scan_leafpart_rtis.
+ *
+ * This is used by GetCachedPlan() to inform its callers of the pruning
+ * decisions made when performing AcquireExecutorLocks() on a given cached
+ * PlannedStmt, which the callers then pass on to the executor. The executor
+ * refers to this node when initializing the plan nodes which contain subplans
+ * that may have been pruned by ExecutorDoInitialPruning(), rather than
+ * redoing initial pruning.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ List *valid_subplan_offs_list;
+ Bitmapset *scan_leafpart_rtis;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..1c5bb5ece1 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9f7727a837 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results; /* list of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_result_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
v21-0002-Allow-adding-Bitmapsets-as-Nodes-into-plan-trees.patchapplication/octet-stream; name=v21-0002-Allow-adding-Bitmapsets-as-Nodes-into-plan-trees.patchDownload
From 41465f94e426a0b22b070ab8034de19cfdb6daa4 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 6 Oct 2022 17:31:37 +0900
Subject: [PATCH v21 2/3] Allow adding Bitmapsets as Nodes into plan trees
Note that this only adds some infrastructure bits and none of the
existing bitmapsets that are added to plan trees have been changed
to instead add the Node version. So, the plan trees, or really the
bitmapsets contained in them, look the same as before as far as
Node write/read functionality is concerned.
This is needed, because it is not currently possible to write and
then read back Bitmapsets that are not direct members of write/read
capable Nodes; for example, if one needs to add a List of Bitmapsets
to a plan tree. The most straightforward way to do that is to make
Bitmapsets be written with outNode() and read with nodeRead().
---
src/backend/nodes/Makefile | 3 ++-
src/backend/nodes/copyfuncs.c | 11 +++++++++++
src/backend/nodes/equalfuncs.c | 6 ++++++
src/backend/nodes/gen_node_support.pl | 1 +
src/backend/nodes/outfuncs.c | 11 +++++++++++
src/backend/nodes/readfuncs.c | 4 ++++
src/backend/optimizer/prep/preptlist.c | 1 -
src/include/nodes/bitmapset.h | 5 +++++
src/include/nodes/meson.build | 1 +
9 files changed, 41 insertions(+), 2 deletions(-)
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 7450e191ee..da5307771b 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -57,7 +57,8 @@ node_headers = \
nodes/replnodes.h \
nodes/supportnodes.h \
nodes/value.h \
- utils/rel.h
+ utils/rel.h \
+ nodes/bitmapset.h
# see also catalog/Makefile for an explanation of these make rules
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e76fda8eba..1482019327 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -160,6 +160,17 @@ _copyExtensibleNode(const ExtensibleNode *from)
return newnode;
}
+/* Custom copy routine for Node bitmapsets */
+static Bitmapset *
+_copyBitmapset(const Bitmapset *from)
+{
+ Bitmapset *newnode = bms_copy(from);
+
+ newnode->type = T_Bitmapset;
+
+ return newnode;
+}
+
/*
* copyObjectImpl -- implementation of copyObject(); see nodes/nodes.h
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 0373aa30fe..e8706c461a 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -210,6 +210,12 @@ _equalList(const List *a, const List *b)
return true;
}
+/* Custom equal routine for Node bitmapsets */
+static bool
+_equalBitmapset(const Bitmapset *a, const Bitmapset *b)
+{
+ return bms_equal(a, b);
+}
/*
* equal
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 81b8c184a9..ccb5aff874 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -71,6 +71,7 @@ my @all_input_files = qw(
nodes/supportnodes.h
nodes/value.h
utils/rel.h
+ nodes/bitmapset.h
);
# Nodes from these input files are automatically treated as nodetag_only.
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 64c65f060b..b3ffd8cec2 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -328,6 +328,17 @@ outBitmapset(StringInfo str, const Bitmapset *bms)
appendStringInfoChar(str, ')');
}
+/* Custom write routine for Node bitmapsets */
+static void
+_outBitmapset(StringInfo str, const Bitmapset *bms)
+{
+ Assert(IsA(bms, Bitmapset));
+ WRITE_NODE_TYPE("BITMAPSET");
+
+ outBitmapset(str, bms);
+}
+
+
/*
* Print the value of a Datum given its type.
*/
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index b4ff855f7c..4d6902d3ac 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -230,6 +230,10 @@ _readBitmapset(void)
result = bms_add_member(result, val);
}
+ /* XXX maybe do `result = makeNode(Bitmapset);` at the top? */
+ if (result)
+ result->type = T_Bitmapset;
+
return result;
}
diff --git a/src/backend/optimizer/prep/preptlist.c b/src/backend/optimizer/prep/preptlist.c
index 137b28323d..e5c1103316 100644
--- a/src/backend/optimizer/prep/preptlist.c
+++ b/src/backend/optimizer/prep/preptlist.c
@@ -337,7 +337,6 @@ extract_update_targetlist_colnos(List *tlist)
return update_colnos;
}
-
/*****************************************************************************
*
* TARGETLIST EXPANSION
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 75b5ce1a8e..9046ca177f 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -20,6 +20,8 @@
#ifndef BITMAPSET_H
#define BITMAPSET_H
+#include "nodes/nodes.h"
+
/*
* Forward decl to save including pg_list.h
*/
@@ -48,6 +50,9 @@ typedef int32 signedbitmapword; /* must be the matching signed type */
typedef struct Bitmapset
{
+ pg_node_attr(custom_copy_equal, custom_read_write)
+
+ NodeTag type;
int nwords; /* number of words in array */
bitmapword words[FLEXIBLE_ARRAY_MEMBER]; /* really [nwords] */
} Bitmapset;
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index b7df232081..94701af8e1 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -19,6 +19,7 @@ node_support_input_i = [
'nodes/supportnodes.h',
'nodes/value.h',
'utils/rel.h',
+ 'nodes/bitmapset.h',
]
node_support_input = []
--
2.35.3
On Wed, Oct 12, 2022 at 4:36 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Fri, Jul 29, 2022 at 1:20 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Jul 28, 2022 at 1:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
0001 adds es_part_prune_result but does not use it, so maybe the
introduction of that field should be deferred until it's needed for
something.Oops, looks like a mistake when breaking the patch. Will move that bit to 0002.
Fixed that and also noticed that I had defined PartitionPruneResult in
the wrong header (execnodes.h). That led to PartitionPruneResult
nodes not being able to be written and read, because
src/backend/nodes/gen_node_support.pl doesn't create _out* and _read*
routines for the nodes defined in execnodes.h. I moved its definition
to plannodes.h, even though it is not actually the planner that
instantiates those; no other include/nodes header sounds better.One more thing I realized is that Bitmapsets added to the List
PartitionPruneResult.valid_subplan_offs_list are not actually
read/write-able. That's a problem that I also faced in [1], so I
proposed a patch there to make Bitmapset a read/write-able Node and
mark (only) the Bitmapsets that are added into read/write-able node
trees with the corresponding NodeTag. I'm including that patch here
as well (0002) for the main patch to work (pass
-DWRITE_READ_PARSE_PLAN_TREES build tests), though it might make sense
to discuss it in its own thread?
Had second thoughts on the use of List of Bitmapsets for this, such
that the make-Bitmapset-Nodes patch is no longer needed.
I had defined PartitionPruneResult such that it stood for the results
of pruning for all PartitionPruneInfos contained in
PlannedStmt.partPruneInfos (covering all Append/MergeAppend nodes that
can use partition pruning in a given plan). So, it had a List of
Bitmapset. I think it's perhaps better for PartitionPruneResult to
cover only one PartitionPruneInfo and thus need only a Bitmapset and
not a List thereof, which I have implemented in the attached updated
patch 0002. So, instead of needing to pass around a
PartitionPruneResult with each PlannedStmt, this now passes a List of
PartitionPruneResult with an entry for each in
PlannedStmt.partPruneInfos.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v22-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patchapplication/octet-stream; name=v22-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patchDownload
From 27db8ab066dace77953d71a6446788190b66ce60 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 27 May 2022 16:00:28 +0900
Subject: [PATCH v22 1/2] Move PartitioPruneInfo out of plan nodes into
PlannedStmt
The planner will now add a given PartitioPruneInfo to
PlannedStmt.partPruneInfos instead of directly to the
Append/MergeAppend plan node. What gets set instead in the
latter is an index field which points to the list element
of PlannedStmt.partPruneInfos containing the PartitioPruneInfo
belonging to the plan node.
A later commit will make AcquireExecutorLocks() do the initial
partition pruning to determine a minimal set of partitions to be
locked when validating a plan tree and it will need to consult the
PartitioPruneInfos referenced therein to do so. It would be better
for the PartitioPruneInfos to be accessible directly than requiring
a walk of the plan tree to find them, which is easier when it can be
done by simply iterating over PlannedStmt.partPruneInfos.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 4 +-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/nodeAppend.c | 4 +-
src/backend/executor/nodeMergeAppend.c | 4 +-
src/backend/optimizer/plan/createplan.c | 24 ++++-----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 65 +++++++++++++------------
src/backend/partitioning/partprune.c | 18 ++++---
src/include/executor/execPartition.h | 3 +-
src/include/nodes/execnodes.h | 1 +
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 11 +++--
src/include/partitioning/partprune.h | 8 +--
15 files changed, 90 insertions(+), 62 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index d78862e660..32475e33ff 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -825,6 +825,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 99512826c5..aca0c6f323 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 40e3c07693..80197d5141 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1791,11 +1791,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..21f4c10937 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,7 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..c6f86a6510 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index c5c62fa5c7..8d35860c30 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ac86ce9003..50a5719ac6 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1203,7 +1203,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1354,6 +1353,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1377,16 +1379,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1425,7 +1425,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1518,6 +1517,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1541,13 +1543,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5d0fd6e072..31fff597a7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -519,6 +519,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1cb0abdbc1..720f20f563 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -348,6 +348,29 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ /* RT index of the table to which the pinfo belongs. */
+ pinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1658,21 +1681,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1734,21 +1748,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6188bf69cb..6565b6ed01 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -209,16 +209,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -332,7 +336,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
@@ -358,7 +362,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bf962af7af 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,8 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 01b1727fc0..4a741b053f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -611,6 +611,7 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6bda383bea..e392fb6fc0 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -122,6 +122,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* List of PartitionPruneInfo contained in the plan */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -503,6 +506,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 21e642a64c..3eb3e6e527 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -70,6 +70,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
@@ -270,8 +273,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -305,8 +308,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v22-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patchapplication/octet-stream; name=v22-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patchDownload
From 5f2d5ca36111f8007a7850fd985c7e965d621149 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v22 2/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 32 ++++
src/backend/executor/execMain.c | 51 ++++++
src/backend/executor/execParallel.c | 26 ++-
src/backend/executor/execPartition.c | 241 +++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/readfuncs.c | 8 +-
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 208 ++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 3 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 12 ++
src/include/nodes/plannodes.h | 46 +++++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
33 files changed, 782 insertions(+), 100 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 2527e66059..fb8779fec0 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 152c29b551..942449544c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c660..2f2b558608 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NIL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, List *part_prune_results,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_results, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 6b6720c690..06dfcd4d84 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NIL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..65c8d0aa59 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NIL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index c4b54d0547..b469e05672 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_results_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_results_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_results_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ List *part_prune_results = lfirst_node(List, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_results, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..f14f9197b5 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,34 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed to
+figure out the minimal set of child subplans that satisfy those pruning steps.
+AcquireExecutorLocks() looking at a given generic plan will then lock only the
+relations scanned by the child subplans that survived such pruning, along with
+those present in PlannedStmt.minLockRelids. Note that the subplans are only
+notionally pruned, that is, they are not removed from the plan tree as such.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of pruning is passed to the executor as a List
+of PartitionPruneResult nodes via the QueryDesc. Each PartitionPruneResult
+consists of the set of indexes of surviving subplans in the respective parent
+plan node's (the one to which the corresponding PartitionPruneInfo belongs)
+list of child subplans, saved as a bitmapset (valid_subplan_offs). In other
+words, the executor executing a generic plan should not re-evaluate the set of
+initially valid subplans for a given plan node by redoing the initial pruning
+if it was already done by AcquireExecutorLocks() when validating the plan.
+Such re-evaluation of the pruning steps may very well end up resulting in a
+different set of subplans, containing some whose relations were not locked by
+AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +314,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 32475e33ff..b59474841f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,54 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+List *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *part_prune_results = NIL;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+
+ pruneresult->valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ scan_leafpart_rtis);
+ part_prune_results = lappend(part_prune_results, pruneresult);
+ }
+
+ return part_prune_results;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +855,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *part_prune_results = queryDesc->part_prune_results;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +876,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_results = part_prune_results;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aca0c6f323..917079a034 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized List of PartitionPruneResult. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized List of PartitionPruneResult */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS,
+ part_prune_results_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_results,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 80197d5141..8728745c44 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1746,8 +1752,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1764,6 +1772,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1781,8 +1796,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1794,28 +1810,65 @@ ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
part_prune_index);
+ PartitionPruneResult *pruneresult = NULL;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
+
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_results
+ * is set.
+ */
+ if (estate->es_part_prune_results)
+ {
+ pruneresult = list_nth(estate->es_part_prune_results, part_prune_index);
+ Assert(IsA(pruneresult, PartitionPruneResult));
+ do_pruning = pruneinfo->needs_exec_pruning;
+ }
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans = bms_copy(pruneresult->valid_subplan_offs);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1823,7 +1876,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1839,11 +1893,74 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors.
+ * Note that we don't omit detached partitions, just like during
+ * execution proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started and thus need the information contained in a PlanState.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1857,19 +1974,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1924,15 +2043,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1946,6 +2092,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1956,6 +2103,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2006,6 +2155,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2013,6 +2164,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -2034,7 +2186,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2044,7 +2196,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2272,10 +2424,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2310,7 +2466,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2324,6 +2480,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2334,13 +2492,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2367,8 +2527,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2376,7 +2542,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 21f4c10937..67a58c7163 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -134,6 +134,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_results = NIL;
estate->es_query_cxt = qcontext;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index e134a82ff7..18d3b98cdc 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NIL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c6f86a6510..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -155,7 +155,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +578,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +643,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +718,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +869,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8d35860c30..2312e5a633 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..93012a5b3b 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_results_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_results_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_results_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ List *part_prune_results = lfirst_node(List, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_results,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index b4ff855f7c..77990a2732 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -795,7 +800,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 31fff597a7..4097cf7164 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -520,7 +520,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 720f20f563..61d6934978 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
foreach(l, pruneinfo->prune_infos)
@@ -362,15 +373,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6565b6ed01..37f3e6af61 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -341,6 +352,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -441,13 +454,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -458,6 +476,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -545,6 +567,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -619,6 +644,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -646,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -658,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -672,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -696,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a9a1851c94..a1be8179e8 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1598,6 +1598,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_results_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1972,7 +1973,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
/*
* Now we can define the portal.
@@ -1987,6 +1990,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..226ee81b63 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_results = part_prune_results; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_results: ExecutorDoInitialPruning() output for the PlannedStmt
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_results, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results_list == NIL ? NIL :
+ linitial(portal->part_prune_results_list),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ List *part_prune_results = NIL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding List of PartitionPruneResult
+ * for this PlannedStmt.
+ */
+ if (portal->part_prune_results_list != NIL)
+ part_prune_results = list_nth(portal->part_prune_results_list,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 0d6a295674..957221c47e 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -782,6 +787,26 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
return tlist;
}
+/*
+ * FreePartitionPruneResults
+ * Frees the List of Lists of PartitionPruneResults for CheckCachedPlan()
+ */
+static void
+FreePartitionPruneResults(List *part_prune_results_list)
+{
+ ListCell *lc;
+
+ foreach(lc, part_prune_results_list)
+ {
+ List *part_prune_results = lfirst(lc);
+
+ /* Free both the PartitionPruneResults and the containing List. */
+ list_free_deep(part_prune_results);
+ }
+
+ list_free(part_prune_results_list);
+}
+
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
@@ -790,15 +815,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_results_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +850,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_results_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +886,11 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /* Release any PartitionPruneResults that may been created. */
+ FreePartitionPruneResults(*part_prune_results_list);
+ *part_prune_results_list = NIL;
}
/*
@@ -874,10 +916,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NILs is returned in *part_prune_results_list, meaning that no
+ * no partition pruning has been done yet for the plans in stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1053,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_results_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
+ }
+
return plan;
}
@@ -1126,6 +1183,19 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a List of PartitionPruneResult or a NIL is added to
+ * *part_prune_results_list. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and has
+ * containsInitialPruning set to true. Before returning such a CachedPlan,
+ * those "initial" steps are performed by calling ExecutorDoInitialPruning()
+ * to determine only those leaf partitions that need to be locked by
+ * AcquireExecutorLocks() by pruning away subplans that don't match the
+ * "initial" pruning conditions. For each PartitionPruneInfo found in
+ * PlannedStmt.partPruneInfos, a PartitionPruneResult containing the bitmapset
+ * of the indexes of surviving subplans is added to the List for the
+ * PlannedStmt.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1209,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_results_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1232,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_results_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1242,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_results_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1288,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_results_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1321,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_results_list)
+ *part_prune_results_list = my_part_prune_results_list;
+
return plan;
}
@@ -1737,17 +1815,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_results_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ List *part_prune_results = NIL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1851,40 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ Bitmapset *scan_leafpart_rtis = NULL;
+
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ part_prune_results = ExecutorDoInitialPruning(plannedstmt,
+ boundParams,
+ &scan_leafpart_rtis);
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1895,59 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_results_list = lappend(*part_prune_results_list,
+ part_prune_results);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ Assert(lockedRelids == NULL);
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 3a161bdb88..4b156de524 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given List of Lists of PartitionPruneResults into the
+ * portal's context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results_list)
+{
+ MemoryContext oldcxt;
+
+ AssertArg(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results_list = copyObject(part_prune_results_list);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..269cc4d562 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ List *part_prune_results,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index bf962af7af..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -126,5 +128,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..7d4379da7b 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ List *part_prune_results; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ed95ed1176..c9a5e5fb68 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern List *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 4a741b053f..521a60b988 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -612,6 +612,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_results; /* QueryDesc.part_prune_results */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index a80f43e540..937cc4629d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -212,6 +212,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index e392fb6fc0..494ae461be 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,18 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 3eb3e6e527..0bc4c8130a 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,8 +73,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1410,6 +1419,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1420,6 +1436,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1464,6 +1482,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
@@ -1548,6 +1569,31 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecPartitionDoInitialPruning() on a given
+ * PartitionPruneInfo.
+ *
+ * valid_subplans_offs contains the indexes of subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() on the
+ * PartitionPruneInfo.
+ *
+ * This is used to store the result of initial partition pruning that is
+ * peformed before the execution has started. A module that needs to do so
+ * should call ExecutorDoInitialPruning() on a given PlannedStmt, which
+ * returns a List of PartitionPruneResult containing an entry for each
+ * PartitionPruneInfo present in PlannedStmt.part_prune_infos. The module
+ * should then pass that list, along with the PlannedStmt, to the executor,
+ * so that it can reuse the result of initial partition pruning when
+ * initializing the subplans for execution.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *valid_subplan_offs;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..32579d4788 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..1901fc5f28 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results_list; /* List of Lists of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_results_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
On Mon, Oct 17, 2022 at 6:29 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Wed, Oct 12, 2022 at 4:36 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Fri, Jul 29, 2022 at 1:20 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Jul 28, 2022 at 1:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
0001 adds es_part_prune_result but does not use it, so maybe the
introduction of that field should be deferred until it's needed for
something.Oops, looks like a mistake when breaking the patch. Will move that bit to 0002.
Fixed that and also noticed that I had defined PartitionPruneResult in
the wrong header (execnodes.h). That led to PartitionPruneResult
nodes not being able to be written and read, because
src/backend/nodes/gen_node_support.pl doesn't create _out* and _read*
routines for the nodes defined in execnodes.h. I moved its definition
to plannodes.h, even though it is not actually the planner that
instantiates those; no other include/nodes header sounds better.One more thing I realized is that Bitmapsets added to the List
PartitionPruneResult.valid_subplan_offs_list are not actually
read/write-able. That's a problem that I also faced in [1], so I
proposed a patch there to make Bitmapset a read/write-able Node and
mark (only) the Bitmapsets that are added into read/write-able node
trees with the corresponding NodeTag. I'm including that patch here
as well (0002) for the main patch to work (pass
-DWRITE_READ_PARSE_PLAN_TREES build tests), though it might make sense
to discuss it in its own thread?Had second thoughts on the use of List of Bitmapsets for this, such
that the make-Bitmapset-Nodes patch is no longer needed.I had defined PartitionPruneResult such that it stood for the results
of pruning for all PartitionPruneInfos contained in
PlannedStmt.partPruneInfos (covering all Append/MergeAppend nodes that
can use partition pruning in a given plan). So, it had a List of
Bitmapset. I think it's perhaps better for PartitionPruneResult to
cover only one PartitionPruneInfo and thus need only a Bitmapset and
not a List thereof, which I have implemented in the attached updated
patch 0002. So, instead of needing to pass around a
PartitionPruneResult with each PlannedStmt, this now passes a List of
PartitionPruneResult with an entry for each in
PlannedStmt.partPruneInfos.
Rebased over 3b2db22fe.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v23-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patchapplication/octet-stream; name=v23-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patchDownload
From c805965cadc12217406309221e2c89e3c17be433 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 27 May 2022 16:00:28 +0900
Subject: [PATCH v23 1/2] Move PartitioPruneInfo out of plan nodes into
PlannedStmt
The planner will now add a given PartitioPruneInfo to
PlannedStmt.partPruneInfos instead of directly to the
Append/MergeAppend plan node. What gets set instead in the
latter is an index field which points to the list element
of PlannedStmt.partPruneInfos containing the PartitioPruneInfo
belonging to the plan node.
A later commit will make AcquireExecutorLocks() do the initial
partition pruning to determine a minimal set of partitions to be
locked when validating a plan tree and it will need to consult the
PartitioPruneInfos referenced therein to do so. It would be better
for the PartitioPruneInfos to be accessible directly than requiring
a walk of the plan tree to find them, which is easier when it can be
done by simply iterating over PlannedStmt.partPruneInfos.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 4 +-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/nodeAppend.c | 4 +-
src/backend/executor/nodeMergeAppend.c | 4 +-
src/backend/optimizer/plan/createplan.c | 24 ++++-----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 65 +++++++++++++------------
src/backend/partitioning/partprune.c | 18 ++++---
src/include/executor/execPartition.h | 3 +-
src/include/nodes/execnodes.h | 1 +
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 11 +++--
src/include/partitioning/partprune.h | 8 +--
15 files changed, 90 insertions(+), 62 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index d78862e660..32475e33ff 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -825,6 +825,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 99512826c5..aca0c6f323 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 40e3c07693..80197d5141 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1791,11 +1791,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..21f4c10937 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,7 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..c6f86a6510 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index c5c62fa5c7..8d35860c30 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ac86ce9003..50a5719ac6 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1203,7 +1203,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1354,6 +1353,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1377,16 +1379,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1425,7 +1425,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1518,6 +1517,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1541,13 +1543,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 78a8174534..240d50f1c0 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -519,6 +519,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1cb0abdbc1..720f20f563 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -348,6 +348,29 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ /* RT index of the table to which the pinfo belongs. */
+ pinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1658,21 +1681,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1734,21 +1748,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6188bf69cb..6565b6ed01 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -209,16 +209,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -332,7 +336,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
@@ -358,7 +362,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bf962af7af 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,8 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 01b1727fc0..4a741b053f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -611,6 +611,7 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 09342d128d..fbe75dca0f 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -122,6 +122,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* List of PartitionPruneInfo contained in the plan */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -503,6 +506,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 5c2ab1b379..2e132afc5a 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -70,6 +70,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
@@ -270,8 +273,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -305,8 +308,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v23-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patchapplication/octet-stream; name=v23-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patchDownload
From ae9a6b7186c77888fd85dd7e4056dd3cd607617c Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v23 2/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 32 ++++
src/backend/executor/execMain.c | 51 ++++++
src/backend/executor/execParallel.c | 26 ++-
src/backend/executor/execPartition.c | 241 +++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/readfuncs.c | 8 +-
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 208 ++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 3 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 12 ++
src/include/nodes/plannodes.h | 46 +++++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
33 files changed, 782 insertions(+), 100 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 2527e66059..fb8779fec0 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 152c29b551..942449544c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c660..2f2b558608 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NIL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, List *part_prune_results,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_results, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 1a62e5dac5..cc36b6fd15 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NIL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..65c8d0aa59 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NIL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 9e29584d93..29b45539d3 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_results_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_results_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_results_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ List *part_prune_results = lfirst_node(List, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_results, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..f14f9197b5 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,34 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed to
+figure out the minimal set of child subplans that satisfy those pruning steps.
+AcquireExecutorLocks() looking at a given generic plan will then lock only the
+relations scanned by the child subplans that survived such pruning, along with
+those present in PlannedStmt.minLockRelids. Note that the subplans are only
+notionally pruned, that is, they are not removed from the plan tree as such.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of pruning is passed to the executor as a List
+of PartitionPruneResult nodes via the QueryDesc. Each PartitionPruneResult
+consists of the set of indexes of surviving subplans in the respective parent
+plan node's (the one to which the corresponding PartitionPruneInfo belongs)
+list of child subplans, saved as a bitmapset (valid_subplan_offs). In other
+words, the executor executing a generic plan should not re-evaluate the set of
+initially valid subplans for a given plan node by redoing the initial pruning
+if it was already done by AcquireExecutorLocks() when validating the plan.
+Such re-evaluation of the pruning steps may very well end up resulting in a
+different set of subplans, containing some whose relations were not locked by
+AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +314,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 32475e33ff..b59474841f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,54 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+List *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *part_prune_results = NIL;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+
+ pruneresult->valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ scan_leafpart_rtis);
+ part_prune_results = lappend(part_prune_results, pruneresult);
+ }
+
+ return part_prune_results;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +855,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *part_prune_results = queryDesc->part_prune_results;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +876,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_results = part_prune_results;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aca0c6f323..917079a034 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized List of PartitionPruneResult. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized List of PartitionPruneResult */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS,
+ part_prune_results_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_results,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 80197d5141..8728745c44 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1746,8 +1752,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1764,6 +1772,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1781,8 +1796,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1794,28 +1810,65 @@ ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
part_prune_index);
+ PartitionPruneResult *pruneresult = NULL;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
+
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_results
+ * is set.
+ */
+ if (estate->es_part_prune_results)
+ {
+ pruneresult = list_nth(estate->es_part_prune_results, part_prune_index);
+ Assert(IsA(pruneresult, PartitionPruneResult));
+ do_pruning = pruneinfo->needs_exec_pruning;
+ }
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans = bms_copy(pruneresult->valid_subplan_offs);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1823,7 +1876,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1839,11 +1893,74 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors.
+ * Note that we don't omit detached partitions, just like during
+ * execution proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started and thus need the information contained in a PlanState.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1857,19 +1974,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1924,15 +2043,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1946,6 +2092,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1956,6 +2103,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2006,6 +2155,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2013,6 +2164,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -2034,7 +2186,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2044,7 +2196,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2272,10 +2424,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2310,7 +2466,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2324,6 +2480,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2334,13 +2492,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2367,8 +2527,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2376,7 +2542,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 21f4c10937..67a58c7163 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -134,6 +134,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_results = NIL;
estate->es_query_cxt = qcontext;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index e134a82ff7..18d3b98cdc 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NIL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c6f86a6510..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -155,7 +155,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +578,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +643,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +718,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +869,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8d35860c30..2312e5a633 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..93012a5b3b 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_results_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_results_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_results_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ List *part_prune_results = lfirst_node(List, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_results,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index b4ff855f7c..77990a2732 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -795,7 +800,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 240d50f1c0..b7801ea04c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -520,7 +520,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 720f20f563..61d6934978 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
foreach(l, pruneinfo->prune_infos)
@@ -362,15 +373,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6565b6ed01..37f3e6af61 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -341,6 +352,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -441,13 +454,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -458,6 +476,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -545,6 +567,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -619,6 +644,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -646,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -658,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -672,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -696,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a9a1851c94..a1be8179e8 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1598,6 +1598,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_results_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1972,7 +1973,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
/*
* Now we can define the portal.
@@ -1987,6 +1990,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..226ee81b63 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_results = part_prune_results; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_results: ExecutorDoInitialPruning() output for the PlannedStmt
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_results, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results_list == NIL ? NIL :
+ linitial(portal->part_prune_results_list),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ List *part_prune_results = NIL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding List of PartitionPruneResult
+ * for this PlannedStmt.
+ */
+ if (portal->part_prune_results_list != NIL)
+ part_prune_results = list_nth(portal->part_prune_results_list,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 0d6a295674..957221c47e 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -782,6 +787,26 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
return tlist;
}
+/*
+ * FreePartitionPruneResults
+ * Frees the List of Lists of PartitionPruneResults for CheckCachedPlan()
+ */
+static void
+FreePartitionPruneResults(List *part_prune_results_list)
+{
+ ListCell *lc;
+
+ foreach(lc, part_prune_results_list)
+ {
+ List *part_prune_results = lfirst(lc);
+
+ /* Free both the PartitionPruneResults and the containing List. */
+ list_free_deep(part_prune_results);
+ }
+
+ list_free(part_prune_results_list);
+}
+
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
@@ -790,15 +815,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_results_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +850,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_results_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +886,11 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /* Release any PartitionPruneResults that may been created. */
+ FreePartitionPruneResults(*part_prune_results_list);
+ *part_prune_results_list = NIL;
}
/*
@@ -874,10 +916,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NILs is returned in *part_prune_results_list, meaning that no
+ * no partition pruning has been done yet for the plans in stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1053,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_results_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
+ }
+
return plan;
}
@@ -1126,6 +1183,19 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a List of PartitionPruneResult or a NIL is added to
+ * *part_prune_results_list. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and has
+ * containsInitialPruning set to true. Before returning such a CachedPlan,
+ * those "initial" steps are performed by calling ExecutorDoInitialPruning()
+ * to determine only those leaf partitions that need to be locked by
+ * AcquireExecutorLocks() by pruning away subplans that don't match the
+ * "initial" pruning conditions. For each PartitionPruneInfo found in
+ * PlannedStmt.partPruneInfos, a PartitionPruneResult containing the bitmapset
+ * of the indexes of surviving subplans is added to the List for the
+ * PlannedStmt.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1209,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_results_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1232,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_results_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1242,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_results_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1288,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_results_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1321,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_results_list)
+ *part_prune_results_list = my_part_prune_results_list;
+
return plan;
}
@@ -1737,17 +1815,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_results_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ List *part_prune_results = NIL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1851,40 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ Bitmapset *scan_leafpart_rtis = NULL;
+
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ part_prune_results = ExecutorDoInitialPruning(plannedstmt,
+ boundParams,
+ &scan_leafpart_rtis);
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1895,59 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_results_list = lappend(*part_prune_results_list,
+ part_prune_results);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ Assert(lockedRelids == NULL);
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index c3e95346b6..74950bd163 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given List of Lists of PartitionPruneResults into the
+ * portal's context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results_list)
+{
+ MemoryContext oldcxt;
+
+ AssertArg(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results_list = copyObject(part_prune_results_list);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..269cc4d562 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ List *part_prune_results,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index bf962af7af..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -126,5 +128,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..7d4379da7b 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ List *part_prune_results; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ed95ed1176..c9a5e5fb68 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern List *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 4a741b053f..521a60b988 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -612,6 +612,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_results; /* QueryDesc.part_prune_results */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index a80f43e540..937cc4629d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -212,6 +212,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index fbe75dca0f..354c2e96c3 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,18 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 2e132afc5a..c0717bf45e 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,8 +73,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1410,6 +1419,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1420,6 +1436,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1464,6 +1482,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
@@ -1548,6 +1569,31 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecPartitionDoInitialPruning() on a given
+ * PartitionPruneInfo.
+ *
+ * valid_subplans_offs contains the indexes of subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() on the
+ * PartitionPruneInfo.
+ *
+ * This is used to store the result of initial partition pruning that is
+ * peformed before the execution has started. A module that needs to do so
+ * should call ExecutorDoInitialPruning() on a given PlannedStmt, which
+ * returns a List of PartitionPruneResult containing an entry for each
+ * PartitionPruneInfo present in PlannedStmt.part_prune_infos. The module
+ * should then pass that list, along with the PlannedStmt, to the executor,
+ * so that it can reuse the result of initial partition pruning when
+ * initializing the subplans for execution.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *valid_subplan_offs;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..32579d4788 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..1901fc5f28 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results_list; /* List of Lists of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_results_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
On Thu, Oct 27, 2022 at 11:41 AM Amit Langote <amitlangote09@gmail.com> wrote:
On Mon, Oct 17, 2022 at 6:29 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Wed, Oct 12, 2022 at 4:36 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Fri, Jul 29, 2022 at 1:20 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Jul 28, 2022 at 1:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
0001 adds es_part_prune_result but does not use it, so maybe the
introduction of that field should be deferred until it's needed for
something.Oops, looks like a mistake when breaking the patch. Will move that bit to 0002.
Fixed that and also noticed that I had defined PartitionPruneResult in
the wrong header (execnodes.h). That led to PartitionPruneResult
nodes not being able to be written and read, because
src/backend/nodes/gen_node_support.pl doesn't create _out* and _read*
routines for the nodes defined in execnodes.h. I moved its definition
to plannodes.h, even though it is not actually the planner that
instantiates those; no other include/nodes header sounds better.One more thing I realized is that Bitmapsets added to the List
PartitionPruneResult.valid_subplan_offs_list are not actually
read/write-able. That's a problem that I also faced in [1], so I
proposed a patch there to make Bitmapset a read/write-able Node and
mark (only) the Bitmapsets that are added into read/write-able node
trees with the corresponding NodeTag. I'm including that patch here
as well (0002) for the main patch to work (pass
-DWRITE_READ_PARSE_PLAN_TREES build tests), though it might make sense
to discuss it in its own thread?Had second thoughts on the use of List of Bitmapsets for this, such
that the make-Bitmapset-Nodes patch is no longer needed.I had defined PartitionPruneResult such that it stood for the results
of pruning for all PartitionPruneInfos contained in
PlannedStmt.partPruneInfos (covering all Append/MergeAppend nodes that
can use partition pruning in a given plan). So, it had a List of
Bitmapset. I think it's perhaps better for PartitionPruneResult to
cover only one PartitionPruneInfo and thus need only a Bitmapset and
not a List thereof, which I have implemented in the attached updated
patch 0002. So, instead of needing to pass around a
PartitionPruneResult with each PlannedStmt, this now passes a List of
PartitionPruneResult with an entry for each in
PlannedStmt.partPruneInfos.Rebased over 3b2db22fe.
Updated 0002 to cope with AssertArg() being removed from the tree.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v24-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patchapplication/octet-stream; name=v24-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patchDownload
From 8f6456d27efb8719a7dd8a52bf0ad3c5033b31a3 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v24 2/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 32 ++++
src/backend/executor/execMain.c | 51 ++++++
src/backend/executor/execParallel.c | 26 ++-
src/backend/executor/execPartition.c | 241 +++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/readfuncs.c | 8 +-
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 208 ++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 3 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 12 ++
src/include/nodes/plannodes.h | 46 +++++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
33 files changed, 782 insertions(+), 100 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f26cc0d162..401a2280a3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 152c29b551..942449544c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c660..2f2b558608 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NIL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, List *part_prune_results,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_results, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 1a62e5dac5..cc36b6fd15 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NIL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..65c8d0aa59 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NIL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 9e29584d93..29b45539d3 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_results_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_results_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_results_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ List *part_prune_results = lfirst_node(List, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_results, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..f14f9197b5 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,34 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed to
+figure out the minimal set of child subplans that satisfy those pruning steps.
+AcquireExecutorLocks() looking at a given generic plan will then lock only the
+relations scanned by the child subplans that survived such pruning, along with
+those present in PlannedStmt.minLockRelids. Note that the subplans are only
+notionally pruned, that is, they are not removed from the plan tree as such.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of pruning is passed to the executor as a List
+of PartitionPruneResult nodes via the QueryDesc. Each PartitionPruneResult
+consists of the set of indexes of surviving subplans in the respective parent
+plan node's (the one to which the corresponding PartitionPruneInfo belongs)
+list of child subplans, saved as a bitmapset (valid_subplan_offs). In other
+words, the executor executing a generic plan should not re-evaluate the set of
+initially valid subplans for a given plan node by redoing the initial pruning
+if it was already done by AcquireExecutorLocks() when validating the plan.
+Such re-evaluation of the pruning steps may very well end up resulting in a
+different set of subplans, containing some whose relations were not locked by
+AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +314,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 32475e33ff..b59474841f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,54 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+List *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *part_prune_results = NIL;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+
+ pruneresult->valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ scan_leafpart_rtis);
+ part_prune_results = lappend(part_prune_results, pruneresult);
+ }
+
+ return part_prune_results;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +855,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *part_prune_results = queryDesc->part_prune_results;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +876,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_results = part_prune_results;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aca0c6f323..917079a034 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized List of PartitionPruneResult. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized List of PartitionPruneResult */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS,
+ part_prune_results_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_results,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 80197d5141..8728745c44 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1746,8 +1752,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1764,6 +1772,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1781,8 +1796,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1794,28 +1810,65 @@ ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
part_prune_index);
+ PartitionPruneResult *pruneresult = NULL;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
+
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_results
+ * is set.
+ */
+ if (estate->es_part_prune_results)
+ {
+ pruneresult = list_nth(estate->es_part_prune_results, part_prune_index);
+ Assert(IsA(pruneresult, PartitionPruneResult));
+ do_pruning = pruneinfo->needs_exec_pruning;
+ }
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans = bms_copy(pruneresult->valid_subplan_offs);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1823,7 +1876,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1839,11 +1893,74 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors.
+ * Note that we don't omit detached partitions, just like during
+ * execution proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started and thus need the information contained in a PlanState.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1857,19 +1974,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1924,15 +2043,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1946,6 +2092,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1956,6 +2103,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2006,6 +2155,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2013,6 +2164,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -2034,7 +2186,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2044,7 +2196,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2272,10 +2424,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2310,7 +2466,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2324,6 +2480,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2334,13 +2492,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2367,8 +2527,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2376,7 +2542,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 21f4c10937..67a58c7163 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -134,6 +134,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_results = NIL;
estate->es_query_cxt = qcontext;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index e134a82ff7..18d3b98cdc 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NIL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c6f86a6510..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -155,7 +155,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +578,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +643,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +718,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +869,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8d35860c30..2312e5a633 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..93012a5b3b 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_results_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_results_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_results_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ List *part_prune_results = lfirst_node(List, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_results,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index b4ff855f7c..77990a2732 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -795,7 +800,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 799602f5ea..a96d316dca 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -520,7 +520,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 720f20f563..61d6934978 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
foreach(l, pruneinfo->prune_infos)
@@ -362,15 +373,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6565b6ed01..37f3e6af61 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -341,6 +352,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -441,13 +454,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -458,6 +476,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -545,6 +567,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -619,6 +644,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -646,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -658,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -672,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -696,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3082093d1e..95ab1d0eef 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1598,6 +1598,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_results_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1972,7 +1973,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
/*
* Now we can define the portal.
@@ -1987,6 +1990,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 52e2db6452..280ed7d239 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_results = part_prune_results; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_results: ExecutorDoInitialPruning() output for the PlannedStmt
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_results, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results_list == NIL ? NIL :
+ linitial(portal->part_prune_results_list),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ List *part_prune_results = NIL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding List of PartitionPruneResult
+ * for this PlannedStmt.
+ */
+ if (portal->part_prune_results_list != NIL)
+ part_prune_results = list_nth(portal->part_prune_results_list,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index cc943205d3..af6fae6e3b 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -782,6 +787,26 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
return tlist;
}
+/*
+ * FreePartitionPruneResults
+ * Frees the List of Lists of PartitionPruneResults for CheckCachedPlan()
+ */
+static void
+FreePartitionPruneResults(List *part_prune_results_list)
+{
+ ListCell *lc;
+
+ foreach(lc, part_prune_results_list)
+ {
+ List *part_prune_results = lfirst(lc);
+
+ /* Free both the PartitionPruneResults and the containing List. */
+ list_free_deep(part_prune_results);
+ }
+
+ list_free(part_prune_results_list);
+}
+
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
@@ -790,15 +815,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_results_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +850,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_results_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +886,11 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /* Release any PartitionPruneResults that may been created. */
+ FreePartitionPruneResults(*part_prune_results_list);
+ *part_prune_results_list = NIL;
}
/*
@@ -874,10 +916,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NILs is returned in *part_prune_results_list, meaning that no
+ * no partition pruning has been done yet for the plans in stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1053,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_results_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
+ }
+
return plan;
}
@@ -1126,6 +1183,19 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a List of PartitionPruneResult or a NIL is added to
+ * *part_prune_results_list. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and has
+ * containsInitialPruning set to true. Before returning such a CachedPlan,
+ * those "initial" steps are performed by calling ExecutorDoInitialPruning()
+ * to determine only those leaf partitions that need to be locked by
+ * AcquireExecutorLocks() by pruning away subplans that don't match the
+ * "initial" pruning conditions. For each PartitionPruneInfo found in
+ * PlannedStmt.partPruneInfos, a PartitionPruneResult containing the bitmapset
+ * of the indexes of surviving subplans is added to the List for the
+ * PlannedStmt.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1209,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_results_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1232,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_results_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1242,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_results_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1288,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_results_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1321,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_results_list)
+ *part_prune_results_list = my_part_prune_results_list;
+
return plan;
}
@@ -1737,17 +1815,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_results_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ List *part_prune_results = NIL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1851,40 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ Bitmapset *scan_leafpart_rtis = NULL;
+
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ part_prune_results = ExecutorDoInitialPruning(plannedstmt,
+ boundParams,
+ &scan_leafpart_rtis);
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1895,59 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_results_list = lappend(*part_prune_results_list,
+ part_prune_results);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ Assert(lockedRelids == NULL);
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 7b1ae6fdcf..5b9098971b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given List of Lists of PartitionPruneResults into the
+ * portal's context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results_list)
+{
+ MemoryContext oldcxt;
+
+ Assert(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results_list = copyObject(part_prune_results_list);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..269cc4d562 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ List *part_prune_results,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index bf962af7af..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -126,5 +128,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..7d4379da7b 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ List *part_prune_results; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ed95ed1176..c9a5e5fb68 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern List *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 4a741b053f..521a60b988 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -612,6 +612,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_results; /* QueryDesc.part_prune_results */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index a80f43e540..937cc4629d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -212,6 +212,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index fbe75dca0f..354c2e96c3 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,18 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 2e132afc5a..c0717bf45e 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,8 +73,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1410,6 +1419,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1420,6 +1436,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1464,6 +1482,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
@@ -1548,6 +1569,31 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecPartitionDoInitialPruning() on a given
+ * PartitionPruneInfo.
+ *
+ * valid_subplans_offs contains the indexes of subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() on the
+ * PartitionPruneInfo.
+ *
+ * This is used to store the result of initial partition pruning that is
+ * peformed before the execution has started. A module that needs to do so
+ * should call ExecutorDoInitialPruning() on a given PlannedStmt, which
+ * returns a List of PartitionPruneResult containing an entry for each
+ * PartitionPruneInfo present in PlannedStmt.part_prune_infos. The module
+ * should then pass that list, along with the PlannedStmt, to the executor,
+ * so that it can reuse the result of initial partition pruning when
+ * initializing the subplans for execution.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *valid_subplan_offs;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..32579d4788 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..1901fc5f28 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results_list; /* List of Lists of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_results_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
v24-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patchapplication/octet-stream; name=v24-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patchDownload
From 9819109681e87342bf22549f5ea316501f77235d Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 27 May 2022 16:00:28 +0900
Subject: [PATCH v24 1/2] Move PartitioPruneInfo out of plan nodes into
PlannedStmt
The planner will now add a given PartitioPruneInfo to
PlannedStmt.partPruneInfos instead of directly to the
Append/MergeAppend plan node. What gets set instead in the
latter is an index field which points to the list element
of PlannedStmt.partPruneInfos containing the PartitioPruneInfo
belonging to the plan node.
A later commit will make AcquireExecutorLocks() do the initial
partition pruning to determine a minimal set of partitions to be
locked when validating a plan tree and it will need to consult the
PartitioPruneInfos referenced therein to do so. It would be better
for the PartitioPruneInfos to be accessible directly than requiring
a walk of the plan tree to find them, which is easier when it can be
done by simply iterating over PlannedStmt.partPruneInfos.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 4 +-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/nodeAppend.c | 4 +-
src/backend/executor/nodeMergeAppend.c | 4 +-
src/backend/optimizer/plan/createplan.c | 24 ++++-----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 65 +++++++++++++------------
src/backend/partitioning/partprune.c | 18 ++++---
src/include/executor/execPartition.h | 3 +-
src/include/nodes/execnodes.h | 1 +
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 11 +++--
src/include/partitioning/partprune.h | 8 +--
15 files changed, 90 insertions(+), 62 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index d78862e660..32475e33ff 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -825,6 +825,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 99512826c5..aca0c6f323 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 40e3c07693..80197d5141 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1791,11 +1791,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..21f4c10937 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,7 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..c6f86a6510 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index c5c62fa5c7..8d35860c30 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ac86ce9003..50a5719ac6 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1203,7 +1203,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1354,6 +1353,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1377,16 +1379,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1425,7 +1425,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1518,6 +1517,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1541,13 +1543,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 493a3af0fa..799602f5ea 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -519,6 +519,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1cb0abdbc1..720f20f563 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -348,6 +348,29 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ /* RT index of the table to which the pinfo belongs. */
+ pinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1658,21 +1681,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1734,21 +1748,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6188bf69cb..6565b6ed01 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -209,16 +209,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -332,7 +336,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
@@ -358,7 +362,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bf962af7af 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,8 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 01b1727fc0..4a741b053f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -611,6 +611,7 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 09342d128d..fbe75dca0f 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -122,6 +122,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* List of PartitionPruneInfo contained in the plan */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -503,6 +506,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 5c2ab1b379..2e132afc5a 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -70,6 +70,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
@@ -270,8 +273,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -305,8 +308,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
Looking at 0001, I wonder if we should have a crosscheck that a
PartitionPruneInfo you got from following an index is indeed constructed
for the relation that you think it is: previously, you were always sure
that the prune struct is for this node, because you followed a pointer
that was set up in the node itself. Now you only have an index, and you
have to trust that the index is correct.
I'm not sure how to implement this, or even if it's doable at all.
Keeping the OID of the partitioned table in the PartitionPruneInfo
struct is easy, but I don't know how to check it in ExecInitMergeAppend
and ExecInitAppend.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Find a bug in a program, and fix it, and the program will work today.
Show the program how to find and fix a bug, and the program
will work forever" (Oliver Silfridge)
Hi Alvaro,
Thanks for looking at this one.
On Thu, Dec 1, 2022 at 3:12 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Looking at 0001, I wonder if we should have a crosscheck that a
PartitionPruneInfo you got from following an index is indeed constructed
for the relation that you think it is: previously, you were always sure
that the prune struct is for this node, because you followed a pointer
that was set up in the node itself. Now you only have an index, and you
have to trust that the index is correct.
Yeah, a crosscheck sounds like a good idea.
I'm not sure how to implement this, or even if it's doable at all.
Keeping the OID of the partitioned table in the PartitionPruneInfo
struct is easy, but I don't know how to check it in ExecInitMergeAppend
and ExecInitAppend.
Hmm, how about keeping the [Merge]Append's parent relation's RT index
in the PartitionPruneInfo and passing it down to
ExecInitPartitionPruning() from ExecInit[Merge]Append() for
cross-checking? Both Append and MergeAppend already have a
'apprelids' field that we can save a copy of in the
PartitionPruneInfo. Tried that in the attached delta patch.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
PartitionPruneInfo-relids.patchapplication/octet-stream; name=PartitionPruneInfo-relids.patchDownload
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 2bd069d889..9a631a9192 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1791,6 +1791,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* Initialize data structure needed for run-time partition pruning and
* do initial pruning if needed
*
+ * 'root_parent_relids' identifies the relation to which both the parent plan
+ * and the PartitionPruneInfo given by 'part_prune_index' belong.
+ *
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
* Initial pruning is performed here if needed and in that case only the
@@ -1804,6 +1807,7 @@ PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
int part_prune_index,
+ Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
@@ -1811,6 +1815,14 @@ ExecInitPartitionPruning(PlanState *planstate,
PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
part_prune_index);
+ /* Sanity: part_prune_index gives the correct PartitionPruneInfo. */
+ if (!bms_equal(root_parent_relids, pruneinfo->root_parent_relids))
+ elog(ERROR, "wrong relids (%s) found in PartitionPruneInfo at part_prune_index=%u which has root_parent_relids=%s",
+ bmsToString(root_parent_relids),
+ part_prune_index,
+ bmsToString(pruneinfo->root_parent_relids));
+
+
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c6f86a6510..99830198bd 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -146,6 +146,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
node->part_prune_index,
+ node->apprelids,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8d35860c30..f370f9f287 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -94,6 +94,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
node->part_prune_index,
+ node->apprelids,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 720f20f563..e67f0e3509 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -354,6 +354,8 @@ set_plan_references(PlannerInfo *root, Plan *plan)
PartitionPruneInfo *pruneinfo = lfirst(lc);
ListCell *l;
+ pruneinfo->root_parent_relids =
+ offset_relid_set(pruneinfo->root_parent_relids, rtoffset);
foreach(l, pruneinfo->prune_infos)
{
List *prune_infos = lfirst(l);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6565b6ed01..d48f6784c1 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -340,6 +340,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
+ pruneinfo->root_parent_relids = parentrel->relids;
pruneinfo->prune_infos = prunerelinfos;
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index bf962af7af..17fabc18c9 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -124,6 +124,7 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
int part_prune_index,
+ Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 2e132afc5a..b2d6f8fb6e 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -1407,6 +1407,8 @@ typedef struct PlanRowMark
* Then, since an Append-type node could have multiple partitioning
* hierarchies among its children, we have an unordered List of those Lists.
*
+ * root_parent_relids RelOptInfo.relids of the relation to which the parent
+ * plan node and this PartitionPruneInfo node belong
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
@@ -1419,6 +1421,7 @@ typedef struct PartitionPruneInfo
pg_node_attr(no_equal)
NodeTag type;
+ Bitmapset *root_parent_relids;
List *prune_infos;
Bitmapset *other_subplans;
} PartitionPruneInfo;
On 2022-Dec-01, Amit Langote wrote:
Hmm, how about keeping the [Merge]Append's parent relation's RT index
in the PartitionPruneInfo and passing it down to
ExecInitPartitionPruning() from ExecInit[Merge]Append() for
cross-checking? Both Append and MergeAppend already have a
'apprelids' field that we can save a copy of in the
PartitionPruneInfo. Tried that in the attached delta patch.
Ah yeah, that sounds about what I was thinking. I've merged that in and
pushed to github, which had a strange pg_upgrade failure on Windows
mentioning log files that were not captured by the CI tooling. So I
pushed another one trying to grab those files, in case it wasn't an
one-off failure. It's running now:
https://cirrus-ci.com/task/5857239638999040
If all goes well with this run, I'll get this 0001 pushed.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Investigación es lo que hago cuando no sé lo que estoy haciendo"
(Wernher von Braun)
On Thu, Dec 1, 2022 at 8:21 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Dec-01, Amit Langote wrote:
Hmm, how about keeping the [Merge]Append's parent relation's RT index
in the PartitionPruneInfo and passing it down to
ExecInitPartitionPruning() from ExecInit[Merge]Append() for
cross-checking? Both Append and MergeAppend already have a
'apprelids' field that we can save a copy of in the
PartitionPruneInfo. Tried that in the attached delta patch.Ah yeah, that sounds about what I was thinking. I've merged that in and
pushed to github, which had a strange pg_upgrade failure on Windows
mentioning log files that were not captured by the CI tooling. So I
pushed another one trying to grab those files, in case it wasn't an
one-off failure. It's running now:
https://cirrus-ci.com/task/5857239638999040If all goes well with this run, I'll get this 0001 pushed.
Thanks for pushing 0001.
Rebased 0002 attached.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v25-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patchapplication/octet-stream; name=v25-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patchDownload
From cff400af6c264d7a2651faec4d963e987797f588 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v25] Optimize AcquireExecutorLocks() by locking only unpruned
partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 32 ++++
src/backend/executor/execMain.c | 51 ++++++
src/backend/executor/execParallel.c | 26 ++-
src/backend/executor/execPartition.c | 238 +++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/readfuncs.c | 8 +-
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 208 ++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 3 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 12 ++
src/include/nodes/plannodes.h | 46 +++++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
33 files changed, 781 insertions(+), 98 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f26cc0d162..401a2280a3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 152c29b551..942449544c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c660..2f2b558608 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NIL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, List *part_prune_results,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_results, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index cf1b1ca571..904cbcba4a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -779,7 +779,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NIL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..65c8d0aa59 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NIL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 9e29584d93..29b45539d3 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_results_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_results_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_results_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ List *part_prune_results = lfirst_node(List, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_results, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..5c59ac5da7 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,34 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed to
+figure out the minimal set of child subplans that satisfy those pruning steps.
+AcquireExecutorLocks() looking at a given generic plan will then lock only the
+relations scanned by the child subplans that survived such pruning, along with
+those present in PlannedStmt.minLockRelids. Note that the subplans are only
+notionally pruned, that is, they are not removed from the plan tree as such.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of pruning is passed to the executor as a List
+of PartitionPruneResult nodes via the QueryDesc. Each PartitionPruneResult
+consists of the set of indexes of surviving subplans in the respective parent
+plan node's (the one to which the corresponding PartitionPruneInfo belongs)
+list of child subplans, saved as a bitmapset (valid_subplan_offs). In other
+words, the executor executing a generic plan should not re-evaluate the set of
+initially valid subplans for a given plan node by redoing the initial pruning
+if it was already done by AcquireExecutorLocks() when validating the plan.
+Such re-evaluation of the pruning steps may very well end up resulting in a
+different set of subplans, containing some whose relations were not locked by
+AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +314,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b6751da574..7a4db80104 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,54 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+List *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *part_prune_results = NIL;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+
+ pruneresult->valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ scan_leafpart_rtis);
+ part_prune_results = lappend(part_prune_results, pruneresult);
+ }
+
+ return part_prune_results;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +855,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *part_prune_results = queryDesc->part_prune_results;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +876,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_results = part_prune_results;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aca0c6f323..917079a034 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized List of PartitionPruneResult. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized List of PartitionPruneResult */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS,
+ part_prune_results_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_results,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 8e6453aec2..13e450c0fa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1758,8 +1764,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1776,6 +1784,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1796,8 +1811,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1810,9 +1826,10 @@ ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo;
+ PartitionPruneResult *pruneresult = NULL;
/* Obtain the pruneinfo we need, and make sure it's the right one */
pruneinfo = list_nth(estate->es_part_prune_infos, part_prune_index);
@@ -1828,20 +1845,57 @@ ExecInitPartitionPruning(PlanState *planstate,
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_results
+ * is set.
+ */
+ if (estate->es_part_prune_results)
+ {
+ pruneresult = list_nth(estate->es_part_prune_results, part_prune_index);
+ Assert(IsA(pruneresult, PartitionPruneResult));
+ }
+
+ if (pruneresult == NULL || pruneinfo->needs_exec_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL,
+ pruneinfo->needs_exec_pruning,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans = bms_copy(pruneresult->valid_subplan_offs);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1849,7 +1903,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1865,11 +1920,74 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors.
+ * Note that we don't omit detached partitions, just like during
+ * execution proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started and thus need the information contained in a PlanState.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1883,19 +2001,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1950,15 +2070,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1972,6 +2119,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1982,6 +2130,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2032,6 +2182,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2039,6 +2191,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -2060,7 +2213,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2070,7 +2223,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2298,10 +2451,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2336,7 +2493,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2350,6 +2507,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2360,13 +2519,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2393,8 +2554,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2402,7 +2569,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9695de85b9..dce93a8c9f 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -135,6 +135,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_results = NIL;
estate->es_query_cxt = qcontext;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index dc13625171..bffb42ce71 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NIL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 99830198bd..3b917584de 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -156,7 +156,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -578,7 +579,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -643,7 +644,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -718,7 +719,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -869,7 +870,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index f370f9f287..ccfa083945 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -104,7 +104,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -219,7 +220,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..93012a5b3b 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_results_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_results_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_results_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ List *part_prune_results = lfirst_node(List, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_results,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 23776367c5..b01f55fb4f 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -800,7 +805,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 799602f5ea..a96d316dca 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -520,7 +520,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index e67f0e3509..5820f26fdb 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
pruneinfo->root_parent_relids =
@@ -364,15 +375,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d48f6784c1..d5556354f7 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -342,6 +353,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->root_parent_relids = parentrel->relids;
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -442,13 +455,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -459,6 +477,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -546,6 +568,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -620,6 +645,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -647,6 +678,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -659,6 +691,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -673,6 +706,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -697,6 +731,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3082093d1e..95ab1d0eef 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1598,6 +1598,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_results_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1972,7 +1973,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
/*
* Now we can define the portal.
@@ -1987,6 +1990,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 52e2db6452..280ed7d239 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_results = part_prune_results; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_results: ExecutorDoInitialPruning() output for the PlannedStmt
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_results, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results_list == NIL ? NIL :
+ linitial(portal->part_prune_results_list),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ List *part_prune_results = NIL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding List of PartitionPruneResult
+ * for this PlannedStmt.
+ */
+ if (portal->part_prune_results_list != NIL)
+ part_prune_results = list_nth(portal->part_prune_results_list,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index cc943205d3..af6fae6e3b 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -782,6 +787,26 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
return tlist;
}
+/*
+ * FreePartitionPruneResults
+ * Frees the List of Lists of PartitionPruneResults for CheckCachedPlan()
+ */
+static void
+FreePartitionPruneResults(List *part_prune_results_list)
+{
+ ListCell *lc;
+
+ foreach(lc, part_prune_results_list)
+ {
+ List *part_prune_results = lfirst(lc);
+
+ /* Free both the PartitionPruneResults and the containing List. */
+ list_free_deep(part_prune_results);
+ }
+
+ list_free(part_prune_results_list);
+}
+
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
@@ -790,15 +815,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_results_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +850,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_results_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +886,11 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /* Release any PartitionPruneResults that may been created. */
+ FreePartitionPruneResults(*part_prune_results_list);
+ *part_prune_results_list = NIL;
}
/*
@@ -874,10 +916,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NILs is returned in *part_prune_results_list, meaning that no
+ * no partition pruning has been done yet for the plans in stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1053,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_results_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
+ }
+
return plan;
}
@@ -1126,6 +1183,19 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a List of PartitionPruneResult or a NIL is added to
+ * *part_prune_results_list. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and has
+ * containsInitialPruning set to true. Before returning such a CachedPlan,
+ * those "initial" steps are performed by calling ExecutorDoInitialPruning()
+ * to determine only those leaf partitions that need to be locked by
+ * AcquireExecutorLocks() by pruning away subplans that don't match the
+ * "initial" pruning conditions. For each PartitionPruneInfo found in
+ * PlannedStmt.partPruneInfos, a PartitionPruneResult containing the bitmapset
+ * of the indexes of surviving subplans is added to the List for the
+ * PlannedStmt.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1209,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_results_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1232,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_results_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1242,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_results_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1288,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_results_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1321,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_results_list)
+ *part_prune_results_list = my_part_prune_results_list;
+
return plan;
}
@@ -1737,17 +1815,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_results_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ List *part_prune_results = NIL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1851,40 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ Bitmapset *scan_leafpart_rtis = NULL;
+
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ part_prune_results = ExecutorDoInitialPruning(plannedstmt,
+ boundParams,
+ &scan_leafpart_rtis);
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1895,59 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_results_list = lappend(*part_prune_results_list,
+ part_prune_results);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ Assert(lockedRelids == NULL);
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 7b1ae6fdcf..5b9098971b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given List of Lists of PartitionPruneResults into the
+ * portal's context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results_list)
+{
+ MemoryContext oldcxt;
+
+ Assert(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results_list = copyObject(part_prune_results_list);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..269cc4d562 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ List *part_prune_results,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 17fabc18c9..4b98d0d2ef 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -127,5 +129,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..7d4379da7b 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ List *part_prune_results; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ed95ed1176..c9a5e5fb68 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern List *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a2008846c6..369de42caf 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -615,6 +615,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_results; /* QueryDesc.part_prune_results */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index a80f43e540..937cc4629d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -212,6 +212,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index dd4eb8679d..36abe4cf9e 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,18 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 2e202892a7..0cab6958d7 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,8 +73,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1414,6 +1423,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1425,6 +1441,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
Bitmapset *root_parent_relids;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1469,6 +1487,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
@@ -1553,6 +1574,31 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecPartitionDoInitialPruning() on a given
+ * PartitionPruneInfo.
+ *
+ * valid_subplans_offs contains the indexes of subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() on the
+ * PartitionPruneInfo.
+ *
+ * This is used to store the result of initial partition pruning that is
+ * peformed before the execution has started. A module that needs to do so
+ * should call ExecutorDoInitialPruning() on a given PlannedStmt, which
+ * returns a List of PartitionPruneResult containing an entry for each
+ * PartitionPruneInfo present in PlannedStmt.part_prune_infos. The module
+ * should then pass that list, along with the PlannedStmt, to the executor,
+ * so that it can reuse the result of initial partition pruning when
+ * initializing the subplans for execution.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *valid_subplan_offs;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..32579d4788 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..1901fc5f28 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results_list; /* List of Lists of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_results_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
On Thu, Dec 1, 2022 at 9:43 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Dec 1, 2022 at 8:21 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Dec-01, Amit Langote wrote:
Hmm, how about keeping the [Merge]Append's parent relation's RT index
in the PartitionPruneInfo and passing it down to
ExecInitPartitionPruning() from ExecInit[Merge]Append() for
cross-checking? Both Append and MergeAppend already have a
'apprelids' field that we can save a copy of in the
PartitionPruneInfo. Tried that in the attached delta patch.Ah yeah, that sounds about what I was thinking. I've merged that in and
pushed to github, which had a strange pg_upgrade failure on Windows
mentioning log files that were not captured by the CI tooling. So I
pushed another one trying to grab those files, in case it wasn't an
one-off failure. It's running now:
https://cirrus-ci.com/task/5857239638999040If all goes well with this run, I'll get this 0001 pushed.
Thanks for pushing 0001.
Rebased 0002 attached.
Thought it might be good for PartitionPruneResult to also have
root_parent_relids that matches with the corresponding
PartitionPruneInfo. ExecInitPartitionPruning() does a sanity check
that the root_parent_relids of a given pair of PartitionPrune{Info |
Result} match.
Posting the patch separately as the attached 0002, just in case you
might think that the extra cross-checking would be an overkill.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v26-0002-Add-root_parent_relids-to-PartitionPruneResult.patchapplication/octet-stream; name=v26-0002-Add-root_parent_relids-to-PartitionPruneResult.patchDownload
From f1af32816635254773386630b634835bd26d1227 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 2 Dec 2022 19:32:14 +0900
Subject: [PATCH v26 2/2] Add root_parent_relids to PartitionPruneResult
It's same as the corresponding PartitionPruneInfo's root_parent_relids.
Like PartitionPruneInfo.root_parent_relids, it's there for
cross-checking a PartitionPruneResult found at a given plan node's
part_prune_index actually matches the plan node.
---
src/backend/executor/execMain.c | 2 ++
src/backend/executor/execPartition.c | 13 +++++++++++--
src/include/nodes/plannodes.h | 7 +++++++
3 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 7a4db80104..1e84e47d46 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -145,6 +145,8 @@ ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
PartitionPruneInfo *pruneinfo = lfirst(lc);
PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+ pruneresult->root_parent_relids =
+ bms_copy(pruneinfo->root_parent_relids);
pruneresult->valid_subplan_offs =
ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
scan_leafpart_rtis);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 13e450c0fa..eda14d6241 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1852,8 +1852,17 @@ ExecInitPartitionPruning(PlanState *planstate,
*/
if (estate->es_part_prune_results)
{
- pruneresult = list_nth(estate->es_part_prune_results, part_prune_index);
- Assert(IsA(pruneresult, PartitionPruneResult));
+ pruneresult = list_nth_node(PartitionPruneResult,
+ estate->es_part_prune_results,
+ part_prune_index);
+ if (!bms_equal(root_parent_relids, pruneinfo->root_parent_relids))
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("mismatching PartitionPruneInfo and PartitionPruneResult at part_prune_index %d",
+ part_prune_index),
+ errdetail_internal("prunresult relids %s, pruneinfo relids %s",
+ bmsToString(pruneresult->root_parent_relids),
+ bmsToString(pruneinfo->root_parent_relids)));
}
if (pruneresult == NULL || pruneinfo->needs_exec_pruning)
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 0cab6958d7..30f51414e9 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -1580,6 +1580,12 @@ typedef struct PartitionPruneStepCombine
* The result of performing ExecPartitionDoInitialPruning() on a given
* PartitionPruneInfo.
*
+ * root_parent_relids is same as PartitionPruneInfo.root_parent_relids. It's
+ * there for cross-checking in ExecInitPartitionPruning() that the
+ * PartitionPruneResult and the PartitionPruneInfo at a given index in
+ * EState.es_part_prune_results and EState.es_part_prune_infos, respectively,
+ * belong to the same parent plan node.
+ *
* valid_subplans_offs contains the indexes of subplans remaining after
* performing initial pruning by calling ExecFindMatchingSubPlans() on the
* PartitionPruneInfo.
@@ -1597,6 +1603,7 @@ typedef struct PartitionPruneResult
{
NodeTag type;
+ Bitmapset *root_parent_relids;
Bitmapset *valid_subplan_offs;
} PartitionPruneResult;
--
2.35.3
v26-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patchapplication/octet-stream; name=v26-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patchDownload
From d8b8185b6ceb2a2a33a6af142f23a59fd93d5cdc Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v26 1/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 32 ++++
src/backend/executor/execMain.c | 51 ++++++
src/backend/executor/execParallel.c | 26 ++-
src/backend/executor/execPartition.c | 238 +++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/readfuncs.c | 8 +-
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 208 ++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 3 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 12 ++
src/include/nodes/plannodes.h | 46 +++++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
33 files changed, 781 insertions(+), 98 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f26cc0d162..401a2280a3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 152c29b551..942449544c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c660..2f2b558608 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NIL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, List *part_prune_results,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_results, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index cf1b1ca571..904cbcba4a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -779,7 +779,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NIL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..65c8d0aa59 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NIL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 9e29584d93..29b45539d3 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_results_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_results_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_results_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ List *part_prune_results = lfirst_node(List, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_results, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..5c59ac5da7 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,34 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed to
+figure out the minimal set of child subplans that satisfy those pruning steps.
+AcquireExecutorLocks() looking at a given generic plan will then lock only the
+relations scanned by the child subplans that survived such pruning, along with
+those present in PlannedStmt.minLockRelids. Note that the subplans are only
+notionally pruned, that is, they are not removed from the plan tree as such.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of pruning is passed to the executor as a List
+of PartitionPruneResult nodes via the QueryDesc. Each PartitionPruneResult
+consists of the set of indexes of surviving subplans in the respective parent
+plan node's (the one to which the corresponding PartitionPruneInfo belongs)
+list of child subplans, saved as a bitmapset (valid_subplan_offs). In other
+words, the executor executing a generic plan should not re-evaluate the set of
+initially valid subplans for a given plan node by redoing the initial pruning
+if it was already done by AcquireExecutorLocks() when validating the plan.
+Such re-evaluation of the pruning steps may very well end up resulting in a
+different set of subplans, containing some whose relations were not locked by
+AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +314,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b6751da574..7a4db80104 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,54 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+List *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *part_prune_results = NIL;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+
+ pruneresult->valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ scan_leafpart_rtis);
+ part_prune_results = lappend(part_prune_results, pruneresult);
+ }
+
+ return part_prune_results;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +855,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *part_prune_results = queryDesc->part_prune_results;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +876,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_results = part_prune_results;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aca0c6f323..917079a034 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized List of PartitionPruneResult. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized List of PartitionPruneResult */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS,
+ part_prune_results_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_results,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 8e6453aec2..13e450c0fa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1758,8 +1764,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1776,6 +1784,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1796,8 +1811,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1810,9 +1826,10 @@ ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo;
+ PartitionPruneResult *pruneresult = NULL;
/* Obtain the pruneinfo we need, and make sure it's the right one */
pruneinfo = list_nth(estate->es_part_prune_infos, part_prune_index);
@@ -1828,20 +1845,57 @@ ExecInitPartitionPruning(PlanState *planstate,
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_results
+ * is set.
+ */
+ if (estate->es_part_prune_results)
+ {
+ pruneresult = list_nth(estate->es_part_prune_results, part_prune_index);
+ Assert(IsA(pruneresult, PartitionPruneResult));
+ }
+
+ if (pruneresult == NULL || pruneinfo->needs_exec_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL,
+ pruneinfo->needs_exec_pruning,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans = bms_copy(pruneresult->valid_subplan_offs);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1849,7 +1903,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1865,11 +1920,74 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors.
+ * Note that we don't omit detached partitions, just like during
+ * execution proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started and thus need the information contained in a PlanState.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1883,19 +2001,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1950,15 +2070,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1972,6 +2119,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1982,6 +2130,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2032,6 +2182,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2039,6 +2191,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -2060,7 +2213,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2070,7 +2223,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2298,10 +2451,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2336,7 +2493,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2350,6 +2507,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2360,13 +2519,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2393,8 +2554,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2402,7 +2569,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9695de85b9..dce93a8c9f 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -135,6 +135,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_results = NIL;
estate->es_query_cxt = qcontext;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index dc13625171..bffb42ce71 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NIL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 99830198bd..3b917584de 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -156,7 +156,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -578,7 +579,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -643,7 +644,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -718,7 +719,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -869,7 +870,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index f370f9f287..ccfa083945 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -104,7 +104,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -219,7 +220,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..93012a5b3b 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_results_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_results_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_results_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ List *part_prune_results = lfirst_node(List, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_results,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 23776367c5..b01f55fb4f 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -800,7 +805,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 799602f5ea..a96d316dca 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -520,7 +520,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index e67f0e3509..5820f26fdb 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
pruneinfo->root_parent_relids =
@@ -364,15 +375,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d48f6784c1..d5556354f7 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -342,6 +353,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->root_parent_relids = parentrel->relids;
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -442,13 +455,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -459,6 +477,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -546,6 +568,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -620,6 +645,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -647,6 +678,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -659,6 +691,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -673,6 +706,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -697,6 +731,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3082093d1e..95ab1d0eef 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1598,6 +1598,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_results_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1972,7 +1973,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
/*
* Now we can define the portal.
@@ -1987,6 +1990,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 52e2db6452..280ed7d239 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_results = part_prune_results; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_results: ExecutorDoInitialPruning() output for the PlannedStmt
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_results, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results_list == NIL ? NIL :
+ linitial(portal->part_prune_results_list),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ List *part_prune_results = NIL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding List of PartitionPruneResult
+ * for this PlannedStmt.
+ */
+ if (portal->part_prune_results_list != NIL)
+ part_prune_results = list_nth(portal->part_prune_results_list,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index cc943205d3..af6fae6e3b 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -782,6 +787,26 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
return tlist;
}
+/*
+ * FreePartitionPruneResults
+ * Frees the List of Lists of PartitionPruneResults for CheckCachedPlan()
+ */
+static void
+FreePartitionPruneResults(List *part_prune_results_list)
+{
+ ListCell *lc;
+
+ foreach(lc, part_prune_results_list)
+ {
+ List *part_prune_results = lfirst(lc);
+
+ /* Free both the PartitionPruneResults and the containing List. */
+ list_free_deep(part_prune_results);
+ }
+
+ list_free(part_prune_results_list);
+}
+
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
@@ -790,15 +815,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_results_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +850,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_results_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +886,11 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /* Release any PartitionPruneResults that may been created. */
+ FreePartitionPruneResults(*part_prune_results_list);
+ *part_prune_results_list = NIL;
}
/*
@@ -874,10 +916,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NILs is returned in *part_prune_results_list, meaning that no
+ * no partition pruning has been done yet for the plans in stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1053,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_results_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
+ }
+
return plan;
}
@@ -1126,6 +1183,19 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a List of PartitionPruneResult or a NIL is added to
+ * *part_prune_results_list. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and has
+ * containsInitialPruning set to true. Before returning such a CachedPlan,
+ * those "initial" steps are performed by calling ExecutorDoInitialPruning()
+ * to determine only those leaf partitions that need to be locked by
+ * AcquireExecutorLocks() by pruning away subplans that don't match the
+ * "initial" pruning conditions. For each PartitionPruneInfo found in
+ * PlannedStmt.partPruneInfos, a PartitionPruneResult containing the bitmapset
+ * of the indexes of surviving subplans is added to the List for the
+ * PlannedStmt.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1209,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_results_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1232,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_results_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1242,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_results_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1288,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_results_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1321,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_results_list)
+ *part_prune_results_list = my_part_prune_results_list;
+
return plan;
}
@@ -1737,17 +1815,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_results_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ List *part_prune_results = NIL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1851,40 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ Bitmapset *scan_leafpart_rtis = NULL;
+
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ part_prune_results = ExecutorDoInitialPruning(plannedstmt,
+ boundParams,
+ &scan_leafpart_rtis);
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1895,59 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_results_list = lappend(*part_prune_results_list,
+ part_prune_results);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ Assert(lockedRelids == NULL);
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 7b1ae6fdcf..5b9098971b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given List of Lists of PartitionPruneResults into the
+ * portal's context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results_list)
+{
+ MemoryContext oldcxt;
+
+ Assert(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results_list = copyObject(part_prune_results_list);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..269cc4d562 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ List *part_prune_results,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 17fabc18c9..4b98d0d2ef 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -127,5 +129,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..7d4379da7b 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ List *part_prune_results; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ed95ed1176..c9a5e5fb68 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern List *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a2008846c6..369de42caf 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -615,6 +615,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_results; /* QueryDesc.part_prune_results */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index a80f43e540..937cc4629d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -212,6 +212,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index dd4eb8679d..36abe4cf9e 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,18 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 2e202892a7..0cab6958d7 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,8 +73,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1414,6 +1423,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1425,6 +1441,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
Bitmapset *root_parent_relids;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1469,6 +1487,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
@@ -1553,6 +1574,31 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecPartitionDoInitialPruning() on a given
+ * PartitionPruneInfo.
+ *
+ * valid_subplans_offs contains the indexes of subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() on the
+ * PartitionPruneInfo.
+ *
+ * This is used to store the result of initial partition pruning that is
+ * peformed before the execution has started. A module that needs to do so
+ * should call ExecutorDoInitialPruning() on a given PlannedStmt, which
+ * returns a List of PartitionPruneResult containing an entry for each
+ * PartitionPruneInfo present in PlannedStmt.part_prune_infos. The module
+ * should then pass that list, along with the PlannedStmt, to the executor,
+ * so that it can reuse the result of initial partition pruning when
+ * initializing the subplans for execution.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *valid_subplan_offs;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..32579d4788 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..1901fc5f28 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results_list; /* List of Lists of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_results_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
On Fri, Dec 2, 2022 at 7:40 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Dec 1, 2022 at 9:43 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Dec 1, 2022 at 8:21 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Dec-01, Amit Langote wrote:
Hmm, how about keeping the [Merge]Append's parent relation's RT index
in the PartitionPruneInfo and passing it down to
ExecInitPartitionPruning() from ExecInit[Merge]Append() for
cross-checking? Both Append and MergeAppend already have a
'apprelids' field that we can save a copy of in the
PartitionPruneInfo. Tried that in the attached delta patch.Ah yeah, that sounds about what I was thinking. I've merged that in and
pushed to github, which had a strange pg_upgrade failure on Windows
mentioning log files that were not captured by the CI tooling. So I
pushed another one trying to grab those files, in case it wasn't an
one-off failure. It's running now:
https://cirrus-ci.com/task/5857239638999040If all goes well with this run, I'll get this 0001 pushed.
Thanks for pushing 0001.
Rebased 0002 attached.
Thought it might be good for PartitionPruneResult to also have
root_parent_relids that matches with the corresponding
PartitionPruneInfo. ExecInitPartitionPruning() does a sanity check
that the root_parent_relids of a given pair of PartitionPrune{Info |
Result} match.Posting the patch separately as the attached 0002, just in case you
might think that the extra cross-checking would be an overkill.
Rebased over 92c4dafe1eed and fixed some factual mistakes in the
comment above ExecutorDoInitialPruning().
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v27-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patchapplication/octet-stream; name=v27-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patchDownload
From 6c4cf0b0a03bfac62e87f76bb3be9c1e62125a0c Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v27 1/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 36 ++++
src/backend/executor/execMain.c | 53 ++++++
src/backend/executor/execParallel.c | 26 ++-
src/backend/executor/execPartition.c | 238 +++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/readfuncs.c | 8 +-
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 208 ++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 3 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 12 ++
src/include/nodes/plannodes.h | 46 +++++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
33 files changed, 787 insertions(+), 98 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f26cc0d162..401a2280a3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 152c29b551..942449544c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c660..2f2b558608 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NIL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, List *part_prune_results,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_results, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index cf1b1ca571..904cbcba4a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -779,7 +779,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NIL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..65c8d0aa59 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NIL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 9e29584d93..29b45539d3 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_results_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_results_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_results_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ List *part_prune_results = lfirst_node(List, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_results, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..7f8cf1494f 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,38 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+The so-called execution time pruning may also occur even before the execution
+has actually started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c:GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed as part
+of the plan validation step, by calling ExecutorDoInitialPruning(). That
+returns the minimal set of child subplans that satisfy thoe initial pruning
+steps contained in each PartitionPruneInfo. AcquireExecutorLocks() will then
+lock only the relations scanned by those subplans, in addition to those present
+inPlannedStmt.minLockRelids. Note that the subplans are not really pruned as
+in being removed from the plan tree, so care is needed by the downstreams
+users of such a plan that has undergone pre-execution initial pruning.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of that pruning is passed to the executor as a
+List of PartitionPruneResult nodes via the QueryDesc, which is subsequently
+assigned to EState.es_part_prune_results. Each PartitionPruneResult therein
+consists of the set of indexes of surviving subplans in the respective parent
+plan node's (the one to which the corresponding PartitionPruneInfo belongs)
+list of child subplans, saved as a bitmapset valid_subplan_offs. The executor
+or any third party execution code working on a generic plan should not
+re-evaluate the set of initially valid subplans for a given plan node by
+redoing the initial pruning if a PartitionPruneResult belonging to thant plan
+node is present in es_part_prune_results. Note that that is not simply a
+performance optimization, because such re-evaluation of the pruning steps may
+very well end up resulting in a different set of initially valid subplans,
+containing some whose relations were not locked by AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +318,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 12ff4f3de5..4d8c8e2e43 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,56 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a List of PartitionPruneResult nodes, one for each
+ * PartitionPruneInfo found in plannedstmt->containsInitialPruning, each
+ * containing a bitmapset of the indexes of unpruned child subplans.
+ * A bitmapset of the RT indexes of the leaf partitions scanned by those
+ * subplans is returned in *scan_leafpart_rtis, which is shared across all
+ * of those PartitionPruneResults.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+List *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *part_prune_results = NIL;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+
+ pruneresult->valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ scan_leafpart_rtis);
+ part_prune_results = lappend(part_prune_results, pruneresult);
+ }
+
+ return part_prune_results;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +857,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *part_prune_results = queryDesc->part_prune_results;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +878,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_results = part_prune_results;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aca0c6f323..917079a034 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized List of PartitionPruneResult. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized List of PartitionPruneResult */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS,
+ part_prune_results_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_results,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 88d0ea3adb..b0eb15b982 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1749,8 +1755,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1767,6 +1775,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1787,8 +1802,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1801,9 +1817,10 @@ ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo;
+ PartitionPruneResult *pruneresult = NULL;
/* Obtain the pruneinfo we need, and make sure it's the right one */
pruneinfo = list_nth(estate->es_part_prune_infos, part_prune_index);
@@ -1819,20 +1836,57 @@ ExecInitPartitionPruning(PlanState *planstate,
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_results
+ * is set.
+ */
+ if (estate->es_part_prune_results)
+ {
+ pruneresult = list_nth(estate->es_part_prune_results, part_prune_index);
+ Assert(IsA(pruneresult, PartitionPruneResult));
+ }
+
+ if (pruneresult == NULL || pruneinfo->needs_exec_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL,
+ pruneinfo->needs_exec_pruning,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans = bms_copy(pruneresult->valid_subplan_offs);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1840,7 +1894,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1856,11 +1911,74 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors.
+ * Note that we don't omit detached partitions, just like during
+ * execution proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started and thus need the information contained in a PlanState.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1874,19 +1992,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1941,15 +2061,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1963,6 +2110,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1973,6 +2121,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2023,6 +2173,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2030,6 +2182,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -2051,7 +2204,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2061,7 +2214,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2289,10 +2442,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2327,7 +2484,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2341,6 +2498,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2351,13 +2510,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2384,8 +2545,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2393,7 +2560,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 572c87e453..044bf3f491 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -135,6 +135,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_results = NIL;
estate->es_query_cxt = qcontext;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index dc13625171..bffb42ce71 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NIL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 99830198bd..3b917584de 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -156,7 +156,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -578,7 +579,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -643,7 +644,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -718,7 +719,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -869,7 +870,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index f370f9f287..ccfa083945 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -104,7 +104,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -219,7 +220,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..93012a5b3b 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_results_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_results_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_results_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ List *part_prune_results = lfirst_node(List, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_results,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 23776367c5..b01f55fb4f 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -800,7 +805,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 799602f5ea..a96d316dca 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -520,7 +520,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 399c1812d4..44ffe71c49 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -353,6 +363,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
ListCell *l;
+ Bitmapset *leafpart_rtis = NULL;
pruneinfo->root_parent_relids =
offset_relid_set(pruneinfo->root_parent_relids, rtoffset);
@@ -364,15 +375,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d48f6784c1..d5556354f7 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -342,6 +353,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->root_parent_relids = parentrel->relids;
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -442,13 +455,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -459,6 +477,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -546,6 +568,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -620,6 +645,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -647,6 +678,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -659,6 +691,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -673,6 +706,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -697,6 +731,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3082093d1e..95ab1d0eef 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1598,6 +1598,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_results_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1972,7 +1973,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
/*
* Now we can define the portal.
@@ -1987,6 +1990,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 52e2db6452..280ed7d239 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_results = part_prune_results; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_results: ExecutorDoInitialPruning() output for the PlannedStmt
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_results, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results_list == NIL ? NIL :
+ linitial(portal->part_prune_results_list),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ List *part_prune_results = NIL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding List of PartitionPruneResult
+ * for this PlannedStmt.
+ */
+ if (portal->part_prune_results_list != NIL)
+ part_prune_results = list_nth(portal->part_prune_results_list,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index cc943205d3..af6fae6e3b 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -782,6 +787,26 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
return tlist;
}
+/*
+ * FreePartitionPruneResults
+ * Frees the List of Lists of PartitionPruneResults for CheckCachedPlan()
+ */
+static void
+FreePartitionPruneResults(List *part_prune_results_list)
+{
+ ListCell *lc;
+
+ foreach(lc, part_prune_results_list)
+ {
+ List *part_prune_results = lfirst(lc);
+
+ /* Free both the PartitionPruneResults and the containing List. */
+ list_free_deep(part_prune_results);
+ }
+
+ list_free(part_prune_results_list);
+}
+
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
@@ -790,15 +815,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_results_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +850,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_results_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +886,11 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /* Release any PartitionPruneResults that may been created. */
+ FreePartitionPruneResults(*part_prune_results_list);
+ *part_prune_results_list = NIL;
}
/*
@@ -874,10 +916,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NILs is returned in *part_prune_results_list, meaning that no
+ * no partition pruning has been done yet for the plans in stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1053,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_results_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
+ }
+
return plan;
}
@@ -1126,6 +1183,19 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a List of PartitionPruneResult or a NIL is added to
+ * *part_prune_results_list. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and has
+ * containsInitialPruning set to true. Before returning such a CachedPlan,
+ * those "initial" steps are performed by calling ExecutorDoInitialPruning()
+ * to determine only those leaf partitions that need to be locked by
+ * AcquireExecutorLocks() by pruning away subplans that don't match the
+ * "initial" pruning conditions. For each PartitionPruneInfo found in
+ * PlannedStmt.partPruneInfos, a PartitionPruneResult containing the bitmapset
+ * of the indexes of surviving subplans is added to the List for the
+ * PlannedStmt.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1209,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_results_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1232,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_results_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1242,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_results_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1288,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_results_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1321,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_results_list)
+ *part_prune_results_list = my_part_prune_results_list;
+
return plan;
}
@@ -1737,17 +1815,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_results_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ List *part_prune_results = NIL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1851,40 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ Bitmapset *scan_leafpart_rtis = NULL;
+
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ part_prune_results = ExecutorDoInitialPruning(plannedstmt,
+ boundParams,
+ &scan_leafpart_rtis);
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1895,59 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_results_list = lappend(*part_prune_results_list,
+ part_prune_results);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ Assert(lockedRelids == NULL);
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 7b1ae6fdcf..5b9098971b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given List of Lists of PartitionPruneResults into the
+ * portal's context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results_list)
+{
+ MemoryContext oldcxt;
+
+ Assert(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results_list = copyObject(part_prune_results_list);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..269cc4d562 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ List *part_prune_results,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 17fabc18c9..4b98d0d2ef 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -127,5 +129,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..7d4379da7b 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ List *part_prune_results; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index aaf2bc78b9..32bbbc5927 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern List *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 71248a9466..9c6e8f5e13 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_results; /* QueryDesc.part_prune_results */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 1f33902947..c2f2544df5 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -218,6 +218,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index dbaa9bb54d..e0e5c15b09 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,18 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c36a15bd09..714e2cf2c7 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,8 +73,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in the
* plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1414,6 +1423,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1425,6 +1441,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
Bitmapset *root_parent_relids;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1469,6 +1487,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
@@ -1553,6 +1574,31 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecPartitionDoInitialPruning() on a given
+ * PartitionPruneInfo.
+ *
+ * valid_subplans_offs contains the indexes of subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() on the
+ * PartitionPruneInfo.
+ *
+ * This is used to store the result of initial partition pruning that is
+ * peformed before the execution has started. A module that needs to do so
+ * should call ExecutorDoInitialPruning() on a given PlannedStmt, which
+ * returns a List of PartitionPruneResult containing an entry for each
+ * PartitionPruneInfo present in PlannedStmt.part_prune_infos. The module
+ * should then pass that list, along with the PlannedStmt, to the executor,
+ * so that it can reuse the result of initial partition pruning when
+ * initializing the subplans for execution.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *valid_subplan_offs;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..32579d4788 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..1901fc5f28 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results_list; /* List of Lists of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_results_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
v27-0002-Add-root_parent_relids-to-PartitionPruneResult.patchapplication/octet-stream; name=v27-0002-Add-root_parent_relids-to-PartitionPruneResult.patchDownload
From 4ef1d918405a7c7c63a3e7376ccef57cf844796d Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 2 Dec 2022 19:32:14 +0900
Subject: [PATCH v27 2/2] Add root_parent_relids to PartitionPruneResult
It's same as the corresponding PartitionPruneInfo's root_parent_relids.
Like PartitionPruneInfo.root_parent_relids, it's there for
cross-checking a PartitionPruneResult found at a given plan node's
part_prune_index actually matches the plan node.
---
src/backend/executor/execMain.c | 2 ++
src/backend/executor/execPartition.c | 13 +++++++++++--
src/include/nodes/plannodes.h | 7 +++++++
3 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4d8c8e2e43..3293a65d15 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -147,6 +147,8 @@ ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
PartitionPruneInfo *pruneinfo = lfirst(lc);
PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+ pruneresult->root_parent_relids =
+ bms_copy(pruneinfo->root_parent_relids);
pruneresult->valid_subplan_offs =
ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
scan_leafpart_rtis);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b0eb15b982..2eadc30ec8 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1843,8 +1843,17 @@ ExecInitPartitionPruning(PlanState *planstate,
*/
if (estate->es_part_prune_results)
{
- pruneresult = list_nth(estate->es_part_prune_results, part_prune_index);
- Assert(IsA(pruneresult, PartitionPruneResult));
+ pruneresult = list_nth_node(PartitionPruneResult,
+ estate->es_part_prune_results,
+ part_prune_index);
+ if (!bms_equal(root_parent_relids, pruneinfo->root_parent_relids))
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("mismatching PartitionPruneInfo and PartitionPruneResult at part_prune_index %d",
+ part_prune_index),
+ errdetail_internal("prunresult relids %s, pruneinfo relids %s",
+ bmsToString(pruneresult->root_parent_relids),
+ bmsToString(pruneinfo->root_parent_relids)));
}
if (pruneresult == NULL || pruneinfo->needs_exec_pruning)
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 714e2cf2c7..ed664c5469 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -1580,6 +1580,12 @@ typedef struct PartitionPruneStepCombine
* The result of performing ExecPartitionDoInitialPruning() on a given
* PartitionPruneInfo.
*
+ * root_parent_relids is same as PartitionPruneInfo.root_parent_relids. It's
+ * there for cross-checking in ExecInitPartitionPruning() that the
+ * PartitionPruneResult and the PartitionPruneInfo at a given index in
+ * EState.es_part_prune_results and EState.es_part_prune_infos, respectively,
+ * belong to the same parent plan node.
+ *
* valid_subplans_offs contains the indexes of subplans remaining after
* performing initial pruning by calling ExecFindMatchingSubPlans() on the
* PartitionPruneInfo.
@@ -1597,6 +1603,7 @@ typedef struct PartitionPruneResult
{
NodeTag type;
+ Bitmapset *root_parent_relids;
Bitmapset *valid_subplan_offs;
} PartitionPruneResult;
--
2.35.3
On Mon, Dec 5, 2022 at 12:00 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Fri, Dec 2, 2022 at 7:40 PM Amit Langote <amitlangote09@gmail.com> wrote:
Thought it might be good for PartitionPruneResult to also have
root_parent_relids that matches with the corresponding
PartitionPruneInfo. ExecInitPartitionPruning() does a sanity check
that the root_parent_relids of a given pair of PartitionPrune{Info |
Result} match.Posting the patch separately as the attached 0002, just in case you
might think that the extra cross-checking would be an overkill.Rebased over 92c4dafe1eed and fixed some factual mistakes in the
comment above ExecutorDoInitialPruning().
Sorry, I had forgotten to git-add hunks including some cosmetic
changes in that one. Here's another version.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v28-0002-Add-root_parent_relids-to-PartitionPruneResult.patchapplication/octet-stream; name=v28-0002-Add-root_parent_relids-to-PartitionPruneResult.patchDownload
From 04f156396309f8c34a853ce1ad4e293fe4e2c4a2 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 2 Dec 2022 19:32:14 +0900
Subject: [PATCH v28 2/2] Add root_parent_relids to PartitionPruneResult
It's same as the corresponding PartitionPruneInfo's root_parent_relids.
Like PartitionPruneInfo.root_parent_relids, it's there for
cross-checking a PartitionPruneResult found at a given plan node's
part_prune_index actually matches the plan node.
---
src/backend/executor/execMain.c | 2 ++
src/backend/executor/execPartition.c | 10 ++++++++++
src/include/nodes/plannodes.h | 7 +++++++
3 files changed, 19 insertions(+)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index f15265716a..554623751b 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -147,6 +147,8 @@ ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+ pruneresult->root_parent_relids =
+ bms_copy(pruneinfo->root_parent_relids);
pruneresult->valid_subplan_offs =
ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
scan_leafpart_rtis);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index bc8331a222..2eadc30ec8 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1842,9 +1842,19 @@ ExecInitPartitionPruning(PlanState *planstate,
* is set.
*/
if (estate->es_part_prune_results)
+ {
pruneresult = list_nth_node(PartitionPruneResult,
estate->es_part_prune_results,
part_prune_index);
+ if (!bms_equal(root_parent_relids, pruneinfo->root_parent_relids))
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("mismatching PartitionPruneInfo and PartitionPruneResult at part_prune_index %d",
+ part_prune_index),
+ errdetail_internal("prunresult relids %s, pruneinfo relids %s",
+ bmsToString(pruneresult->root_parent_relids),
+ bmsToString(pruneinfo->root_parent_relids)));
+ }
if (pruneresult == NULL || pruneinfo->needs_exec_pruning)
{
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 714e2cf2c7..ed664c5469 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -1580,6 +1580,12 @@ typedef struct PartitionPruneStepCombine
* The result of performing ExecPartitionDoInitialPruning() on a given
* PartitionPruneInfo.
*
+ * root_parent_relids is same as PartitionPruneInfo.root_parent_relids. It's
+ * there for cross-checking in ExecInitPartitionPruning() that the
+ * PartitionPruneResult and the PartitionPruneInfo at a given index in
+ * EState.es_part_prune_results and EState.es_part_prune_infos, respectively,
+ * belong to the same parent plan node.
+ *
* valid_subplans_offs contains the indexes of subplans remaining after
* performing initial pruning by calling ExecFindMatchingSubPlans() on the
* PartitionPruneInfo.
@@ -1597,6 +1603,7 @@ typedef struct PartitionPruneResult
{
NodeTag type;
+ Bitmapset *root_parent_relids;
Bitmapset *valid_subplan_offs;
} PartitionPruneResult;
--
2.35.3
v28-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patchapplication/octet-stream; name=v28-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patchDownload
From 28bdd07ae15228bc3173257ab5968864455dda16 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v28 1/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 36 ++++
src/backend/executor/execMain.c | 53 ++++++
src/backend/executor/execParallel.c | 26 ++-
src/backend/executor/execPartition.c | 237 +++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/readfuncs.c | 8 +-
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 29 ++-
src/backend/utils/cache/plancache.c | 208 +++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 3 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 12 ++
src/include/nodes/plannodes.h | 46 +++++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
33 files changed, 787 insertions(+), 98 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f26cc0d162..401a2280a3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 152c29b551..942449544c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c660..2f2b558608 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NIL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, List *part_prune_results,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_results, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index cf1b1ca571..904cbcba4a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -779,7 +779,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NIL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..65c8d0aa59 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NIL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 9e29584d93..29b45539d3 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_results_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_results_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_results_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ List *part_prune_results = lfirst_node(List, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_results, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..7f8cf1494f 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,38 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+The so-called execution time pruning may also occur even before the execution
+has actually started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c:GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed as part
+of the plan validation step, by calling ExecutorDoInitialPruning(). That
+returns the minimal set of child subplans that satisfy thoe initial pruning
+steps contained in each PartitionPruneInfo. AcquireExecutorLocks() will then
+lock only the relations scanned by those subplans, in addition to those present
+inPlannedStmt.minLockRelids. Note that the subplans are not really pruned as
+in being removed from the plan tree, so care is needed by the downstreams
+users of such a plan that has undergone pre-execution initial pruning.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of that pruning is passed to the executor as a
+List of PartitionPruneResult nodes via the QueryDesc, which is subsequently
+assigned to EState.es_part_prune_results. Each PartitionPruneResult therein
+consists of the set of indexes of surviving subplans in the respective parent
+plan node's (the one to which the corresponding PartitionPruneInfo belongs)
+list of child subplans, saved as a bitmapset valid_subplan_offs. The executor
+or any third party execution code working on a generic plan should not
+re-evaluate the set of initially valid subplans for a given plan node by
+redoing the initial pruning if a PartitionPruneResult belonging to thant plan
+node is present in es_part_prune_results. Note that that is not simply a
+performance optimization, because such re-evaluation of the pruning steps may
+very well end up resulting in a different set of initially valid subplans,
+containing some whose relations were not locked by AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +318,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 12ff4f3de5..f15265716a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,56 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a List of PartitionPruneResult nodes, one for each
+ * PartitionPruneInfo found in plannedstmt->containsInitialPruning, each
+ * containing a bitmapset of the indexes of unpruned child subplans.
+ * A bitmapset of the RT indexes of the leaf partitions scanned by those
+ * subplans is returned in *scan_leafpart_rtis, which is shared across all
+ * of those PartitionPruneResults.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+List *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *part_prune_results = NIL;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+
+ pruneresult->valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ scan_leafpart_rtis);
+ part_prune_results = lappend(part_prune_results, pruneresult);
+ }
+
+ return part_prune_results;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +857,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *part_prune_results = queryDesc->part_prune_results;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +878,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_results = part_prune_results;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aca0c6f323..917079a034 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized List of PartitionPruneResult. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized List of PartitionPruneResult */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS,
+ part_prune_results_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_results,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 88d0ea3adb..bc8331a222 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1749,8 +1755,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1767,6 +1775,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1787,8 +1802,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1801,9 +1817,10 @@ ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo;
+ PartitionPruneResult *pruneresult = NULL;
/* Obtain the pruneinfo we need, and make sure it's the right one */
pruneinfo = list_nth(estate->es_part_prune_infos, part_prune_index);
@@ -1819,20 +1836,56 @@ ExecInitPartitionPruning(PlanState *planstate,
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_results
+ * is set.
+ */
+ if (estate->es_part_prune_results)
+ pruneresult = list_nth_node(PartitionPruneResult,
+ estate->es_part_prune_results,
+ part_prune_index);
+
+ if (pruneresult == NULL || pruneinfo->needs_exec_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL,
+ pruneinfo->needs_exec_pruning,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans = bms_copy(pruneresult->valid_subplan_offs);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1840,7 +1893,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1856,11 +1910,74 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors.
+ * Note that we don't omit detached partitions, just like during
+ * execution proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started and thus need the information contained in a PlanState.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1874,19 +1991,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1941,15 +2060,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1963,6 +2109,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1973,6 +2120,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2023,6 +2172,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2030,6 +2181,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -2051,7 +2203,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2061,7 +2213,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2289,10 +2441,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2327,7 +2483,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2341,6 +2497,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2351,13 +2509,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2384,8 +2544,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2393,7 +2559,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 572c87e453..044bf3f491 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -135,6 +135,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_results = NIL;
estate->es_query_cxt = qcontext;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index dc13625171..bffb42ce71 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NIL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 99830198bd..3b917584de 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -156,7 +156,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -578,7 +579,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -643,7 +644,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -718,7 +719,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -869,7 +870,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index f370f9f287..ccfa083945 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -104,7 +104,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -219,7 +220,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..93012a5b3b 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_results_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_results_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_results_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ List *part_prune_results = lfirst_node(List, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_results,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 23776367c5..b01f55fb4f 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -800,7 +805,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 799602f5ea..a96d316dca 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -520,7 +520,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 399c1812d4..44ffe71c49 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -353,6 +363,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
ListCell *l;
+ Bitmapset *leafpart_rtis = NULL;
pruneinfo->root_parent_relids =
offset_relid_set(pruneinfo->root_parent_relids, rtoffset);
@@ -364,15 +375,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d48f6784c1..d5556354f7 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -342,6 +353,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->root_parent_relids = parentrel->relids;
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -442,13 +455,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -459,6 +477,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -546,6 +568,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -620,6 +645,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -647,6 +678,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -659,6 +691,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -673,6 +706,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -697,6 +731,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3082093d1e..95ab1d0eef 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1598,6 +1598,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_results_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1972,7 +1973,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
/*
* Now we can define the portal.
@@ -1987,6 +1990,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 52e2db6452..f582ff177b 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_results = part_prune_results; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_results: ExecutorDoInitialPruning() output for the PlannedStmt
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_results, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results_list == NIL ? NIL :
+ linitial(portal->part_prune_results_list),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ List *part_prune_results = NIL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,19 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding List of PartitionPruneResult
+ * for this PlannedStmt.
+ */
+ if (portal->part_prune_results_list != NIL)
+ part_prune_results = list_nth_node(List,
+ portal->part_prune_results_list,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1304,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index cc943205d3..8ff42153a1 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -782,6 +787,26 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
return tlist;
}
+/*
+ * FreePartitionPruneResults
+ * Frees the List of Lists of PartitionPruneResults for CheckCachedPlan()
+ */
+static void
+FreePartitionPruneResults(List *part_prune_results_list)
+{
+ ListCell *lc;
+
+ foreach(lc, part_prune_results_list)
+ {
+ List *part_prune_results = lfirst_node(List, lc);
+
+ /* Free both the PartitionPruneResults and the containing List. */
+ list_free_deep(part_prune_results);
+ }
+
+ list_free(part_prune_results_list);
+}
+
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
@@ -790,15 +815,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_results_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +850,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_results_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +886,11 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /* Release any PartitionPruneResults that may been created. */
+ FreePartitionPruneResults(*part_prune_results_list);
+ *part_prune_results_list = NIL;
}
/*
@@ -874,10 +916,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NILs is returned in *part_prune_results_list, meaning that no
+ * no partition pruning has been done yet for the plans in stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1053,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_results_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
+ }
+
return plan;
}
@@ -1126,6 +1183,19 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a List of PartitionPruneResult or a NIL is added to
+ * *part_prune_results_list. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and has
+ * containsInitialPruning set to true. Before returning such a CachedPlan,
+ * those "initial" steps are performed by calling ExecutorDoInitialPruning()
+ * to determine only those leaf partitions that need to be locked by
+ * AcquireExecutorLocks() by pruning away subplans that don't match the
+ * "initial" pruning conditions. For each PartitionPruneInfo found in
+ * PlannedStmt.partPruneInfos, a PartitionPruneResult containing the bitmapset
+ * of the indexes of surviving subplans is added to the List for the
+ * PlannedStmt.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1209,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_results_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1232,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_results_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1242,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_results_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1288,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_results_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1321,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_results_list)
+ *part_prune_results_list = my_part_prune_results_list;
+
return plan;
}
@@ -1737,17 +1815,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_results_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ List *part_prune_results = NIL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1851,40 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ Bitmapset *scan_leafpart_rtis = NULL;
+
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ part_prune_results = ExecutorDoInitialPruning(plannedstmt,
+ boundParams,
+ &scan_leafpart_rtis);
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1895,59 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_results_list = lappend(*part_prune_results_list,
+ part_prune_results);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst_node(Bitmapset, lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ Assert(lockedRelids == NULL);
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 7b1ae6fdcf..5b9098971b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given List of Lists of PartitionPruneResults into the
+ * portal's context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results_list)
+{
+ MemoryContext oldcxt;
+
+ Assert(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results_list = copyObject(part_prune_results_list);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..269cc4d562 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ List *part_prune_results,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 17fabc18c9..4b98d0d2ef 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -127,5 +129,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..7d4379da7b 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ List *part_prune_results; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index aaf2bc78b9..32bbbc5927 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern List *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 71248a9466..9c6e8f5e13 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_results; /* QueryDesc.part_prune_results */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 1f33902947..c2f2544df5 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -218,6 +218,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index dbaa9bb54d..e0e5c15b09 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,18 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c36a15bd09..714e2cf2c7 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,8 +73,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in the
* plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1414,6 +1423,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1425,6 +1441,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
Bitmapset *root_parent_relids;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1469,6 +1487,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
@@ -1553,6 +1574,31 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecPartitionDoInitialPruning() on a given
+ * PartitionPruneInfo.
+ *
+ * valid_subplans_offs contains the indexes of subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() on the
+ * PartitionPruneInfo.
+ *
+ * This is used to store the result of initial partition pruning that is
+ * peformed before the execution has started. A module that needs to do so
+ * should call ExecutorDoInitialPruning() on a given PlannedStmt, which
+ * returns a List of PartitionPruneResult containing an entry for each
+ * PartitionPruneInfo present in PlannedStmt.part_prune_infos. The module
+ * should then pass that list, along with the PlannedStmt, to the executor,
+ * so that it can reuse the result of initial partition pruning when
+ * initializing the subplans for execution.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *valid_subplan_offs;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..32579d4788 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..1901fc5f28 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results_list; /* List of Lists of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_results_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
I find the API of GetCachedPlans a little weird after this patch. I
think it may be better to have it return a pointer of a new struct --
one that contains both the CachedPlan pointer and the list of pruning
results. (As I understand, the sole caller that isn't interested in the
pruning results, SPI_plan_get_cached_plan, can be explained by the fact
that it knows there won't be any. So I don't think we need to worry
about this case?)
And I think you should make that struct also be the last argument of
PortalDefineQuery, so you don't need the separate
PortalStorePartitionPruneResults function -- because as far as I can
tell, the callers that pass a non-NULL pointer there are the exactly
same that later call PortalStorePartitionPruneResults.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
Thanks for the review.
On Wed, Dec 7, 2022 at 4:00 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I find the API of GetCachedPlans a little weird after this patch. I
think it may be better to have it return a pointer of a new struct --
one that contains both the CachedPlan pointer and the list of pruning
results. (As I understand, the sole caller that isn't interested in the
pruning results, SPI_plan_get_cached_plan, can be explained by the fact
that it knows there won't be any. So I don't think we need to worry
about this case?)
David, in his Apr 7 reply on this thread, also sounded to suggest
something similar.
Hmm, I was / am not so sure if GetCachedPlan() should return something
that is not CachedPlan. An idea I had today was to replace the
part_prune_results_list output List parameter with, say,
QueryInitPruningResult, or something like that and put the current
list into that struct. Was looking at QueryEnvironment to come up
with *that* name. Any thoughts?
And I think you should make that struct also be the last argument of
PortalDefineQuery, so you don't need the separate
PortalStorePartitionPruneResults function -- because as far as I can
tell, the callers that pass a non-NULL pointer there are the exactly
same that later call PortalStorePartitionPruneResults.
Yes, it would be better to not need PortalStorePartitionPruneResults.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
On 2022-Dec-09, Amit Langote wrote:
On Wed, Dec 7, 2022 at 4:00 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I find the API of GetCachedPlans a little weird after this patch.
David, in his Apr 7 reply on this thread, also sounded to suggest
something similar.Hmm, I was / am not so sure if GetCachedPlan() should return something
that is not CachedPlan. An idea I had today was to replace the
part_prune_results_list output List parameter with, say,
QueryInitPruningResult, or something like that and put the current
list into that struct. Was looking at QueryEnvironment to come up
with *that* name. Any thoughts?
Remind me again why is part_prune_results_list not part of struct
CachedPlan then? I tried to understand that based on comments upthread,
but I was unable to find anything.
(My first reaction to your above comment was "well, rename GetCachedPlan
then, maybe to GetRunnablePlan", but then I'm wondering if CachedPlan is
in any way a structure that must be "immutable" in the way parser output
is. Looking at the comment at the top of plancache.c it appears to me
that it isn't, but maybe I'm missing something.)
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"The Postgresql hackers have what I call a "NASA space shot" mentality.
Quite refreshing in a world of "weekend drag racer" developers."
(Scott Marlowe)
On Fri, Dec 9, 2022 at 6:52 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Dec-09, Amit Langote wrote:
On Wed, Dec 7, 2022 at 4:00 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I find the API of GetCachedPlans a little weird after this patch.
David, in his Apr 7 reply on this thread, also sounded to suggest
something similar.Hmm, I was / am not so sure if GetCachedPlan() should return something
that is not CachedPlan. An idea I had today was to replace the
part_prune_results_list output List parameter with, say,
QueryInitPruningResult, or something like that and put the current
list into that struct. Was looking at QueryEnvironment to come up
with *that* name. Any thoughts?Remind me again why is part_prune_results_list not part of struct
CachedPlan then? I tried to understand that based on comments upthread,
but I was unable to find anything.
It used to be part of CachedPlan for a brief period of time (in patch
v12 I posted in [1]/messages/by-id/CA+HiwqH4qQ_YVROr7TY0jSCuGn0oHhH79_DswOdXWN5UnMCBtQ@mail.gmail.com), but David, in his reply to [1]/messages/by-id/CA+HiwqH4qQ_YVROr7TY0jSCuGn0oHhH79_DswOdXWN5UnMCBtQ@mail.gmail.com, said he wasn't
so sure that it belonged there.
(My first reaction to your above comment was "well, rename GetCachedPlan
then, maybe to GetRunnablePlan", but then I'm wondering if CachedPlan is
in any way a structure that must be "immutable" in the way parser output
is. Looking at the comment at the top of plancache.c it appears to me
that it isn't, but maybe I'm missing something.)
CachedPlan *is* supposed to be read-only per the comment above
CachedPlanSource definition:
* ...If we are using a generic
* cached plan then it is meant to be re-used across multiple executions, so
* callers must always treat CachedPlans as read-only.
FYI, there was even an idea of putting a PartitionPruneResults for a
given PlannedStmt into the PlannedStmt itself [2]/messages/by-id/CAApHDvp_DjVVkgSV24+UF7p_yKWeepgoo+W2SWLLhNmjwHTVYQ@mail.gmail.com, but PlannedStmt is
supposed to be read-only too [3]/messages/by-id/922566.1648784745@sss.pgh.pa.us.
Maybe we need some new overarching context when invoking plancache, if
Portal can't already be it, whose struct can be passed to
GetCachedPlan() to put the pruning results in? Perhaps,
GetRunnablePlan() that you floated could be a wrapper for
GetCachedPlan(), owning that new context.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
[1]: /messages/by-id/CA+HiwqH4qQ_YVROr7TY0jSCuGn0oHhH79_DswOdXWN5UnMCBtQ@mail.gmail.com
[2]: /messages/by-id/CAApHDvp_DjVVkgSV24+UF7p_yKWeepgoo+W2SWLLhNmjwHTVYQ@mail.gmail.com
[3]: /messages/by-id/922566.1648784745@sss.pgh.pa.us
On 2022-Dec-09, Amit Langote wrote:
On Fri, Dec 9, 2022 at 6:52 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Remind me again why is part_prune_results_list not part of struct
CachedPlan then? I tried to understand that based on comments upthread,
but I was unable to find anything.It used to be part of CachedPlan for a brief period of time (in patch
v12 I posted in [1]), but David, in his reply to [1], said he wasn't
so sure that it belonged there.
I'm not sure I necessarily agree with that. I'll have a look at v12 to
try and understand what was David so unhappy about.
(My first reaction to your above comment was "well, rename GetCachedPlan
then, maybe to GetRunnablePlan", but then I'm wondering if CachedPlan is
in any way a structure that must be "immutable" in the way parser output
is. Looking at the comment at the top of plancache.c it appears to me
that it isn't, but maybe I'm missing something.)CachedPlan *is* supposed to be read-only per the comment above
CachedPlanSource definition:* ...If we are using a generic
* cached plan then it is meant to be re-used across multiple executions, so
* callers must always treat CachedPlans as read-only.
I read that as implying that the part_prune_results_list must remain
intact as long as no invalidations occur. Does part_prune_result_list
really change as a result of something other than a sinval event?
Keep in mind that if a sinval message that touches one of the relations
in the plan arrives, then we'll discard it and generate it afresh. I
don't see that the part_prune_results_list would change otherwise, but
maybe I misunderstand?
FYI, there was even an idea of putting a PartitionPruneResults for a
given PlannedStmt into the PlannedStmt itself [2], but PlannedStmt is
supposed to be read-only too [3].
Hmm, I'm not familiar with PlannedStmt lifetime, but I'm definitely not
betting that Tom is wrong about this.
Maybe we need some new overarching context when invoking plancache, if
Portal can't already be it, whose struct can be passed to
GetCachedPlan() to put the pruning results in? Perhaps,
GetRunnablePlan() that you floated could be a wrapper for
GetCachedPlan(), owning that new context.
Perhaps that is a solution. I'm not sure.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Uno puede defenderse de los ataques; contra los elogios se esta indefenso"
On Fri, Dec 9, 2022 at 7:49 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Dec-09, Amit Langote wrote:
On Fri, Dec 9, 2022 at 6:52 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Remind me again why is part_prune_results_list not part of struct
CachedPlan then? I tried to understand that based on comments upthread,
but I was unable to find anything.(My first reaction to your above comment was "well, rename GetCachedPlan
then, maybe to GetRunnablePlan", but then I'm wondering if CachedPlan is
in any way a structure that must be "immutable" in the way parser output
is. Looking at the comment at the top of plancache.c it appears to me
that it isn't, but maybe I'm missing something.)CachedPlan *is* supposed to be read-only per the comment above
CachedPlanSource definition:* ...If we are using a generic
* cached plan then it is meant to be re-used across multiple executions, so
* callers must always treat CachedPlans as read-only.I read that as implying that the part_prune_results_list must remain
intact as long as no invalidations occur. Does part_prune_result_list
really change as a result of something other than a sinval event?
Keep in mind that if a sinval message that touches one of the relations
in the plan arrives, then we'll discard it and generate it afresh. I
don't see that the part_prune_results_list would change otherwise, but
maybe I misunderstand?
Pruning will be done afresh on every fetch of a given cached plan when
CheckCachedPlan() is called on it, so the part_prune_results_list part
will be discarded and rebuilt as many times as the plan is executed.
You'll find a description around CachedPlanSavePartitionPruneResults()
that's in v12.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
On 2022-Dec-09, Amit Langote wrote:
Pruning will be done afresh on every fetch of a given cached plan when
CheckCachedPlan() is called on it, so the part_prune_results_list part
will be discarded and rebuilt as many times as the plan is executed.
You'll find a description around CachedPlanSavePartitionPruneResults()
that's in v12.
I see.
In that case, a separate container struct seems warranted.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Industry suffers from the managerial dogma that for the sake of stability
and continuity, the company should be independent of the competence of
individual employees." (E. Dijkstra)
On Fri, Dec 9, 2022 at 8:37 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Dec-09, Amit Langote wrote:
Pruning will be done afresh on every fetch of a given cached plan when
CheckCachedPlan() is called on it, so the part_prune_results_list part
will be discarded and rebuilt as many times as the plan is executed.
You'll find a description around CachedPlanSavePartitionPruneResults()
that's in v12.I see.
In that case, a separate container struct seems warranted.
I thought about this today and played around with some container struct ideas.
Though, I started feeling like putting all the new logic being added
by this patch into plancache.c at the heart of GetCachedPlan() and
tweaking its API in kind of unintuitive ways may not have been such a
good idea to begin with. So I started thinking again about your
GetRunnablePlan() wrapper idea and thought maybe we could do something
with it. Let's say we name it GetCachedPlanLockPartitions() and put
the logic that does initial pruning with the new
ExecutorDoInitialPruning() in it, instead of in the normal
GetCachedPlan() path. Any callers that call GetCachedPlan() instead
call GetCachedPlanLockPartitions() with either the List ** parameter
as now or some container struct if that seems better. Whether
GetCachedPlanLockPartitions() needs to do anything other than return
the CachedPlan returned by GetCachedPlan() can be decided by the
latter setting, say, CachedPlan.has_unlocked_partitions. That will be
done by AcquireExecutorLocks() when it sees containsInitialPrunnig in
any of the PlannedStmts it sees, locking only the
PlannedStmt.minLockRelids set (which is all relations where no pruning
is needed!), leaving the partition locking to
GetCachedPlanLockPartitions(). If the CachedPlan is invalidated
during the partition locking phase, it calls GetCachedPlan() again;
maybe some refactoring is needed to avoid too much useless work in
such cases.
Thoughts?
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
On 2022-Dec-12, Amit Langote wrote:
I started feeling like putting all the new logic being added
by this patch into plancache.c at the heart of GetCachedPlan() and
tweaking its API in kind of unintuitive ways may not have been such a
good idea to begin with. So I started thinking again about your
GetRunnablePlan() wrapper idea and thought maybe we could do something
with it. Let's say we name it GetCachedPlanLockPartitions() and put
the logic that does initial pruning with the new
ExecutorDoInitialPruning() in it, instead of in the normal
GetCachedPlan() path. Any callers that call GetCachedPlan() instead
call GetCachedPlanLockPartitions() with either the List ** parameter
as now or some container struct if that seems better. Whether
GetCachedPlanLockPartitions() needs to do anything other than return
the CachedPlan returned by GetCachedPlan() can be decided by the
latter setting, say, CachedPlan.has_unlocked_partitions. That will be
done by AcquireExecutorLocks() when it sees containsInitialPrunnig in
any of the PlannedStmts it sees, locking only the
PlannedStmt.minLockRelids set (which is all relations where no pruning
is needed!), leaving the partition locking to
GetCachedPlanLockPartitions().
Hmm. This doesn't sound totally unreasonable, except to the point David
was making that perhaps we may want this container struct to accomodate
other things in the future than just the partition pruning results, so I
think its name (and that of the function that produces it) ought to be a
little more generic than that.
(I think this also answers your question on whether a List ** is better
than a container struct.)
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)
On Tue, Dec 13, 2022 at 2:24 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Dec-12, Amit Langote wrote:
I started feeling like putting all the new logic being added
by this patch into plancache.c at the heart of GetCachedPlan() and
tweaking its API in kind of unintuitive ways may not have been such a
good idea to begin with. So I started thinking again about your
GetRunnablePlan() wrapper idea and thought maybe we could do something
with it. Let's say we name it GetCachedPlanLockPartitions() and put
the logic that does initial pruning with the new
ExecutorDoInitialPruning() in it, instead of in the normal
GetCachedPlan() path. Any callers that call GetCachedPlan() instead
call GetCachedPlanLockPartitions() with either the List ** parameter
as now or some container struct if that seems better. Whether
GetCachedPlanLockPartitions() needs to do anything other than return
the CachedPlan returned by GetCachedPlan() can be decided by the
latter setting, say, CachedPlan.has_unlocked_partitions. That will be
done by AcquireExecutorLocks() when it sees containsInitialPrunnig in
any of the PlannedStmts it sees, locking only the
PlannedStmt.minLockRelids set (which is all relations where no pruning
is needed!), leaving the partition locking to
GetCachedPlanLockPartitions().Hmm. This doesn't sound totally unreasonable, except to the point David
was making that perhaps we may want this container struct to accomodate
other things in the future than just the partition pruning results, so I
think its name (and that of the function that produces it) ought to be a
little more generic than that.(I think this also answers your question on whether a List ** is better
than a container struct.)
OK, so here's a WIP attempt at that.
I have moved the original functionality of GetCachedPlan() to
GetCachedPlanInternal(), turning the former into a sort of controller
as described shortly. The latter's CheckCachedPlan() part now only
locks the "minimal" set of, non-prunable, relations, making a note of
whether the plan contains any prunable subnodes and thus prunable
relations whose locking is deferred to the caller, GetCachedPlan().
GetCachedPlan(), as a sort of controller as mentioned before, does the
pruning if needed on the minimally valid plan returned by
GetCachedPlanInternal(), locks the partitions that survive, and redoes
the whole thing if the locking of partitions invalidates the plan.
The pruning results are returned through the new output parameter of
GetCachedPlan() of type CachedPlanExtra. I named it so after much
consideration, because all the new logic that produces stuff to put
into it is a part of the plancache module and has to do with
manipulating a CachedPlan. (I had considered CachedPlanExecInfo to
indicate that it contains information that is to be forwarded to the
executor, though that just didn't seem to fit in plancache.h.)
I have broken out a few things into a preparatory patch 0001. Mainly,
it invents PlannedStmt.minLockRelids to replace the
AcquireExecutorLocks()'s current loop over the range table to figure
out the relations to lock. I also threw in a couple of pruning
related non-functional changes in there to make it easier to read the
0002, which is the main patch.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v29-0001-Preparatory-refactoring-before-reworking-CachedP.patchapplication/octet-stream; name=v29-0001-Preparatory-refactoring-before-reworking-CachedP.patchDownload
From 14a1198bdaad007b1dc835f24caa42d3667c7048 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 13 Dec 2022 11:58:07 +0900
Subject: [PATCH v29 1/2] Preparatory refactoring before reworking CachedPlan
locking
Remember the RT indexes of RTEs that AcquireExecutorLocks() must
look at to consider locking in a bitmapset, so that nstead of looping
over the range table to find those RTEs, it can look them up using
the RT indexes set in the bitmapset.
This also adds some extra information related to execution-time
pruning to the relevant plan nodes.
---
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 6 ++++
src/backend/nodes/readfuncs.c | 8 ++++--
src/backend/optimizer/plan/planner.c | 2 ++
src/backend/optimizer/plan/setrefs.c | 12 ++++++++
src/backend/partitioning/partprune.c | 42 ++++++++++++++++++++++++++--
src/backend/utils/cache/plancache.c | 10 +++++--
src/include/executor/execPartition.h | 2 ++
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 11 ++++++++
src/include/nodes/plannodes.h | 19 +++++++++++++
11 files changed, 106 insertions(+), 8 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a5b8e43ec5..65c4b63bbd 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -182,6 +182,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false; /* workers need not know! */
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 76d79b9741..5b62157712 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1956,6 +1956,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1966,6 +1967,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2016,6 +2019,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2023,6 +2028,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 966b75f5a6..1161671fa4 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -796,7 +801,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5dd4f92720..620b163ef9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -523,8 +523,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
result->permInfos = glob->finalrteperminfos;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 596f1fbc8e..ed43d5936d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -279,6 +279,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -377,9 +387,11 @@ set_plan_references(PlannerInfo *root, Plan *plan)
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
}
+
}
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ glob->containsInitialPruning |= pruneinfo->needs_init_pruning;
}
return result;
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d48f6784c1..56270d7670 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -342,6 +353,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->root_parent_relids = parentrel->relids;
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -442,13 +455,19 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate whether
+ * the pruning steps contained in the returned PartitionedRelPruneInfos
+ * can be performed during executor startup and during execution,
+ * respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -459,6 +478,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -546,6 +569,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -620,6 +646,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -647,6 +679,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -659,6 +692,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -673,6 +707,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -697,6 +732,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index cc943205d3..339bb603f7 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1747,7 +1747,8 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ Bitmapset *allLockRelids;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1760,14 +1761,17 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
*/
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+ Assert(plannedstmt->minLockRelids == NULL);
if (query)
ScanQueryForLocks(query, acquire);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ allLockRelids = plannedstmt->minLockRelids;
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 17fabc18c9..aeeaeb7884 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 1f33902947..c2f2544df5 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -218,6 +218,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 654dba61aa..4337e7aa34 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -128,6 +128,17 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries; for AcquireExecutorLocks()'s
+ * perusal.
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index bddfe86191..eb0a007946 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,11 +73,18 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in the
* plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries; for
+ * AcquireExecutorLocks()'s perusal */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1417,6 +1424,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1428,6 +1442,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
Bitmapset *root_parent_relids;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1472,6 +1488,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
--
2.35.3
v29-0002-In-GetCachedPlan-only-lock-unpruned-partitions.patchapplication/octet-stream; name=v29-0002-In-GetCachedPlan-only-lock-unpruned-partitions.patchDownload
From 69855fffacf69575471beb69da761babadc9f75c Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v29 2/2] In GetCachedPlan(), only lock unpruned partitions
This does two things mainly:
* The planner now removes the RT indexes of "initially prunable"
partitions from PlannedStmt.minLockRelids such that the set only
contains the relations not subject to initial partition pruning. So,
AcquireExecutorLocks only locks a subset of the relations contained
in a plan, deferring the locking of prunable relations to the caller.
* GetCachedPlans(), if there are prunable relations in the plan,
performs the initial partition pruning using available EXTERN params
and locks the partitions remaining after that, so the the CachedPlan
that's returned is valid in a race-free manner including for any
partitions that will be scanned during execution.
To make the pruning possible before entering ExecutorStart(), this
also adds a ExecPartitionDoInitialPruning(), which can be called by
GetCachedPlan() for a given PlannedStmt.
The result of performing initial partition pruning this way is made
available to the actual execution via PartitionPruneResult, of which
there is one for every ParttionPruneInfo contained in the PlannedStmt.
List of PartitionPruneResult for a given PlannedStmt are returned to
to the callers of GetCachedPlan() via its new output parameter of type
CachedPlanExtra, whose members currently only include said List.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 28 ++-
src/backend/executor/README | 31 ++-
src/backend/executor/execMain.c | 2 +
src/backend/executor/execParallel.c | 25 ++-
src/backend/executor/execPartition.c | 215 +++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 31 ++-
src/backend/optimizer/plan/setrefs.c | 36 ++++
src/backend/tcop/postgres.c | 9 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 257 +++++++++++++++++++++++--
src/backend/utils/mmgr/portalmem.c | 16 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 7 +-
src/include/executor/execdesc.h | 3 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/pathnodes.h | 4 +-
src/include/nodes/plannodes.h | 31 ++-
src/include/utils/plancache.h | 11 +-
src/include/utils/portal.h | 3 +
28 files changed, 694 insertions(+), 82 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f26cc0d162..401a2280a3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 152c29b551..942449544c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c660..2f2b558608 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NIL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, List *part_prune_results,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_results, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index cf1b1ca571..904cbcba4a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -779,7 +779,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NIL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..65c8d0aa59 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NIL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 9e29584d93..729384a9a6 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -154,6 +154,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanExtra *cplan_extra = NULL;
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -193,7 +194,11 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &cplan_extra);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +212,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ if (cplan_extra)
+ PortalSaveCachedPlanExtra(portal, cplan_extra);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -575,6 +583,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanExtra *cplan_extra = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -619,7 +628,11 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &cplan_extra);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -637,10 +650,17 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ List *part_prune_results = NIL;
+
+ if (cplan_extra)
+ part_prune_results = list_nth_node(List,
+ cplan_extra->part_prune_results_list,
+ foreach_current_index(p));
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_results, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..2222b3ed6f 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -63,7 +63,36 @@ if the executor determines that an entire subplan is not required due to
execution time partition pruning determining that no matching records will be
found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
-subnode array will become out of sequence to the plan's subplan list.
+subnode array will become out of sequence to the plan's subplan list. Note
+that this is referred to as "initial" pruning, because it needs to occur only
+once during the execution startup, and uses a set of pruning steps called
+initial pruning steps (see PartitionedRelPruneInfo.initial_pruning_steps).
+
+Actually, "initial" pruning may occur even before the execution startup in
+in some cases. For example, when a cached generic plan is validated for
+execution, which works by locking all the relations that will be scanned by
+that plan during execution. If the generic plan contains plan nodes that have
+prunable child subnodes, then this validation locking is performed after
+pruning child subnodes that need not be scanned during execution, that is,
+using initial pruning steps. When such a generic plan is forwarded for
+execution, it must be accompanied by the set of PartitionPruneResult nodes that
+contain the result of that pruning, which basically consists of a bitmapset of
+child subnode indexes that survived the pruning and thus whose relations would
+have been locked for execution. This is important, because, unlike the
+plan-time pruning and actual executor-startup pruning, this does not actually
+remove the pruned subnodes from the plan tree, but only marks them as being
+pruned. So, the executor code (core or third party), especially one that runs
+before ExecutorStart() and thus looks at bare Plan trees (not PlanState trees)
+must beware of plan nodes that may actually have been pruned and thus subject
+to being invalidated by concurrent schema changes. For plan nodes that can
+have prunable child subnodes and thus contain a PartitionPruneInfo, such code
+must always check if the corresponding PartitionPruneResult exists
+in EState.es_part_prune_results at given part_prune_index and use that to
+decide which subplans are valid for execution instead of redoing the pruning.
+Note that that is not just a performance optimization but also necessary to
+avoid possibly ending up considering a different set of child subnodes as valid
+than the set CachedPlanLockPartitions() would have locked the relations of, if
+the pruning steps produce a different result when executed multiple times.
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2c2b3a8874..229f61f72e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -798,6 +798,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *part_prune_results = queryDesc->part_prune_results;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -819,6 +820,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_results = part_prune_results;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 65c4b63bbd..9745eba0af 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -599,12 +600,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -633,6 +637,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -659,6 +664,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized List of PartitionPruneResult. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -753,6 +763,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized List of PartitionPruneResult */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS,
+ part_prune_results_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1234,8 +1250,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
ParamListInfo paramLI;
char *queryString;
@@ -1246,12 +1264,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_results,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 5b62157712..dcd2bb0f90 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1742,7 +1748,8 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
+ * done once during executor startup or even before that, such as when called
+ * from CachedPlanLockPartitions(). Expressions that do involve such Params
* require us to prune separately for each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
@@ -1760,6 +1767,12 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the set of the parent plan node's
+ * child subnodes that are valid for execution and also the set of the RT
+ * indexes of leaf partitions scanned by those subnodes.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1780,8 +1793,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * That set is computed by either performing the "initial pruning" here or
+ * reusing the one present in EState.es_part_prune_results[part_prune_index]
+ * if it has been set, which it would be if CachedPlanLockPartitions() would
+ * have done the initial pruning.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1794,9 +1809,10 @@ ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo;
+ PartitionPruneResult *pruneresult = NULL;
/* Obtain the pruneinfo we need, and make sure it's the right one */
pruneinfo = list_nth(estate->es_part_prune_infos, part_prune_index);
@@ -1812,20 +1828,62 @@ ExecInitPartitionPruning(PlanState *planstate,
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /* Initial pruning already done if es_part_prune_results has been set. */
+ if (estate->es_part_prune_results)
+ {
+ pruneresult = list_nth_node(PartitionPruneResult,
+ estate->es_part_prune_results,
+ part_prune_index);
+ if (!bms_equal(root_parent_relids, pruneinfo->root_parent_relids))
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("mismatching PartitionPruneInfo and PartitionPruneResult at part_prune_index %d",
+ part_prune_index),
+ errdetail_internal("prunresult relids %s, pruneinfo relids %s",
+ bmsToString(pruneresult->root_parent_relids),
+ bmsToString(pruneinfo->root_parent_relids)));
+ }
+
+ if (pruneresult == NULL || pruneinfo->needs_exec_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL,
+ pruneinfo->needs_exec_pruning,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans = bms_copy(pruneresult->valid_subplan_offs);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1833,7 +1891,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1849,11 +1908,58 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the set of the parent plan node's child subnodes that are valid for
+ * execution
+ *
+ * On return, *scan_leafpart_rtis will contain the RT indexes of leaf
+ * partitions scanned by those valid subnodes.
+ *
+ * Note that this does not share state with the actual execution, so must do
+ * with the information present in the PlannedStmt. For example, there isn't
+ * a PlanState for the parent plan node yet, so we must create a standalone
+ * ExprContext to evaluate pruning expressions, equipped with the information
+ * about the EXTERN parameters that we do have. Note that that's okay because
+ * the initial pruning steps do not contain anything that would require the
+ * execution to have started. Likewise, we create our own PartitionDirectory
+ * to look up the PartitionDescs to use.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /* Don't omit detached partitions, just like during execution proper. */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1867,19 +1973,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1934,15 +2042,39 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called from
+ * CachedPlanLockPartitions(). In that case, sub-partitions must
+ * be locked, because AcquirePlannerLocks() would have locked only
+ * the root parent.
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -2050,7 +2182,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2060,7 +2192,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2288,10 +2420,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2326,7 +2462,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2340,6 +2476,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2350,13 +2488,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2383,8 +2523,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2392,7 +2538,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 87f4d53ca7..7d36c972d3 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -139,6 +139,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_results = NIL;
estate->es_query_cxt = qcontext;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index dc13625171..bffb42ce71 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NIL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 99830198bd..3b917584de 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -156,7 +156,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -578,7 +579,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -643,7 +644,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -718,7 +719,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -869,7 +870,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index f370f9f287..ccfa083945 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -104,7 +104,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -219,7 +220,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..2ecb9193aa 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1577,6 +1577,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanExtra *cplan_extra;
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1657,7 +1658,11 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cplan_extra);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1690,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ if (cplan_extra)
+ PortalSaveCachedPlanExtra(portal, cplan_extra);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2067,6 +2075,7 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanExtra *cplan_extra = NULL;
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -2092,8 +2101,12 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ &cplan_extra);
Assert(cplan == plansource->gplan);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
@@ -2399,6 +2412,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
CachedPlan *cplan = NULL;
+ CachedPlanExtra *cplan_extra = NULL;
ListCell *lc1;
/*
@@ -2549,8 +2563,12 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cplan_extra);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
stmt_list = cplan->stmt_list;
/*
@@ -2592,9 +2610,14 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ List *part_prune_results = NIL;
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
+ if (cplan_extra)
+ part_prune_results = list_nth_node(List,
+ cplan_extra->part_prune_results_list,
+ foreach_current_index(lc2));
/*
* Reset output state. (Note that if a non-SPI receiver is used,
* _SPI_current->processed will stay zero, and that's what we'll
@@ -2663,7 +2686,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_results,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ed43d5936d..db27cae297 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -372,6 +372,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
ListCell *l;
+ Bitmapset *leafpart_rtis = NULL;
pruneinfo->root_parent_relids =
offset_relid_set(pruneinfo->root_parent_relids, rtoffset);
@@ -383,17 +384,52 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the set of relations to be
+ * locked by AcquireExecutorLocks(). The actual set of leaf
+ * partitions to be locked is computed by
+ * CachedPlanLockPartitions().
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
glob->containsInitialPruning |= pruneinfo->needs_init_pruning;
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f8808d2191..9c1c7bfa9e 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1598,6 +1598,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanExtra *cplan_extra = NULL;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1972,7 +1973,10 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cplan_extra);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
/*
* Now we can define the portal.
@@ -1987,6 +1991,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ if (cplan_extra)
+ PortalSaveCachedPlanExtra(portal, cplan_extra);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 52e2db6452..32e6b7b767 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_results = part_prune_results;
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +124,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_results: pruning results returned by CachedPlanLockPartitions()
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +137,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +149,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_results, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +495,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan_extra == NULL ? NIL :
+ linitial(portal->cplan_extra->part_prune_results_list),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1234,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ List *part_prune_results = NIL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1282,19 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding List of PartitionPruneResult
+ * for this PlannedStmt.
+ */
+ if (portal->cplan_extra)
+ part_prune_results = list_nth_node(List,
+ portal->cplan_extra->part_prune_results_list,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 339bb603f7..7bd94e7632 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -59,6 +59,7 @@
#include "access/transam.h"
#include "catalog/namespace.h"
#include "executor/executor.h"
+#include "executor/execPartition.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/optimizer.h"
@@ -96,17 +97,20 @@ static dlist_head saved_plan_list = DLIST_STATIC_INIT(saved_plan_list);
*/
static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_list);
+static CachedPlan *GetCachedPlanInternal(CachedPlanSource *plansource,
+ ParamListInfo boundParams, ResourceOwner owner,
+ QueryEnvironment *queryEnv, bool *hasUnlockedParts);
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, bool *hasUnlockedParts);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static bool AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -783,16 +787,23 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
}
/*
- * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid and
+ * set *hasUnlockedParts if any PlannedStmt contains "initially" prunable
+ * subnodes; partitions are not locked till initial pruning is done.
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
+ * On a "true" return, we have acquired the minimal set of locks needed to run
+ * the plan, that is, excluding partitions that are subject to being pruned
+ * before execution. The caller must lock partitions after pruning those and
+ * locking the ones that remain before actually telling the world that the
+ * plan is "valid".
+ *
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, bool *hasUnlockedParts)
{
CachedPlan *plan = plansource->gplan;
@@ -826,7 +837,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ *hasUnlockedParts = AcquireExecutorLocks(plan->stmt_list, true);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +859,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ (void) AcquireExecutorLocks(plan->stmt_list, false);
}
/*
@@ -1120,7 +1131,125 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
}
/*
- * GetCachedPlan: get a cached plan from a CachedPlanSource.
+ * For each PlannedStmt in plan->stmt_list, do initial partition pruning if
+ * needed and lock partitions that survive.
+ *
+ * The returned list of the same length as plan->stmt_list will contains either
+ * a NIL if the PlannedStmt did not contain any PartitionPruneInfos requiring
+ * initial pruning or a List of PartitionPruneResult that in turn contains
+ * an element for each PartitionPruneInfo found in stmt->partPruneInfos.
+ *
+ * Also, on return, *lockedRelids_per_stmt, that will be made of the same
+ * length as plan->stmt_list, will contain either a NULL if no additional
+ * relations needed to be locked for the PlannedStmt, or a bitmapset of RT
+ * indexes of partitions locked.
+ */
+static bool
+CachedPlanLockPartitions(CachedPlan *plan,
+ ParamListInfo boundParams,
+ ResourceOwner owner,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt)
+{
+ List *my_part_prune_results_list = NIL;
+ List *my_lockedRelids_per_stmt = NIL;
+ ListCell *lc1;
+ MemoryContext oldcontext,
+ tmpcontext;
+
+ *part_prune_results_list = NIL;
+ *lockedRelids_per_stmt = NIL;
+
+ /*
+ * Create a temporary context for memory allocations required while
+ * executing partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "CachedPlanLockPartitions() working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+ foreach(lc1, plan->stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockPartRelids = NULL;
+ int rti;
+ List *part_prune_results = NIL;
+ Bitmapset *lockedRelids = NULL;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, because AcquireExecutorLocks on the
+ * parent CachedPlan would have dealt with these. Though, do let
+ * the caller know that no pruning is applicable to this statement.
+ */
+ my_part_prune_results_list = lappend(my_part_prune_results_list,
+ NIL);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, NULL);
+ continue;
+ }
+
+ /* Figure out the partitions that would need to be locked. */
+ if (plannedstmt->containsInitialPruning)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc2);
+ PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+
+ pruneresult->root_parent_relids =
+ bms_copy(pruneinfo->root_parent_relids);
+ pruneresult->valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, boundParams,
+ pruneinfo,
+ &lockPartRelids);
+ part_prune_results = lappend(part_prune_results, pruneresult);
+ }
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockPartRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ my_part_prune_results_list = lappend(my_part_prune_results_list,
+ part_prune_results);
+ my_lockedRelids_per_stmt = lappend(my_lockedRelids_per_stmt,
+ lockedRelids);
+ }
+
+ /*
+ * If the plan is still valid, copy the prune results and lockRelids
+ * bitmapsets into the caller's context.
+ */
+ MemoryContextSwitchTo(oldcontext);
+ if (plan->is_valid)
+ {
+ *part_prune_results_list = copyObject(my_part_prune_results_list);
+ *lockedRelids_per_stmt = copyObject(my_lockedRelids_per_stmt);
+ }
+
+ /* Clear up the temporary context. */
+ MemoryContextDelete(tmpcontext);
+ return plan->is_valid;
+}
+
+/*
+ * GetCachedPlan: get a cached plan from a CachedPlanSource
*
* This function hides the logic that decides whether to use a generic
* plan or a custom plan for the given parameters: the caller does not know
@@ -1139,7 +1268,97 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanExtra **extra)
+{
+ CachedPlan *plan;
+
+ Assert(extra != NULL);
+ *extra = NULL;
+ for (;;)
+ {
+ bool hasUnlockedParts = false;
+
+ /* Actually get the plan. */
+ plan = GetCachedPlanInternal(plansource, boundParams, owner, queryEnv,
+ &hasUnlockedParts);
+ Assert(plan->is_valid);
+
+ /* Nothing to do if all relations already locked. */
+ if (!hasUnlockedParts)
+ return plan;
+ else
+ {
+ /*
+ * Do initial pruning to filter out partitions that need not be
+ * locked for execution.
+ */
+ ListCell *lc1,
+ *lc2;
+ List *part_prune_results_list;
+ List *lockedRelids_per_stmt;
+
+ /* Only a generic plan can ever have unlocked partitions in it. */
+ Assert(plan == plansource->gplan);
+
+ /*
+ * This does:
+ *
+ * 1) the pruning, returning in part_prune_results_list the
+ * PartitionPruneResult Lists for all statements
+ *
+ * 2) lock partitions that survive in each statement, returning
+ * in lockedRelids_per_stmt the RT indexes of those locked.
+ *
+ * True is returned if the plan is still valid after locking all
+ * partitions; false otherwise, in which case we must get a new
+ * plan.
+ */
+ if (CachedPlanLockPartitions(plan, boundParams, owner,
+ &part_prune_results_list,
+ &lockedRelids_per_stmt))
+ {
+ Assert(plan->is_valid);
+ *extra = (CachedPlanExtra *) palloc(sizeof(CachedPlanExtra));
+ (*extra)->part_prune_results_list = part_prune_results_list;
+ return plan;
+ }
+
+ /*
+ * Release the locks and start over. This is the same as what
+ * CheckCachedPlan does when doing AcquireExecutorLocks() causes
+ * the plan to be invalidated.
+ */
+ forboth(lc1, plan->stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst(lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ continue;
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+ }
+ }
+
+ Assert(false);
+ return NULL;
+}
+
+/* Internal workhorse of GetCachedPlan() */
+static CachedPlan *
+GetCachedPlanInternal(CachedPlanSource *plansource, ParamListInfo boundParams,
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ bool *hasUnlockedParts)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1160,7 +1379,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, hasUnlockedParts))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1738,11 +1957,16 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
+ *
+ * If some PlannedStmt(s) contain "initially prunable" partitions, they are not
+ * locked here. Instead, the caller is informed of their existence so that it
+ * can lock them after doing the initial pruning.
*/
-static void
+static bool
AcquireExecutorLocks(List *stmt_list, bool acquire)
{
ListCell *lc1;
+ bool hasUnlockedParts = false;
foreach(lc1, stmt_list)
{
@@ -1763,10 +1987,17 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Assert(plannedstmt->minLockRelids == NULL);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
continue;
}
+ /*
+ * If partitions can be pruned before execution, defer their locking to
+ * the caller.
+ */
+ if (plannedstmt->containsInitialPruning)
+ hasUnlockedParts = true;
+
allLockRelids = plannedstmt->minLockRelids;
rti = -1;
while ((rti = bms_next_member(allLockRelids, rti)) > 0)
@@ -1788,6 +2019,8 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
+
+ return hasUnlockedParts;
}
/*
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 7b1ae6fdcf..94a9db84e3 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,22 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * Copies the given CachedPlanExtra struct into the portal.
+ */
+void
+PortalSaveCachedPlanExtra(Portal portal, CachedPlanExtra *extra)
+{
+ MemoryContext oldcxt = MemoryContextSwitchTo(portal->portalContext);
+
+ Assert(portal->cplan_extra == NULL && extra != NULL);
+ portal->cplan_extra = (CachedPlanExtra *)
+ palloc(sizeof(CachedPlanExtra));
+ portal->cplan_extra->part_prune_results_list =
+ copyObject(extra->part_prune_results_list);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..269cc4d562 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ List *part_prune_results,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index aeeaeb7884..4b98d0d2ef 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -129,5 +129,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..5a7d075750 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ List *part_prune_results; /* PartitionPruneResults returned by
+ * CachedPlanLockPartitions() */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 9a64a830a2..f1374057e5 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -617,6 +617,7 @@ typedef struct EState
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_results; /* QueryDesc.part_prune_results */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4337e7aa34..10f12e780e 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -134,8 +134,8 @@ typedef struct PlannerGlobal
bool containsInitialPruning;
/*
- * Indexes of all range table entries; for AcquireExecutorLocks()'s
- * perusal.
+ * Indexes of all range table entries except those of leaf partitions
+ * scanned by prunable subplans; for AcquireExecutorLocks() perusal.
*/
Bitmapset *minLockRelids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index eb0a007946..ab8bc74e4a 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -82,7 +82,9 @@ typedef struct PlannedStmt
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
- Bitmapset *minLockRelids; /* Indexes of all range table entries; for
+ Bitmapset *minLockRelids; /* Indexes of all range table entries except
+ * those of leaf partitions scanned by
+ * prunable subplans; for
* AcquireExecutorLocks()'s perusal */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
@@ -1575,6 +1577,33 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecPartitionDoInitialPruning() on a given
+ * PartitionPruneInfo.
+ *
+ * root_parent_relids is same as PartitionPruneInfo.root_parent_relids. It's
+ * there for cross-checking in ExecInitPartitionPruning() that the
+ * PartitionPruneResult and the PartitionPruneInfo at a given index in
+ * EState.es_part_prune_results and EState.es_part_prune_infos, respectively,
+ * belong to the same parent plan node.
+ *
+ * valid_subplans_offs contains the indexes of subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() on the
+ * PartitionPruneInfo.
+ *
+ * This is used to store the result of initial partition pruning that is
+ * peformed before the execution has started, such as in
+ * CachedPlanLockPartitions().
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *root_parent_relids;
+ Bitmapset *valid_subplan_offs;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..4ac66d2761 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -160,6 +160,14 @@ typedef struct CachedPlan
MemoryContext context; /* context containing this CachedPlan */
} CachedPlan;
+/*
+ * Additional information to pass the executor when executing a CachedPlan.
+ */
+typedef struct CachedPlanExtra
+{
+ List *part_prune_results_list;
+} CachedPlanExtra;
+
/*
* CachedExpression is a low-overhead mechanism for caching the planned form
* of standalone scalar expressions. While such expressions are not usually
@@ -220,7 +228,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanExtra **extra);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..49bb00cda5 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,8 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ CachedPlanExtra *cplan_extra; /* CachedPlanExtra for cplan in Portal's
+ * memory */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +244,7 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalSaveCachedPlanExtra(Portal portal, CachedPlanExtra *extra);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
On Wed, Dec 14, 2022 at 5:35 PM Amit Langote <amitlangote09@gmail.com> wrote:
I have moved the original functionality of GetCachedPlan() to
GetCachedPlanInternal(), turning the former into a sort of controller
as described shortly. The latter's CheckCachedPlan() part now only
locks the "minimal" set of, non-prunable, relations, making a note of
whether the plan contains any prunable subnodes and thus prunable
relations whose locking is deferred to the caller, GetCachedPlan().
GetCachedPlan(), as a sort of controller as mentioned before, does the
pruning if needed on the minimally valid plan returned by
GetCachedPlanInternal(), locks the partitions that survive, and redoes
the whole thing if the locking of partitions invalidates the plan.
After sleeping on it, I realized this doesn't have to be that
complicated. Rather than turn GetCachedPlan() into a wrapper for
handling deferred partition locking as outlined above, I could have
changed it more simply as follows to get the same thing done:
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ bool hasUnlockedParts = false;
+
+ if (CheckCachedPlan(plansource, &hasUnlockedParts) &&
+ hasUnlockedParts &&
+ CachedPlanLockPartitions(plansource, boundParams, owner, extra))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
Attached updated patch does it like that.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v30-0002-In-GetCachedPlan-only-lock-unpruned-partitions.patchapplication/octet-stream; name=v30-0002-In-GetCachedPlan-only-lock-unpruned-partitions.patchDownload
From 4176843628ef29c1ff173ad0dfbdd13f7d07c225 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v30 2/2] In GetCachedPlan(), only lock unpruned partitions
This does two things mainly:
* The planner now removes the RT indexes of "initially prunable"
partitions from PlannedStmt.minLockRelids such that the set only
contains the relations not subject to initial partition pruning. So,
AcquireExecutorLocks only locks a subset of the relations contained
in a plan, deferring the locking of prunable relations to the caller.
* GetCachedPlans(), if there are prunable relations in the plan,
performs the initial partition pruning using available EXTERN params
and locks the partitions remaining after that, so the the CachedPlan
that's returned is valid in a race-free manner including for any
partitions that will be scanned during execution.
To make the pruning possible before entering ExecutorStart(), this
also adds a ExecPartitionDoInitialPruning(), which can be called by
GetCachedPlan() for a given PlannedStmt.
The result of performing initial partition pruning this way is made
available to the actual execution via PartitionPruneResult, of which
there is one for every ParttionPruneInfo contained in the PlannedStmt.
List of PartitionPruneResult for a given PlannedStmt are returned to
to the callers of GetCachedPlan() via its new output parameter of type
CachedPlanExtra, whose members currently only include said List.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 28 +++-
src/backend/executor/README | 31 +++-
src/backend/executor/execMain.c | 2 +
src/backend/executor/execParallel.c | 25 ++-
src/backend/executor/execPartition.c | 215 +++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 31 +++-
src/backend/optimizer/plan/setrefs.c | 36 +++++
src/backend/tcop/postgres.c | 9 +-
src/backend/tcop/pquery.c | 28 +++-
src/backend/utils/cache/plancache.c | 204 +++++++++++++++++++++--
src/backend/utils/mmgr/portalmem.c | 16 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 7 +-
src/include/executor/execdesc.h | 3 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/pathnodes.h | 4 +-
src/include/nodes/plannodes.h | 31 +++-
src/include/utils/plancache.h | 11 +-
src/include/utils/portal.h | 3 +
28 files changed, 640 insertions(+), 83 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f26cc0d162..401a2280a3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 152c29b551..942449544c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c660..2f2b558608 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NIL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, List *part_prune_results,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_results, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index cf1b1ca571..904cbcba4a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -779,7 +779,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NIL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 8ba2436a71..049a90f49d 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -409,7 +409,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NIL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 9e29584d93..729384a9a6 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -154,6 +154,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanExtra *cplan_extra = NULL;
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -193,7 +194,11 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &cplan_extra);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +212,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ if (cplan_extra)
+ PortalSaveCachedPlanExtra(portal, cplan_extra);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -575,6 +583,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanExtra *cplan_extra = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -619,7 +628,11 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &cplan_extra);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -637,10 +650,17 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ List *part_prune_results = NIL;
+
+ if (cplan_extra)
+ part_prune_results = list_nth_node(List,
+ cplan_extra->part_prune_results_list,
+ foreach_current_index(p));
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_results, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..2222b3ed6f 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -63,7 +63,36 @@ if the executor determines that an entire subplan is not required due to
execution time partition pruning determining that no matching records will be
found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
-subnode array will become out of sequence to the plan's subplan list.
+subnode array will become out of sequence to the plan's subplan list. Note
+that this is referred to as "initial" pruning, because it needs to occur only
+once during the execution startup, and uses a set of pruning steps called
+initial pruning steps (see PartitionedRelPruneInfo.initial_pruning_steps).
+
+Actually, "initial" pruning may occur even before the execution startup in
+in some cases. For example, when a cached generic plan is validated for
+execution, which works by locking all the relations that will be scanned by
+that plan during execution. If the generic plan contains plan nodes that have
+prunable child subnodes, then this validation locking is performed after
+pruning child subnodes that need not be scanned during execution, that is,
+using initial pruning steps. When such a generic plan is forwarded for
+execution, it must be accompanied by the set of PartitionPruneResult nodes that
+contain the result of that pruning, which basically consists of a bitmapset of
+child subnode indexes that survived the pruning and thus whose relations would
+have been locked for execution. This is important, because, unlike the
+plan-time pruning and actual executor-startup pruning, this does not actually
+remove the pruned subnodes from the plan tree, but only marks them as being
+pruned. So, the executor code (core or third party), especially one that runs
+before ExecutorStart() and thus looks at bare Plan trees (not PlanState trees)
+must beware of plan nodes that may actually have been pruned and thus subject
+to being invalidated by concurrent schema changes. For plan nodes that can
+have prunable child subnodes and thus contain a PartitionPruneInfo, such code
+must always check if the corresponding PartitionPruneResult exists
+in EState.es_part_prune_results at given part_prune_index and use that to
+decide which subplans are valid for execution instead of redoing the pruning.
+Note that that is not just a performance optimization but also necessary to
+avoid possibly ending up considering a different set of child subnodes as valid
+than the set CachedPlanLockPartitions() would have locked the relations of, if
+the pruning steps produce a different result when executed multiple times.
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2c2b3a8874..229f61f72e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -798,6 +798,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *part_prune_results = queryDesc->part_prune_results;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -819,6 +820,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_results = part_prune_results;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 65c4b63bbd..9745eba0af 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -599,12 +600,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -633,6 +637,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -659,6 +664,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized List of PartitionPruneResult. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -753,6 +763,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized List of PartitionPruneResult */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS,
+ part_prune_results_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1234,8 +1250,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
ParamListInfo paramLI;
char *queryString;
@@ -1246,12 +1264,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_results,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 5b62157712..dcd2bb0f90 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1742,7 +1748,8 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
+ * done once during executor startup or even before that, such as when called
+ * from CachedPlanLockPartitions(). Expressions that do involve such Params
* require us to prune separately for each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
@@ -1760,6 +1767,12 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the set of the parent plan node's
+ * child subnodes that are valid for execution and also the set of the RT
+ * indexes of leaf partitions scanned by those subnodes.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1780,8 +1793,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * That set is computed by either performing the "initial pruning" here or
+ * reusing the one present in EState.es_part_prune_results[part_prune_index]
+ * if it has been set, which it would be if CachedPlanLockPartitions() would
+ * have done the initial pruning.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1794,9 +1809,10 @@ ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo;
+ PartitionPruneResult *pruneresult = NULL;
/* Obtain the pruneinfo we need, and make sure it's the right one */
pruneinfo = list_nth(estate->es_part_prune_infos, part_prune_index);
@@ -1812,20 +1828,62 @@ ExecInitPartitionPruning(PlanState *planstate,
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /* Initial pruning already done if es_part_prune_results has been set. */
+ if (estate->es_part_prune_results)
+ {
+ pruneresult = list_nth_node(PartitionPruneResult,
+ estate->es_part_prune_results,
+ part_prune_index);
+ if (!bms_equal(root_parent_relids, pruneinfo->root_parent_relids))
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("mismatching PartitionPruneInfo and PartitionPruneResult at part_prune_index %d",
+ part_prune_index),
+ errdetail_internal("prunresult relids %s, pruneinfo relids %s",
+ bmsToString(pruneresult->root_parent_relids),
+ bmsToString(pruneinfo->root_parent_relids)));
+ }
+
+ if (pruneresult == NULL || pruneinfo->needs_exec_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL,
+ pruneinfo->needs_exec_pruning,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans = bms_copy(pruneresult->valid_subplan_offs);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1833,7 +1891,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1849,11 +1908,58 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the set of the parent plan node's child subnodes that are valid for
+ * execution
+ *
+ * On return, *scan_leafpart_rtis will contain the RT indexes of leaf
+ * partitions scanned by those valid subnodes.
+ *
+ * Note that this does not share state with the actual execution, so must do
+ * with the information present in the PlannedStmt. For example, there isn't
+ * a PlanState for the parent plan node yet, so we must create a standalone
+ * ExprContext to evaluate pruning expressions, equipped with the information
+ * about the EXTERN parameters that we do have. Note that that's okay because
+ * the initial pruning steps do not contain anything that would require the
+ * execution to have started. Likewise, we create our own PartitionDirectory
+ * to look up the PartitionDescs to use.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /* Don't omit detached partitions, just like during execution proper. */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1867,19 +1973,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1934,15 +2042,39 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called from
+ * CachedPlanLockPartitions(). In that case, sub-partitions must
+ * be locked, because AcquirePlannerLocks() would have locked only
+ * the root parent.
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -2050,7 +2182,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2060,7 +2192,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2288,10 +2420,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2326,7 +2462,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2340,6 +2476,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2350,13 +2488,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2383,8 +2523,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2392,7 +2538,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 87f4d53ca7..7d36c972d3 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -139,6 +139,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_results = NIL;
estate->es_query_cxt = qcontext;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index dc13625171..bffb42ce71 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NIL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 99830198bd..3b917584de 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -156,7 +156,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -578,7 +579,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -643,7 +644,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -718,7 +719,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -869,7 +870,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index f370f9f287..ccfa083945 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -104,7 +104,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -219,7 +220,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..2ecb9193aa 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1577,6 +1577,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanExtra *cplan_extra;
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1657,7 +1658,11 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cplan_extra);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1690,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ if (cplan_extra)
+ PortalSaveCachedPlanExtra(portal, cplan_extra);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2067,6 +2075,7 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanExtra *cplan_extra = NULL;
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -2092,8 +2101,12 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ &cplan_extra);
Assert(cplan == plansource->gplan);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
@@ -2399,6 +2412,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
CachedPlan *cplan = NULL;
+ CachedPlanExtra *cplan_extra = NULL;
ListCell *lc1;
/*
@@ -2549,8 +2563,12 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cplan_extra);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
stmt_list = cplan->stmt_list;
/*
@@ -2592,9 +2610,14 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ List *part_prune_results = NIL;
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
+ if (cplan_extra)
+ part_prune_results = list_nth_node(List,
+ cplan_extra->part_prune_results_list,
+ foreach_current_index(lc2));
/*
* Reset output state. (Note that if a non-SPI receiver is used,
* _SPI_current->processed will stay zero, and that's what we'll
@@ -2663,7 +2686,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_results,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ed43d5936d..db27cae297 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -372,6 +372,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
ListCell *l;
+ Bitmapset *leafpart_rtis = NULL;
pruneinfo->root_parent_relids =
offset_relid_set(pruneinfo->root_parent_relids, rtoffset);
@@ -383,17 +384,52 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the set of relations to be
+ * locked by AcquireExecutorLocks(). The actual set of leaf
+ * partitions to be locked is computed by
+ * CachedPlanLockPartitions().
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
glob->containsInitialPruning |= pruneinfo->needs_init_pruning;
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 01d264b5ab..e11e07658d 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1598,6 +1598,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanExtra *cplan_extra = NULL;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1972,7 +1973,10 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cplan_extra);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
/*
* Now we can define the portal.
@@ -1987,6 +1991,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ if (cplan_extra)
+ PortalSaveCachedPlanExtra(portal, cplan_extra);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 52e2db6452..32e6b7b767 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_results = part_prune_results;
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +124,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_results: pruning results returned by CachedPlanLockPartitions()
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +137,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +149,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_results, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +495,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan_extra == NULL ? NIL :
+ linitial(portal->cplan_extra->part_prune_results_list),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1234,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ List *part_prune_results = NIL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1282,19 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding List of PartitionPruneResult
+ * for this PlannedStmt.
+ */
+ if (portal->cplan_extra)
+ part_prune_results = list_nth_node(List,
+ portal->cplan_extra->part_prune_results_list,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 339bb603f7..16b9869fae 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -59,6 +59,7 @@
#include "access/transam.h"
#include "catalog/namespace.h"
#include "executor/executor.h"
+#include "executor/execPartition.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/optimizer.h"
@@ -99,14 +100,18 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, bool *hasUnlockedParts);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static bool AcquireExecutorLocks(List *stmt_list, bool acquire);
+static bool CachedPlanLockPartitions(CachedPlanSource *plansource,
+ ParamListInfo boundParams,
+ ResourceOwner owner,
+ CachedPlanExtra **extra);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -783,16 +788,23 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
}
/*
- * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid and
+ * set *hasUnlockedParts if any PlannedStmt contains "initially" prunable
+ * subnodes; partitions are not locked till initial pruning is done.
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
+ * On a "true" return, we have acquired the minimal set of locks needed to run
+ * the plan, that is, excluding partitions that are subject to being pruned
+ * before execution. The caller must lock partitions after pruning those and
+ * locking the ones that remain before actually telling the world that the
+ * plan is "valid".
+ *
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, bool *hasUnlockedParts)
{
CachedPlan *plan = plansource->gplan;
@@ -826,7 +838,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ *hasUnlockedParts = AcquireExecutorLocks(plan->stmt_list, true);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +860,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ (void) AcquireExecutorLocks(plan->stmt_list, false);
}
/*
@@ -1120,14 +1132,17 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
}
/*
- * GetCachedPlan: get a cached plan from a CachedPlanSource.
+ * GetCachedPlan: get a cached plan from a CachedPlanSource
*
* This function hides the logic that decides whether to use a generic
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
* On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * execution. If the plan is a generic plan containing prunable partitions,
+ * the locks on partitions are taken after the pruning and the result of that
+ * pruning is saved in *extra->part_prune_results_list for the caller to pass
+ * to the executor, along with plan->stmt_list.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1139,12 +1154,16 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanExtra **extra)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ Assert(extra != NULL);
+ *extra = NULL;
+
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
Assert(plansource->is_complete);
@@ -1160,7 +1179,11 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ bool hasUnlockedParts = false;
+
+ if (CheckCachedPlan(plansource, &hasUnlockedParts) &&
+ hasUnlockedParts &&
+ CachedPlanLockPartitions(plansource, boundParams, owner, extra))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1282,6 +1305,147 @@ ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner)
}
}
+/*
+ * For each PlannedStmt in the generic plan, do the "initial" partition pruning
+ * if needed and lock only partitions that survive.
+ *
+ * On return, (*extra)->part_prune_results_list will contain an element for
+ * each PlannedStmt in the generic plan's stmt_list, which is a NIL if the
+ * PlannedStmt does not contain any PartitionPruneInfos requiring initial
+ * pruning or a List of PartitionPruneResult containing elements corresponding
+ * to the PartitionPruneInfos in PlannedStmt.partPruneInfos.
+ */
+static bool
+CachedPlanLockPartitions(CachedPlanSource *plansource,
+ ParamListInfo boundParams,
+ ResourceOwner owner,
+ CachedPlanExtra **extra)
+{
+ CachedPlan *plan = plansource->gplan;
+ List *part_prune_results_list = NIL;
+ List *lockedRelids_per_stmt = NIL;
+ ListCell *lc1,
+ *lc2;
+ MemoryContext oldcontext,
+ tmpcontext;
+
+ /*
+ * Won't be here without CheckCachedPlan() having validated a generic
+ * plan.
+ */
+ Assert(plansource->gplan != NULL);
+
+ /*
+ * Create a temporary context for memory allocations required while
+ * executing partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "CachedPlanLockPartitions() working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+ foreach(lc1, plan->stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockPartRelids = NULL;
+ int rti;
+ List *part_prune_results = NIL;
+ Bitmapset *lockedRelids = NULL;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, because AcquireExecutorLocks on the
+ * parent CachedPlan would have dealt with these. Though, do let
+ * the caller know that no pruning is applicable to this statement.
+ */
+ part_prune_results_list = lappend(part_prune_results_list, NIL);
+ lockedRelids_per_stmt = lappend(lockedRelids_per_stmt, NULL);
+ continue;
+ }
+
+ /* Figure out the partitions that would need to be locked. */
+ if (plannedstmt->containsInitialPruning)
+ {
+ foreach(lc2, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc2);
+ PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+
+ pruneresult->root_parent_relids =
+ bms_copy(pruneinfo->root_parent_relids);
+ pruneresult->valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, boundParams,
+ pruneinfo,
+ &lockPartRelids);
+ part_prune_results = lappend(part_prune_results, pruneresult);
+ }
+ }
+
+ /* Lock 'em. */
+ rti = -1;
+ while ((rti = bms_next_member(lockPartRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ part_prune_results_list = lappend(part_prune_results_list,
+ part_prune_results);
+ lockedRelids_per_stmt = lappend(lockedRelids_per_stmt,
+ lockedRelids);
+ }
+
+ /*
+ * If the plan is still valid, set *extra, returning in it a copy the
+ * pruning results obtained above allocated in the caller's context.
+ */
+ MemoryContextSwitchTo(oldcontext);
+ if (plan->is_valid)
+ {
+ *extra = (CachedPlanExtra *) palloc(sizeof(CachedPlanExtra));
+ (*extra)->part_prune_results_list = copyObject(part_prune_results_list);
+ }
+ else
+ {
+ /*
+ * Release the now useless locks. Note that this is the same as what
+ * CheckCachedPlan() does when the locks taken by
+ * AcquireExecutorLocks() causes the plan to be invalidated.
+ */
+ forboth(lc1, plan->stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst(lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ continue;
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+ }
+
+ /* Clear up the temporary context. */
+ MemoryContextDelete(tmpcontext);
+ return plan->is_valid;
+}
+
/*
* CachedPlanAllowsSimpleValidityCheck: can we use CachedPlanIsSimplyValid?
*
@@ -1738,11 +1902,16 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
+ *
+ * If some PlannedStmt(s) contain "initially prunable" partitions, they are not
+ * locked here. Instead, the caller is informed of their existence so that it
+ * can lock them after doing the initial pruning.
*/
-static void
+static bool
AcquireExecutorLocks(List *stmt_list, bool acquire)
{
ListCell *lc1;
+ bool hasUnlockedParts = false;
foreach(lc1, stmt_list)
{
@@ -1763,10 +1932,17 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Assert(plannedstmt->minLockRelids == NULL);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
continue;
}
+ /*
+ * If partitions can be pruned before execution, defer their locking to
+ * the caller.
+ */
+ if (plannedstmt->containsInitialPruning)
+ hasUnlockedParts = true;
+
allLockRelids = plannedstmt->minLockRelids;
rti = -1;
while ((rti = bms_next_member(allLockRelids, rti)) > 0)
@@ -1788,6 +1964,8 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
+
+ return hasUnlockedParts;
}
/*
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 7b1ae6fdcf..94a9db84e3 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,22 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * Copies the given CachedPlanExtra struct into the portal.
+ */
+void
+PortalSaveCachedPlanExtra(Portal portal, CachedPlanExtra *extra)
+{
+ MemoryContext oldcxt = MemoryContextSwitchTo(portal->portalContext);
+
+ Assert(portal->cplan_extra == NULL && extra != NULL);
+ portal->cplan_extra = (CachedPlanExtra *)
+ palloc(sizeof(CachedPlanExtra));
+ portal->cplan_extra->part_prune_results_list =
+ copyObject(extra->part_prune_results_list);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..269cc4d562 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ List *part_prune_results,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index aeeaeb7884..4b98d0d2ef 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -129,5 +129,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..5a7d075750 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ List *part_prune_results; /* PartitionPruneResults returned by
+ * CachedPlanLockPartitions() */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 9a64a830a2..f1374057e5 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -617,6 +617,7 @@ typedef struct EState
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_results; /* QueryDesc.part_prune_results */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4337e7aa34..10f12e780e 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -134,8 +134,8 @@ typedef struct PlannerGlobal
bool containsInitialPruning;
/*
- * Indexes of all range table entries; for AcquireExecutorLocks()'s
- * perusal.
+ * Indexes of all range table entries except those of leaf partitions
+ * scanned by prunable subplans; for AcquireExecutorLocks() perusal.
*/
Bitmapset *minLockRelids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index eb0a007946..ab8bc74e4a 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -82,7 +82,9 @@ typedef struct PlannedStmt
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
- Bitmapset *minLockRelids; /* Indexes of all range table entries; for
+ Bitmapset *minLockRelids; /* Indexes of all range table entries except
+ * those of leaf partitions scanned by
+ * prunable subplans; for
* AcquireExecutorLocks()'s perusal */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
@@ -1575,6 +1577,33 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecPartitionDoInitialPruning() on a given
+ * PartitionPruneInfo.
+ *
+ * root_parent_relids is same as PartitionPruneInfo.root_parent_relids. It's
+ * there for cross-checking in ExecInitPartitionPruning() that the
+ * PartitionPruneResult and the PartitionPruneInfo at a given index in
+ * EState.es_part_prune_results and EState.es_part_prune_infos, respectively,
+ * belong to the same parent plan node.
+ *
+ * valid_subplans_offs contains the indexes of subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() on the
+ * PartitionPruneInfo.
+ *
+ * This is used to store the result of initial partition pruning that is
+ * peformed before the execution has started, such as in
+ * CachedPlanLockPartitions().
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *root_parent_relids;
+ Bitmapset *valid_subplan_offs;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..4ac66d2761 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -160,6 +160,14 @@ typedef struct CachedPlan
MemoryContext context; /* context containing this CachedPlan */
} CachedPlan;
+/*
+ * Additional information to pass the executor when executing a CachedPlan.
+ */
+typedef struct CachedPlanExtra
+{
+ List *part_prune_results_list;
+} CachedPlanExtra;
+
/*
* CachedExpression is a low-overhead mechanism for caching the planned form
* of standalone scalar expressions. While such expressions are not usually
@@ -220,7 +228,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanExtra **extra);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..49bb00cda5 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,8 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ CachedPlanExtra *cplan_extra; /* CachedPlanExtra for cplan in Portal's
+ * memory */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +244,7 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalSaveCachedPlanExtra(Portal portal, CachedPlanExtra *extra);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
v30-0001-Preparatory-refactoring-before-reworking-CachedP.patchapplication/octet-stream; name=v30-0001-Preparatory-refactoring-before-reworking-CachedP.patchDownload
From 22c64b3d1ade0cb0f413c17d84a9bb0dd4e6d734 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 13 Dec 2022 11:58:07 +0900
Subject: [PATCH v30 1/2] Preparatory refactoring before reworking CachedPlan
locking
Remember the RT indexes of RTEs that AcquireExecutorLocks() must
look at to consider locking in a bitmapset, so that nstead of looping
over the range table to find those RTEs, it can look them up using
the RT indexes set in the bitmapset.
This also adds some extra information related to execution-time
pruning to the relevant plan nodes.
---
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 6 ++++
src/backend/nodes/readfuncs.c | 8 ++++--
src/backend/optimizer/plan/planner.c | 2 ++
src/backend/optimizer/plan/setrefs.c | 12 ++++++++
src/backend/partitioning/partprune.c | 42 ++++++++++++++++++++++++++--
src/backend/utils/cache/plancache.c | 10 +++++--
src/include/executor/execPartition.h | 2 ++
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 11 ++++++++
src/include/nodes/plannodes.h | 19 +++++++++++++
11 files changed, 106 insertions(+), 8 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a5b8e43ec5..65c4b63bbd 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -182,6 +182,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false; /* workers need not know! */
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 76d79b9741..5b62157712 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1956,6 +1956,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1966,6 +1967,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2016,6 +2019,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2023,6 +2028,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 966b75f5a6..1161671fa4 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -796,7 +801,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5dd4f92720..620b163ef9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -523,8 +523,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
result->permInfos = glob->finalrteperminfos;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 596f1fbc8e..ed43d5936d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -279,6 +279,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -377,9 +387,11 @@ set_plan_references(PlannerInfo *root, Plan *plan)
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
}
+
}
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ glob->containsInitialPruning |= pruneinfo->needs_init_pruning;
}
return result;
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d48f6784c1..56270d7670 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -342,6 +353,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->root_parent_relids = parentrel->relids;
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -442,13 +455,19 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate whether
+ * the pruning steps contained in the returned PartitionedRelPruneInfos
+ * can be performed during executor startup and during execution,
+ * respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -459,6 +478,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -546,6 +569,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -620,6 +646,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -647,6 +679,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -659,6 +692,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -673,6 +707,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -697,6 +732,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index cc943205d3..339bb603f7 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1747,7 +1747,8 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ Bitmapset *allLockRelids;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1760,14 +1761,17 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
*/
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+ Assert(plannedstmt->minLockRelids == NULL);
if (query)
ScanQueryForLocks(query, acquire);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ allLockRelids = plannedstmt->minLockRelids;
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 17fabc18c9..aeeaeb7884 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 1f33902947..c2f2544df5 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -218,6 +218,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 654dba61aa..4337e7aa34 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -128,6 +128,17 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries; for AcquireExecutorLocks()'s
+ * perusal.
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index bddfe86191..eb0a007946 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,11 +73,18 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in the
* plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries; for
+ * AcquireExecutorLocks()'s perusal */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1417,6 +1424,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1428,6 +1442,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
Bitmapset *root_parent_relids;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1472,6 +1488,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
--
2.35.3
This version of the patch looks not entirely unreasonable to me. I'll
set this as Ready for Committer in case David or Tom or someone else
want to have a look and potentially commit it.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
On Wed, Dec 21, 2022 at 7:18 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
This version of the patch looks not entirely unreasonable to me. I'll
set this as Ready for Committer in case David or Tom or someone else
want to have a look and potentially commit it.
Thank you, Alvaro.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
This version of the patch looks not entirely unreasonable to me. I'll
set this as Ready for Committer in case David or Tom or someone else
want to have a look and potentially commit it.
I will have a look during the January CF.
regards, tom lane
I spent some time re-reading this whole thread, and the more I read
the less happy I got. We are adding a lot of complexity and introducing
coding hazards that will surely bite somebody someday. And after awhile
I had what felt like an epiphany: the whole problem arises because the
system is wrongly factored. We should get rid of AcquireExecutorLocks
altogether, allowing the plancache to hand back a generic plan that
it's not certain of the validity of, and instead integrate the
responsibility for acquiring locks into executor startup. It'd have
to be optional there, since we don't need new locks in the case of
executing a just-planned plan; but we can easily add another eflags
bit (EXEC_FLAG_GET_LOCKS or so). Then there has to be a convention
whereby the ExecInitNode traversal can return an indicator that
"we failed because the plan is stale, please make a new plan".
There are a couple reasons why this feels like a good idea:
* There's no need for worry about keeping the locking decisions in sync
with what executor startup does.
* We don't need to add the overhead proposed in the current patch to
pass forward data about what got locked/pruned. While that overhead
is hopefully less expensive than the locks it saved acquiring, it's
still overhead (and in some cases the patch will fail to save acquiring
any locks, making it certainly a net negative).
* In a successfully built execution state tree, there will simply
not be any nodes corresponding to pruned-away, never-locked subplans.
As long as code like EXPLAIN follows the state tree and doesn't poke
into plan nodes that have no matching state, it's secure against the
sort of problems that Robert worried about upthread.
While I've not attempted to write any code for this, I can also
think of a few issues that'd have to be resolved:
* We'd be pushing the responsibility for looping back and re-planning
out to fairly high-level calling code. There are only half a dozen
callers of GetCachedPlan, so there's not that many places to be
touched; but in some of those places the subsequent executor-start call
is not close by, so that the necessary refactoring might be pretty
painful. I doubt there's anything insurmountable, but we'd definitely
be changing some fundamental APIs.
* In some cases (views, at least) we need to acquire lock on relations
that aren't directly reflected anywhere in the plan tree. So there'd
have to be a separate mechanism for getting those locks and rechecking
validity afterward. A list of relevant relation OIDs might be enough
for that.
* We currently do ExecCheckPermissions() before initializing the
plan state tree. It won't do to check permissions on relations we
haven't yet locked, so that responsibility would have to be moved.
Maybe that could also be integrated into the initialization recursion?
Not sure.
* In the existing usage of AcquireExecutorLocks, if we do decide
that the plan is stale then we are able to release all the locks
we got before we go off and replan. I'm not certain if that behavior
needs to be preserved, but if it does then that would require some
additional bookkeeping in the executor.
* This approach is optimizing on the assumption that we usually
won't need to replan, because if we do then we might waste a fair
amount of executor startup overhead before discovering we have
to throw all that state away. I think that's clearly the right
way to bet, but perhaps somebody else has a different view.
Thoughts?
regards, tom lane
On Fri, Jan 20, 2023 at 4:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I spent some time re-reading this whole thread, and the more I read
the less happy I got.
Thanks a lot for your time on this.
We are adding a lot of complexity and introducing
coding hazards that will surely bite somebody someday. And after awhile
I had what felt like an epiphany: the whole problem arises because the
system is wrongly factored. We should get rid of AcquireExecutorLocks
altogether, allowing the plancache to hand back a generic plan that
it's not certain of the validity of, and instead integrate the
responsibility for acquiring locks into executor startup. It'd have
to be optional there, since we don't need new locks in the case of
executing a just-planned plan; but we can easily add another eflags
bit (EXEC_FLAG_GET_LOCKS or so). Then there has to be a convention
whereby the ExecInitNode traversal can return an indicator that
"we failed because the plan is stale, please make a new plan".
Interesting. The current implementation relies on
PlanCacheRelCallback() marking a generic CachedPlan as invalid, so
perhaps there will have to be some sharing of state between the
plancache and the executor for this to work?
There are a couple reasons why this feels like a good idea:
* There's no need for worry about keeping the locking decisions in sync
with what executor startup does.* We don't need to add the overhead proposed in the current patch to
pass forward data about what got locked/pruned. While that overhead
is hopefully less expensive than the locks it saved acquiring, it's
still overhead (and in some cases the patch will fail to save acquiring
any locks, making it certainly a net negative).* In a successfully built execution state tree, there will simply
not be any nodes corresponding to pruned-away, never-locked subplans.
As long as code like EXPLAIN follows the state tree and doesn't poke
into plan nodes that have no matching state, it's secure against the
sort of problems that Robert worried about upthread.
I think this is true with the patch as proposed too, but I was still a
bit worried about what an ExecutorStart_hook may be doing with an
uninitialized plan tree. Maybe we're mandating that the hook must
call standard_ExecutorStart() and only work with the finished
PlanState tree?
While I've not attempted to write any code for this, I can also
think of a few issues that'd have to be resolved:* We'd be pushing the responsibility for looping back and re-planning
out to fairly high-level calling code. There are only half a dozen
callers of GetCachedPlan, so there's not that many places to be
touched; but in some of those places the subsequent executor-start call
is not close by, so that the necessary refactoring might be pretty
painful. I doubt there's anything insurmountable, but we'd definitely
be changing some fundamental APIs.
Yeah. I suppose mostly the same place that the current patch is
touching to pass around the PartitionPruneResult nodes.
* In some cases (views, at least) we need to acquire lock on relations
that aren't directly reflected anywhere in the plan tree. So there'd
have to be a separate mechanism for getting those locks and rechecking
validity afterward. A list of relevant relation OIDs might be enough
for that.
Hmm, a list of only the OIDs wouldn't preserve the lock mode, so maybe
a list or bitmapset of the RTIs, something along the lines of
PlannedStmt.minLockRelids in the patch?
It perhaps even makes sense to make a special list in PlannedStmt for
only the views?
* We currently do ExecCheckPermissions() before initializing the
plan state tree. It won't do to check permissions on relations we
haven't yet locked, so that responsibility would have to be moved.
Maybe that could also be integrated into the initialization recursion?
Not sure.
Ah, I remember mentioning moving that into ExecGetRangeTableRelation()
[1]: /messages/by-id/CA+HiwqG7ZruBmmih3wPsBZ4s0H2EhywrnXEduckY5Hr3fWzPWA@mail.gmail.com
plan tree, such as views. Though maybe that's not a problem if we
track views separately as mentioned above.
* In the existing usage of AcquireExecutorLocks, if we do decide
that the plan is stale then we are able to release all the locks
we got before we go off and replan. I'm not certain if that behavior
needs to be preserved, but if it does then that would require some
additional bookkeeping in the executor.
I think maybe we'll want to continue to release the existing locks,
because if we don't, it's possible we may keep some locks uselessly if
replanning might lock a different set of relations.
* This approach is optimizing on the assumption that we usually
won't need to replan, because if we do then we might waste a fair
amount of executor startup overhead before discovering we have
to throw all that state away. I think that's clearly the right
way to bet, but perhaps somebody else has a different view.
Not sure if you'd like, because it would still keep the
PartitionPruneResult business, but this will be less of a problem if
we do the initial pruning at the beginning of InitPlan(), followed by
locking, before doing anything else. We would have initialized the
QueryDesc and the EState, but only minimally. That also keeps the
PartitionPruneResult business local to the executor.
Would you like me to hack up a PoC or are you already on that?
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
[1]: /messages/by-id/CA+HiwqG7ZruBmmih3wPsBZ4s0H2EhywrnXEduckY5Hr3fWzPWA@mail.gmail.com
Amit Langote <amitlangote09@gmail.com> writes:
On Fri, Jan 20, 2023 at 4:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I had what felt like an epiphany: the whole problem arises because the
system is wrongly factored. We should get rid of AcquireExecutorLocks
altogether, allowing the plancache to hand back a generic plan that
it's not certain of the validity of, and instead integrate the
responsibility for acquiring locks into executor startup.
Interesting. The current implementation relies on
PlanCacheRelCallback() marking a generic CachedPlan as invalid, so
perhaps there will have to be some sharing of state between the
plancache and the executor for this to work?
Yeah. Thinking a little harder, I think this would have to involve
passing a CachedPlan pointer to the executor, and what the executor
would do after acquiring each lock is to ask the plancache "hey, do
you still think this CachedPlan entry is valid?". In the case where
there's a problem, the AcceptInvalidationMessages call involved in
lock acquisition would lead to a cache inval that clears the validity
flag on the CachedPlan entry, and this would provide an inexpensive
way to check if that happened.
It might be possible to incorporate this pointer into PlannedStmt
instead of passing it separately.
* In a successfully built execution state tree, there will simply
not be any nodes corresponding to pruned-away, never-locked subplans.
I think this is true with the patch as proposed too, but I was still a
bit worried about what an ExecutorStart_hook may be doing with an
uninitialized plan tree. Maybe we're mandating that the hook must
call standard_ExecutorStart() and only work with the finished
PlanState tree?
It would certainly be incumbent on any such hook to not touch
not-yet-locked parts of the plan tree. I'm not particularly concerned
about that sort of requirements change, because we'd be breaking APIs
all through this area in any case.
* In some cases (views, at least) we need to acquire lock on relations
that aren't directly reflected anywhere in the plan tree. So there'd
have to be a separate mechanism for getting those locks and rechecking
validity afterward. A list of relevant relation OIDs might be enough
for that.
Hmm, a list of only the OIDs wouldn't preserve the lock mode,
Good point. I wonder if we could integrate this with the
RTEPermissionInfo data structure?
Would you like me to hack up a PoC or are you already on that?
I'm not planning to work on this myself, I was hoping you would.
regards, tom lane
On Fri, Jan 20, 2023 at 12:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Langote <amitlangote09@gmail.com> writes:
On Fri, Jan 20, 2023 at 4:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I had what felt like an epiphany: the whole problem arises because the
system is wrongly factored. We should get rid of AcquireExecutorLocks
altogether, allowing the plancache to hand back a generic plan that
it's not certain of the validity of, and instead integrate the
responsibility for acquiring locks into executor startup.Interesting. The current implementation relies on
PlanCacheRelCallback() marking a generic CachedPlan as invalid, so
perhaps there will have to be some sharing of state between the
plancache and the executor for this to work?Yeah. Thinking a little harder, I think this would have to involve
passing a CachedPlan pointer to the executor, and what the executor
would do after acquiring each lock is to ask the plancache "hey, do
you still think this CachedPlan entry is valid?". In the case where
there's a problem, the AcceptInvalidationMessages call involved in
lock acquisition would lead to a cache inval that clears the validity
flag on the CachedPlan entry, and this would provide an inexpensive
way to check if that happened.
OK, thanks, this is useful.
It might be possible to incorporate this pointer into PlannedStmt
instead of passing it separately.
Yeah, that would be less churn. Though, I wonder if you still hold
that PlannedStmt should not be scribbled upon outside the planner as
you said upthread [1]/messages/by-id/922566.1648784745@sss.pgh.pa.us?
* In a successfully built execution state tree, there will simply
not be any nodes corresponding to pruned-away, never-locked subplans.I think this is true with the patch as proposed too, but I was still a
bit worried about what an ExecutorStart_hook may be doing with an
uninitialized plan tree. Maybe we're mandating that the hook must
call standard_ExecutorStart() and only work with the finished
PlanState tree?It would certainly be incumbent on any such hook to not touch
not-yet-locked parts of the plan tree. I'm not particularly concerned
about that sort of requirements change, because we'd be breaking APIs
all through this area in any case.
OK. Perhaps something that should be documented around ExecutorStart().
* In some cases (views, at least) we need to acquire lock on relations
that aren't directly reflected anywhere in the plan tree. So there'd
have to be a separate mechanism for getting those locks and rechecking
validity afterward. A list of relevant relation OIDs might be enough
for that.Hmm, a list of only the OIDs wouldn't preserve the lock mode,
Good point. I wonder if we could integrate this with the
RTEPermissionInfo data structure?
You mean adding a rellockmode field to RTEPermissionInfo?
Would you like me to hack up a PoC or are you already on that?
I'm not planning to work on this myself, I was hoping you would.
Alright, I'll try to get something out early next week. Thanks for
all the pointers.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Amit Langote <amitlangote09@gmail.com> writes:
On Fri, Jan 20, 2023 at 12:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
It might be possible to incorporate this pointer into PlannedStmt
instead of passing it separately.
Yeah, that would be less churn. Though, I wonder if you still hold
that PlannedStmt should not be scribbled upon outside the planner as
you said upthread [1]?
Well, the whole point of that rule is that the executor can't modify
a plancache entry. If the plancache itself sets a field in such an
entry, that doesn't seem problematic from here.
But there's other possibilities if that bothers you; QueryDesc
could hold the field, for example. Also, I bet we'd want to copy
it into EState for the main initialization recursion.
regards, tom lane
On Fri, Jan 20, 2023 at 12:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Langote <amitlangote09@gmail.com> writes:
On Fri, Jan 20, 2023 at 12:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
It might be possible to incorporate this pointer into PlannedStmt
instead of passing it separately.Yeah, that would be less churn. Though, I wonder if you still hold
that PlannedStmt should not be scribbled upon outside the planner as
you said upthread [1]?Well, the whole point of that rule is that the executor can't modify
a plancache entry. If the plancache itself sets a field in such an
entry, that doesn't seem problematic from here.But there's other possibilities if that bothers you; QueryDesc
could hold the field, for example. Also, I bet we'd want to copy
it into EState for the main initialization recursion.
QueryDesc sounds good to me, and yes, also a copy in EState in any case.
So I started looking at the call sites of CreateQueryDesc() and
stopped to look at ExecParallelGetQueryDesc(). AFAICS, we wouldn't
need to pass the CachedPlan to a parallel worker's rerun of
InitPlan(), because 1) it doesn't make sense to call the plancache in
a parallel worker, 2) the leader should already have taken all the
locks in necessary for executing a given plan subnode that it intends
to pass to a worker in ExecInitGather(). Does that make sense?
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
On Fri, Jan 20, 2023 at 12:52 PM Amit Langote <amitlangote09@gmail.com> wrote:
Alright, I'll try to get something out early next week. Thanks for
all the pointers.
Sorry for the delay. Attached is what I've come up with so far.
I didn't actually go with calling the plancache on every lock taken on
a relation, that is, in ExecGetRangeTableRelation(). One thing about
doing it that way that I didn't quite like (or didn't see a clean
enough way to code) is the need to complicate the ExecInitNode()
traversal for handling the abrupt suspension of the ongoing setup of
the PlanState tree.
So, I decided to keep the current model of locking all the relations
that need to be locked before doing anything else in InitPlan(), much
as how AcquireExecutorLocks() does it. A new function called from
the top of InitPlan that I've called ExecLockRelationsIfNeeded() does
that locking after performing the initial pruning in the same manner
as the earlier patch did. That does mean that I needed to keep all
the adjustments of the pruning code that are required for such
out-of-ExecInitNode() invocation of initial pruning, including those
PartitionPruneResult to carry the result of that pruning for
ExecInitNode()-time reuse, though they no longer need be passed
through many unrelated interfaces.
Anyways, here's a description of the patches:
0001 adjusts various call sites of ExecutorStart() to cope with the
possibility of being asked to recreate a CachedPlan, if one is
involved. The main objective here is to have as little stuff as
sensible happen between GetCachedPlan() that returned the CachedPlan
and ExecutorStart() so as to minimize the chances of missing cleaning
up resources that must not be missed.
0002 is preparatory refactoring to make out-of-ExecInitNode()
invocation of pruning possible.
0003 moves the responsibility of CachedPlan validation locking into
ExecutorStart() as described above.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v31-0001-Move-ExecutorStart-closer-to-GetCachedPlan.patchapplication/octet-stream; name=v31-0001-Move-ExecutorStart-closer-to-GetCachedPlan.patchDownload
From 4cfc3fdfb4c31a163fc3b0657be77927314cc1ca Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 20 Jan 2023 16:52:31 +0900
Subject: [PATCH v31 1/3] Move ExecutorStart() closer to GetCachedPlan()
This is in preparation for moving CachedPlan validation locking
into ExecutorStart(). The intent is to not have many steps between
GetCachedPlan() and ExecutorStart() so that if the latter invalidates
a CachedPlan, there's not much resource cleanup to worry about.
---
contrib/auto_explain/auto_explain.c | 9 +-
.../pg_stat_statements/pg_stat_statements.c | 8 +-
src/backend/commands/copyto.c | 5 +-
src/backend/commands/createas.c | 4 +-
src/backend/commands/explain.c | 148 ++++++---
src/backend/commands/extension.c | 3 +-
src/backend/commands/matview.c | 4 +-
src/backend/commands/portalcmds.c | 2 +-
src/backend/commands/prepare.c | 85 +++--
src/backend/executor/execMain.c | 18 +-
src/backend/executor/execParallel.c | 9 +-
src/backend/executor/functions.c | 3 +-
src/backend/executor/spi.c | 47 ++-
src/backend/nodes/Makefile | 1 +
src/backend/nodes/gen_node_support.pl | 2 +
src/backend/tcop/postgres.c | 12 +-
src/backend/tcop/pquery.c | 292 +++++++++---------
src/backend/utils/mmgr/portalmem.c | 6 +
src/include/commands/explain.h | 8 +-
src/include/executor/execdesc.h | 4 +
src/include/executor/executor.h | 6 +-
src/include/nodes/meson.build | 1 +
src/include/tcop/pquery.h | 3 +-
src/include/utils/portal.h | 2 +
24 files changed, 411 insertions(+), 271 deletions(-)
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index c3ac27ae99..0f20b97781 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -78,7 +78,8 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
-static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags,
+ bool *replan);
static void explain_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count, bool execute_once);
@@ -259,7 +260,7 @@ _PG_init(void)
* ExecutorStart hook: start up logging if needed
*/
static void
-explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
+explain_ExecutorStart(QueryDesc *queryDesc, int eflags, bool *replan)
{
/*
* At the beginning of each top-level statement, decide whether we'll
@@ -296,9 +297,9 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
}
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ prev_ExecutorStart(queryDesc, eflags, replan);
else
- standard_ExecutorStart(queryDesc, eflags);
+ standard_ExecutorStart(queryDesc, eflags, replan);
if (auto_explain_enabled())
{
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index ad1fe44496..76348419ae 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -325,7 +325,7 @@ static PlannedStmt *pgss_planner(Query *parse,
const char *query_string,
int cursorOptions,
ParamListInfo boundParams);
-static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags, bool *replan);
static void pgss_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count, bool execute_once);
@@ -962,12 +962,12 @@ pgss_planner(Query *parse,
* ExecutorStart hook: start up tracking if needed
*/
static void
-pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
+pgss_ExecutorStart(QueryDesc *queryDesc, int eflags, bool *replan)
{
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ prev_ExecutorStart(queryDesc, eflags, replan);
else
- standard_ExecutorStart(queryDesc, eflags);
+ standard_ExecutorStart(queryDesc, eflags, replan);
/*
* If query has queryId zero, don't track it. This prevents double
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 8043b4e9b1..b6d8fa59d5 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,8 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL,
+ pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -568,7 +569,7 @@ BeginCopyTo(ParseState *pstate,
*
* ExecutorStart computes a result tupdesc for us
*/
- ExecutorStart(cstate->queryDesc, 0);
+ ExecutorStart(cstate->queryDesc, 0, NULL);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index d6c6d514f3..ee33f02602 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,12 +325,12 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
/* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ ExecutorStart(queryDesc, GetIntoRelEFlags(into), NULL);
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5212a64b1e..fcb227533c 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -384,6 +384,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
else
{
PlannedStmt *plan;
+ QueryDesc *queryDesc;
instr_time planstart,
planduration;
BufferUsage bufusage_start,
@@ -406,12 +407,94 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
+ queryDesc = ExplainQueryDesc(plan, NULL, queryString, into, es,
+ params, queryEnv, NULL);
+
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(queryDesc, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
+
+ /* One pushed by ExplainQueryDesc(). */
+ PopActiveSnapshot();
}
}
+/*
+ * ExplainQueryDesc
+ * Set up QueryDesc for EXPLAINing a given plan
+ *
+ * On return, *replan is set to true if cplan is found to have been
+ * invalidated since its creation.
+ */
+QueryDesc *
+ExplainQueryDesc(PlannedStmt *stmt, CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv,
+ bool *replan)
+{
+ QueryDesc *queryDesc;
+ DestReceiver *dest;
+ int eflags;
+ int instrument_option = 0;
+
+ /*
+ * Normally we discard the query's output, but if explaining CREATE TABLE
+ * AS, we'd better use the appropriate tuple receiver.
+ */
+ if (into)
+ dest = CreateIntoRelDestReceiver(into);
+ else
+ dest = None_Receiver;
+
+ if (es->analyze && es->timing)
+ instrument_option |= INSTRUMENT_TIMER;
+ else if (es->analyze)
+ instrument_option |= INSTRUMENT_ROWS;
+
+ if (es->buffers)
+ instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
+
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+
+ /* Create a QueryDesc for the query */
+ queryDesc = CreateQueryDesc(stmt, cplan, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv, instrument_option);
+
+ /* Select execution options */
+ if (es->analyze)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_EXPLAIN_ONLY;
+ if (into)
+ eflags |= GetIntoRelEFlags(into);
+
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated as we're doing that.
+ */
+ if (replan)
+ *replan = false;
+ ExecutorStart(queryDesc, eflags, replan);
+ if (replan && *replan)
+ {
+ /* Clean up. */
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ return NULL;
+ }
+
+ return queryDesc;
+}
+
/*
* ExplainOneUtility -
* print out the execution plan for one utility statement
@@ -515,30 +598,18 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
{
- DestReceiver *dest;
- QueryDesc *queryDesc;
+ PlannedStmt *plannedstmt = queryDesc->plannedstmt;
instr_time starttime;
double totaltime = 0;
- int eflags;
- int instrument_option = 0;
Assert(plannedstmt->commandType != CMD_UTILITY);
- if (es->analyze && es->timing)
- instrument_option |= INSTRUMENT_TIMER;
- else if (es->analyze)
- instrument_option |= INSTRUMENT_ROWS;
-
- if (es->buffers)
- instrument_option |= INSTRUMENT_BUFFERS;
- if (es->wal)
- instrument_option |= INSTRUMENT_WAL;
-
/*
* We always collect timing for the entire statement, even when node-level
* timing is off, so we don't look at es->timing here. (We could skip
@@ -546,38 +617,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
- /*
- * Normally we discard the query's output, but if explaining CREATE TABLE
- * AS, we'd better use the appropriate tuple receiver.
- */
- if (into)
- dest = CreateIntoRelDestReceiver(into);
- else
- dest = None_Receiver;
-
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
-
- /* Select execution options */
- if (es->analyze)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_EXPLAIN_ONLY;
- if (into)
- eflags |= GetIntoRelEFlags(into);
-
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
-
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -658,8 +697,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
FreeQueryDesc(queryDesc);
- PopActiveSnapshot();
-
/* We need a CCI just in case query expanded to multiple plans */
if (es->analyze)
CommandCounterIncrement();
@@ -4854,6 +4891,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index b1509cc505..1493b99beb 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -780,11 +780,12 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
- ExecutorStart(qdesc, 0);
+ ExecutorStart(qdesc, 0, NULL);
ExecutorRun(qdesc, ForwardScanDirection, 0, true);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index fb30d2595c..e13b344ba3 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -409,12 +409,12 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
/* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, 0);
+ ExecutorStart(queryDesc, 0, NULL);
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 8a3cf98cce..9fd27bf07a 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -143,7 +143,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
/*
* Start execution, inserting parameters if any.
*/
- PortalStart(portal, params, 0, GetActiveSnapshot());
+ PortalStart(portal, params, 0, GetActiveSnapshot(), NULL);
Assert(portal->strategy == PORTAL_ONE_SELECT);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..c1fa1b72be 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ bool replan;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,6 +194,7 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
plan_list = cplan->stmt_list;
@@ -251,9 +253,16 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal contains a cached plan,
+ * it must be recreated if *replan is set.
*/
- PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ PortalStart(portal, paramLI, eflags, GetActiveSnapshot(), &replan);
+
+ if (replan)
+ {
+ MarkPortalFailed(portal);
+ goto replan;
+ }
(void) PortalRun(portal, count, false, true, dest, dest, qc);
@@ -574,7 +583,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -583,6 +592,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
instr_time planduration;
BufferUsage bufusage_start,
bufusage;
+ bool replan = true;
if (es->buffers)
bufusage_start = pgBufferUsage;
@@ -618,38 +628,57 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
- cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ while (replan)
+ {
+ cplan = GetCachedPlan(entry->plansource, paramLI,
+ CurrentResourceOwner, queryEnv);
- INSTR_TIME_SET_CURRENT(planduration);
- INSTR_TIME_SUBTRACT(planduration, planstart);
+ INSTR_TIME_SET_CURRENT(planduration);
+ INSTR_TIME_SUBTRACT(planduration, planstart);
- /* calc differences of buffer counters. */
- if (es->buffers)
- {
- memset(&bufusage, 0, sizeof(BufferUsage));
- BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
- }
+ /* calc differences of buffer counters. */
+ if (es->buffers)
+ {
+ memset(&bufusage, 0, sizeof(BufferUsage));
+ BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
+ }
- plan_list = cplan->stmt_list;
+ plan_list = cplan->stmt_list;
- /* Explain each query */
- foreach(p, plan_list)
- {
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ /* Explain each query */
+ foreach(p, plan_list)
+ {
+ PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
- if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
- else
- ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
- paramLI, queryEnv);
+ if (pstmt->commandType != CMD_UTILITY)
+ {
+ QueryDesc *queryDesc;
+
+ queryDesc = ExplainQueryDesc(pstmt, cplan, queryString,
+ into, es, paramLI, queryEnv,
+ &replan);
+ if (replan)
+ {
+ ExplainResetOutput(es);
+ break;
+ }
+ ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
+
+ /* One pushed by ExplainQueryDesc(). */
+ PopActiveSnapshot();
+ }
+ else
+ ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
+ paramLI, queryEnv);
- /* No need for CommandCounterIncrement, as ExplainOnePlan did it */
+ /* No need for CommandCounterIncrement, as ExplainOnePlan did it */
- /* Separate plans with an appropriate separator */
- if (lnext(plan_list, p) != NULL)
- ExplainSeparatePlans(es);
+ /* Separate plans with an appropriate separator */
+ if (lnext(plan_list, p) != NULL)
+ ExplainSeparatePlans(es);
+ }
}
if (estate)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index a5115b9c1f..45c999bcdb 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -119,6 +119,11 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
*
* eflags contains flag bits as described in executor.h.
*
+ * replan must be non-NULL when executing a cached query plan. On return,
+ * *replan is set if queryDesc->cplan is found to have been invalidated. In
+ * that case, callers must recreate the CachedPlan before retrying the
+ * execution.
+ *
* NB: the CurrentMemoryContext when this is called will become the parent
* of the per-query context used for this Executor invocation.
*
@@ -129,8 +134,10 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
* ----------------------------------------------------------------
*/
void
-ExecutorStart(QueryDesc *queryDesc, int eflags)
+ExecutorStart(QueryDesc *queryDesc, int eflags, bool *replan)
{
+ Assert(replan != NULL || queryDesc->cplan == NULL);
+
/*
* In some cases (e.g. an EXECUTE statement) a query execution will skip
* parse analysis, which means that the query_id won't be reported. Note
@@ -140,13 +147,13 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
if (ExecutorStart_hook)
- (*ExecutorStart_hook) (queryDesc, eflags);
+ (*ExecutorStart_hook) (queryDesc, eflags, replan);
else
- standard_ExecutorStart(queryDesc, eflags);
+ standard_ExecutorStart(queryDesc, eflags, replan);
}
void
-standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
+standard_ExecutorStart(QueryDesc *queryDesc, int eflags, bool *replan)
{
EState *estate;
MemoryContext oldcontext;
@@ -2797,7 +2804,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aa3f283453..5f97f5353f 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1249,8 +1249,13 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. Note that no CachedPlan is available
+ * here even if the containing plan tree may have come from one in the
+ * leader.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
@@ -1431,7 +1436,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
- ExecutorStart(queryDesc, fpes->eflags);
+ ExecutorStart(queryDesc, fpes->eflags, NULL);
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 50e06ec693..df37bfb4ed 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -843,6 +843,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL, /* fmgr_sql() doesn't use CachedPlans */
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -867,7 +868,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
eflags = EXEC_FLAG_SKIP_TRIGGERS;
else
eflags = 0; /* default run-to-completion flags */
- ExecutorStart(es->qd, eflags);
+ ExecutorStart(es->qd, eflags, NULL);
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 61f03e3999..9a3398b591 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -71,7 +71,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ bool replan;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,6 +1658,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
+replan:
cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
@@ -1766,9 +1768,16 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if *replan is set.
*/
- PortalStart(portal, paramLI, 0, snapshot);
+ PortalStart(portal, paramLI, 0, snapshot, &replan);
+
+ if (replan)
+ {
+ MarkPortalFailed(portal);
+ goto replan;
+ }
Assert(portal->strategy != PORTAL_MULTI_QUERY);
@@ -2548,6 +2557,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2657,6 +2667,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
+ bool replan = false;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2664,14 +2676,28 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+ ExecutorStart(qdesc, eflags, &replan);
+ if (replan)
+ {
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ goto replan;
+ }
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2846,10 +2872,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2893,14 +2918,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index af12c64878..7fb0d2d202 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -52,6 +52,7 @@ node_headers = \
access/tsmapi.h \
commands/event_trigger.h \
commands/trigger.h \
+ executor/execdesc.h \
executor/tuptable.h \
foreign/fdwapi.h \
nodes/bitmapset.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index b3c1ead496..74f83f12a6 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -63,6 +63,7 @@ my @all_input_files = qw(
access/tsmapi.h
commands/event_trigger.h
commands/trigger.h
+ executor/execdesc.h
executor/tuptable.h
foreign/fdwapi.h
nodes/bitmapset.h
@@ -87,6 +88,7 @@ my @nodetag_only_files = qw(
access/tsmapi.h
commands/event_trigger.h
commands/trigger.h
+ executor/execdesc.h
executor/tuptable.h
foreign/fdwapi.h
nodes/lockoptions.h
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 470b734e9e..1617b93ecc 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1195,7 +1195,7 @@ exec_simple_query(const char *query_string)
/*
* Start the portal. No parameters here.
*/
- PortalStart(portal, NULL, 0, InvalidSnapshot);
+ PortalStart(portal, NULL, 0, InvalidSnapshot, NULL);
/*
* Select the appropriate output format: text unless we are doing a
@@ -1597,6 +1597,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ bool replan;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1971,6 +1972,7 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
+replan:
cplan = GetCachedPlan(psrc, params, NULL, NULL);
/*
@@ -1993,7 +1995,13 @@ exec_bind_message(StringInfo input_message)
/*
* And we're ready to start portal execution.
*/
- PortalStart(portal, params, 0, InvalidSnapshot);
+ PortalStart(portal, params, 0, InvalidSnapshot, &replan);
+
+ if (replan)
+ {
+ MarkPortalFailed(portal);
+ goto replan;
+ }
/*
* Apply the result format requests to the portal.
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5f0248acc5..97de5c53e3 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -35,12 +36,6 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc);
static void FillPortalStore(Portal portal, bool isTopLevel);
static uint64 RunFromStore(Portal portal, ScanDirection direction, uint64 count,
DestReceiver *dest);
@@ -65,6 +60,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -75,8 +71,10 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
{
QueryDesc *qd = (QueryDesc *) palloc(sizeof(QueryDesc));
+ qd->type = T_QueryDesc;
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan, if plan is from one */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -116,86 +114,6 @@ FreeQueryDesc(QueryDesc *qdesc)
}
-/*
- * ProcessQuery
- * Execute a single plannable query within a PORTAL_MULTI_QUERY,
- * PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
- *
- * plan: the plan tree for the query
- * sourceText: the source text of the query
- * params: any parameters needed
- * dest: where to send results
- * qc: where to store the command completion status data.
- *
- * qc may be NULL if caller doesn't want a status string.
- *
- * Must be called in a memory context that will be reset or deleted on
- * error; otherwise the executor's memory usage will be leaked.
- */
-static void
-ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc)
-{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
-
- /*
- * Run the plan to completion.
- */
- ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
-
- /*
- * Build command completion status data, if caller wants one.
- */
- if (qc)
- {
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
- SetQueryCompletion(qc, CMDTAG_SELECT, queryDesc->estate->es_processed);
- break;
- case CMD_INSERT:
- SetQueryCompletion(qc, CMDTAG_INSERT, queryDesc->estate->es_processed);
- break;
- case CMD_UPDATE:
- SetQueryCompletion(qc, CMDTAG_UPDATE, queryDesc->estate->es_processed);
- break;
- case CMD_DELETE:
- SetQueryCompletion(qc, CMDTAG_DELETE, queryDesc->estate->es_processed);
- break;
- case CMD_MERGE:
- SetQueryCompletion(qc, CMDTAG_MERGE, queryDesc->estate->es_processed);
- break;
- default:
- SetQueryCompletion(qc, CMDTAG_UNKNOWN, queryDesc->estate->es_processed);
- break;
- }
- }
-
- /*
- * Now, we close down all the scans and free allocated resources.
- */
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
-
- FreeQueryDesc(queryDesc);
-}
-
/*
* ChoosePortalStrategy
* Select portal execution strategy given the intended statement list.
@@ -427,15 +345,16 @@ FetchStatementTargetList(Node *stmt)
* to be used for cursors).
*
* On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * tupdesc (if any) is known, unless *replan is set to true, in which case,
+ * the caller must retry after generating a new CachedPlan.
*/
void
PortalStart(Portal portal, ParamListInfo params,
- int eflags, Snapshot snapshot)
+ int eflags, Snapshot snapshot,
+ bool *replan)
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
int myeflags;
@@ -443,20 +362,21 @@ PortalStart(Portal portal, ParamListInfo params,
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_DEFINED);
+ if (replan)
+ *replan = false;
+
/*
* Set up global portal context pointers.
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -472,6 +392,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -493,6 +415,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -501,30 +424,48 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Call ExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated as we're doing that.
*/
- ExecutorStart(queryDesc, myeflags);
+ ExecutorStart(queryDesc, myeflags, replan);
+ if (replan && *replan)
+ {
+ Assert(queryDesc->cplan);
+ PopActiveSnapshot();
+ goto early_exit;
+ }
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -536,29 +477,6 @@ PortalStart(Portal portal, ParamListInfo params,
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -581,7 +499,61 @@ PortalStart(Portal portal, ParamListInfo params,
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool pushed_active_snapshot = false;
+
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /* Must set snapshot before starting executor. */
+ if (!pushed_active_snapshot && !is_utility)
+ {
+ PushActiveSnapshot(GetTransactionSnapshot());
+ pushed_active_snapshot = true;
+ }
+
+ /*
+ * Create the QueryDesc object. DestReceiver will
+ * be set in PortalRunMulti().
+ */
+ queryDesc = CreateQueryDesc(plan, portal->cplan,
+ portal->sourceText,
+ pushed_active_snapshot ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalMultiRun() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated as
+ * we're doing that.
+ */
+ if (!is_utility)
+ {
+ ExecutorStart(queryDesc, 0, replan);
+ if (replan && *replan)
+ {
+ Assert(queryDesc->cplan);
+ if (pushed_active_snapshot)
+ PopActiveSnapshot();
+ goto early_exit;
+ }
+ }
+ }
+
+ if (pushed_active_snapshot)
+ PopActiveSnapshot();
+ }
+
portal->tupDesc = NULL;
break;
}
@@ -594,19 +566,18 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+early_exit:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
-
- portal->status = PORTAL_READY;
}
/*
@@ -1193,7 +1164,7 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1185,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = lfirst_node(QueryDesc, qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1241,7 +1213,7 @@ PortalRunMulti(Portal portal,
*/
if (!active_snapshot_set)
{
- Snapshot snapshot = GetTransactionSnapshot();
+ Snapshot snapshot = qdesc->snapshot;
/* If told to, register the snapshot and save in portal */
if (setHoldSnapshot)
@@ -1271,23 +1243,38 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Run the plan to completion.
+ */
+ qdesc->dest = dest;
+ ExecutorRun(qdesc, ForwardScanDirection, 0L, true);
+
+ /*
+ * Build command completion status data if needed.
+ */
if (pstmt->canSetTag)
{
- /* statement can set tag string */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- dest, qc);
- }
- else
- {
- /* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- altdest, NULL);
+ switch (qdesc->operation)
+ {
+ case CMD_SELECT:
+ SetQueryCompletion(qc, CMDTAG_SELECT, qdesc->estate->es_processed);
+ break;
+ case CMD_INSERT:
+ SetQueryCompletion(qc, CMDTAG_INSERT, qdesc->estate->es_processed);
+ break;
+ case CMD_UPDATE:
+ SetQueryCompletion(qc, CMDTAG_UPDATE, qdesc->estate->es_processed);
+ break;
+ case CMD_DELETE:
+ SetQueryCompletion(qc, CMDTAG_DELETE, qdesc->estate->es_processed);
+ break;
+ case CMD_MERGE:
+ SetQueryCompletion(qc, CMDTAG_MERGE, qdesc->estate->es_processed);
+ break;
+ default:
+ SetQueryCompletion(qc, CMDTAG_UNKNOWN, qdesc->estate->es_processed);
+ break;
+ }
}
if (log_executor_stats)
@@ -1346,8 +1333,19 @@ PortalRunMulti(Portal portal,
* Increment command counter between queries, but not after the last
* one.
*/
- if (lnext(portal->stmts, stmtlist_item) != NULL)
+ if (lnext(portal->qdescs, qdesc_item) != NULL)
CommandCounterIncrement();
+
+ /* portal->queryDesc is free'd by PortalCleanup(). */
+ if (qdesc != portal->queryDesc)
+ {
+ if (qdesc->estate)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ }
+ FreeQueryDesc(qdesc);
+ }
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 06dfa85f04..3ad80c7ecb 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -201,6 +201,10 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /* initialize portal's query context to store QueryDescs */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -224,6 +228,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -594,6 +599,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 7c1071ddd1..ea35adfb3d 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,12 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt, struct CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv,
+ bool *replan);
+extern void ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
@@ -103,6 +108,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index af2bf36dfb..4b7368a0dc 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -32,9 +32,12 @@
*/
typedef struct QueryDesc
{
+ NodeTag type;
+
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ struct CachedPlan *cplan; /* CachedPlan, if plannedstmt is from one */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +60,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ struct CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index e7e25c057e..63f3d09804 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -62,7 +62,7 @@
/* Hook for plugins to get control in ExecutorStart() */
-typedef void (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
+typedef void (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags, bool *replan);
extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
/* Hook for plugins to get control in ExecutorRun() */
@@ -187,8 +187,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
-extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
-extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern void ExecutorStart(QueryDesc *queryDesc, int eflags, bool *replan);
+extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags, bool *replan);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count, bool execute_once);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index efe0834afb..a8fdd9e176 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -13,6 +13,7 @@ node_support_input_i = [
'access/tsmapi.h',
'commands/event_trigger.h',
'commands/trigger.h',
+ 'executor/execdesc.h',
'executor/tuptable.h',
'foreign/fdwapi.h',
'nodes/bitmapset.h',
diff --git a/src/include/tcop/pquery.h b/src/include/tcop/pquery.h
index a5e65b98aa..08783f1b43 100644
--- a/src/include/tcop/pquery.h
+++ b/src/include/tcop/pquery.h
@@ -30,7 +30,8 @@ extern List *FetchPortalTargetList(Portal portal);
extern List *FetchStatementTargetList(Node *stmt);
extern void PortalStart(Portal portal, ParamListInfo params,
- int eflags, Snapshot snapshot);
+ int eflags, Snapshot snapshot,
+ bool *replan);
extern void PortalSetResultFormat(Portal portal, int nFormats,
int16 *formats);
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aa08b1e0fc..af059e30f8 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,8 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
--
2.35.3
v31-0003-Move-CachedPlan-validation-locking-to-ExecutorSt.patchapplication/octet-stream; name=v31-0003-Move-CachedPlan-validation-locking-to-ExecutorSt.patchDownload
From 0522447f5816211ac3e32ebc6920d7f7805718d6 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 26 Jan 2023 10:52:24 +0900
Subject: [PATCH v31 3/3] Move CachedPlan validation locking to ExecutorStart()
---
src/backend/executor/execMain.c | 163 +++++++++++++++++++++++--
src/backend/executor/execParallel.c | 38 +++++-
src/backend/executor/execPartition.c | 90 +++++++++++---
src/backend/executor/execUtils.c | 8 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/optimizer/plan/setrefs.c | 36 ++++++
src/backend/utils/cache/plancache.c | 146 +++++++---------------
src/include/executor/execPartition.h | 8 +-
src/include/executor/execdesc.h | 6 +
src/include/executor/executor.h | 2 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/pathnodes.h | 4 +-
src/include/nodes/plannodes.h | 31 ++++-
src/include/utils/plancache.h | 1 +
15 files changed, 404 insertions(+), 146 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 45c999bcdb..68743d5f66 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -64,6 +65,7 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/partcache.h"
+#include "utils/plancache.h"
#include "utils/rls.h"
#include "utils/ruleutils.h"
#include "utils/snapmgr.h"
@@ -79,7 +81,12 @@ ExecutorEnd_hook_type ExecutorEnd_hook = NULL;
ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
/* decls for local routines only used within this module */
-static void InitPlan(QueryDesc *queryDesc, int eflags);
+static void InitPlan(QueryDesc *queryDesc, int eflags, bool *replan);
+static void ExecLockRelationsIfNeeded(QueryDesc *queryDesc, bool *replan);
+static Bitmapset *ExecDoInitialPartitionPruning(PlannedStmt *stmt,
+ EState *estate);
+static void AcquireExecutorLocks(Bitmapset *lockRelids, EState *estate,
+ bool acquire);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
static void ExecEndPlan(PlanState *planstate, EState *estate);
@@ -270,7 +277,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags, bool *replan)
/*
* Initialize the plan state tree
*/
- InitPlan(queryDesc, eflags);
+ InitPlan(queryDesc, eflags, replan);
MemoryContextSwitchTo(oldcontext);
}
@@ -801,7 +808,7 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
* ----------------------------------------------------------------
*/
static void
-InitPlan(QueryDesc *queryDesc, int eflags)
+InitPlan(QueryDesc *queryDesc, int eflags, bool *replan)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
@@ -814,19 +821,26 @@ InitPlan(QueryDesc *queryDesc, int eflags)
int i;
/*
- * Do permissions checks and save the list for later use.
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
- estate->es_rteperminfos = plannedstmt->permInfos;
-
- /*
- * initialize the node's execution state
+ * Initialize es_range_table and es_relations.
*/
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ /*
+ * Acquire locks on relations referenced in the plan if it comes
+ * from a CachedPlan after performing "initial" partition pruning.
+ * Results of pruning, if any, are saved in es_part_prune_results.
+ */
+ ExecLockRelationsIfNeeded(queryDesc, replan);
+
+ /*
+ * Do permissions checks and save the list for later use.
+ */
+ ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
+ estate->es_rteperminfos = plannedstmt->permInfos;
+
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
*/
@@ -982,6 +996,133 @@ InitPlan(QueryDesc *queryDesc, int eflags)
queryDesc->planstate = planstate;
}
+/*
+ * ExecLockRelationsIfNeeded
+ * Lock relations that a query's plan depends on if the plan comes
+ * from a CachedPlan
+ *
+ * On return, we have all acquired the locks needed to run the plan.
+ * Also *replan is set to true if acquiring those locks would have invalidated
+ * the CachedPlan.
+ */
+static void
+ExecLockRelationsIfNeeded(QueryDesc *queryDesc, bool *replan)
+{
+ PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ EState *estate = queryDesc->estate;
+ CachedPlan *cplan = queryDesc->cplan;
+ Bitmapset *allLockRelids;
+
+ /* Nothing to do if the plan tree is not cached. */
+ if (cplan == NULL || cplan->is_oneshot)
+ return;
+
+ Assert(plannedstmt);
+ Assert(replan);
+ *replan = false;
+
+ /*
+ * Temporarily signal to ExecGetRangeTableRelation() that it must take
+ * take a lock. This is needed for CreatePartitionPruneState() to be
+ * able to open parent partitioned tables using
+ * ExecGetRangeTableRelation().
+ */
+ estate->es_top_eflags |= EXEC_FLAG_GET_LOCKS;
+
+ allLockRelids = plannedstmt->minLockRelids;
+ if (plannedstmt->containsInitialPruning)
+ {
+ Bitmapset *partRelids = ExecDoInitialPartitionPruning(plannedstmt,
+ estate);
+
+ allLockRelids = bms_add_members(allLockRelids, partRelids);
+ }
+
+ /* Done with it. */
+ estate->es_top_eflags &= ~EXEC_FLAG_GET_LOCKS;
+
+ /* Acquire locks. */
+ AcquireExecutorLocks(allLockRelids, estate, true);
+
+ /* Check if acquiring those locks has invalidated the plan. */
+ *replan = !CachedPlanStillValid(cplan);
+
+ /* Release useless locks if needed. */
+ if (*replan)
+ AcquireExecutorLocks(allLockRelids, estate, false);
+}
+
+/*
+ * ExecDoInitialPartitionPruning
+ * Perform initial partition pruning if needed by the plan
+ *
+ * The return value is the set of RT indexes of surviving partitions.
+ * A list of PartitionPruneResult with an element for each in
+ * plannedstmt->partPruneInfos is saved in estate->es_part_prune_results.
+ */
+static Bitmapset *
+ExecDoInitialPartitionPruning(PlannedStmt *plannedstmt, EState *estate)
+{
+ ListCell *lc;
+ Bitmapset *lockPartRelids = NULL;
+
+ Assert(plannedstmt->containsInitialPruning);
+ Assert(plannedstmt->partPruneInfos);
+
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+ PartitionPruneResult *pruneresult;
+ Bitmapset *validsubplans;
+
+ /* No PlanState here; unnecessary for "initial" pruning. */
+ prunestate = ExecCreatePartitionPruneState(NULL, estate, pruneinfo,
+ true, false);
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &lockPartRelids);
+
+ pruneresult = makeNode(PartitionPruneResult);
+ pruneresult->root_parent_relids = bms_copy(pruneinfo->root_parent_relids);
+ pruneresult->validsubplans = bms_copy(validsubplans);
+ estate->es_part_prune_results = lappend(estate->es_part_prune_results,
+ pruneresult);
+ }
+
+ return lockPartRelids;
+}
+
+/*
+ * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
+ * or release them if acquire is false.
+ */
+static void
+AcquireExecutorLocks(Bitmapset *lockRelids, EState *estate, bool acquire)
+{
+ int rti;
+
+ rti = -1;
+ while ((rti = bms_next_member(lockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rti, estate);
+
+ if (!(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
+ continue;
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
/*
* Check that a proposed result relation is a legal target for the operation
*
@@ -1396,7 +1537,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked by the planner.
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 1f5d6d4d64..5c967451ce 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -599,12 +600,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -633,6 +637,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -659,6 +664,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized List of PartitionPruneResult. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -753,6 +763,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized List of PartitionPruneResult */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS,
+ part_prune_results_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1234,8 +1250,11 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
char *paramspace;
PlannedStmt *pstmt;
+ QueryDesc *queryDesc;
+ List *part_prune_results;
ParamListInfo paramLI;
char *queryString;
@@ -1246,6 +1265,11 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
@@ -1255,11 +1279,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
* here even if the containing plan tree may have come from one in the
* leader.
*/
- return CreateQueryDesc(pstmt,
- NULL,
- queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- receiver, paramLI, NULL, instrument_options);
+ queryDesc = CreateQueryDesc(pstmt,
+ NULL,
+ queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ receiver, paramLI, NULL, instrument_options);
+
+ queryDesc->part_prune_results = part_prune_results;
+
+ return queryDesc;
}
/*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 4b91bb7403..09e0d7aa9c 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -196,7 +196,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1782,8 +1783,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * That set is computed by either performing the "initial pruning" here or
+ * reusing the one present in EState.es_part_prune_results[part_prune_index]
+ * if it has been set, which it would be if ExecDoInitialPartitionPruning()
+ * would have done the initial pruning.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1796,9 +1799,10 @@ ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo;
+ PartitionPruneResult *pruneresult = NULL;
/* Obtain the pruneinfo we need, and make sure it's the right one */
pruneinfo = list_nth(estate->es_part_prune_infos, part_prune_index);
@@ -1814,22 +1818,56 @@ ExecInitPartitionPruning(PlanState *planstate,
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = ExecCreatePartitionPruneState(planstate, estate, pruneinfo,
- pruneinfo->needs_init_pruning,
- pruneinfo->needs_exec_pruning);
+ /* Initial pruning already done if es_part_prune_results has been set. */
+ if (estate->es_part_prune_results)
+ {
+ pruneresult = list_nth_node(PartitionPruneResult,
+ estate->es_part_prune_results,
+ part_prune_index);
+ if (!bms_equal(root_parent_relids, pruneinfo->root_parent_relids))
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("mismatching PartitionPruneInfo and PartitionPruneResult at part_prune_index %d",
+ part_prune_index),
+ errdetail_internal("prunresult relids %s, pruneinfo relids %s",
+ bmsToString(pruneresult->root_parent_relids),
+ bmsToString(pruneinfo->root_parent_relids)));
+ }
+
+ if (pruneresult == NULL || pruneinfo->needs_exec_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = ExecCreatePartitionPruneState(planstate, estate,
+ pruneinfo,
+ pruneresult == NULL,
+ pruneinfo->needs_exec_pruning);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans = bms_copy(pruneresult->validsubplans);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1837,7 +1875,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -2295,10 +2334,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2333,10 +2376,10 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
- if (pprune->exec_pruning_steps)
+ if (pprune->exec_pruning_steps && !initial_prune)
ResetExprContext(pprune->exec_context.exprcontext);
}
@@ -2347,6 +2390,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2357,13 +2402,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2390,8 +2437,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2399,7 +2452,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c33a3c0bec..035ed8a872 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -140,6 +140,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_results = NIL;
estate->es_query_cxt = qcontext;
@@ -800,7 +801,12 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ /*
+ * Must take a lock on the relation if we got here by way of
+ * ExecLockRelationsIfNeeded().
+ */
+ if (!IsParallelWorker() &&
+ (estate->es_top_eflags & EXEC_FLAG_GET_LOCKS) == 0)
{
/*
* In a normal query, we should already have the appropriate lock,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index cb25499b3f..2f585793da 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -156,7 +156,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -578,7 +579,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -643,7 +644,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -718,7 +719,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -869,7 +870,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 399b39c598..c653084515 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -104,7 +104,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -219,7 +220,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index b4fa8d90bc..ff363be811 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -372,6 +372,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
ListCell *l;
+ Bitmapset *leafpart_rtis = NULL;
pruneinfo->root_parent_relids =
offset_relid_set(pruneinfo->root_parent_relids, rtoffset);
@@ -383,17 +384,52 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the set of relations to be
+ * locked by AcquireExecutorLocks(). The actual set of leaf
+ * partitions to be locked is computed by
+ * ExecLockRelationsIfNeeded().
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
glob->containsInitialPruning |= pruneinfo->needs_init_pruning;
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index f113170140..af5e9b1609 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -100,13 +100,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -787,9 +787,6 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
- *
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -803,60 +800,69 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
+ return cplan->is_valid;
+}
- return false;
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Called by the executor after it has finished taking locks on a plan tree
+ * in a CachedPlan.
+ */
+bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return GenericPlanIsValid(cplan);
}
/*
@@ -1126,9 +1132,6 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
- *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1362,6 +1365,7 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
/*
* Reject if AcquireExecutorLocks would have anything to do. This is
* probably unnecessary given the previous check, but let's be safe.
+ * XXX - maybe remove?
*/
foreach(lc, plan->stmt_list)
{
@@ -1735,62 +1739,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- Bitmapset *allLockRelids;
- int rti;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- Assert(plannedstmt->minLockRelids == NULL);
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- allLockRelids = plannedstmt->minLockRelids;
- rti = -1;
- while ((rti = bms_next_member(allLockRelids, rti)) > 0)
- {
- RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 21d85a7809..526f5781da 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -133,5 +133,11 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
bool consider_initial_steps,
bool consider_exec_steps);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
+ EState *estate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 4b7368a0dc..595297df6c 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -46,6 +46,12 @@ typedef struct QueryDesc
QueryEnvironment *queryEnv; /* query environment passed in */
int instrument_options; /* OR of InstrumentOption flags */
+ /*
+ * Used by ExecParallelGetQueryDesc() to save the result of initial
+ * partition pruning performed by the leader.
+ */
+ List *part_prune_results; /* list of PartitionPruneResult */
+
/* These fields are set by ExecutorStart */
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 63f3d09804..755e231675 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -59,6 +59,8 @@
#define EXEC_FLAG_MARK 0x0008 /* need mark/restore */
#define EXEC_FLAG_SKIP_TRIGGERS 0x0010 /* skip AfterTrigger calls */
#define EXEC_FLAG_WITH_NO_DATA 0x0020 /* rel scannability doesn't matter */
+#define EXEC_FLAG_GET_LOCKS 0x0400 /* should ExecGetRangeTableRelation()
+ * lock relations? */
/* Hook for plugins to get control in ExecutorStart() */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 20f4c8b35f..b361592e2d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -620,6 +620,7 @@ typedef struct EState
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_results; /* QueryDesc.part_prune_results */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d00b5dcb03..83e5c665c7 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -134,8 +134,8 @@ typedef struct PlannerGlobal
bool containsInitialPruning;
/*
- * Indexes of all range table entries; for AcquireExecutorLocks()'s
- * perusal.
+ * Indexes of all range table entries except those of leaf partitions
+ * scanned by prunable subplans; for AcquireExecutorLocks() perusal.
*/
Bitmapset *minLockRelids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 7b53f990e0..e76e945c8c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -82,8 +82,9 @@ typedef struct PlannedStmt
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
- Bitmapset *minLockRelids; /* Indexes of all range table entries; for
- * AcquireExecutorLocks()'s perusal */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries except
+ * those of leaf partitions scanned by
+ * prunable subplans */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1575,6 +1576,32 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecPartitionDoInitialPruning() on a given
+ * PartitionPruneInfo.
+ *
+ * root_parent_relids is same as PartitionPruneInfo.root_parent_relids. It's
+ * there for cross-checking in ExecInitPartitionPruning() that the
+ * PartitionPruneResult and the PartitionPruneInfo at a given index in
+ * EState.es_part_prune_results and EState.es_part_prune_infos, respectively,
+ * belong to the same parent plan node.
+ *
+ * validsubplans contains the indexes of subplans remaining after performing
+ * initial pruning by calling ExecFindMatchingSubPlans() on the
+ * PartitionPruneInfo.
+ *
+ * This is used to store the result of initial partition pruning that is
+ * peformed in ExecDoInitialPartitionPruning().
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *root_parent_relids;
+ Bitmapset *validsubplans;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a443181d41..7c664bad35 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -221,6 +221,7 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+extern bool CachedPlanStillValid(CachedPlan *cplan);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
--
2.35.3
v31-0002-Preparatory-refactoring-before-reworking-CachedP.patchapplication/octet-stream; name=v31-0002-Preparatory-refactoring-before-reworking-CachedP.patchDownload
From 936ef111a9b515c0d0111637a22959ea62e92b5d Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 13 Dec 2022 11:58:07 +0900
Subject: [PATCH v31 2/3] Preparatory refactoring before reworking CachedPlan
locking
Remember the RT indexes of RTEs that AcquireExecutorLocks() must
look at to consider locking in a bitmapset, so that nstead of looping
over the range table to find those RTEs, it can look them up using
the RT indexes set in the bitmapset.
This also adds some extra information related to execution-time
pruning to the relevant plan nodes.
---
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 34 +++++++++++++++-------
src/backend/nodes/readfuncs.c | 8 ++++--
src/backend/optimizer/plan/planner.c | 2 ++
src/backend/optimizer/plan/setrefs.c | 12 ++++++++
src/backend/partitioning/partprune.c | 42 ++++++++++++++++++++++++++--
src/backend/utils/cache/plancache.c | 10 +++++--
src/include/executor/execPartition.h | 6 ++++
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 11 ++++++++
src/include/nodes/plannodes.h | 19 +++++++++++++
11 files changed, 128 insertions(+), 18 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 5f97f5353f..1f5d6d4d64 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -182,6 +182,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false; /* workers need not know! */
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 651ad24fc1..4b91bb7403 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -184,8 +184,6 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
-static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1759,6 +1757,11 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecCreatePartitionPruneState:
+ * A sub-routine of ExecInitPartitionPruning() that creates the
+ * PartitionPruneState from a given PartitionPruneInfo. Exported for the
+ * use by callers that don't need to do ExecInitPartitionPruning().
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1812,7 +1815,9 @@ ExecInitPartitionPruning(PlanState *planstate,
ExecAssignExprContext(estate, planstate);
/* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ prunestate = ExecCreatePartitionPruneState(planstate, estate, pruneinfo,
+ pruneinfo->needs_init_pruning,
+ pruneinfo->needs_exec_pruning);
/*
* Perform an initial partition prune pass, if required.
@@ -1849,7 +1854,7 @@ ExecInitPartitionPruning(PlanState *planstate,
}
/*
- * CreatePartitionPruneState
+ * ExecCreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
* 'planstate' is the parent plan node's execution state.
@@ -1865,15 +1870,18 @@ ExecInitPartitionPruning(PlanState *planstate,
* re-evaluate which partitions match the pruning steps provided in each
* PartitionedRelPruneInfo.
*/
-static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+PartitionPruneState *
+ExecCreatePartitionPruneState(PlanState *planstate, EState *estate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps)
{
- EState *estate = planstate->state;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
+ ExprContext *econtext = planstate ? planstate->ps_ExprContext :
+ GetPerTupleExprContext(estate);
/* For data reading, executor always omits detached partitions */
if (estate->es_partition_directory == NULL)
@@ -1955,6 +1963,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1965,6 +1974,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2015,6 +2026,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2022,6 +2035,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -2043,7 +2057,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2053,7 +2067,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index f3629cdfd1..caf2a60493 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -800,7 +805,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 05f44faf6e..2b7238cf24 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -525,8 +525,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
result->permInfos = glob->finalrteperminfos;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 85ba9d1ca1..b4fa8d90bc 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -279,6 +279,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -377,9 +387,11 @@ set_plan_references(PlannerInfo *root, Plan *plan)
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
}
+
}
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ glob->containsInitialPruning |= pruneinfo->needs_init_pruning;
}
return result;
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 510145e3c0..9ae41053da 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -342,6 +353,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->root_parent_relids = parentrel->relids;
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -442,13 +455,19 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate whether
+ * the pruning steps contained in the returned PartitionedRelPruneInfos
+ * can be performed during executor startup and during execution,
+ * respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -459,6 +478,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -546,6 +569,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -620,6 +646,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -647,6 +679,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -659,6 +692,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -673,6 +707,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -697,6 +732,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 77c2ba3f8f..f113170140 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1747,7 +1747,8 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ Bitmapset *allLockRelids;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1760,14 +1761,17 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
*/
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+ Assert(plannedstmt->minLockRelids == NULL);
if (query)
ScanQueryForLocks(query, acquire);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ allLockRelids = plannedstmt->minLockRelids;
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (!(rte->rtekind == RTE_RELATION ||
(rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index ee487e42dd..21d85a7809 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -126,6 +128,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
+extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate, EState *estate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 10752e8011..1de8f3fadc 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -218,6 +218,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 2d1d8f4bcd..d00b5dcb03 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -128,6 +128,17 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries; for AcquireExecutorLocks()'s
+ * perusal.
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c1234fcf36..7b53f990e0 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,11 +73,18 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in the
* plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries; for
+ * AcquireExecutorLocks()'s perusal */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1417,6 +1424,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1428,6 +1442,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
Bitmapset *root_parent_relids;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1472,6 +1488,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
--
2.35.3
On Fri, Jan 27, 2023 at 4:01 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Fri, Jan 20, 2023 at 12:52 PM Amit Langote <amitlangote09@gmail.com> wrote:
Alright, I'll try to get something out early next week. Thanks for
all the pointers.Sorry for the delay. Attached is what I've come up with so far.
I didn't actually go with calling the plancache on every lock taken on
a relation, that is, in ExecGetRangeTableRelation(). One thing about
doing it that way that I didn't quite like (or didn't see a clean
enough way to code) is the need to complicate the ExecInitNode()
traversal for handling the abrupt suspension of the ongoing setup of
the PlanState tree.
OK, I gave this one more try and attached is what I came up with.
This adds a ExecPlanStillValid(), which is called right after anything
that may in turn call ExecGetRangeTableRelation() which has been
taught to lock a relation if EXEC_FLAG_GET_LOCKS has been passed in
EState.es_top_eflags. That includes all ExecInitNode() calls, and a
few other functions that call ExecGetRangeTableRelation() directly,
such as ExecOpenScanRelation(). If ExecPlanStillValid() returns
false, that is, if EState.es_cachedplan is found to have been
invalidated after a lock being taken by ExecGetRangeTableRelation(),
whatever funcion called it must return immediately and so must its
caller and so on. ExecEndPlan() seems to be able to clean up after a
partially finished attempt of initializing a PlanState tree in this
way. Maybe my preliminary testing didn't catch cases where pointers
to resources that are normally put into the nodes of a PlanState tree
are now left dangling, because a partially built PlanState tree is not
accessible to ExecEndPlan; QueryDesc.planstate would remain NULL in
such cases. Maybe there's only es_tupleTable and es_relations that
needs to be explicitly released and the rest is taken care of by
resetting the ExecutorState context.
On testing, I'm afraid we're going to need something like
src/test/modules/delay_execution to test that concurrent changes to
relation(s) in PlannedStmt.relationOids that occur somewhere between
RevalidateCachedQuery() and InitPlan() result in the latter to be
aborted and that it is handled correctly. It seems like it is only
the locking of partitions (that are not present in an unplanned Query
and thus not protected by AcquirePlannerLocks()) that can trigger
replanning of a CachedPlan, so any tests we write should involve
partitions. Should this try to test as many plan shapes as possible
though given the uncertainty around ExecEndPlan() robustness or should
manual auditing suffice to be sure that nothing's broken?
On possibly needing to move permission checking to occur *after*
taking locks, I realized that we don't really need to, because no
relation that needs its permissions should be unlocked by the time we
get to ExecCheckPermissions(); note we only check permissions of
tables that are present in the original parse tree and
RevalidateCachedQuery() should have locked those. I found a couple of
exceptions to that invariant in that views sometimes appear not to be
in the set of relations that RevalidateCachedQuery() locks. So, I
invented PlannedStmt.viewRelations, a list of RT indexes of view RTEs
that is populated in setrefs.c. ExecLockViewRelations() called before
ExecCheckPermissions() locks those.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v32-0001-Move-AcquireExecutorLocks-s-responsibility-into-.patchapplication/octet-stream; name=v32-0001-Move-AcquireExecutorLocks-s-responsibility-into-.patchDownload
From d48cb6fe06f7d3d98adb36299966daff7df25a3b Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 20 Jan 2023 16:52:31 +0900
Subject: [PATCH v32] Move AcquireExecutorLocks()'s responsibility into the
executor
---
contrib/postgres_fdw/postgres_fdw.c | 4 +
src/backend/commands/copyto.c | 4 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 142 ++++++----
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 16 +-
src/backend/commands/prepare.c | 29 +-
src/backend/executor/execMain.c | 98 ++++++-
src/backend/executor/execParallel.c | 7 +-
src/backend/executor/execPartition.c | 4 +
src/backend/executor/execProcnode.c | 5 +
src/backend/executor/execUtils.c | 5 +-
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeAgg.c | 2 +
src/backend/executor/nodeAppend.c | 4 +
src/backend/executor/nodeBitmapAnd.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 4 +
src/backend/executor/nodeBitmapOr.c | 2 +
src/backend/executor/nodeCustom.c | 2 +
src/backend/executor/nodeForeignscan.c | 4 +
src/backend/executor/nodeGather.c | 2 +
src/backend/executor/nodeGatherMerge.c | 2 +
src/backend/executor/nodeGroup.c | 2 +
src/backend/executor/nodeHash.c | 2 +
src/backend/executor/nodeHashjoin.c | 4 +
src/backend/executor/nodeIncrementalSort.c | 2 +
src/backend/executor/nodeIndexonlyscan.c | 2 +
src/backend/executor/nodeIndexscan.c | 2 +
src/backend/executor/nodeLimit.c | 2 +
src/backend/executor/nodeLockRows.c | 2 +
src/backend/executor/nodeMaterial.c | 2 +
src/backend/executor/nodeMemoize.c | 2 +
src/backend/executor/nodeMergeAppend.c | 4 +
src/backend/executor/nodeMergejoin.c | 4 +
src/backend/executor/nodeModifyTable.c | 7 +
src/backend/executor/nodeNestloop.c | 4 +
src/backend/executor/nodeProjectSet.c | 2 +
src/backend/executor/nodeRecursiveunion.c | 4 +
src/backend/executor/nodeResult.c | 2 +
src/backend/executor/nodeSamplescan.c | 2 +
src/backend/executor/nodeSeqscan.c | 2 +
src/backend/executor/nodeSetOp.c | 2 +
src/backend/executor/nodeSort.c | 2 +
src/backend/executor/nodeSubqueryscan.c | 2 +
src/backend/executor/nodeTidrangescan.c | 2 +
src/backend/executor/nodeTidscan.c | 2 +
src/backend/executor/nodeUnique.c | 2 +
src/backend/executor/nodeWindowAgg.c | 2 +
src/backend/executor/spi.c | 44 +++-
src/backend/nodes/Makefile | 1 +
src/backend/nodes/gen_node_support.pl | 2 +
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 5 +
src/backend/rewrite/rewriteHandler.c | 7 +-
src/backend/storage/lmgr/lmgr.c | 45 ++++
src/backend/tcop/postgres.c | 8 +
src/backend/tcop/pquery.c | 291 +++++++++++----------
src/backend/utils/cache/lsyscache.c | 21 ++
src/backend/utils/cache/plancache.c | 134 +++-------
src/backend/utils/mmgr/portalmem.c | 6 +
src/include/commands/explain.h | 7 +-
src/include/executor/execdesc.h | 5 +
src/include/executor/executor.h | 12 +
src/include/nodes/execnodes.h | 2 +
src/include/nodes/meson.build | 1 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 3 +
src/include/storage/lmgr.h | 1 +
src/include/utils/lsyscache.h | 1 +
src/include/utils/plancache.h | 15 ++
src/include/utils/portal.h | 4 +
72 files changed, 698 insertions(+), 332 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index f5926ab89d..93f3f8b5d1 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2659,7 +2659,11 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
/* Get info about foreign table. */
rtindex = node->resultRelInfo->ri_RangeTableIndex;
if (fsplan->scan.scanrelid == 0)
+ {
dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (!ExecPlanStillValid(estate))
+ return;
+ }
else
dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 8043b4e9b1..a438c547e8 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,8 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL,
+ pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -569,6 +570,7 @@ BeginCopyTo(ParseState *pstate,
* ExecutorStart computes a result tupdesc for us
*/
ExecutorStart(cstate->queryDesc, 0);
+ Assert(cstate->queryDesc->plan_valid);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index d6c6d514f3..a55b851574 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index fbbf28cf06..8fdc966a73 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -384,6 +384,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
else
{
PlannedStmt *plan;
+ QueryDesc *queryDesc;
instr_time planstart,
planduration;
BufferUsage bufusage_start,
@@ -406,12 +407,89 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
+ queryDesc = ExplainQueryDesc(plan, NULL, queryString, into, es,
+ params, queryEnv);
+ Assert(queryDesc);
+
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(queryDesc, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
+/*
+ * ExplainQueryDesc
+ * Set up QueryDesc for EXPLAINing a given plan
+ *
+ * This returns NULL if cplan is found to have been invalidated since its
+ * creation.
+ */
+QueryDesc *
+ExplainQueryDesc(PlannedStmt *stmt, CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv)
+{
+ QueryDesc *queryDesc;
+ DestReceiver *dest;
+ int eflags;
+ int instrument_option = 0;
+
+ /*
+ * Normally we discard the query's output, but if explaining CREATE TABLE
+ * AS, we'd better use the appropriate tuple receiver.
+ */
+ if (into)
+ dest = CreateIntoRelDestReceiver(into);
+ else
+ dest = None_Receiver;
+
+ if (es->analyze && es->timing)
+ instrument_option |= INSTRUMENT_TIMER;
+ else if (es->analyze)
+ instrument_option |= INSTRUMENT_ROWS;
+
+ if (es->buffers)
+ instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
+
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+
+ /* Create a QueryDesc for the query */
+ queryDesc = CreateQueryDesc(stmt, cplan, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv, instrument_option);
+
+ /* Select execution options */
+ if (es->analyze)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_EXPLAIN_ONLY;
+ if (into)
+ eflags |= GetIntoRelEFlags(into);
+
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated as we're doing that.
+ */
+ ExecutorStart(queryDesc, eflags);
+ if (!queryDesc->plan_valid)
+ {
+ /* Clean up. */
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ return NULL;
+ }
+
+ return queryDesc;
+}
+
/*
* ExplainOneUtility -
* print out the execution plan for one utility statement
@@ -515,29 +593,16 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
{
- DestReceiver *dest;
- QueryDesc *queryDesc;
instr_time starttime;
double totaltime = 0;
- int eflags;
- int instrument_option = 0;
-
- Assert(plannedstmt->commandType != CMD_UTILITY);
- if (es->analyze && es->timing)
- instrument_option |= INSTRUMENT_TIMER;
- else if (es->analyze)
- instrument_option |= INSTRUMENT_ROWS;
-
- if (es->buffers)
- instrument_option |= INSTRUMENT_BUFFERS;
- if (es->wal)
- instrument_option |= INSTRUMENT_WAL;
+ Assert(queryDesc->plannedstmt->commandType != CMD_UTILITY);
/*
* We always collect timing for the entire statement, even when node-level
@@ -546,38 +611,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
- /*
- * Normally we discard the query's output, but if explaining CREATE TABLE
- * AS, we'd better use the appropriate tuple receiver.
- */
- if (into)
- dest = CreateIntoRelDestReceiver(into);
- else
- dest = None_Receiver;
-
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
-
- /* Select execution options */
- if (es->analyze)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_EXPLAIN_ONLY;
- if (into)
- eflags |= GetIntoRelEFlags(into);
-
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
-
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -4851,6 +4884,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index b1509cc505..e2f79cc7a7 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -780,6 +780,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index fb30d2595c..17d457ccfb 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -409,7 +409,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 8a3cf98cce..3c34ab4351 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -146,6 +146,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
PortalStart(portal, params, 0, GetActiveSnapshot());
Assert(portal->strategy == PORTAL_ONE_SELECT);
+ Assert(portal->plan_valid);
/*
* We're done; the query won't actually be run until PerformPortalFetch is
@@ -249,6 +250,17 @@ PerformPortalClose(const char *name)
PortalDrop(portal, false);
}
+/*
+ * Release a portal's QueryDesc.
+ */
+void
+PortalQueryFinish(QueryDesc *queryDesc)
+{
+ ExecutorFinish(queryDesc);
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+}
+
/*
* PortalCleanup
*
@@ -295,9 +307,7 @@ PortalCleanup(Portal portal)
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
- FreeQueryDesc(queryDesc);
+ PortalQueryFinish(queryDesc);
CurrentResourceOwner = saveResourceOwner;
}
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..6c72b46f07 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -183,6 +183,7 @@ ExecuteQuery(ParseState *pstate,
paramLI = EvaluateParams(pstate, entry, stmt->params, estate);
}
+replan:
/* Create a new portal to run the query in */
portal = CreateNewPortal();
/* Don't display the portal in pg_cursors, it is for internal use only */
@@ -251,10 +252,17 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal contains a cached plan,
+ * it must be recreated if *replan is set.
*/
PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
(void) PortalRun(portal, count, false, true, dest, dest, qc);
PortalDrop(portal, false);
@@ -574,7 +582,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -618,6 +626,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI,
CurrentResourceOwner, queryEnv);
@@ -639,8 +648,20 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ {
+ QueryDesc *queryDesc;
+
+ queryDesc = ExplainQueryDesc(pstmt, cplan, queryString,
+ into, es, paramLI, queryEnv);
+ if (queryDesc == NULL)
+ {
+ ExplainResetOutput(es);
+ goto replan;
+ }
+ ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
+ }
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index a5115b9c1f..47bc6a1f3a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -119,6 +119,11 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
*
* eflags contains flag bits as described in executor.h.
*
+ * replan must be non-NULL when executing a cached query plan. On return,
+ * *replan is set if queryDesc->cplan is found to have been invalidated. In
+ * that case, callers must recreate the CachedPlan before retrying the
+ * execution.
+ *
* NB: the CurrentMemoryContext when this is called will become the parent
* of the per-query context used for this Executor invocation.
*
@@ -131,6 +136,10 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
void
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ /* Take locks if the plan tree comes from a CachedPlan. */
+ if (queryDesc->cplan)
+ eflags |= EXEC_FLAG_GET_LOCKS;
+
/*
* In some cases (e.g. an EXECUTE statement) a query execution will skip
* parse analysis, which means that the query_id won't be reported. Note
@@ -582,6 +591,16 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
RTEPermissionInfo *perminfo = lfirst_node(RTEPermissionInfo, l);
Assert(OidIsValid(perminfo->relid));
+
+ /*
+ * Relations whose permissions need to be checked must already have
+ * been locked by the parser or by AcquirePlannerLocks() if a
+ * cached plan is being executed.
+ * XXX shouldn't we skip calling ExecCheckPermissions from InitPlan
+ * in a parallel worker?
+ */
+ Assert(CheckRelLockedByMe(perminfo->relid, AccessShareLock, true) ||
+ IsParallelWorker());
result = ExecCheckOneRelPerms(perminfo);
if (!result)
{
@@ -785,12 +804,43 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
+/*
+ * Lock view relations in a given query's range table.
+ */
+static void
+ExecLockViewRelations(List *viewRelations, EState *estate, bool acquire)
+{
+ ListCell *lc;
+
+ foreach(lc, viewRelations)
+ {
+ Index rti = lfirst_int(lc);
+ RangeTblEntry *rte = exec_rt_fetch(rti, estate);
+
+ Assert(OidIsValid(rte->relid));
+ Assert(rte->relkind == RELKIND_VIEW);
+ Assert(rte->rellockmode != NoLock);
+
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
/* ----------------------------------------------------------------
* InitPlan
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ * If queryDesc contains a CachedPlan, this takes locks on relations.
+ * If any of those relations have undergone concurrent schema changes
+ * between successfully performing RevalidateCachedQuery() on the
+ * containing CachedPlanSource and here, locking those relations would
+ * invalidate the CachedPlan by way of PlanCacheRelCallback(). In that
+ * case, queryDesc->plan_valid would be set to false to tell the caller
+ * to retry after creating a new CachedPlan.
* ----------------------------------------------------------------
*/
static void
@@ -807,17 +857,21 @@ InitPlan(QueryDesc *queryDesc, int eflags)
int i;
/*
- * Do permissions checks and save the list for later use.
+ * initialize the node's execution state
*/
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
- estate->es_rteperminfos = plannedstmt->permInfos;
+ ExecInitRangeTable(estate, rangeTable);
+
+ if (eflags & EXEC_FLAG_GET_LOCKS)
+ ExecLockViewRelations(plannedstmt->viewRelations, estate, true);
/*
- * initialize the node's execution state
+ * Do permissions checks and save the list for later use.
*/
- ExecInitRangeTable(estate, rangeTable);
+ ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
+ estate->es_rteperminfos = plannedstmt->permInfos;
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = queryDesc->cplan;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
@@ -850,6 +904,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -917,6 +973,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
sp_eflags |= EXEC_FLAG_REWIND;
subplanstate = ExecInitNode(subplan, estate, sp_eflags);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
@@ -930,6 +988,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -973,6 +1033,19 @@ InitPlan(QueryDesc *queryDesc, int eflags)
queryDesc->tupDesc = tupType;
queryDesc->planstate = planstate;
+ queryDesc->plan_valid = true;
+ return;
+
+failed:
+ /*
+ * Plan initialization failed. Mark QueryDesc as such and release useless
+ * locks.
+ */
+ queryDesc->plan_valid = false;
+ if (eflags & EXEC_FLAG_GET_LOCKS)
+ ExecLockViewRelations(plannedstmt->viewRelations, estate, false);
+ /* Also ask ExecCloseRangeTableRelations() to release locks. */
+ estate->es_top_eflags |= EXEC_FLAG_REL_LOCKS;
}
/*
@@ -1389,7 +1462,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked.
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
@@ -1558,7 +1631,8 @@ ExecCloseResultRelations(EState *estate)
/*
* Close all relations opened by ExecGetRangeTableRelation().
*
- * We do not release any locks we might hold on those rels.
+ * We do not release any locks we might hold on those rels, unless
+ * the caller asked otherwise.
*/
void
ExecCloseRangeTableRelations(EState *estate)
@@ -1567,8 +1641,12 @@ ExecCloseRangeTableRelations(EState *estate)
for (i = 0; i < estate->es_range_table_size; i++)
{
+ int lockmode = NoLock;
+
+ if (estate->es_top_eflags & EXEC_FLAG_REL_LOCKS)
+ lockmode = exec_rt_fetch(i+1, estate)->rellockmode;
if (estate->es_relations[i])
- table_close(estate->es_relations[i], NoLock);
+ table_close(estate->es_relations[i], lockmode);
}
}
@@ -2797,7 +2875,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2883,6 +2962,7 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
PlanState *subplanstate;
subplanstate = ExecInitNode(subplan, rcestate, 0);
+ Assert(ExecPlanStillValid(rcestate));
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aa3f283453..fe1d173501 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1249,8 +1249,13 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. Note that no CachedPlan is available
+ * here even if the containing plan tree may have come from one in the
+ * leader.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 651ad24fc1..a1bb1ac50f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1813,6 +1813,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/* Create the working data structure for pruning */
prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Perform an initial partition prune pass, if required.
@@ -1939,6 +1941,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d4..bfc4b6f81c 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -388,6 +388,9 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
break;
}
+ if (!ExecPlanStillValid(estate))
+ return result;
+
ExecSetExecProcNode(result, result->ExecProcNode);
/*
@@ -402,6 +405,8 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
Assert(IsA(subplan, SubPlan));
sstate = ExecInitSubPlan(subplan, result);
+ if (!ExecPlanStillValid(estate))
+ return result;
subps = lappend(subps, sstate);
}
result->initPlan = subps;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c33a3c0bec..d5bd268514 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -800,7 +800,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (!IsParallelWorker() &&
+ (estate->es_top_eflags & EXEC_FLAG_GET_LOCKS) == 0)
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -844,6 +845,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (!ExecPlanStillValid(estate))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 50e06ec693..949bdfc837 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -843,6 +843,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL, /* fmgr_sql() doesn't use CachedPlans */
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 20d23696a5..94b7d08c93 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3295,6 +3295,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return aggstate;
/*
* initialize source tuple type.
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index cb25499b3f..2e0bfbe85a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -148,6 +148,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
node->part_prune_index,
node->apprelids,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return appendstate;
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -218,6 +220,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return appendstate;
}
appendstate->as_first_partial_plan = firstvalid;
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4c5eb2b23b..6b559bae2b 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -89,6 +89,8 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return bitmapandstate;
i++;
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index f35df0b8bf..a545018701 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -763,11 +763,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 0bf8af9652..87eb5dd5d3 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -90,6 +90,8 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return bitmaporstate;
i++;
}
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index bd42c65b29..efb94f9c59 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -61,6 +61,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return css;
css->ss.ss_currentRelation = scan_rel;
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index c2139acca0..c9a072e911 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -173,6 +173,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -264,6 +266,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* Tell the FDW to initialize the scan.
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 307fc10eea..365d3af3e4 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -89,6 +89,8 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return gatherstate;
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 9d5e1a46e9..8d2809f079 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -108,6 +108,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return gm_state;
/*
* Leader may access ExecProcNode result directly (if
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 25a1618952..fa6dad3939 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -185,6 +185,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return grpstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index eceee99374..6afc04edf1 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -379,6 +379,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hashstate;
/*
* initialize our result slot and type. No need to build projection
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index b215e3f59a..0e2f931efa 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -659,8 +659,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hjstate;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hjstate;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 12bc22f33c..093c33d8ca 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1041,6 +1041,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return incrsortstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0b43a9b969..a37a48c94a 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -512,6 +512,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 4540c7781d..00dcb8424f 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -925,6 +925,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 425fbfc405..2fcbde74ed 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -476,6 +476,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return limitstate;
/*
* initialize child expressions
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 407414fc0c..3a8aa2b5a4 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -323,6 +323,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return lrstate;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 09632678b0..09982fd38c 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return matstate;
/*
* Initialize result type and slot. No need to initialize projection info
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 74f7d21bc8..ad7a1f6fe0 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -931,6 +931,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mstate;
/*
* Initialize return slot and type. No need to initialize projection info
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 399b39c598..c3fdddecc5 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -96,6 +96,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
node->part_prune_index,
node->apprelids,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -152,6 +154,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
}
mergestate->ps.ps_ProjInfo = NULL;
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 809aa215c6..489c651a25 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1482,11 +1482,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 1ac65172e4..27dda57c3d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4010,6 +4010,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ if (!ExecPlanStillValid(estate))
+ return mtstate;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL, node->epqParam);
mtstate->fireBSTriggers = true;
@@ -4036,6 +4039,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
if (resultRelInfo != mtstate->rootResultRelInfo)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ if (!ExecPlanStillValid(estate))
+ return mtstate;
/*
* For child result relations, store the root result relation
@@ -4063,6 +4068,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mtstate;
/*
* Do additional per-result-relation initialization.
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b3d52e69ec..299f6b3a57 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -295,11 +295,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return nlstate;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return nlstate;
/*
* Initialize result slot, type and projection.
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index f6ff3dc44c..b85ba2cf23 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -247,6 +247,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return state;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index e781003934..967fe4f287 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return rustate;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return rustate;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 4219712d30..a79d407fa8 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -208,6 +208,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return resstate;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index d7e22b1dbb..31a6148977 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -125,6 +125,8 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 4da0f28f7b..88fe4d40d5 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,8 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 4bc2406b89..697dc699a5 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return setopstate;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index c6c72c6e67..c8ed534f29 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return sortstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 42471bfc04..3bb8bbbb84 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return subquerystate;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 2124c55ef5..c528a63c38 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -386,6 +386,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return tidrangestate;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index fe6a964ee1..a8e449e70a 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -529,6 +529,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return tidstate;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 45035d74fa..6b183d7324 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -136,6 +136,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return uniquestate;
/*
* Initialize result slot and type. Unique nodes do no projections, so
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index d61d57e9a8..239ad14dfc 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2450,6 +2450,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return winstate;
/*
* initialize source tuple type (which is also the tuple type that we'll
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 61f03e3999..38d76c6719 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -71,7 +71,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1623,6 +1623,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
_SPI_current->processed = 0;
_SPI_current->tuptable = NULL;
+replan:
/* Create the portal */
if (name == NULL || name[0] == '\0')
{
@@ -1766,7 +1767,8 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if *replan is set.
*/
PortalStart(portal, paramLI, 0, snapshot);
@@ -1775,6 +1777,12 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/* Pop the SPI stack */
_SPI_end_call(true);
@@ -2548,6 +2556,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2657,6 +2666,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2664,14 +2674,29 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+ ExecutorStart(qdesc, eflags);
+ if (!qdesc->plan_valid)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ goto replan;
+ }
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2846,10 +2871,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2893,14 +2917,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index af12c64878..7fb0d2d202 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -52,6 +52,7 @@ node_headers = \
access/tsmapi.h \
commands/event_trigger.h \
commands/trigger.h \
+ executor/execdesc.h \
executor/tuptable.h \
foreign/fdwapi.h \
nodes/bitmapset.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 19ed29657c..69e60206ba 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -63,6 +63,7 @@ my @all_input_files = qw(
access/tsmapi.h
commands/event_trigger.h
commands/trigger.h
+ executor/execdesc.h
executor/tuptable.h
foreign/fdwapi.h
nodes/bitmapset.h
@@ -87,6 +88,7 @@ my @nodetag_only_files = qw(
access/tsmapi.h
commands/event_trigger.h
commands/trigger.h
+ executor/execdesc.h
executor/tuptable.h
foreign/fdwapi.h
nodes/lockoptions.h
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index db5ff6fdca..670eba3a3a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -527,6 +527,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->permInfos = glob->finalrteperminfos;
+ result->viewRelations = glob->viewRelations;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 186fc8014b..454e30e0ca 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "access/transam.h"
+#include "catalog/pg_class.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -599,6 +600,10 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, List *rteperminfos,
(newrte->rtekind == RTE_SUBQUERY && OidIsValid(newrte->relid)))
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
+ if (newrte->relkind == RELKIND_VIEW)
+ glob->viewRelations = lappend_int(glob->viewRelations,
+ list_length(glob->finalrtable));
+
/*
* Add a copy of the RTEPermissionInfo, if any, corresponding to this RTE
* to the flattened global list.
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index c74bac20b1..29d13e95db 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1834,11 +1834,10 @@ ApplyRetrieveRule(Query *parsetree,
/*
* Clear fields that should not be set in a subquery RTE. Note that we
- * leave the relid, rellockmode, and perminfoindex fields set, so that the
- * view relation can be appropriately locked before execution and its
- * permissions checked.
+ * leave the relid, relkind, rellockmode, and perminfoindex fields set,
+ * so that the view relation can be appropriately locked before execution
+ * and its permissions checked.
*/
- rte->relkind = 0;
rte->tablesample = NULL;
rte->inh = false; /* must not be set for a subquery */
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a672..c807e9cdcc 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -27,6 +27,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
@@ -364,6 +365,50 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
return false;
}
+/*
+ * CheckRelLockedByMe
+ *
+ * Returns true if current transaction holds a lock on the given relation of
+ * mode 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
+ */
+bool
+CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+ Oid dbId = get_rel_relisshared(relid) ? InvalidOid : MyDatabaseId;
+ LOCKTAG tag;
+
+ SET_LOCKTAG_RELATION(tag, dbId, relid);
+
+ if (LockHeldByMe(&tag, lockmode))
+ return true;
+
+ if (orstronger)
+ {
+ LOCKMODE slockmode;
+
+ for (slockmode = lockmode + 1;
+ slockmode <= MaxLockMode;
+ slockmode++)
+ {
+ if (LockHeldByMe(&tag, slockmode))
+ {
+#ifdef NOT_USED
+ /* Sometimes this might be useful for debugging purposes */
+ elog(WARNING, "lock mode %s substituted for %s on relation %s",
+ GetLockmodeName(tag.locktag_lockmethodid, slockmode),
+ GetLockmodeName(tag.locktag_lockmethodid, lockmode),
+ RelationGetRelationName(relation));
+#endif
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
/*
* LockHasWaitersRelation
*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 470b734e9e..34d3f4ff8d 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1196,6 +1196,7 @@ exec_simple_query(const char *query_string)
* Start the portal. No parameters here.
*/
PortalStart(portal, NULL, 0, InvalidSnapshot);
+ Assert(portal->plan_valid);
/*
* Select the appropriate output format: text unless we are doing a
@@ -1700,6 +1701,7 @@ exec_bind_message(StringInfo input_message)
"commands ignored until end of transaction block"),
errdetail_abort()));
+replan:
/*
* Create the portal. Allow silent replacement of an existing portal only
* if the unnamed portal is specified.
@@ -1995,6 +1997,12 @@ exec_bind_message(StringInfo input_message)
*/
PortalStart(portal, params, 0, InvalidSnapshot);
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/*
* Apply the result format requests to the portal.
*/
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5f0248acc5..cf3a9790d6 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -35,12 +36,6 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc);
static void FillPortalStore(Portal portal, bool isTopLevel);
static uint64 RunFromStore(Portal portal, ScanDirection direction, uint64 count,
DestReceiver *dest);
@@ -65,6 +60,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -75,8 +71,10 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
{
QueryDesc *qd = (QueryDesc *) palloc(sizeof(QueryDesc));
+ qd->type = T_QueryDesc;
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan, if plan is from one */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -116,86 +114,6 @@ FreeQueryDesc(QueryDesc *qdesc)
}
-/*
- * ProcessQuery
- * Execute a single plannable query within a PORTAL_MULTI_QUERY,
- * PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
- *
- * plan: the plan tree for the query
- * sourceText: the source text of the query
- * params: any parameters needed
- * dest: where to send results
- * qc: where to store the command completion status data.
- *
- * qc may be NULL if caller doesn't want a status string.
- *
- * Must be called in a memory context that will be reset or deleted on
- * error; otherwise the executor's memory usage will be leaked.
- */
-static void
-ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc)
-{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
-
- /*
- * Run the plan to completion.
- */
- ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
-
- /*
- * Build command completion status data, if caller wants one.
- */
- if (qc)
- {
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
- SetQueryCompletion(qc, CMDTAG_SELECT, queryDesc->estate->es_processed);
- break;
- case CMD_INSERT:
- SetQueryCompletion(qc, CMDTAG_INSERT, queryDesc->estate->es_processed);
- break;
- case CMD_UPDATE:
- SetQueryCompletion(qc, CMDTAG_UPDATE, queryDesc->estate->es_processed);
- break;
- case CMD_DELETE:
- SetQueryCompletion(qc, CMDTAG_DELETE, queryDesc->estate->es_processed);
- break;
- case CMD_MERGE:
- SetQueryCompletion(qc, CMDTAG_MERGE, queryDesc->estate->es_processed);
- break;
- default:
- SetQueryCompletion(qc, CMDTAG_UNKNOWN, queryDesc->estate->es_processed);
- break;
- }
- }
-
- /*
- * Now, we close down all the scans and free allocated resources.
- */
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
-
- FreeQueryDesc(queryDesc);
-}
-
/*
* ChoosePortalStrategy
* Select portal execution strategy given the intended statement list.
@@ -427,7 +345,8 @@ FetchStatementTargetList(Node *stmt)
* to be used for cursors).
*
* On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * tupdesc (if any) is known, unless portal->plan_valid is set to false, in
+ * which case, the caller must retry after generating a new CachedPlan.
*/
void
PortalStart(Portal portal, ParamListInfo params,
@@ -435,7 +354,6 @@ PortalStart(Portal portal, ParamListInfo params,
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
int myeflags;
@@ -448,15 +366,13 @@ PortalStart(Portal portal, ParamListInfo params,
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -472,6 +388,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -493,6 +411,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -501,30 +420,50 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Call ExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated as we're doing that.
*/
ExecutorStart(queryDesc, myeflags);
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ PopActiveSnapshot();
+ portal->plan_valid = false;
+ goto early_exit;
+ }
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -532,33 +471,11 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -578,11 +495,69 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool pushed_active_snapshot = false;
+
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /* Must set snapshot before starting executor. */
+ if (!pushed_active_snapshot && !is_utility)
+ {
+ PushActiveSnapshot(GetTransactionSnapshot());
+ pushed_active_snapshot = true;
+ }
+
+ /*
+ * Create the QueryDesc object. DestReceiver will
+ * be set in PortalRunMulti().
+ */
+ queryDesc = CreateQueryDesc(plan, portal->cplan,
+ portal->sourceText,
+ pushed_active_snapshot ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalMultiRun() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated as
+ * we're doing that.
+ */
+ if (!is_utility)
+ {
+ ExecutorStart(queryDesc, 0);
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ if (pushed_active_snapshot)
+ PopActiveSnapshot();
+ portal->plan_valid = false;
+ goto early_exit;
+ }
+ }
+ }
+
+ if (pushed_active_snapshot)
+ PopActiveSnapshot();
+ }
+
portal->tupDesc = NULL;
+ portal->plan_valid = true;
break;
}
}
@@ -594,19 +569,18 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+early_exit:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
-
- portal->status = PORTAL_READY;
}
/*
@@ -1193,7 +1167,7 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1188,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = lfirst_node(QueryDesc, qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1241,7 +1216,7 @@ PortalRunMulti(Portal portal,
*/
if (!active_snapshot_set)
{
- Snapshot snapshot = GetTransactionSnapshot();
+ Snapshot snapshot = qdesc->snapshot;
/* If told to, register the snapshot and save in portal */
if (setHoldSnapshot)
@@ -1271,23 +1246,38 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Run the plan to completion.
+ */
+ qdesc->dest = dest;
+ ExecutorRun(qdesc, ForwardScanDirection, 0L, true);
+
+ /*
+ * Build command completion status data if needed.
+ */
if (pstmt->canSetTag)
{
- /* statement can set tag string */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- dest, qc);
- }
- else
- {
- /* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- altdest, NULL);
+ switch (qdesc->operation)
+ {
+ case CMD_SELECT:
+ SetQueryCompletion(qc, CMDTAG_SELECT, qdesc->estate->es_processed);
+ break;
+ case CMD_INSERT:
+ SetQueryCompletion(qc, CMDTAG_INSERT, qdesc->estate->es_processed);
+ break;
+ case CMD_UPDATE:
+ SetQueryCompletion(qc, CMDTAG_UPDATE, qdesc->estate->es_processed);
+ break;
+ case CMD_DELETE:
+ SetQueryCompletion(qc, CMDTAG_DELETE, qdesc->estate->es_processed);
+ break;
+ case CMD_MERGE:
+ SetQueryCompletion(qc, CMDTAG_MERGE, qdesc->estate->es_processed);
+ break;
+ default:
+ SetQueryCompletion(qc, CMDTAG_UNKNOWN, qdesc->estate->es_processed);
+ break;
+ }
}
if (log_executor_stats)
@@ -1346,8 +1336,19 @@ PortalRunMulti(Portal portal,
* Increment command counter between queries, but not after the last
* one.
*/
- if (lnext(portal->stmts, stmtlist_item) != NULL)
+ if (lnext(portal->qdescs, qdesc_item) != NULL)
CommandCounterIncrement();
+
+ /* portal->queryDesc is free'd by PortalCleanup(). */
+ if (qdesc != portal->queryDesc)
+ {
+ if (qdesc->estate)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ }
+ FreeQueryDesc(qdesc);
+ }
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index c07382051d..38ae43e24b 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2073,6 +2073,27 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * get_rel_relisshared
+ *
+ * Returns if the given relation is shared or not
+ */
+bool
+get_rel_relisshared(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relisshared;
+ ReleaseSysCache(tp);
+
+ return result;
+}
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 77c2ba3f8f..4e455d815f 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -100,13 +100,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -787,9 +787,6 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
- *
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -803,60 +800,56 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
-
- return false;
+ return cplan->is_valid;
}
/*
@@ -1126,9 +1119,6 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
- *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1360,8 +1350,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if the executor would need to take additional locks, that is, in
+ * addition to those taken by AcquirePlannerLocks() on a given query.
*/
foreach(lc, plan->stmt_list)
{
@@ -1735,58 +1725,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 06dfa85f04..3ad80c7ecb 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -201,6 +201,10 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /* initialize portal's query context to store QueryDescs */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -224,6 +228,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -594,6 +599,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 7c1071ddd1..da39b2e4ff 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,11 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt, struct CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv);
+extern void ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
@@ -103,6 +107,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index af2bf36dfb..c36c25b497 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -32,9 +32,12 @@
*/
typedef struct QueryDesc
{
+ NodeTag type;
+
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ struct CachedPlan *cplan; /* CachedPlan, if plannedstmt is from one */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -47,6 +50,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ bool plan_valid; /* is planstate tree fully valid? */
/* This field is set by ExecutorRun */
bool already_executed; /* true if previously executed */
@@ -57,6 +61,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ struct CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index e7e25c057e..15a1abaacf 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -59,6 +60,10 @@
#define EXEC_FLAG_MARK 0x0008 /* need mark/restore */
#define EXEC_FLAG_SKIP_TRIGGERS 0x0010 /* skip AfterTrigger calls */
#define EXEC_FLAG_WITH_NO_DATA 0x0020 /* rel scannability doesn't matter */
+#define EXEC_FLAG_GET_LOCKS 0x0400 /* should ExecGetRangeTableRelation
+ * lock relations? */
+#define EXEC_FLAG_REL_LOCKS 0x8000 /* should ExecCloseRangeTableRelations
+ * release locks? */
/* Hook for plugins to get control in ExecutorStart() */
@@ -245,6 +250,13 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/* Is the cached plan*/
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan ?
+ CachedPlanStillValid(estate->es_cachedplan) : true;
+}
/* ----------------------------------------------------------------
* ExecProcNode
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 20f4c8b35f..89f5a627c8 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,8 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct CachedPlan *es_cachedplan; /* CachedPlan if plannedstmt is from
+ * one */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
const char *es_sourceText; /* Source text from QueryDesc */
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index efe0834afb..a8fdd9e176 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -13,6 +13,7 @@ node_support_input_i = [
'access/tsmapi.h',
'commands/event_trigger.h',
'commands/trigger.h',
+ 'executor/execdesc.h',
'executor/tuptable.h',
'foreign/fdwapi.h',
'nodes/bitmapset.h',
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0d4b1ec4e4..71004fee75 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -116,6 +116,9 @@ typedef struct PlannerGlobal
/* "flat" list of RTEPermissionInfos */
List *finalrteperminfos;
+ /* "flat" list of integer RT indexes */
+ List *viewRelations;
+
/* "flat" list of PlanRowMarks */
List *finalrowmarks;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 4781a9c632..da9e73fb16 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -78,6 +78,9 @@ typedef struct PlannedStmt
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
+ List *viewRelations; /* integer list of RT indexes, or NIL if no
+ * views are queried */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 4ee91e3cf9..598bf2688a 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,7 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
bool orstronger);
+extern bool CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger);
extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 4f5418b972..3074e604dd 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -139,6 +139,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_relisshared(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a443181d41..c2e485ac2c 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -221,6 +221,21 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Called by the executor after it has finished taking locks on a plan tree
+ * in a CachedPlan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
+extern bool CachedPlanStillValid(CachedPlan *cplan);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aa08b1e0fc..332a08ccb4 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,9 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ bool plan_valid; /* are plan(s) ready for execution? */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +245,7 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalQueryFinish(QueryDesc *queryDesc);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
On Thu, Feb 2, 2023 at 11:49 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Fri, Jan 27, 2023 at 4:01 PM Amit Langote <amitlangote09@gmail.com> wrote:
I didn't actually go with calling the plancache on every lock taken on
a relation, that is, in ExecGetRangeTableRelation(). One thing about
doing it that way that I didn't quite like (or didn't see a clean
enough way to code) is the need to complicate the ExecInitNode()
traversal for handling the abrupt suspension of the ongoing setup of
the PlanState tree.OK, I gave this one more try and attached is what I came up with.
This adds a ExecPlanStillValid(), which is called right after anything
that may in turn call ExecGetRangeTableRelation() which has been
taught to lock a relation if EXEC_FLAG_GET_LOCKS has been passed in
EState.es_top_eflags. That includes all ExecInitNode() calls, and a
few other functions that call ExecGetRangeTableRelation() directly,
such as ExecOpenScanRelation(). If ExecPlanStillValid() returns
false, that is, if EState.es_cachedplan is found to have been
invalidated after a lock being taken by ExecGetRangeTableRelation(),
whatever funcion called it must return immediately and so must its
caller and so on. ExecEndPlan() seems to be able to clean up after a
partially finished attempt of initializing a PlanState tree in this
way. Maybe my preliminary testing didn't catch cases where pointers
to resources that are normally put into the nodes of a PlanState tree
are now left dangling, because a partially built PlanState tree is not
accessible to ExecEndPlan; QueryDesc.planstate would remain NULL in
such cases. Maybe there's only es_tupleTable and es_relations that
needs to be explicitly released and the rest is taken care of by
resetting the ExecutorState context.
In the attached updated patch, I've made the functions that check
ExecPlanStillValid() to return NULL (if returning something) instead
of returning partially initialized structs. Those partially
initialized structs were not being subsequently looked at anyway.
On testing, I'm afraid we're going to need something like
src/test/modules/delay_execution to test that concurrent changes to
relation(s) in PlannedStmt.relationOids that occur somewhere between
RevalidateCachedQuery() and InitPlan() result in the latter to be
aborted and that it is handled correctly. It seems like it is only
the locking of partitions (that are not present in an unplanned Query
and thus not protected by AcquirePlannerLocks()) that can trigger
replanning of a CachedPlan, so any tests we write should involve
partitions. Should this try to test as many plan shapes as possible
though given the uncertainty around ExecEndPlan() robustness or should
manual auditing suffice to be sure that nothing's broken?
I've added a test case under src/modules/delay_execution by adding a
new ExecutorStart_hook that works similarly as
delay_execution_planner(). The test works by allowing a concurrent
session to drop an object being referenced in a cached plan being
initialized while the ExecutorStart_hook waits to get an advisory
lock. The concurrent drop of the referenced object is detected during
ExecInitNode() and thus triggers replanning of the cached plan.
I also fixed a bug in the ExplainExecuteQuery() while testing and some comments.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v33-0001-Move-AcquireExecutorLocks-s-responsibility-into-.patchapplication/octet-stream; name=v33-0001-Move-AcquireExecutorLocks-s-responsibility-into-.patchDownload
From 4a8e2b6c4d87e0ae81becd74b4a1e4d5217eb05a Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 20 Jan 2023 16:52:31 +0900
Subject: [PATCH v33] Move AcquireExecutorLocks()'s responsibility into the
executor
---
contrib/postgres_fdw/postgres_fdw.c | 4 +
src/backend/commands/copyto.c | 4 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 142 ++++++---
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 16 +-
src/backend/commands/prepare.c | 30 +-
src/backend/executor/execMain.c | 105 ++++++-
src/backend/executor/execParallel.c | 7 +-
src/backend/executor/execPartition.c | 4 +
src/backend/executor/execProcnode.c | 5 +
src/backend/executor/execUtils.c | 5 +-
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeAgg.c | 2 +
src/backend/executor/nodeAppend.c | 4 +
src/backend/executor/nodeBitmapAnd.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 4 +
src/backend/executor/nodeBitmapOr.c | 2 +
src/backend/executor/nodeCustom.c | 2 +
src/backend/executor/nodeForeignscan.c | 4 +
src/backend/executor/nodeGather.c | 2 +
src/backend/executor/nodeGatherMerge.c | 2 +
src/backend/executor/nodeGroup.c | 2 +
src/backend/executor/nodeHash.c | 2 +
src/backend/executor/nodeHashjoin.c | 4 +
src/backend/executor/nodeIncrementalSort.c | 2 +
src/backend/executor/nodeIndexonlyscan.c | 2 +
src/backend/executor/nodeIndexscan.c | 2 +
src/backend/executor/nodeLimit.c | 2 +
src/backend/executor/nodeLockRows.c | 2 +
src/backend/executor/nodeMaterial.c | 2 +
src/backend/executor/nodeMemoize.c | 2 +
src/backend/executor/nodeMergeAppend.c | 4 +
src/backend/executor/nodeMergejoin.c | 4 +
src/backend/executor/nodeModifyTable.c | 7 +
src/backend/executor/nodeNestloop.c | 4 +
src/backend/executor/nodeProjectSet.c | 2 +
src/backend/executor/nodeRecursiveunion.c | 4 +
src/backend/executor/nodeResult.c | 2 +
src/backend/executor/nodeSamplescan.c | 2 +
src/backend/executor/nodeSeqscan.c | 2 +
src/backend/executor/nodeSetOp.c | 2 +
src/backend/executor/nodeSort.c | 2 +
src/backend/executor/nodeSubqueryscan.c | 2 +
src/backend/executor/nodeTidrangescan.c | 2 +
src/backend/executor/nodeTidscan.c | 2 +
src/backend/executor/nodeUnique.c | 2 +
src/backend/executor/nodeWindowAgg.c | 2 +
src/backend/executor/spi.c | 44 ++-
src/backend/nodes/Makefile | 1 +
src/backend/nodes/gen_node_support.pl | 2 +
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 5 +
src/backend/rewrite/rewriteHandler.c | 7 +-
src/backend/storage/lmgr/lmgr.c | 45 +++
src/backend/tcop/postgres.c | 8 +
src/backend/tcop/pquery.c | 291 +++++++++---------
src/backend/utils/cache/lsyscache.c | 21 ++
src/backend/utils/cache/plancache.c | 134 +++-----
src/backend/utils/mmgr/portalmem.c | 6 +
src/include/commands/explain.h | 7 +-
src/include/executor/execdesc.h | 5 +
src/include/executor/executor.h | 12 +
src/include/nodes/execnodes.h | 2 +
src/include/nodes/meson.build | 1 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 3 +
src/include/storage/lmgr.h | 1 +
src/include/utils/lsyscache.h | 1 +
src/include/utils/plancache.h | 15 +
src/include/utils/portal.h | 4 +
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 63 +++-
.../expected/cached-plan-replan.out | 43 +++
.../specs/cached-plan-replan.spec | 39 +++
76 files changed, 846 insertions(+), 340 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-replan.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-replan.spec
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index f5926ab89d..93f3f8b5d1 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2659,7 +2659,11 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
/* Get info about foreign table. */
rtindex = node->resultRelInfo->ri_RangeTableIndex;
if (fsplan->scan.scanrelid == 0)
+ {
dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (!ExecPlanStillValid(estate))
+ return;
+ }
else
dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 8043b4e9b1..a438c547e8 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,8 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL,
+ pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -569,6 +570,7 @@ BeginCopyTo(ParseState *pstate,
* ExecutorStart computes a result tupdesc for us
*/
ExecutorStart(cstate->queryDesc, 0);
+ Assert(cstate->queryDesc->plan_valid);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index d6c6d514f3..a55b851574 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index fbbf28cf06..8fdc966a73 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -384,6 +384,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
else
{
PlannedStmt *plan;
+ QueryDesc *queryDesc;
instr_time planstart,
planduration;
BufferUsage bufusage_start,
@@ -406,12 +407,89 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
+ queryDesc = ExplainQueryDesc(plan, NULL, queryString, into, es,
+ params, queryEnv);
+ Assert(queryDesc);
+
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(queryDesc, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
+/*
+ * ExplainQueryDesc
+ * Set up QueryDesc for EXPLAINing a given plan
+ *
+ * This returns NULL if cplan is found to have been invalidated since its
+ * creation.
+ */
+QueryDesc *
+ExplainQueryDesc(PlannedStmt *stmt, CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv)
+{
+ QueryDesc *queryDesc;
+ DestReceiver *dest;
+ int eflags;
+ int instrument_option = 0;
+
+ /*
+ * Normally we discard the query's output, but if explaining CREATE TABLE
+ * AS, we'd better use the appropriate tuple receiver.
+ */
+ if (into)
+ dest = CreateIntoRelDestReceiver(into);
+ else
+ dest = None_Receiver;
+
+ if (es->analyze && es->timing)
+ instrument_option |= INSTRUMENT_TIMER;
+ else if (es->analyze)
+ instrument_option |= INSTRUMENT_ROWS;
+
+ if (es->buffers)
+ instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
+
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+
+ /* Create a QueryDesc for the query */
+ queryDesc = CreateQueryDesc(stmt, cplan, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv, instrument_option);
+
+ /* Select execution options */
+ if (es->analyze)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_EXPLAIN_ONLY;
+ if (into)
+ eflags |= GetIntoRelEFlags(into);
+
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated as we're doing that.
+ */
+ ExecutorStart(queryDesc, eflags);
+ if (!queryDesc->plan_valid)
+ {
+ /* Clean up. */
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ return NULL;
+ }
+
+ return queryDesc;
+}
+
/*
* ExplainOneUtility -
* print out the execution plan for one utility statement
@@ -515,29 +593,16 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
{
- DestReceiver *dest;
- QueryDesc *queryDesc;
instr_time starttime;
double totaltime = 0;
- int eflags;
- int instrument_option = 0;
-
- Assert(plannedstmt->commandType != CMD_UTILITY);
- if (es->analyze && es->timing)
- instrument_option |= INSTRUMENT_TIMER;
- else if (es->analyze)
- instrument_option |= INSTRUMENT_ROWS;
-
- if (es->buffers)
- instrument_option |= INSTRUMENT_BUFFERS;
- if (es->wal)
- instrument_option |= INSTRUMENT_WAL;
+ Assert(queryDesc->plannedstmt->commandType != CMD_UTILITY);
/*
* We always collect timing for the entire statement, even when node-level
@@ -546,38 +611,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
- /*
- * Normally we discard the query's output, but if explaining CREATE TABLE
- * AS, we'd better use the appropriate tuple receiver.
- */
- if (into)
- dest = CreateIntoRelDestReceiver(into);
- else
- dest = None_Receiver;
-
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
-
- /* Select execution options */
- if (es->analyze)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_EXPLAIN_ONLY;
- if (into)
- eflags |= GetIntoRelEFlags(into);
-
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
-
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -4851,6 +4884,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index b1509cc505..e2f79cc7a7 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -780,6 +780,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index fb30d2595c..17d457ccfb 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -409,7 +409,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 8a3cf98cce..3c34ab4351 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -146,6 +146,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
PortalStart(portal, params, 0, GetActiveSnapshot());
Assert(portal->strategy == PORTAL_ONE_SELECT);
+ Assert(portal->plan_valid);
/*
* We're done; the query won't actually be run until PerformPortalFetch is
@@ -249,6 +250,17 @@ PerformPortalClose(const char *name)
PortalDrop(portal, false);
}
+/*
+ * Release a portal's QueryDesc.
+ */
+void
+PortalQueryFinish(QueryDesc *queryDesc)
+{
+ ExecutorFinish(queryDesc);
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+}
+
/*
* PortalCleanup
*
@@ -295,9 +307,7 @@ PortalCleanup(Portal portal)
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
- FreeQueryDesc(queryDesc);
+ PortalQueryFinish(queryDesc);
CurrentResourceOwner = saveResourceOwner;
}
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..3099536a54 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -183,6 +183,7 @@ ExecuteQuery(ParseState *pstate,
paramLI = EvaluateParams(pstate, entry, stmt->params, estate);
}
+replan:
/* Create a new portal to run the query in */
portal = CreateNewPortal();
/* Don't display the portal in pg_cursors, it is for internal use only */
@@ -251,10 +252,17 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal contains a cached plan,
+ * it must be recreated if *replan is set.
*/
PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
(void) PortalRun(portal, count, false, true, dest, dest, qc);
PortalDrop(portal, false);
@@ -574,7 +582,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -618,6 +626,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI,
CurrentResourceOwner, queryEnv);
@@ -639,8 +648,21 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ {
+ QueryDesc *queryDesc;
+
+ queryDesc = ExplainQueryDesc(pstmt, cplan, queryString,
+ into, es, paramLI, queryEnv);
+ if (queryDesc == NULL)
+ {
+ ExplainResetOutput(es);
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ goto replan;
+ }
+ ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
+ }
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index a5115b9c1f..d97c9de409 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -126,11 +126,27 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
* get control when ExecutorStart is called. Such a plugin would
* normally call standard_ExecutorStart().
*
+ * Normally, the plan tree given in queryDesc->plannedstmt is known to be
+ * valid in that *all* relations contained in plannedstmt->relationOids have
+ * already been locked. That may not be the case however if the plannedstmt
+ * comes from a CachedPlan, one given in queryDesc->cplan. Locks necessary
+ * to validate such a plan tree must be taken when initializing the plan tree
+ * in InitPlan(), so this sets the eflag EXEC_FLAG_GET_LOCKS. If the
+ * CachedPlan gets invalidated as these locks are taken, InitPlan returns
+ * without setting queryDesc->planstate and sets queryDesc->plan_valid to
+ * false. Caller must retry the execution with a freshly created CachedPlan
+ * in that case.
* ----------------------------------------------------------------
*/
void
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ /* Take locks if the plan tree comes from a CachedPlan. */
+ Assert(queryDesc->cplan == NULL ||
+ CachedPlanStillValid(queryDesc->cplan));
+ if (queryDesc->cplan)
+ eflags |= EXEC_FLAG_GET_LOCKS;
+
/*
* In some cases (e.g. an EXECUTE statement) a query execution will skip
* parse analysis, which means that the query_id won't be reported. Note
@@ -582,6 +598,16 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
RTEPermissionInfo *perminfo = lfirst_node(RTEPermissionInfo, l);
Assert(OidIsValid(perminfo->relid));
+
+ /*
+ * Relations whose permissions need to be checked must already have
+ * been locked by the parser or by AcquirePlannerLocks() if a
+ * cached plan is being executed.
+ * XXX shouldn't we skip calling ExecCheckPermissions from InitPlan
+ * in a parallel worker?
+ */
+ Assert(CheckRelLockedByMe(perminfo->relid, AccessShareLock, true) ||
+ IsParallelWorker());
result = ExecCheckOneRelPerms(perminfo);
if (!result)
{
@@ -785,12 +811,43 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
+/*
+ * Lock view relations in a given query's range table.
+ */
+static void
+ExecLockViewRelations(List *viewRelations, EState *estate, bool acquire)
+{
+ ListCell *lc;
+
+ foreach(lc, viewRelations)
+ {
+ Index rti = lfirst_int(lc);
+ RangeTblEntry *rte = exec_rt_fetch(rti, estate);
+
+ Assert(OidIsValid(rte->relid));
+ Assert(rte->relkind == RELKIND_VIEW);
+ Assert(rte->rellockmode != NoLock);
+
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
/* ----------------------------------------------------------------
* InitPlan
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ * If queryDesc contains a CachedPlan, this takes locks on relations.
+ * If any of those relations have undergone concurrent schema changes
+ * between successfully performing RevalidateCachedQuery() on the
+ * containing CachedPlanSource and here, locking those relations would
+ * invalidate the CachedPlan by way of PlanCacheRelCallback(). In that
+ * case, queryDesc->plan_valid would be set to false to tell the caller
+ * to retry after creating a new CachedPlan.
* ----------------------------------------------------------------
*/
static void
@@ -807,17 +864,21 @@ InitPlan(QueryDesc *queryDesc, int eflags)
int i;
/*
- * Do permissions checks and save the list for later use.
+ * initialize the node's execution state
*/
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
- estate->es_rteperminfos = plannedstmt->permInfos;
+ ExecInitRangeTable(estate, rangeTable);
+
+ if (eflags & EXEC_FLAG_GET_LOCKS)
+ ExecLockViewRelations(plannedstmt->viewRelations, estate, true);
/*
- * initialize the node's execution state
+ * Do permissions checks and save the list for later use.
*/
- ExecInitRangeTable(estate, rangeTable);
+ ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
+ estate->es_rteperminfos = plannedstmt->permInfos;
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = queryDesc->cplan;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
@@ -850,6 +911,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -917,6 +980,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
sp_eflags |= EXEC_FLAG_REWIND;
subplanstate = ExecInitNode(subplan, estate, sp_eflags);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
@@ -930,6 +995,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -973,6 +1040,19 @@ InitPlan(QueryDesc *queryDesc, int eflags)
queryDesc->tupDesc = tupType;
queryDesc->planstate = planstate;
+ queryDesc->plan_valid = true;
+ return;
+
+failed:
+ /*
+ * Plan initialization failed. Mark QueryDesc as such and release useless
+ * locks.
+ */
+ queryDesc->plan_valid = false;
+ if (eflags & EXEC_FLAG_GET_LOCKS)
+ ExecLockViewRelations(plannedstmt->viewRelations, estate, false);
+ /* Also ask ExecCloseRangeTableRelations() to release locks. */
+ estate->es_top_eflags |= EXEC_FLAG_REL_LOCKS;
}
/*
@@ -1389,7 +1469,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked.
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
@@ -1558,7 +1638,8 @@ ExecCloseResultRelations(EState *estate)
/*
* Close all relations opened by ExecGetRangeTableRelation().
*
- * We do not release any locks we might hold on those rels.
+ * We do not release any locks we might hold on those rels, unless
+ * the caller asked otherwise.
*/
void
ExecCloseRangeTableRelations(EState *estate)
@@ -1567,8 +1648,12 @@ ExecCloseRangeTableRelations(EState *estate)
for (i = 0; i < estate->es_range_table_size; i++)
{
+ int lockmode = NoLock;
+
+ if (estate->es_top_eflags & EXEC_FLAG_REL_LOCKS)
+ lockmode = exec_rt_fetch(i+1, estate)->rellockmode;
if (estate->es_relations[i])
- table_close(estate->es_relations[i], NoLock);
+ table_close(estate->es_relations[i], lockmode);
}
}
@@ -2797,7 +2882,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2883,6 +2969,7 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
PlanState *subplanstate;
subplanstate = ExecInitNode(subplan, rcestate, 0);
+ Assert(ExecPlanStillValid(rcestate));
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aa3f283453..fe1d173501 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1249,8 +1249,13 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. Note that no CachedPlan is available
+ * here even if the containing plan tree may have come from one in the
+ * leader.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 651ad24fc1..a1bb1ac50f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1813,6 +1813,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/* Create the working data structure for pruning */
prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Perform an initial partition prune pass, if required.
@@ -1939,6 +1941,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d4..bd0c2cba92 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -388,6 +388,9 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
break;
}
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
ExecSetExecProcNode(result, result->ExecProcNode);
/*
@@ -402,6 +405,8 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
Assert(IsA(subplan, SubPlan));
sstate = ExecInitSubPlan(subplan, result);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
subps = lappend(subps, sstate);
}
result->initPlan = subps;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c33a3c0bec..d5bd268514 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -800,7 +800,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (!IsParallelWorker() &&
+ (estate->es_top_eflags & EXEC_FLAG_GET_LOCKS) == 0)
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -844,6 +845,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (!ExecPlanStillValid(estate))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 50e06ec693..949bdfc837 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -843,6 +843,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL, /* fmgr_sql() doesn't use CachedPlans */
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 20d23696a5..f9b668dc01 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3295,6 +3295,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize source tuple type.
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index cb25499b3f..fd0ad98621 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -148,6 +148,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
node->part_prune_index,
node->apprelids,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -218,6 +220,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
appendstate->as_first_partial_plan = firstvalid;
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4c5eb2b23b..98cbeb2502 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -89,6 +89,8 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
i++;
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index f35df0b8bf..121b1afa5d 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -763,11 +763,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 0bf8af9652..be736946f1 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -90,6 +90,8 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
i++;
}
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index bd42c65b29..91239cc500 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -61,6 +61,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
css->ss.ss_currentRelation = scan_rel;
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index c2139acca0..f130d5863d 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -173,6 +173,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -264,6 +266,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Tell the FDW to initialize the scan.
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 307fc10eea..4a7715b8cc 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -89,6 +89,8 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 9d5e1a46e9..9e383c96ff 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -108,6 +108,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Leader may access ExecProcNode result directly (if
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 25a1618952..87af2a92f9 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -185,6 +185,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index eceee99374..c8fedee777 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -379,6 +379,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize our result slot and type. No need to build projection
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index b215e3f59a..86420e8f17 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -659,8 +659,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 12bc22f33c..0456ad779f 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1041,6 +1041,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0b43a9b969..e0aaeb5ebd 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -512,6 +512,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 4540c7781d..5090ee39e0 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -925,6 +925,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 425fbfc405..d8789553e1 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -476,6 +476,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize child expressions
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 407414fc0c..9104954bb1 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -323,6 +323,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 09632678b0..6ef50d3960 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result type and slot. No need to initialize projection info
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 74f7d21bc8..4ecc60a238 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -931,6 +931,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize return slot and type. No need to initialize projection info
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 399b39c598..b12a02c028 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -96,6 +96,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
node->part_prune_index,
node->apprelids,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -152,6 +154,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
mergestate->ps.ps_ProjInfo = NULL;
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 809aa215c6..0157a7ff3c 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1482,11 +1482,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 1ac65172e4..355964d103 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4010,6 +4010,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL, node->epqParam);
mtstate->fireBSTriggers = true;
@@ -4036,6 +4039,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
if (resultRelInfo != mtstate->rootResultRelInfo)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* For child result relations, store the root result relation
@@ -4063,6 +4068,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Do additional per-result-relation initialization.
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b3d52e69ec..e4319f5c90 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -295,11 +295,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result slot, type and projection.
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index f6ff3dc44c..a168cd68f6 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -247,6 +247,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index e781003934..3dae9b1497 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 4219712d30..9da456be4a 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -208,6 +208,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index d7e22b1dbb..22357e7a0e 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -125,6 +125,8 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 4da0f28f7b..b0b34cd14e 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,8 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 4bc2406b89..2c350e6c24 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index c6c72c6e67..216a5afb40 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 42471bfc04..34afe14bea 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 2124c55ef5..613b377c7c 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -386,6 +386,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index fe6a964ee1..23782aad89 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -529,6 +529,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 45035d74fa..06257e9e51 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -136,6 +136,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result slot and type. Unique nodes do no projections, so
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index d61d57e9a8..d8a9f1e94e 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2450,6 +2450,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize source tuple type (which is also the tuple type that we'll
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 61f03e3999..38d76c6719 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -71,7 +71,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1623,6 +1623,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
_SPI_current->processed = 0;
_SPI_current->tuptable = NULL;
+replan:
/* Create the portal */
if (name == NULL || name[0] == '\0')
{
@@ -1766,7 +1767,8 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if *replan is set.
*/
PortalStart(portal, paramLI, 0, snapshot);
@@ -1775,6 +1777,12 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/* Pop the SPI stack */
_SPI_end_call(true);
@@ -2548,6 +2556,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2657,6 +2666,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2664,14 +2674,29 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+ ExecutorStart(qdesc, eflags);
+ if (!qdesc->plan_valid)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ goto replan;
+ }
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2846,10 +2871,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2893,14 +2917,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index af12c64878..7fb0d2d202 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -52,6 +52,7 @@ node_headers = \
access/tsmapi.h \
commands/event_trigger.h \
commands/trigger.h \
+ executor/execdesc.h \
executor/tuptable.h \
foreign/fdwapi.h \
nodes/bitmapset.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 19ed29657c..69e60206ba 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -63,6 +63,7 @@ my @all_input_files = qw(
access/tsmapi.h
commands/event_trigger.h
commands/trigger.h
+ executor/execdesc.h
executor/tuptable.h
foreign/fdwapi.h
nodes/bitmapset.h
@@ -87,6 +88,7 @@ my @nodetag_only_files = qw(
access/tsmapi.h
commands/event_trigger.h
commands/trigger.h
+ executor/execdesc.h
executor/tuptable.h
foreign/fdwapi.h
nodes/lockoptions.h
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index db5ff6fdca..670eba3a3a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -527,6 +527,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->permInfos = glob->finalrteperminfos;
+ result->viewRelations = glob->viewRelations;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 186fc8014b..454e30e0ca 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "access/transam.h"
+#include "catalog/pg_class.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -599,6 +600,10 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, List *rteperminfos,
(newrte->rtekind == RTE_SUBQUERY && OidIsValid(newrte->relid)))
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
+ if (newrte->relkind == RELKIND_VIEW)
+ glob->viewRelations = lappend_int(glob->viewRelations,
+ list_length(glob->finalrtable));
+
/*
* Add a copy of the RTEPermissionInfo, if any, corresponding to this RTE
* to the flattened global list.
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index c74bac20b1..29d13e95db 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1834,11 +1834,10 @@ ApplyRetrieveRule(Query *parsetree,
/*
* Clear fields that should not be set in a subquery RTE. Note that we
- * leave the relid, rellockmode, and perminfoindex fields set, so that the
- * view relation can be appropriately locked before execution and its
- * permissions checked.
+ * leave the relid, relkind, rellockmode, and perminfoindex fields set,
+ * so that the view relation can be appropriately locked before execution
+ * and its permissions checked.
*/
- rte->relkind = 0;
rte->tablesample = NULL;
rte->inh = false; /* must not be set for a subquery */
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a672..c807e9cdcc 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -27,6 +27,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
@@ -364,6 +365,50 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
return false;
}
+/*
+ * CheckRelLockedByMe
+ *
+ * Returns true if current transaction holds a lock on the given relation of
+ * mode 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
+ */
+bool
+CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+ Oid dbId = get_rel_relisshared(relid) ? InvalidOid : MyDatabaseId;
+ LOCKTAG tag;
+
+ SET_LOCKTAG_RELATION(tag, dbId, relid);
+
+ if (LockHeldByMe(&tag, lockmode))
+ return true;
+
+ if (orstronger)
+ {
+ LOCKMODE slockmode;
+
+ for (slockmode = lockmode + 1;
+ slockmode <= MaxLockMode;
+ slockmode++)
+ {
+ if (LockHeldByMe(&tag, slockmode))
+ {
+#ifdef NOT_USED
+ /* Sometimes this might be useful for debugging purposes */
+ elog(WARNING, "lock mode %s substituted for %s on relation %s",
+ GetLockmodeName(tag.locktag_lockmethodid, slockmode),
+ GetLockmodeName(tag.locktag_lockmethodid, lockmode),
+ RelationGetRelationName(relation));
+#endif
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
/*
* LockHasWaitersRelation
*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 470b734e9e..34d3f4ff8d 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1196,6 +1196,7 @@ exec_simple_query(const char *query_string)
* Start the portal. No parameters here.
*/
PortalStart(portal, NULL, 0, InvalidSnapshot);
+ Assert(portal->plan_valid);
/*
* Select the appropriate output format: text unless we are doing a
@@ -1700,6 +1701,7 @@ exec_bind_message(StringInfo input_message)
"commands ignored until end of transaction block"),
errdetail_abort()));
+replan:
/*
* Create the portal. Allow silent replacement of an existing portal only
* if the unnamed portal is specified.
@@ -1995,6 +1997,12 @@ exec_bind_message(StringInfo input_message)
*/
PortalStart(portal, params, 0, InvalidSnapshot);
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/*
* Apply the result format requests to the portal.
*/
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5f0248acc5..cf3a9790d6 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -35,12 +36,6 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc);
static void FillPortalStore(Portal portal, bool isTopLevel);
static uint64 RunFromStore(Portal portal, ScanDirection direction, uint64 count,
DestReceiver *dest);
@@ -65,6 +60,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -75,8 +71,10 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
{
QueryDesc *qd = (QueryDesc *) palloc(sizeof(QueryDesc));
+ qd->type = T_QueryDesc;
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan, if plan is from one */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -116,86 +114,6 @@ FreeQueryDesc(QueryDesc *qdesc)
}
-/*
- * ProcessQuery
- * Execute a single plannable query within a PORTAL_MULTI_QUERY,
- * PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
- *
- * plan: the plan tree for the query
- * sourceText: the source text of the query
- * params: any parameters needed
- * dest: where to send results
- * qc: where to store the command completion status data.
- *
- * qc may be NULL if caller doesn't want a status string.
- *
- * Must be called in a memory context that will be reset or deleted on
- * error; otherwise the executor's memory usage will be leaked.
- */
-static void
-ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc)
-{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
-
- /*
- * Run the plan to completion.
- */
- ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
-
- /*
- * Build command completion status data, if caller wants one.
- */
- if (qc)
- {
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
- SetQueryCompletion(qc, CMDTAG_SELECT, queryDesc->estate->es_processed);
- break;
- case CMD_INSERT:
- SetQueryCompletion(qc, CMDTAG_INSERT, queryDesc->estate->es_processed);
- break;
- case CMD_UPDATE:
- SetQueryCompletion(qc, CMDTAG_UPDATE, queryDesc->estate->es_processed);
- break;
- case CMD_DELETE:
- SetQueryCompletion(qc, CMDTAG_DELETE, queryDesc->estate->es_processed);
- break;
- case CMD_MERGE:
- SetQueryCompletion(qc, CMDTAG_MERGE, queryDesc->estate->es_processed);
- break;
- default:
- SetQueryCompletion(qc, CMDTAG_UNKNOWN, queryDesc->estate->es_processed);
- break;
- }
- }
-
- /*
- * Now, we close down all the scans and free allocated resources.
- */
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
-
- FreeQueryDesc(queryDesc);
-}
-
/*
* ChoosePortalStrategy
* Select portal execution strategy given the intended statement list.
@@ -427,7 +345,8 @@ FetchStatementTargetList(Node *stmt)
* to be used for cursors).
*
* On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * tupdesc (if any) is known, unless portal->plan_valid is set to false, in
+ * which case, the caller must retry after generating a new CachedPlan.
*/
void
PortalStart(Portal portal, ParamListInfo params,
@@ -435,7 +354,6 @@ PortalStart(Portal portal, ParamListInfo params,
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
int myeflags;
@@ -448,15 +366,13 @@ PortalStart(Portal portal, ParamListInfo params,
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -472,6 +388,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -493,6 +411,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -501,30 +420,50 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Call ExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated as we're doing that.
*/
ExecutorStart(queryDesc, myeflags);
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ PopActiveSnapshot();
+ portal->plan_valid = false;
+ goto early_exit;
+ }
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -532,33 +471,11 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -578,11 +495,69 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool pushed_active_snapshot = false;
+
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /* Must set snapshot before starting executor. */
+ if (!pushed_active_snapshot && !is_utility)
+ {
+ PushActiveSnapshot(GetTransactionSnapshot());
+ pushed_active_snapshot = true;
+ }
+
+ /*
+ * Create the QueryDesc object. DestReceiver will
+ * be set in PortalRunMulti().
+ */
+ queryDesc = CreateQueryDesc(plan, portal->cplan,
+ portal->sourceText,
+ pushed_active_snapshot ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalMultiRun() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated as
+ * we're doing that.
+ */
+ if (!is_utility)
+ {
+ ExecutorStart(queryDesc, 0);
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ if (pushed_active_snapshot)
+ PopActiveSnapshot();
+ portal->plan_valid = false;
+ goto early_exit;
+ }
+ }
+ }
+
+ if (pushed_active_snapshot)
+ PopActiveSnapshot();
+ }
+
portal->tupDesc = NULL;
+ portal->plan_valid = true;
break;
}
}
@@ -594,19 +569,18 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+early_exit:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
-
- portal->status = PORTAL_READY;
}
/*
@@ -1193,7 +1167,7 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1188,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = lfirst_node(QueryDesc, qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1241,7 +1216,7 @@ PortalRunMulti(Portal portal,
*/
if (!active_snapshot_set)
{
- Snapshot snapshot = GetTransactionSnapshot();
+ Snapshot snapshot = qdesc->snapshot;
/* If told to, register the snapshot and save in portal */
if (setHoldSnapshot)
@@ -1271,23 +1246,38 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Run the plan to completion.
+ */
+ qdesc->dest = dest;
+ ExecutorRun(qdesc, ForwardScanDirection, 0L, true);
+
+ /*
+ * Build command completion status data if needed.
+ */
if (pstmt->canSetTag)
{
- /* statement can set tag string */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- dest, qc);
- }
- else
- {
- /* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- altdest, NULL);
+ switch (qdesc->operation)
+ {
+ case CMD_SELECT:
+ SetQueryCompletion(qc, CMDTAG_SELECT, qdesc->estate->es_processed);
+ break;
+ case CMD_INSERT:
+ SetQueryCompletion(qc, CMDTAG_INSERT, qdesc->estate->es_processed);
+ break;
+ case CMD_UPDATE:
+ SetQueryCompletion(qc, CMDTAG_UPDATE, qdesc->estate->es_processed);
+ break;
+ case CMD_DELETE:
+ SetQueryCompletion(qc, CMDTAG_DELETE, qdesc->estate->es_processed);
+ break;
+ case CMD_MERGE:
+ SetQueryCompletion(qc, CMDTAG_MERGE, qdesc->estate->es_processed);
+ break;
+ default:
+ SetQueryCompletion(qc, CMDTAG_UNKNOWN, qdesc->estate->es_processed);
+ break;
+ }
}
if (log_executor_stats)
@@ -1346,8 +1336,19 @@ PortalRunMulti(Portal portal,
* Increment command counter between queries, but not after the last
* one.
*/
- if (lnext(portal->stmts, stmtlist_item) != NULL)
+ if (lnext(portal->qdescs, qdesc_item) != NULL)
CommandCounterIncrement();
+
+ /* portal->queryDesc is free'd by PortalCleanup(). */
+ if (qdesc != portal->queryDesc)
+ {
+ if (qdesc->estate)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ }
+ FreeQueryDesc(qdesc);
+ }
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index c07382051d..38ae43e24b 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2073,6 +2073,27 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * get_rel_relisshared
+ *
+ * Returns if the given relation is shared or not
+ */
+bool
+get_rel_relisshared(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relisshared;
+ ReleaseSysCache(tp);
+
+ return result;
+}
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 77c2ba3f8f..4e455d815f 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -100,13 +100,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -787,9 +787,6 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
- *
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -803,60 +800,56 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
-
- return false;
+ return cplan->is_valid;
}
/*
@@ -1126,9 +1119,6 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
- *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1360,8 +1350,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if the executor would need to take additional locks, that is, in
+ * addition to those taken by AcquirePlannerLocks() on a given query.
*/
foreach(lc, plan->stmt_list)
{
@@ -1735,58 +1725,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 06dfa85f04..3ad80c7ecb 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -201,6 +201,10 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /* initialize portal's query context to store QueryDescs */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -224,6 +228,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -594,6 +599,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 7c1071ddd1..da39b2e4ff 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,11 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt, struct CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv);
+extern void ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
@@ -103,6 +107,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index af2bf36dfb..c36c25b497 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -32,9 +32,12 @@
*/
typedef struct QueryDesc
{
+ NodeTag type;
+
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ struct CachedPlan *cplan; /* CachedPlan, if plannedstmt is from one */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -47,6 +50,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ bool plan_valid; /* is planstate tree fully valid? */
/* This field is set by ExecutorRun */
bool already_executed; /* true if previously executed */
@@ -57,6 +61,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ struct CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index e7e25c057e..8c680358e8 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -59,6 +60,10 @@
#define EXEC_FLAG_MARK 0x0008 /* need mark/restore */
#define EXEC_FLAG_SKIP_TRIGGERS 0x0010 /* skip AfterTrigger calls */
#define EXEC_FLAG_WITH_NO_DATA 0x0020 /* rel scannability doesn't matter */
+#define EXEC_FLAG_GET_LOCKS 0x0400 /* should ExecGetRangeTableRelation
+ * lock relations? */
+#define EXEC_FLAG_REL_LOCKS 0x8000 /* should ExecCloseRangeTableRelations
+ * release locks? */
/* Hook for plugins to get control in ExecutorStart() */
@@ -245,6 +250,13 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/* Is the cached plan*/
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanStillValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 20f4c8b35f..89f5a627c8 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,8 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct CachedPlan *es_cachedplan; /* CachedPlan if plannedstmt is from
+ * one */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
const char *es_sourceText; /* Source text from QueryDesc */
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index efe0834afb..a8fdd9e176 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -13,6 +13,7 @@ node_support_input_i = [
'access/tsmapi.h',
'commands/event_trigger.h',
'commands/trigger.h',
+ 'executor/execdesc.h',
'executor/tuptable.h',
'foreign/fdwapi.h',
'nodes/bitmapset.h',
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0d4b1ec4e4..71004fee75 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -116,6 +116,9 @@ typedef struct PlannerGlobal
/* "flat" list of RTEPermissionInfos */
List *finalrteperminfos;
+ /* "flat" list of integer RT indexes */
+ List *viewRelations;
+
/* "flat" list of PlanRowMarks */
List *finalrowmarks;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 4781a9c632..da9e73fb16 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -78,6 +78,9 @@ typedef struct PlannedStmt
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
+ List *viewRelations; /* integer list of RT indexes, or NIL if no
+ * views are queried */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 4ee91e3cf9..598bf2688a 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,7 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
bool orstronger);
+extern bool CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger);
extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 4f5418b972..3074e604dd 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -139,6 +139,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_relisshared(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a443181d41..c2e485ac2c 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -221,6 +221,21 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Called by the executor after it has finished taking locks on a plan tree
+ * in a CachedPlan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
+extern bool CachedPlanStillValid(CachedPlan *cplan);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aa08b1e0fc..332a08ccb4 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,9 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ bool plan_valid; /* are plan(s) ready for execution? */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +245,7 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalQueryFinish(QueryDesc *queryDesc);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..2fca84d027 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-replan
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7cd76eb34b..5d7a3e9858 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2023, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,41 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* planner_hook function to provide the desired delay */
+static void
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ queryDesc->cplan->is_valid ? "valid" : "not valid");
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +123,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-replan.out b/src/test/modules/delay_execution/expected/cached-plan-replan.out
new file mode 100644
index 0000000000..eaac55122b
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-replan.out
@@ -0,0 +1,43 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+---------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo1 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo1_a
+ Index Cond: (a = $1)
+(6 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+----------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo1 foo_1
+ Filter: (a = $1)
+(4 rows)
+
diff --git a/src/test/modules/delay_execution/specs/cached-plan-replan.spec b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
new file mode 100644
index 0000000000..5bd5fdbf1c
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
@@ -0,0 +1,39 @@
+# Test to check that invalidation of a cached plan during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo1 PARTITION OF foo FOR VALUES IN (1);
+ CREATE INDEX foo1_a ON foo1 (a);
+ CREATE TABLE foo2 PARTITION OF foo FOR VALUES IN (2);
+ CREATE VIEW foov AS SELECT * FROM foo;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP TABLE foo;
+}
+
+session "s1"
+# Creates a prepared statement and forces creation of a generic plan
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo1_a; }
+
+# While "s1exec" waits to acquire the advisory lock, "s2drop" is able to drop
+# the index being used in the cached plan for `q`, so when "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
--
2.35.3
Hi,
On 2023-02-03 22:01:09 +0900, Amit Langote wrote:
I've added a test case under src/modules/delay_execution by adding a
new ExecutorStart_hook that works similarly as
delay_execution_planner(). The test works by allowing a concurrent
session to drop an object being referenced in a cached plan being
initialized while the ExecutorStart_hook waits to get an advisory
lock. The concurrent drop of the referenced object is detected during
ExecInitNode() and thus triggers replanning of the cached plan.I also fixed a bug in the ExplainExecuteQuery() while testing and some comments.
The tests seem to frequently hang on freebsd:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3478
Greetings,
Andres Freund
On Tue, Feb 7, 2023 at 23:38 Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2023-02-03 22:01:09 +0900, Amit Langote wrote:
I've added a test case under src/modules/delay_execution by adding a
new ExecutorStart_hook that works similarly as
delay_execution_planner(). The test works by allowing a concurrent
session to drop an object being referenced in a cached plan being
initialized while the ExecutorStart_hook waits to get an advisory
lock. The concurrent drop of the referenced object is detected during
ExecInitNode() and thus triggers replanning of the cached plan.I also fixed a bug in the ExplainExecuteQuery() while testing and some
comments.
The tests seem to frequently hang on freebsd:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3478
Thanks for the heads up. I’ve noticed this one too, though couldn’t find
the testrun artifacts like I could get for some other failures (on other
cirrus machines). Has anyone else been a similar situation?
<https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3478>
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
On Wed, Feb 8, 2023 at 7:31 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Tue, Feb 7, 2023 at 23:38 Andres Freund <andres@anarazel.de> wrote:
The tests seem to frequently hang on freebsd:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3478Thanks for the heads up. I’ve noticed this one too, though couldn’t find the testrun artifacts like I could get for some other failures (on other cirrus machines). Has anyone else been a similar situation?
I think I have figured out what might be going wrong on that cfbot
animal after building with the same CPPFLAGS as that animal locally.
I had forgotten to update _out/_readRangeTblEntry() to account for the
patch's change that a view's RTE_SUBQUERY now also preserves relkind
in addition to relid and rellockmode for the locking consideration.
Also, I noticed that a multi-query Portal execution with rules was
failing (thanks to a regression test added in a7d71c41db) because of
the snapshot used for the 2nd query onward not being updated for
command ID change under patched model of multi-query Portal execution.
To wit, under the patched model, all queries in the multi-query Portal
case undergo ExecutorStart() before any of it is run with
ExecutorRun(). The patch hadn't changed things however to update the
snapshot's command ID for the 2nd query onwards, which caused the
aforementioned test case to fail.
This new model does however mean that the 2nd query onwards must use
PushCopiedSnapshot() given the current requirement of
UpdateActiveSnapshotCommandId() that the snapshot passed to it must
not be referenced anywhere else. The new model basically requires
that each query's QueryDesc points to its own copy of the
ActiveSnapshot. That may not be a thing in favor of the patched model
though. For now, I haven't been able to come up with a better
alternative.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v34-0001-Move-AcquireExecutorLocks-s-responsibility-into-.patchapplication/octet-stream; name=v34-0001-Move-AcquireExecutorLocks-s-responsibility-into-.patchDownload
From 1e8b5333a9619ec87c3e31c2034422309a4719ad Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 20 Jan 2023 16:52:31 +0900
Subject: [PATCH v34] Move AcquireExecutorLocks()'s responsibility into the
executor
---
contrib/postgres_fdw/postgres_fdw.c | 4 +
src/backend/commands/copyto.c | 4 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 142 +++++---
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 16 +-
src/backend/commands/prepare.c | 30 +-
src/backend/executor/execMain.c | 105 +++++-
src/backend/executor/execParallel.c | 7 +-
src/backend/executor/execPartition.c | 4 +
src/backend/executor/execProcnode.c | 5 +
src/backend/executor/execUtils.c | 5 +-
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeAgg.c | 2 +
src/backend/executor/nodeAppend.c | 4 +
src/backend/executor/nodeBitmapAnd.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 4 +
src/backend/executor/nodeBitmapOr.c | 2 +
src/backend/executor/nodeCustom.c | 2 +
src/backend/executor/nodeForeignscan.c | 4 +
src/backend/executor/nodeGather.c | 2 +
src/backend/executor/nodeGatherMerge.c | 2 +
src/backend/executor/nodeGroup.c | 2 +
src/backend/executor/nodeHash.c | 2 +
src/backend/executor/nodeHashjoin.c | 4 +
src/backend/executor/nodeIncrementalSort.c | 2 +
src/backend/executor/nodeIndexonlyscan.c | 2 +
src/backend/executor/nodeIndexscan.c | 2 +
src/backend/executor/nodeLimit.c | 2 +
src/backend/executor/nodeLockRows.c | 2 +
src/backend/executor/nodeMaterial.c | 2 +
src/backend/executor/nodeMemoize.c | 2 +
src/backend/executor/nodeMergeAppend.c | 4 +
src/backend/executor/nodeMergejoin.c | 4 +
src/backend/executor/nodeModifyTable.c | 7 +
src/backend/executor/nodeNestloop.c | 4 +
src/backend/executor/nodeProjectSet.c | 2 +
src/backend/executor/nodeRecursiveunion.c | 4 +
src/backend/executor/nodeResult.c | 2 +
src/backend/executor/nodeSamplescan.c | 2 +
src/backend/executor/nodeSeqscan.c | 2 +
src/backend/executor/nodeSetOp.c | 2 +
src/backend/executor/nodeSort.c | 2 +
src/backend/executor/nodeSubqueryscan.c | 2 +
src/backend/executor/nodeTidrangescan.c | 2 +
src/backend/executor/nodeTidscan.c | 2 +
src/backend/executor/nodeUnique.c | 2 +
src/backend/executor/nodeWindowAgg.c | 2 +
src/backend/executor/spi.c | 46 ++-
src/backend/nodes/Makefile | 1 +
src/backend/nodes/gen_node_support.pl | 2 +
src/backend/nodes/outfuncs.c | 1 +
src/backend/nodes/readfuncs.c | 1 +
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 5 +
src/backend/rewrite/rewriteHandler.c | 7 +-
src/backend/storage/lmgr/lmgr.c | 45 +++
src/backend/tcop/postgres.c | 8 +
src/backend/tcop/pquery.c | 307 +++++++++---------
src/backend/utils/cache/lsyscache.c | 21 ++
src/backend/utils/cache/plancache.c | 134 ++------
src/backend/utils/mmgr/portalmem.c | 6 +
src/include/commands/explain.h | 7 +-
src/include/executor/execdesc.h | 5 +
src/include/executor/executor.h | 12 +
src/include/nodes/execnodes.h | 2 +
src/include/nodes/meson.build | 1 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 3 +
src/include/storage/lmgr.h | 1 +
src/include/utils/lsyscache.h | 1 +
src/include/utils/plancache.h | 15 +
src/include/utils/portal.h | 4 +
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 63 +++-
.../expected/cached-plan-replan.out | 43 +++
.../specs/cached-plan-replan.spec | 39 +++
78 files changed, 866 insertions(+), 340 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-replan.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-replan.spec
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index f5926ab89d..93f3f8b5d1 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2659,7 +2659,11 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
/* Get info about foreign table. */
rtindex = node->resultRelInfo->ri_RangeTableIndex;
if (fsplan->scan.scanrelid == 0)
+ {
dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (!ExecPlanStillValid(estate))
+ return;
+ }
else
dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 8043b4e9b1..a438c547e8 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,8 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL,
+ pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -569,6 +570,7 @@ BeginCopyTo(ParseState *pstate,
* ExecutorStart computes a result tupdesc for us
*/
ExecutorStart(cstate->queryDesc, 0);
+ Assert(cstate->queryDesc->plan_valid);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index d6c6d514f3..a55b851574 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e57bda7b62..e56ccdca66 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -384,6 +384,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
else
{
PlannedStmt *plan;
+ QueryDesc *queryDesc;
instr_time planstart,
planduration;
BufferUsage bufusage_start,
@@ -406,12 +407,89 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
+ queryDesc = ExplainQueryDesc(plan, NULL, queryString, into, es,
+ params, queryEnv);
+ Assert(queryDesc);
+
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(queryDesc, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
+/*
+ * ExplainQueryDesc
+ * Set up QueryDesc for EXPLAINing a given plan
+ *
+ * This returns NULL if cplan is found to have been invalidated since its
+ * creation.
+ */
+QueryDesc *
+ExplainQueryDesc(PlannedStmt *stmt, CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv)
+{
+ QueryDesc *queryDesc;
+ DestReceiver *dest;
+ int eflags;
+ int instrument_option = 0;
+
+ /*
+ * Normally we discard the query's output, but if explaining CREATE TABLE
+ * AS, we'd better use the appropriate tuple receiver.
+ */
+ if (into)
+ dest = CreateIntoRelDestReceiver(into);
+ else
+ dest = None_Receiver;
+
+ if (es->analyze && es->timing)
+ instrument_option |= INSTRUMENT_TIMER;
+ else if (es->analyze)
+ instrument_option |= INSTRUMENT_ROWS;
+
+ if (es->buffers)
+ instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
+
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+
+ /* Create a QueryDesc for the query */
+ queryDesc = CreateQueryDesc(stmt, cplan, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv, instrument_option);
+
+ /* Select execution options */
+ if (es->analyze)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_EXPLAIN_ONLY;
+ if (into)
+ eflags |= GetIntoRelEFlags(into);
+
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated as we're doing that.
+ */
+ ExecutorStart(queryDesc, eflags);
+ if (!queryDesc->plan_valid)
+ {
+ /* Clean up. */
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ return NULL;
+ }
+
+ return queryDesc;
+}
+
/*
* ExplainOneUtility -
* print out the execution plan for one utility statement
@@ -515,29 +593,16 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
{
- DestReceiver *dest;
- QueryDesc *queryDesc;
instr_time starttime;
double totaltime = 0;
- int eflags;
- int instrument_option = 0;
-
- Assert(plannedstmt->commandType != CMD_UTILITY);
- if (es->analyze && es->timing)
- instrument_option |= INSTRUMENT_TIMER;
- else if (es->analyze)
- instrument_option |= INSTRUMENT_ROWS;
-
- if (es->buffers)
- instrument_option |= INSTRUMENT_BUFFERS;
- if (es->wal)
- instrument_option |= INSTRUMENT_WAL;
+ Assert(queryDesc->plannedstmt->commandType != CMD_UTILITY);
/*
* We always collect timing for the entire statement, even when node-level
@@ -546,38 +611,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
- /*
- * Normally we discard the query's output, but if explaining CREATE TABLE
- * AS, we'd better use the appropriate tuple receiver.
- */
- if (into)
- dest = CreateIntoRelDestReceiver(into);
- else
- dest = None_Receiver;
-
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
-
- /* Select execution options */
- if (es->analyze)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_EXPLAIN_ONLY;
- if (into)
- eflags |= GetIntoRelEFlags(into);
-
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
-
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -4851,6 +4884,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index b1509cc505..e2f79cc7a7 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -780,6 +780,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index fb30d2595c..17d457ccfb 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -409,7 +409,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 8a3cf98cce..3c34ab4351 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -146,6 +146,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
PortalStart(portal, params, 0, GetActiveSnapshot());
Assert(portal->strategy == PORTAL_ONE_SELECT);
+ Assert(portal->plan_valid);
/*
* We're done; the query won't actually be run until PerformPortalFetch is
@@ -249,6 +250,17 @@ PerformPortalClose(const char *name)
PortalDrop(portal, false);
}
+/*
+ * Release a portal's QueryDesc.
+ */
+void
+PortalQueryFinish(QueryDesc *queryDesc)
+{
+ ExecutorFinish(queryDesc);
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+}
+
/*
* PortalCleanup
*
@@ -295,9 +307,7 @@ PortalCleanup(Portal portal)
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
- FreeQueryDesc(queryDesc);
+ PortalQueryFinish(queryDesc);
CurrentResourceOwner = saveResourceOwner;
}
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..3099536a54 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -183,6 +183,7 @@ ExecuteQuery(ParseState *pstate,
paramLI = EvaluateParams(pstate, entry, stmt->params, estate);
}
+replan:
/* Create a new portal to run the query in */
portal = CreateNewPortal();
/* Don't display the portal in pg_cursors, it is for internal use only */
@@ -251,10 +252,17 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal contains a cached plan,
+ * it must be recreated if *replan is set.
*/
PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
(void) PortalRun(portal, count, false, true, dest, dest, qc);
PortalDrop(portal, false);
@@ -574,7 +582,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -618,6 +626,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI,
CurrentResourceOwner, queryEnv);
@@ -639,8 +648,21 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ {
+ QueryDesc *queryDesc;
+
+ queryDesc = ExplainQueryDesc(pstmt, cplan, queryString,
+ into, es, paramLI, queryEnv);
+ if (queryDesc == NULL)
+ {
+ ExplainResetOutput(es);
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ goto replan;
+ }
+ ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
+ }
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 39bfb48dc2..dafdd8a783 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -126,11 +126,27 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
* get control when ExecutorStart is called. Such a plugin would
* normally call standard_ExecutorStart().
*
+ * Normally, the plan tree given in queryDesc->plannedstmt is known to be
+ * valid in that *all* relations contained in plannedstmt->relationOids have
+ * already been locked. That may not be the case however if the plannedstmt
+ * comes from a CachedPlan, one given in queryDesc->cplan. Locks necessary
+ * to validate such a plan tree must be taken when initializing the plan tree
+ * in InitPlan(), so this sets the eflag EXEC_FLAG_GET_LOCKS. If the
+ * CachedPlan gets invalidated as these locks are taken, InitPlan returns
+ * without setting queryDesc->planstate and sets queryDesc->plan_valid to
+ * false. Caller must retry the execution with a freshly created CachedPlan
+ * in that case.
* ----------------------------------------------------------------
*/
void
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ /* Take locks if the plan tree comes from a CachedPlan. */
+ Assert(queryDesc->cplan == NULL ||
+ CachedPlanStillValid(queryDesc->cplan));
+ if (queryDesc->cplan)
+ eflags |= EXEC_FLAG_GET_LOCKS;
+
/*
* In some cases (e.g. an EXECUTE statement) a query execution will skip
* parse analysis, which means that the query_id won't be reported. Note
@@ -582,6 +598,16 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
RTEPermissionInfo *perminfo = lfirst_node(RTEPermissionInfo, l);
Assert(OidIsValid(perminfo->relid));
+
+ /*
+ * Relations whose permissions need to be checked must already have
+ * been locked by the parser or by AcquirePlannerLocks() if a
+ * cached plan is being executed.
+ * XXX shouldn't we skip calling ExecCheckPermissions from InitPlan
+ * in a parallel worker?
+ */
+ Assert(CheckRelLockedByMe(perminfo->relid, AccessShareLock, true) ||
+ IsParallelWorker());
result = ExecCheckOneRelPerms(perminfo);
if (!result)
{
@@ -785,12 +811,43 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
+/*
+ * Lock view relations in a given query's range table.
+ */
+static void
+ExecLockViewRelations(List *viewRelations, EState *estate, bool acquire)
+{
+ ListCell *lc;
+
+ foreach(lc, viewRelations)
+ {
+ Index rti = lfirst_int(lc);
+ RangeTblEntry *rte = exec_rt_fetch(rti, estate);
+
+ Assert(OidIsValid(rte->relid));
+ Assert(rte->relkind == RELKIND_VIEW);
+ Assert(rte->rellockmode != NoLock);
+
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
/* ----------------------------------------------------------------
* InitPlan
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ * If queryDesc contains a CachedPlan, this takes locks on relations.
+ * If any of those relations have undergone concurrent schema changes
+ * between successfully performing RevalidateCachedQuery() on the
+ * containing CachedPlanSource and here, locking those relations would
+ * invalidate the CachedPlan by way of PlanCacheRelCallback(). In that
+ * case, queryDesc->plan_valid would be set to false to tell the caller
+ * to retry after creating a new CachedPlan.
* ----------------------------------------------------------------
*/
static void
@@ -807,17 +864,21 @@ InitPlan(QueryDesc *queryDesc, int eflags)
int i;
/*
- * Do permissions checks and save the list for later use.
+ * initialize the node's execution state
*/
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
- estate->es_rteperminfos = plannedstmt->permInfos;
+ ExecInitRangeTable(estate, rangeTable);
+
+ if (eflags & EXEC_FLAG_GET_LOCKS)
+ ExecLockViewRelations(plannedstmt->viewRelations, estate, true);
/*
- * initialize the node's execution state
+ * Do permissions checks and save the list for later use.
*/
- ExecInitRangeTable(estate, rangeTable);
+ ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
+ estate->es_rteperminfos = plannedstmt->permInfos;
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = queryDesc->cplan;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
@@ -850,6 +911,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -917,6 +980,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
sp_eflags |= EXEC_FLAG_REWIND;
subplanstate = ExecInitNode(subplan, estate, sp_eflags);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
@@ -930,6 +995,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -973,6 +1040,19 @@ InitPlan(QueryDesc *queryDesc, int eflags)
queryDesc->tupDesc = tupType;
queryDesc->planstate = planstate;
+ queryDesc->plan_valid = true;
+ return;
+
+failed:
+ /*
+ * Plan initialization failed. Mark QueryDesc as such and release useless
+ * locks.
+ */
+ queryDesc->plan_valid = false;
+ if (eflags & EXEC_FLAG_GET_LOCKS)
+ ExecLockViewRelations(plannedstmt->viewRelations, estate, false);
+ /* Also ask ExecCloseRangeTableRelations() to release locks. */
+ estate->es_top_eflags |= EXEC_FLAG_REL_LOCKS;
}
/*
@@ -1389,7 +1469,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked.
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
@@ -1558,7 +1638,8 @@ ExecCloseResultRelations(EState *estate)
/*
* Close all relations opened by ExecGetRangeTableRelation().
*
- * We do not release any locks we might hold on those rels.
+ * We do not release any locks we might hold on those rels, unless
+ * the caller asked otherwise.
*/
void
ExecCloseRangeTableRelations(EState *estate)
@@ -1567,8 +1648,12 @@ ExecCloseRangeTableRelations(EState *estate)
for (i = 0; i < estate->es_range_table_size; i++)
{
+ int lockmode = NoLock;
+
+ if (estate->es_top_eflags & EXEC_FLAG_REL_LOCKS)
+ lockmode = exec_rt_fetch(i+1, estate)->rellockmode;
if (estate->es_relations[i])
- table_close(estate->es_relations[i], NoLock);
+ table_close(estate->es_relations[i], lockmode);
}
}
@@ -2797,7 +2882,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2883,6 +2969,7 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
PlanState *subplanstate;
subplanstate = ExecInitNode(subplan, rcestate, 0);
+ Assert(ExecPlanStillValid(rcestate));
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aa3f283453..fe1d173501 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1249,8 +1249,13 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. Note that no CachedPlan is available
+ * here even if the containing plan tree may have come from one in the
+ * leader.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index fd6ca8a5d9..ae6a974e7a 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1817,6 +1817,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/* Create the working data structure for pruning */
prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Perform an initial partition prune pass, if required.
@@ -1943,6 +1945,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d4..bd0c2cba92 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -388,6 +388,9 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
break;
}
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
ExecSetExecProcNode(result, result->ExecProcNode);
/*
@@ -402,6 +405,8 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
Assert(IsA(subplan, SubPlan));
sstate = ExecInitSubPlan(subplan, result);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
subps = lappend(subps, sstate);
}
result->initPlan = subps;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c33a3c0bec..d5bd268514 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -800,7 +800,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (!IsParallelWorker() &&
+ (estate->es_top_eflags & EXEC_FLAG_GET_LOCKS) == 0)
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -844,6 +845,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (!ExecPlanStillValid(estate))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 50e06ec693..949bdfc837 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -843,6 +843,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL, /* fmgr_sql() doesn't use CachedPlans */
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 20d23696a5..f9b668dc01 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3295,6 +3295,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize source tuple type.
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index cb25499b3f..fd0ad98621 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -148,6 +148,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
node->part_prune_index,
node->apprelids,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -218,6 +220,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
appendstate->as_first_partial_plan = firstvalid;
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4c5eb2b23b..98cbeb2502 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -89,6 +89,8 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
i++;
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index f35df0b8bf..121b1afa5d 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -763,11 +763,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 0bf8af9652..be736946f1 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -90,6 +90,8 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
i++;
}
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index bd42c65b29..91239cc500 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -61,6 +61,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
css->ss.ss_currentRelation = scan_rel;
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index c2139acca0..f130d5863d 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -173,6 +173,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -264,6 +266,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Tell the FDW to initialize the scan.
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 307fc10eea..4a7715b8cc 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -89,6 +89,8 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 9d5e1a46e9..9e383c96ff 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -108,6 +108,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Leader may access ExecProcNode result directly (if
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 25a1618952..87af2a92f9 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -185,6 +185,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index eceee99374..c8fedee777 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -379,6 +379,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize our result slot and type. No need to build projection
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index b215e3f59a..86420e8f17 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -659,8 +659,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 12bc22f33c..0456ad779f 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1041,6 +1041,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0b43a9b969..e0aaeb5ebd 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -512,6 +512,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 4540c7781d..5090ee39e0 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -925,6 +925,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 425fbfc405..d8789553e1 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -476,6 +476,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize child expressions
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 407414fc0c..9104954bb1 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -323,6 +323,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 09632678b0..6ef50d3960 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result type and slot. No need to initialize projection info
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 74f7d21bc8..4ecc60a238 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -931,6 +931,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize return slot and type. No need to initialize projection info
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 399b39c598..b12a02c028 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -96,6 +96,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
node->part_prune_index,
node->apprelids,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -152,6 +154,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
mergestate->ps.ps_ProjInfo = NULL;
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 809aa215c6..0157a7ff3c 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1482,11 +1482,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 6f0543af83..66c0ebe16d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4006,6 +4006,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL, node->epqParam);
mtstate->fireBSTriggers = true;
@@ -4032,6 +4035,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
if (resultRelInfo != mtstate->rootResultRelInfo)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* For child result relations, store the root result relation
@@ -4059,6 +4064,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Do additional per-result-relation initialization.
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b3d52e69ec..e4319f5c90 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -295,11 +295,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result slot, type and projection.
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index f6ff3dc44c..a168cd68f6 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -247,6 +247,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index e781003934..3dae9b1497 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 4219712d30..9da456be4a 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -208,6 +208,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index d7e22b1dbb..22357e7a0e 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -125,6 +125,8 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 4da0f28f7b..b0b34cd14e 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,8 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 4bc2406b89..2c350e6c24 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index c6c72c6e67..216a5afb40 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 42471bfc04..34afe14bea 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 2124c55ef5..613b377c7c 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -386,6 +386,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 862bd0330b..1b0a2d8083 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -529,6 +529,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 45035d74fa..06257e9e51 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -136,6 +136,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result slot and type. Unique nodes do no projections, so
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 7c07fb0684..7363291023 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2451,6 +2451,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize source tuple type (which is also the tuple type that we'll
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index e3a170c38b..edea7675d4 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -71,7 +71,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1623,6 +1623,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
_SPI_current->processed = 0;
_SPI_current->tuptable = NULL;
+replan:
/* Create the portal */
if (name == NULL || name[0] == '\0')
{
@@ -1766,7 +1767,8 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if *replan is set.
*/
PortalStart(portal, paramLI, 0, snapshot);
@@ -1775,6 +1777,12 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/* Pop the SPI stack */
_SPI_end_call(true);
@@ -2552,6 +2560,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2661,6 +2670,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2668,14 +2678,31 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+ ExecutorStart(qdesc, eflags);
+ if (!qdesc->plan_valid)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ Assert(cplan);
+ ReleaseCachedPlan(cplan, plan_owner);
+ goto replan;
+ }
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2850,10 +2877,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2897,14 +2923,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index af12c64878..7fb0d2d202 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -52,6 +52,7 @@ node_headers = \
access/tsmapi.h \
commands/event_trigger.h \
commands/trigger.h \
+ executor/execdesc.h \
executor/tuptable.h \
foreign/fdwapi.h \
nodes/bitmapset.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index ecbcadb8bf..b1cf61c0a2 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -63,6 +63,7 @@ my @all_input_files = qw(
access/tsmapi.h
commands/event_trigger.h
commands/trigger.h
+ executor/execdesc.h
executor/tuptable.h
foreign/fdwapi.h
nodes/bitmapset.h
@@ -87,6 +88,7 @@ my @nodetag_only_files = qw(
access/tsmapi.h
commands/event_trigger.h
commands/trigger.h
+ executor/execdesc.h
executor/tuptable.h
foreign/fdwapi.h
nodes/lockoptions.h
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index ba00b99249..955286513d 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -513,6 +513,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
WRITE_BOOL_FIELD(security_barrier);
/* we re-use these RELATION fields, too: */
WRITE_OID_FIELD(relid);
+ WRITE_CHAR_FIELD(relkind);
WRITE_INT_FIELD(rellockmode);
WRITE_UINT_FIELD(perminfoindex);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index f3629cdfd1..3bc5a6dca0 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -480,6 +480,7 @@ _readRangeTblEntry(void)
READ_BOOL_FIELD(security_barrier);
/* we re-use these RELATION fields, too: */
READ_OID_FIELD(relid);
+ READ_CHAR_FIELD(relkind);
READ_INT_FIELD(rellockmode);
READ_UINT_FIELD(perminfoindex);
break;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a1873ce26d..271d2539e8 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -527,6 +527,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->permInfos = glob->finalrteperminfos;
+ result->viewRelations = glob->viewRelations;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 5cc8366af6..f13240bf33 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "access/transam.h"
+#include "catalog/pg_class.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -604,6 +605,10 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, List *rteperminfos,
(newrte->rtekind == RTE_SUBQUERY && OidIsValid(newrte->relid)))
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
+ if (newrte->relkind == RELKIND_VIEW)
+ glob->viewRelations = lappend_int(glob->viewRelations,
+ list_length(glob->finalrtable));
+
/*
* Add a copy of the RTEPermissionInfo, if any, corresponding to this RTE
* to the flattened global list.
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index a614e3f5bd..de07e53178 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1847,11 +1847,10 @@ ApplyRetrieveRule(Query *parsetree,
/*
* Clear fields that should not be set in a subquery RTE. Note that we
- * leave the relid, rellockmode, and perminfoindex fields set, so that the
- * view relation can be appropriately locked before execution and its
- * permissions checked.
+ * leave the relid, relkind, rellockmode, and perminfoindex fields set,
+ * so that the view relation can be appropriately locked before execution
+ * and its permissions checked.
*/
- rte->relkind = 0;
rte->tablesample = NULL;
rte->inh = false; /* must not be set for a subquery */
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a672..c807e9cdcc 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -27,6 +27,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
@@ -364,6 +365,50 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
return false;
}
+/*
+ * CheckRelLockedByMe
+ *
+ * Returns true if current transaction holds a lock on the given relation of
+ * mode 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
+ */
+bool
+CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+ Oid dbId = get_rel_relisshared(relid) ? InvalidOid : MyDatabaseId;
+ LOCKTAG tag;
+
+ SET_LOCKTAG_RELATION(tag, dbId, relid);
+
+ if (LockHeldByMe(&tag, lockmode))
+ return true;
+
+ if (orstronger)
+ {
+ LOCKMODE slockmode;
+
+ for (slockmode = lockmode + 1;
+ slockmode <= MaxLockMode;
+ slockmode++)
+ {
+ if (LockHeldByMe(&tag, slockmode))
+ {
+#ifdef NOT_USED
+ /* Sometimes this might be useful for debugging purposes */
+ elog(WARNING, "lock mode %s substituted for %s on relation %s",
+ GetLockmodeName(tag.locktag_lockmethodid, slockmode),
+ GetLockmodeName(tag.locktag_lockmethodid, lockmode),
+ RelationGetRelationName(relation));
+#endif
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
/*
* LockHasWaitersRelation
*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index cab709b07b..a291bcfcfc 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1199,6 +1199,7 @@ exec_simple_query(const char *query_string)
* Start the portal. No parameters here.
*/
PortalStart(portal, NULL, 0, InvalidSnapshot);
+ Assert(portal->plan_valid);
/*
* Select the appropriate output format: text unless we are doing a
@@ -1703,6 +1704,7 @@ exec_bind_message(StringInfo input_message)
"commands ignored until end of transaction block"),
errdetail_abort()));
+replan:
/*
* Create the portal. Allow silent replacement of an existing portal only
* if the unnamed portal is specified.
@@ -1998,6 +2000,12 @@ exec_bind_message(StringInfo input_message)
*/
PortalStart(portal, params, 0, InvalidSnapshot);
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/*
* Apply the result format requests to the portal.
*/
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5f0248acc5..b9df5d4a04 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -35,12 +36,6 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc);
static void FillPortalStore(Portal portal, bool isTopLevel);
static uint64 RunFromStore(Portal portal, ScanDirection direction, uint64 count,
DestReceiver *dest);
@@ -65,6 +60,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -75,8 +71,10 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
{
QueryDesc *qd = (QueryDesc *) palloc(sizeof(QueryDesc));
+ qd->type = T_QueryDesc;
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan, if plan is from one */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -116,86 +114,6 @@ FreeQueryDesc(QueryDesc *qdesc)
}
-/*
- * ProcessQuery
- * Execute a single plannable query within a PORTAL_MULTI_QUERY,
- * PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
- *
- * plan: the plan tree for the query
- * sourceText: the source text of the query
- * params: any parameters needed
- * dest: where to send results
- * qc: where to store the command completion status data.
- *
- * qc may be NULL if caller doesn't want a status string.
- *
- * Must be called in a memory context that will be reset or deleted on
- * error; otherwise the executor's memory usage will be leaked.
- */
-static void
-ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc)
-{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
-
- /*
- * Run the plan to completion.
- */
- ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
-
- /*
- * Build command completion status data, if caller wants one.
- */
- if (qc)
- {
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
- SetQueryCompletion(qc, CMDTAG_SELECT, queryDesc->estate->es_processed);
- break;
- case CMD_INSERT:
- SetQueryCompletion(qc, CMDTAG_INSERT, queryDesc->estate->es_processed);
- break;
- case CMD_UPDATE:
- SetQueryCompletion(qc, CMDTAG_UPDATE, queryDesc->estate->es_processed);
- break;
- case CMD_DELETE:
- SetQueryCompletion(qc, CMDTAG_DELETE, queryDesc->estate->es_processed);
- break;
- case CMD_MERGE:
- SetQueryCompletion(qc, CMDTAG_MERGE, queryDesc->estate->es_processed);
- break;
- default:
- SetQueryCompletion(qc, CMDTAG_UNKNOWN, queryDesc->estate->es_processed);
- break;
- }
- }
-
- /*
- * Now, we close down all the scans and free allocated resources.
- */
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
-
- FreeQueryDesc(queryDesc);
-}
-
/*
* ChoosePortalStrategy
* Select portal execution strategy given the intended statement list.
@@ -427,7 +345,8 @@ FetchStatementTargetList(Node *stmt)
* to be used for cursors).
*
* On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * tupdesc (if any) is known, unless portal->plan_valid is set to false, in
+ * which case, the caller must retry after generating a new CachedPlan.
*/
void
PortalStart(Portal portal, ParamListInfo params,
@@ -435,7 +354,6 @@ PortalStart(Portal portal, ParamListInfo params,
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
int myeflags;
@@ -448,15 +366,13 @@ PortalStart(Portal portal, ParamListInfo params,
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -472,6 +388,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -493,6 +411,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -501,30 +420,50 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Call ExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated as we're doing that.
*/
ExecutorStart(queryDesc, myeflags);
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ PopActiveSnapshot();
+ portal->plan_valid = false;
+ goto early_exit;
+ }
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -532,33 +471,11 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -578,11 +495,85 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool first = true;
+
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /*
+ * Push the snapshot to be used by the executor.
+ */
+ if (!is_utility)
+ {
+ /*
+ * Must copy the snapshot if we'll need to update
+ * its command ID.
+ */
+ if (!first)
+ PushCopiedSnapshot(GetTransactionSnapshot());
+ else
+ PushActiveSnapshot(GetTransactionSnapshot());
+ }
+
+ /*
+ * From the 2nd statement onwards, update the command
+ * ID and the snapshot to match.
+ */
+ if (!first)
+ {
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+ }
+
+ first = false;
+
+ /*
+ * Create the QueryDesc object. DestReceiver will
+ * be set in PortalRunMulti().
+ */
+ queryDesc = CreateQueryDesc(plan, portal->cplan,
+ portal->sourceText,
+ !is_utility ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalMultiRun() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ if (is_utility)
+ continue;
+
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated as
+ * we're doing that.
+ */
+ ExecutorStart(queryDesc, 0);
+ PopActiveSnapshot();
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ portal->plan_valid = false;
+ goto early_exit;
+ }
+ }
+ }
+
portal->tupDesc = NULL;
+ portal->plan_valid = true;
break;
}
}
@@ -594,19 +585,18 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+early_exit:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
-
- portal->status = PORTAL_READY;
}
/*
@@ -1193,7 +1183,7 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1204,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = lfirst_node(QueryDesc, qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1241,7 +1232,7 @@ PortalRunMulti(Portal portal,
*/
if (!active_snapshot_set)
{
- Snapshot snapshot = GetTransactionSnapshot();
+ Snapshot snapshot = qdesc->snapshot;
/* If told to, register the snapshot and save in portal */
if (setHoldSnapshot)
@@ -1271,23 +1262,38 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Run the plan to completion.
+ */
+ qdesc->dest = dest;
+ ExecutorRun(qdesc, ForwardScanDirection, 0L, true);
+
+ /*
+ * Build command completion status data if needed.
+ */
if (pstmt->canSetTag)
{
- /* statement can set tag string */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- dest, qc);
- }
- else
- {
- /* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- altdest, NULL);
+ switch (qdesc->operation)
+ {
+ case CMD_SELECT:
+ SetQueryCompletion(qc, CMDTAG_SELECT, qdesc->estate->es_processed);
+ break;
+ case CMD_INSERT:
+ SetQueryCompletion(qc, CMDTAG_INSERT, qdesc->estate->es_processed);
+ break;
+ case CMD_UPDATE:
+ SetQueryCompletion(qc, CMDTAG_UPDATE, qdesc->estate->es_processed);
+ break;
+ case CMD_DELETE:
+ SetQueryCompletion(qc, CMDTAG_DELETE, qdesc->estate->es_processed);
+ break;
+ case CMD_MERGE:
+ SetQueryCompletion(qc, CMDTAG_MERGE, qdesc->estate->es_processed);
+ break;
+ default:
+ SetQueryCompletion(qc, CMDTAG_UNKNOWN, qdesc->estate->es_processed);
+ break;
+ }
}
if (log_executor_stats)
@@ -1346,8 +1352,19 @@ PortalRunMulti(Portal portal,
* Increment command counter between queries, but not after the last
* one.
*/
- if (lnext(portal->stmts, stmtlist_item) != NULL)
+ if (lnext(portal->qdescs, qdesc_item) != NULL)
CommandCounterIncrement();
+
+ /* portal->queryDesc is free'd by PortalCleanup(). */
+ if (qdesc != portal->queryDesc)
+ {
+ if (qdesc->estate)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ }
+ FreeQueryDesc(qdesc);
+ }
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index c07382051d..38ae43e24b 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2073,6 +2073,27 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * get_rel_relisshared
+ *
+ * Returns if the given relation is shared or not
+ */
+bool
+get_rel_relisshared(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relisshared;
+ ReleaseSysCache(tp);
+
+ return result;
+}
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 77c2ba3f8f..4e455d815f 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -100,13 +100,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -787,9 +787,6 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
- *
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -803,60 +800,56 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
-
- return false;
+ return cplan->is_valid;
}
/*
@@ -1126,9 +1119,6 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
- *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1360,8 +1350,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if the executor would need to take additional locks, that is, in
+ * addition to those taken by AcquirePlannerLocks() on a given query.
*/
foreach(lc, plan->stmt_list)
{
@@ -1735,58 +1725,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 06dfa85f04..3ad80c7ecb 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -201,6 +201,10 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /* initialize portal's query context to store QueryDescs */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -224,6 +228,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -594,6 +599,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 7c1071ddd1..da39b2e4ff 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,11 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt, struct CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv);
+extern void ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
@@ -103,6 +107,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index af2bf36dfb..c36c25b497 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -32,9 +32,12 @@
*/
typedef struct QueryDesc
{
+ NodeTag type;
+
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ struct CachedPlan *cplan; /* CachedPlan, if plannedstmt is from one */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -47,6 +50,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ bool plan_valid; /* is planstate tree fully valid? */
/* This field is set by ExecutorRun */
bool already_executed; /* true if previously executed */
@@ -57,6 +61,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ struct CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index e7e25c057e..8c680358e8 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -59,6 +60,10 @@
#define EXEC_FLAG_MARK 0x0008 /* need mark/restore */
#define EXEC_FLAG_SKIP_TRIGGERS 0x0010 /* skip AfterTrigger calls */
#define EXEC_FLAG_WITH_NO_DATA 0x0020 /* rel scannability doesn't matter */
+#define EXEC_FLAG_GET_LOCKS 0x0400 /* should ExecGetRangeTableRelation
+ * lock relations? */
+#define EXEC_FLAG_REL_LOCKS 0x8000 /* should ExecCloseRangeTableRelations
+ * release locks? */
/* Hook for plugins to get control in ExecutorStart() */
@@ -245,6 +250,13 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/* Is the cached plan*/
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanStillValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 20f4c8b35f..89f5a627c8 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,8 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct CachedPlan *es_cachedplan; /* CachedPlan if plannedstmt is from
+ * one */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
const char *es_sourceText; /* Source text from QueryDesc */
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index efe0834afb..a8fdd9e176 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -13,6 +13,7 @@ node_support_input_i = [
'access/tsmapi.h',
'commands/event_trigger.h',
'commands/trigger.h',
+ 'executor/execdesc.h',
'executor/tuptable.h',
'foreign/fdwapi.h',
'nodes/bitmapset.h',
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d61a62da19..9b888b0d75 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -116,6 +116,9 @@ typedef struct PlannerGlobal
/* "flat" list of RTEPermissionInfos */
List *finalrteperminfos;
+ /* "flat" list of integer RT indexes */
+ List *viewRelations;
+
/* "flat" list of PlanRowMarks */
List *finalrowmarks;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 659bd05c0c..496410198f 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -78,6 +78,9 @@ typedef struct PlannedStmt
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
+ List *viewRelations; /* integer list of RT indexes, or NIL if no
+ * views are queried */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 4ee91e3cf9..598bf2688a 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,7 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
bool orstronger);
+extern bool CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger);
extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 4f5418b972..3074e604dd 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -139,6 +139,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_relisshared(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a443181d41..c2e485ac2c 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -221,6 +221,21 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Called by the executor after it has finished taking locks on a plan tree
+ * in a CachedPlan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
+extern bool CachedPlanStillValid(CachedPlan *cplan);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aa08b1e0fc..332a08ccb4 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,9 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ bool plan_valid; /* are plan(s) ready for execution? */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +245,7 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalQueryFinish(QueryDesc *queryDesc);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..2fca84d027 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-replan
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7cd76eb34b..5d7a3e9858 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2023, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,41 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* planner_hook function to provide the desired delay */
+static void
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ queryDesc->cplan->is_valid ? "valid" : "not valid");
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +123,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-replan.out b/src/test/modules/delay_execution/expected/cached-plan-replan.out
new file mode 100644
index 0000000000..eaac55122b
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-replan.out
@@ -0,0 +1,43 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+---------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo1 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo1_a
+ Index Cond: (a = $1)
+(6 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+----------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo1 foo_1
+ Filter: (a = $1)
+(4 rows)
+
diff --git a/src/test/modules/delay_execution/specs/cached-plan-replan.spec b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
new file mode 100644
index 0000000000..5bd5fdbf1c
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
@@ -0,0 +1,39 @@
+# Test to check that invalidation of a cached plan during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo1 PARTITION OF foo FOR VALUES IN (1);
+ CREATE INDEX foo1_a ON foo1 (a);
+ CREATE TABLE foo2 PARTITION OF foo FOR VALUES IN (2);
+ CREATE VIEW foov AS SELECT * FROM foo;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP TABLE foo;
+}
+
+session "s1"
+# Creates a prepared statement and forces creation of a generic plan
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo1_a; }
+
+# While "s1exec" waits to acquire the advisory lock, "s2drop" is able to drop
+# the index being used in the cached plan for `q`, so when "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
--
2.35.3
On Thu, Mar 2, 2023 at 10:52 PM Amit Langote <amitlangote09@gmail.com> wrote:
I think I have figured out what might be going wrong on that cfbot
animal after building with the same CPPFLAGS as that animal locally.
I had forgotten to update _out/_readRangeTblEntry() to account for the
patch's change that a view's RTE_SUBQUERY now also preserves relkind
in addition to relid and rellockmode for the locking consideration.Also, I noticed that a multi-query Portal execution with rules was
failing (thanks to a regression test added in a7d71c41db) because of
the snapshot used for the 2nd query onward not being updated for
command ID change under patched model of multi-query Portal execution.
To wit, under the patched model, all queries in the multi-query Portal
case undergo ExecutorStart() before any of it is run with
ExecutorRun(). The patch hadn't changed things however to update the
snapshot's command ID for the 2nd query onwards, which caused the
aforementioned test case to fail.This new model does however mean that the 2nd query onwards must use
PushCopiedSnapshot() given the current requirement of
UpdateActiveSnapshotCommandId() that the snapshot passed to it must
not be referenced anywhere else. The new model basically requires
that each query's QueryDesc points to its own copy of the
ActiveSnapshot. That may not be a thing in favor of the patched model
though. For now, I haven't been able to come up with a better
alternative.
Here's a new version addressing the following 2 points.
* Like views, I realized that non-leaf relations of partition trees
scanned by an Append/MergeAppend would need to be locked separately,
because ExecInitNode() traversal of the plan tree would not account
for them. That is, they are not opened using
ExecGetRangeTableRelation() or ExecOpenScanRelation(). One exception
is that some (if not all) of those non-leaf relations may be
referenced in PartitionPruneInfo and so locked as part of initializing
the corresponding PartitionPruneState, but I decided not to complicate
the code to filter out such relations from the set locked separately.
To carry the set of relations to lock, the refactoring patch 0001
re-introduces the List of Bitmapset field named allpartrelids into
Append/MergeAppend nodes, which we had previously removed on the
grounds that those relations need not be locked separately (commits
f2343653f5b, f003a7522bf).
* I decided to initialize QueryDesc.planstate even in the cases where
ExecInitNode() traversal is aborted in the middle on detecting
CachedPlan invalidation such that it points to a partially initialized
PlanState tree. My earlier thinking that each PlanState node need not
be visited for resource cleanup in such cases was naive after all. To
that end, I've fixed the ExecEndNode() subroutines of all Plan node
types to account for potentially uninitialized fields. There are a
couple of cases where I'm a bit doubtful though. In
ExecEndCustomScan(), there's no indication in CustomScanState whether
it's OK to call EndCustomScan() when BeginCustomScan() may not have
been called. For ForeignScanState, I've assumed that
ForeignScanState.fdw_state being set can be used as a marker that
BeginForeignScan would have been called, though maybe that's not a
solid assumption.
I'm also attaching a new (small) patch 0003 that eliminates the
loop-over-rangetable in ExecCloseRangeTableRelations() in favor of
iterating over a new List field of EState named es_opened_relations,
which is populated by ExecGetRangeTableRelation() with only the
relations that were opened. This speeds up
ExecCloseRangeTableRelations() significantly for the cases with many
runtime-prunable partitions.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v35-0003-Track-opened-range-table-relations-in-a-List-in-.patchapplication/octet-stream; name=v35-0003-Track-opened-range-table-relations-in-a-List-in-.patchDownload
From ff01e18a889f4e7ecb11d58488676395973656b0 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Mon, 13 Mar 2023 15:59:38 +0900
Subject: [PATCH v35 3/3] Track opened range table relations in a List in
EState
This makes ExecCloseRangeTableRelations faster when there are many
relations in the range table but only a few are opened during
execution, such as when run-time pruning kicks in on an Append
containing 1000s of partition subplans.
---
src/backend/executor/execMain.c | 9 +++++----
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 2 ++
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index fc0d2ca481..380739e3a2 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1630,12 +1630,13 @@ ExecCloseResultRelations(EState *estate)
void
ExecCloseRangeTableRelations(EState *estate)
{
- int i;
+ ListCell *lc;
- for (i = 0; i < estate->es_range_table_size; i++)
+ foreach(lc, estate->es_opened_relations)
{
- if (estate->es_relations[i])
- table_close(estate->es_relations[i], NoLock);
+ Relation rel = lfirst(lc);
+
+ table_close(rel, NoLock);
}
}
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index a485e7dfc5..f7053072d9 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -829,6 +829,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
}
estate->es_relations[rti - 1] = rel;
+ estate->es_opened_relations = lappend(estate->es_opened_relations,
+ rel);
}
return rel;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index c6b3885bf6..214a5f5ea4 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -617,6 +617,8 @@ typedef struct EState
Index es_range_table_size; /* size of the range table arrays */
Relation *es_relations; /* Array of per-range-table-entry Relation
* pointers, or NULL if not yet opened */
+ List *es_opened_relations; /* List of non-NULL entries in
+ * es_relations in no specific order */
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
--
2.35.3
v35-0002-Move-AcquireExecutorLocks-s-responsibility-into-.patchapplication/octet-stream; name=v35-0002-Move-AcquireExecutorLocks-s-responsibility-into-.patchDownload
From 48465ed72fc40ac3f26c27703267513295b288f6 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 20 Jan 2023 16:52:31 +0900
Subject: [PATCH v35 2/3] Move AcquireExecutorLocks()'s responsibility into the
executor
This commit introduces a new executor flag EXEC_FLAG_GET_LOCKS that
should be passed in eflags to ExecutorStart() if the PlannedStmt
comes from a CachedPlan. When set, the executor will take locks
on any relations referenced in the plan nodes that need to be
initialized for execution. That excludes any partitions that can
be pruned during the executor initialization phase, that is, based
on the values of only the external (PARAM_EXTERN) parameters.
Relations that are not explicitly mentioned in the plan tree, such
as views and non-leaf partition parents whose children are mentioned
in Append/MergeAppend nodes, are locked separately. After taking each
lock, the executor calls CachedPlanStillValid() to check if
CachedPlan.is_valid has been reset by PlanCacheRelCallback() due to
concurrent modification of relations referenced in the plan. If it
is found that the CachedPlan is indeed invalid, the recursive
ExecInitNode() traversal is aborted at that point. To allow the
proper cleanup of such a partially initialized planstate tree,
ExecEndNode() subroutines of various plan nodes have been fixed to
account for potentially uninitialized fields. It is the caller's
(of ExecutorStart()) responsibility to call ExecutorEnd() even on
a QueryDesc containing such a partially initialized PlanState tree.
Call sites that use plancache (GetCachedPlan) to get the plan trees
to pass to the executor for execution should now be prepared to
handle the case that the plan tree may be flagged by the executor as
stale as described above. To that end, this commit refactors the
relevant code sites to move the ExecutorStart() call closer to the
GetCachedPlan() call to reduce the friction in the cases where
replanning is needed due to a CachedPlan being marked stale in this
manner. Callers must check that QueryDesc.plan_valid is true before
passing it on to ExecutorRun() for execution.
PortalStart() now performs CreateQueryDesc() and ExecutorStart() for
all portal strategies, including those pertaining to multiple queries.
The QueryDescs for strategies handled by PortalRunMulti() are
remembered in the Portal in a new List field 'qdescs', allocated in a
new memory context 'queryContext'. This new arrangment is to make it
easier to discard and recreate a Portal if the CachedPlan goes stale
during setup.
---
contrib/postgres_fdw/postgres_fdw.c | 4 +
src/backend/commands/copyto.c | 4 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 146 +++++---
src/backend/commands/extension.c | 2 +
src/backend/commands/matview.c | 3 +-
src/backend/commands/portalcmds.c | 16 +-
src/backend/commands/prepare.c | 32 +-
src/backend/executor/execMain.c | 89 ++++-
src/backend/executor/execParallel.c | 8 +-
src/backend/executor/execPartition.c | 4 +
src/backend/executor/execProcnode.c | 9 +
src/backend/executor/execUtils.c | 60 +++-
src/backend/executor/functions.c | 2 +
src/backend/executor/nodeAgg.c | 23 +-
src/backend/executor/nodeAppend.c | 23 +-
src/backend/executor/nodeBitmapAnd.c | 10 +-
src/backend/executor/nodeBitmapHeapscan.c | 10 +-
src/backend/executor/nodeBitmapIndexscan.c | 2 +
src/backend/executor/nodeBitmapOr.c | 10 +-
src/backend/executor/nodeCtescan.c | 6 +-
src/backend/executor/nodeCustom.c | 12 +-
src/backend/executor/nodeForeignscan.c | 22 +-
src/backend/executor/nodeFunctionscan.c | 3 +-
src/backend/executor/nodeGather.c | 2 +
src/backend/executor/nodeGatherMerge.c | 2 +
src/backend/executor/nodeGroup.c | 5 +-
src/backend/executor/nodeHash.c | 2 +
src/backend/executor/nodeHashjoin.c | 13 +-
src/backend/executor/nodeIncrementalSort.c | 14 +-
src/backend/executor/nodeIndexonlyscan.c | 7 +-
src/backend/executor/nodeIndexscan.c | 7 +-
src/backend/executor/nodeLimit.c | 2 +
src/backend/executor/nodeLockRows.c | 2 +
src/backend/executor/nodeMaterial.c | 5 +-
src/backend/executor/nodeMemoize.c | 12 +-
src/backend/executor/nodeMergeAppend.c | 23 +-
src/backend/executor/nodeMergejoin.c | 10 +-
src/backend/executor/nodeModifyTable.c | 13 +-
.../executor/nodeNamedtuplestorescan.c | 3 +-
src/backend/executor/nodeNestloop.c | 7 +-
src/backend/executor/nodeProjectSet.c | 5 +-
src/backend/executor/nodeRecursiveunion.c | 4 +
src/backend/executor/nodeResult.c | 5 +-
src/backend/executor/nodeSamplescan.c | 5 +-
src/backend/executor/nodeSeqscan.c | 5 +-
src/backend/executor/nodeSetOp.c | 5 +-
src/backend/executor/nodeSort.c | 8 +-
src/backend/executor/nodeSubqueryscan.c | 5 +-
src/backend/executor/nodeTableFuncscan.c | 3 +-
src/backend/executor/nodeTidrangescan.c | 5 +-
src/backend/executor/nodeTidscan.c | 5 +-
src/backend/executor/nodeUnique.c | 5 +-
src/backend/executor/nodeValuesscan.c | 3 +-
src/backend/executor/nodeWindowAgg.c | 55 +++-
src/backend/executor/nodeWorktablescan.c | 3 +-
src/backend/executor/spi.c | 53 ++-
src/backend/nodes/outfuncs.c | 1 +
src/backend/nodes/readfuncs.c | 1 +
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 5 +
src/backend/rewrite/rewriteHandler.c | 7 +-
src/backend/storage/lmgr/lmgr.c | 45 +++
src/backend/tcop/postgres.c | 13 +-
src/backend/tcop/pquery.c | 311 ++++++++++--------
src/backend/utils/cache/lsyscache.c | 21 ++
src/backend/utils/cache/plancache.c | 134 ++------
src/backend/utils/mmgr/portalmem.c | 6 +
src/include/commands/explain.h | 7 +-
src/include/executor/execdesc.h | 5 +
src/include/executor/executor.h | 12 +
src/include/nodes/execnodes.h | 2 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 3 +
src/include/storage/lmgr.h | 1 +
src/include/utils/lsyscache.h | 1 +
src/include/utils/plancache.h | 14 +
src/include/utils/portal.h | 4 +
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 63 +++-
.../expected/cached-plan-replan.out | 117 +++++++
.../specs/cached-plan-replan.spec | 50 +++
82 files changed, 1213 insertions(+), 422 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-replan.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-replan.spec
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index f5926ab89d..93f3f8b5d1 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2659,7 +2659,11 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
/* Get info about foreign table. */
rtindex = node->resultRelInfo->ri_RangeTableIndex;
if (fsplan->scan.scanrelid == 0)
+ {
dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (!ExecPlanStillValid(estate))
+ return;
+ }
else
dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index beea1ac687..e9f77d5711 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,8 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL,
+ pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -569,6 +570,7 @@ BeginCopyTo(ParseState *pstate,
* ExecutorStart computes a result tupdesc for us
*/
ExecutorStart(cstate->queryDesc, 0);
+ Assert(cstate->queryDesc->plan_valid);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index d6c6d514f3..a55b851574 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e57bda7b62..acae5b455b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -384,6 +384,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
else
{
PlannedStmt *plan;
+ QueryDesc *queryDesc;
instr_time planstart,
planduration;
BufferUsage bufusage_start,
@@ -406,12 +407,93 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
+ queryDesc = ExplainQueryDesc(plan, NULL, queryString, into, es,
+ params, queryEnv);
+ Assert(queryDesc);
+
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(queryDesc, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
+/*
+ * ExplainQueryDesc
+ * Set up QueryDesc for EXPLAINing a given plan
+ *
+ * This returns NULL if cplan is found to have been invalidated since its
+ * creation.
+ */
+QueryDesc *
+ExplainQueryDesc(PlannedStmt *stmt, CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv)
+{
+ QueryDesc *queryDesc;
+ DestReceiver *dest;
+ int eflags;
+ int instrument_option = 0;
+
+ /*
+ * Normally we discard the query's output, but if explaining CREATE TABLE
+ * AS, we'd better use the appropriate tuple receiver.
+ */
+ if (into)
+ dest = CreateIntoRelDestReceiver(into);
+ else
+ dest = None_Receiver;
+
+ if (es->analyze && es->timing)
+ instrument_option |= INSTRUMENT_TIMER;
+ else if (es->analyze)
+ instrument_option |= INSTRUMENT_ROWS;
+
+ if (es->buffers)
+ instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
+
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+
+ /* Create a QueryDesc for the query */
+ queryDesc = CreateQueryDesc(stmt, cplan, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv, instrument_option);
+
+ /* Select execution options */
+ if (es->analyze)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_EXPLAIN_ONLY;
+ if (into)
+ eflags |= GetIntoRelEFlags(into);
+
+ /* Take locks if using a CachedPlan */
+ if (queryDesc->cplan)
+ eflags |= EXEC_FLAG_GET_LOCKS;
+
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated as we're doing that.
+ */
+ ExecutorStart(queryDesc, eflags);
+ if (!queryDesc->plan_valid)
+ {
+ /* Clean up. */
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ return NULL;
+ }
+
+ return queryDesc;
+}
+
/*
* ExplainOneUtility -
* print out the execution plan for one utility statement
@@ -515,29 +597,16 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
{
- DestReceiver *dest;
- QueryDesc *queryDesc;
instr_time starttime;
double totaltime = 0;
- int eflags;
- int instrument_option = 0;
- Assert(plannedstmt->commandType != CMD_UTILITY);
-
- if (es->analyze && es->timing)
- instrument_option |= INSTRUMENT_TIMER;
- else if (es->analyze)
- instrument_option |= INSTRUMENT_ROWS;
-
- if (es->buffers)
- instrument_option |= INSTRUMENT_BUFFERS;
- if (es->wal)
- instrument_option |= INSTRUMENT_WAL;
+ Assert(queryDesc->plannedstmt->commandType != CMD_UTILITY);
/*
* We always collect timing for the entire statement, even when node-level
@@ -546,38 +615,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
- /*
- * Normally we discard the query's output, but if explaining CREATE TABLE
- * AS, we'd better use the appropriate tuple receiver.
- */
- if (into)
- dest = CreateIntoRelDestReceiver(into);
- else
- dest = None_Receiver;
-
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
-
- /* Select execution options */
- if (es->analyze)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_EXPLAIN_ONLY;
- if (into)
- eflags |= GetIntoRelEFlags(into);
-
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
-
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -4851,6 +4888,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 02ff4a9a7f..2d2ef98b54 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -780,11 +780,13 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
ExecutorStart(qdesc, 0);
+ Assert(qdesc->plan_valid);
ExecutorRun(qdesc, ForwardScanDirection, 0, true);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index fb30d2595c..9adaf6c527 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -409,12 +409,13 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, 0);
+ Assert(queryDesc->plan_valid);
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 8a3cf98cce..3c34ab4351 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -146,6 +146,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
PortalStart(portal, params, 0, GetActiveSnapshot());
Assert(portal->strategy == PORTAL_ONE_SELECT);
+ Assert(portal->plan_valid);
/*
* We're done; the query won't actually be run until PerformPortalFetch is
@@ -249,6 +250,17 @@ PerformPortalClose(const char *name)
PortalDrop(portal, false);
}
+/*
+ * Release a portal's QueryDesc.
+ */
+void
+PortalQueryFinish(QueryDesc *queryDesc)
+{
+ ExecutorFinish(queryDesc);
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+}
+
/*
* PortalCleanup
*
@@ -295,9 +307,7 @@ PortalCleanup(Portal portal)
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
- FreeQueryDesc(queryDesc);
+ PortalQueryFinish(queryDesc);
CurrentResourceOwner = saveResourceOwner;
}
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..c9070ed97f 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -183,6 +183,7 @@ ExecuteQuery(ParseState *pstate,
paramLI = EvaluateParams(pstate, entry, stmt->params, estate);
}
+replan:
/* Create a new portal to run the query in */
portal = CreateNewPortal();
/* Don't display the portal in pg_cursors, it is for internal use only */
@@ -251,10 +252,19 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal contains a cached plan, it
+ * must be recreated if portal->plan_valid is false which tells that the
+ * cached plan was found to have been invalidated when initializing one of
+ * the plan trees contained in it.
*/
PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
(void) PortalRun(portal, count, false, true, dest, dest, qc);
PortalDrop(portal, false);
@@ -574,7 +584,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -618,6 +628,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI,
CurrentResourceOwner, queryEnv);
@@ -639,8 +650,21 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ {
+ QueryDesc *queryDesc;
+
+ queryDesc = ExplainQueryDesc(pstmt, cplan, queryString,
+ into, es, paramLI, queryEnv);
+ if (queryDesc == NULL)
+ {
+ ExplainResetOutput(es);
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ goto replan;
+ }
+ ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
+ }
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b32f419176..fc0d2ca481 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -126,11 +126,32 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
* get control when ExecutorStart is called. Such a plugin would
* normally call standard_ExecutorStart().
*
+ * Normally, the plan tree given in queryDesc->plannedstmt is known to be
+ * valid in that *all* relations contained in plannedstmt->relationOids have
+ * already been locked. That may not be the case however if the plannedstmt
+ * comes from a CachedPlan, one given in queryDesc->cplan, in which case only
+ * some of the relations referenced in the plan would have been locked; to
+ * wit, those that AcquirePlannerLocks() deems necessary. Locks necessary
+ * to fully validate such a plan tree, including relations that are added by
+ * the planner, will taken when initializing the plan tree in InitPlan(); the
+ * the caller must have set the EXEC_FLAG_GET_LOCKS bit in eflags. If the
+ * CachedPlan gets invalidated as these locks are taken, plan tree
+ * initialization is suspended at the point when the invalidation is first
+ * detected and InitPlan() returns after setting queryDesc->plan_valid to
+ * false. queryDesc->planstate would be pointing to a potentially partially
+ * initialized PlanState tree in that case. Callers must retry the execution
+ * with a freshly created CachedPlan in that case, after properly freeing the
+ * partially valid QueryDesc.
* ----------------------------------------------------------------
*/
void
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ /* Take locks if the plan tree comes from a CachedPlan. */
+ Assert(queryDesc->cplan == NULL ||
+ (CachedPlanStillValid(queryDesc->cplan) &&
+ (eflags & EXEC_FLAG_GET_LOCKS) != 0));
+
/*
* In some cases (e.g. an EXECUTE statement) a query execution will skip
* parse analysis, which means that the query_id won't be reported. Note
@@ -582,6 +603,16 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
RTEPermissionInfo *perminfo = lfirst_node(RTEPermissionInfo, l);
Assert(OidIsValid(perminfo->relid));
+
+ /*
+ * Relations whose permissions need to be checked must already have
+ * been locked by the parser or by AcquirePlannerLocks() if a
+ * cached plan is being executed.
+ * XXX shouldn't we skip calling ExecCheckPermissions from InitPlan
+ * in a parallel worker?
+ */
+ Assert(CheckRelLockedByMe(perminfo->relid, AccessShareLock, true) ||
+ IsParallelWorker());
result = ExecCheckOneRelPerms(perminfo);
if (!result)
{
@@ -785,12 +816,19 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
-
/* ----------------------------------------------------------------
* InitPlan
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ * If queryDesc contains a CachedPlan, this takes locks on relations.
+ * If any of those relations have undergone concurrent schema changes
+ * between successfully performing RevalidateCachedQuery() on the
+ * containing CachedPlanSource and here, locking those relations would
+ * invalidate the CachedPlan by way of PlanCacheRelCallback(). In that
+ * case, queryDesc->plan_valid would be set to false to tell the caller
+ * to retry after creating a new CachedPlan.
* ----------------------------------------------------------------
*/
static void
@@ -801,20 +839,32 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
- PlanState *planstate;
+ PlanState *planstate = NULL;
TupleDesc tupType;
ListCell *l;
int i;
/*
- * Do permissions checks
+ * Set up range table in EState.
*/
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
+ ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+
+ /* Make sure ExecPlanStillValid() can work. */
+ estate->es_cachedplan = queryDesc->cplan;
/*
- * initialize the node's execution state
+ * Lock any views that were mentioned in the query if needed. View
+ * relations must be locked separately like this, because they are not
+ * referenced in the plan tree.
*/
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+ ExecLockViewRelations(plannedstmt->viewRelations, estate);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
+
+ /*
+ * Do permissions checks
+ */
+ ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
@@ -849,6 +899,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -919,6 +971,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
i++;
}
@@ -929,6 +983,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -972,6 +1028,17 @@ InitPlan(QueryDesc *queryDesc, int eflags)
queryDesc->tupDesc = tupType;
queryDesc->planstate = planstate;
+ queryDesc->plan_valid = true;
+ return;
+
+failed:
+ /*
+ * Plan initialization failed. Mark QueryDesc as such. Note that we do
+ * set planstate, even if it may only be partially initialized, so that
+ * ExecEndPlan() can process it.
+ */
+ queryDesc->planstate = planstate;
+ queryDesc->plan_valid = false;
}
/*
@@ -1389,7 +1456,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked.
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
@@ -2797,7 +2864,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2884,6 +2952,7 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
PlanState *subplanstate;
subplanstate = ExecInitNode(subplan, rcestate, 0);
+ Assert(ExecPlanStillValid(rcestate));
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
}
@@ -2937,6 +3006,10 @@ EvalPlanQualEnd(EPQState *epqstate)
MemoryContext oldcontext;
ListCell *l;
+ /* Nothing to do if EvalPlanQualInit() wasn't done to begin with. */
+ if (epqstate->parentestate == NULL)
+ return;
+
rtsize = epqstate->parentestate->es_range_table_size;
/*
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aa3f283453..df4cc5ddaf 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1249,8 +1249,13 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. Note that no CachedPlan is available
+ * here even if the containing plan tree may have come from one in the
+ * leader.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
@@ -1432,6 +1437,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
ExecutorStart(queryDesc, fpes->eflags);
+ Assert(queryDesc->plan_valid);
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index fd6ca8a5d9..ae6a974e7a 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1817,6 +1817,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/* Create the working data structure for pruning */
prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Perform an initial partition prune pass, if required.
@@ -1943,6 +1945,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d4..6f3c37b6fd 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -388,6 +388,9 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
break;
}
+ if (!ExecPlanStillValid(estate))
+ return result;
+
ExecSetExecProcNode(result, result->ExecProcNode);
/*
@@ -403,6 +406,12 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
Assert(IsA(subplan, SubPlan));
sstate = ExecInitSubPlan(subplan, result);
subps = lappend(subps, sstate);
+ if (!ExecPlanStillValid(estate))
+ {
+ /* Don't lose track of those initialized. */
+ result->initPlan = subps;
+ return result;
+ }
}
result->initPlan = subps;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 012dbb6965..a485e7dfc5 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -804,7 +804,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (!IsParallelWorker() &&
+ (estate->es_top_eflags & EXEC_FLAG_GET_LOCKS) == 0)
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -833,6 +834,61 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
return rel;
}
+/*
+ * ExecLockViewRelations
+ * Lock view relations, if any, in a given query
+ */
+void
+ExecLockViewRelations(List *viewRelations, EState *estate)
+{
+ ListCell *lc;
+
+ /* Nothing to do if no locks need to be taken. */
+ if ((estate->es_top_eflags & EXEC_FLAG_GET_LOCKS) == 0)
+ return;
+
+ foreach(lc, viewRelations)
+ {
+ Index rti = lfirst_int(lc);
+ RangeTblEntry *rte = exec_rt_fetch(rti, estate);
+
+ Assert(OidIsValid(rte->relid));
+ Assert(rte->relkind == RELKIND_VIEW);
+ Assert(rte->rellockmode != NoLock);
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * ExecLockAppendNonLeafRelations
+ * Lock non-leaf relations whose children are scanned by a given
+ * Append/MergeAppend node
+ */
+void
+ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids)
+{
+ ListCell *l;
+
+ /* Nothing to do if no locks need to be taken. */
+ if ((estate->es_top_eflags & EXEC_FLAG_GET_LOCKS) == 0)
+ return;
+
+ foreach(l, allpartrelids)
+ {
+ Bitmapset *partrelids = lfirst_node(Bitmapset, l);
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partrelids, i)) > 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(i, estate);
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+}
+
/*
* ExecInitResultRelation
* Open relation given by the passed-in RT index and fill its
@@ -848,6 +904,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (!ExecPlanStillValid(estate))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 50e06ec693..f8c9de1fda 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -843,6 +843,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL, /* fmgr_sql() doesn't use CachedPlans */
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -868,6 +869,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
eflags = 0; /* default run-to-completion flags */
ExecutorStart(es->qd, eflags);
+ Assert(es->qd->plan_valid);
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 19342a420c..06e0d7d149 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3134,15 +3134,18 @@ hashagg_reset_spill_state(AggState *aggstate)
{
HashAggSpill *spill = &aggstate->hash_spills[setno];
- pfree(spill->ntuples);
- pfree(spill->partitions);
+ if (spill->ntuples)
+ pfree(spill->ntuples);
+ if (spill->partitions)
+ pfree(spill->partitions);
}
pfree(aggstate->hash_spills);
aggstate->hash_spills = NULL;
}
/* free batches */
- list_free_deep(aggstate->hash_batches);
+ if (aggstate->hash_batches)
+ list_free_deep(aggstate->hash_batches);
aggstate->hash_batches = NIL;
/* close tape set */
@@ -3296,6 +3299,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return aggstate;
/*
* initialize source tuple type.
@@ -4336,10 +4341,13 @@ ExecEndAgg(AggState *node)
{
AggStatePerTrans pertrans = &node->pertrans[transno];
- for (setno = 0; setno < numGroupingSets; setno++)
+ if (pertrans)
{
- if (pertrans->sortstates[setno])
- tuplesort_end(pertrans->sortstates[setno]);
+ for (setno = 0; setno < numGroupingSets; setno++)
+ {
+ if (pertrans->sortstates[setno])
+ tuplesort_end(pertrans->sortstates[setno]);
+ }
}
}
@@ -4357,7 +4365,8 @@ ExecEndAgg(AggState *node)
ExecFreeExprContext(&node->ss.ps);
/* clean up tuple table */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c185b11c67..091f979c46 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -109,10 +109,11 @@ AppendState *
ExecInitAppend(Append *node, EState *estate, int eflags)
{
AppendState *appendstate = makeNode(AppendState);
- PlanState **appendplanstates;
+ PlanState **appendplanstates = NULL;
Bitmapset *validsubplans;
Bitmapset *asyncplans;
int nplans;
+ int ninited = 0;
int nasyncplans;
int firstvalid;
int i,
@@ -133,6 +134,15 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_syncdone = false;
appendstate->as_begun = false;
+ /*
+ * Lock non-leaf partitions. In the pruning case, some of these locks
+ * will be retaken when the partition will be opened for pruning, but it
+ * does not seem worthwhile to spend cycles to filter those out here.
+ */
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_index >= 0)
{
@@ -148,6 +158,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
node->part_prune_index,
node->apprelids,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -222,11 +234,12 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ ninited++;
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
}
appendstate->as_first_partial_plan = firstvalid;
- appendstate->appendplans = appendplanstates;
- appendstate->as_nplans = nplans;
/* Initialize async state */
appendstate->as_asyncplans = asyncplans;
@@ -276,6 +289,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
/* For parallel query, this will be overridden later. */
appendstate->choose_next_subplan = choose_next_subplan_locally;
+early_exit:
+ appendstate->appendplans = appendplanstates;
+ appendstate->as_nplans = ninited;
+
return appendstate;
}
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4c5eb2b23b..acc6c50e20 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -57,6 +57,7 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
BitmapAndState *bitmapandstate = makeNode(BitmapAndState);
PlanState **bitmapplanstates;
int nplans;
+ int ninited = 0;
int i;
ListCell *l;
Plan *initNode;
@@ -77,8 +78,6 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
bitmapandstate->ps.plan = (Plan *) node;
bitmapandstate->ps.state = estate;
bitmapandstate->ps.ExecProcNode = ExecBitmapAnd;
- bitmapandstate->bitmapplans = bitmapplanstates;
- bitmapandstate->nplans = nplans;
/*
* call ExecInitNode on each of the plans to be executed and save the
@@ -89,6 +88,9 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ ninited++;
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
i++;
}
@@ -99,6 +101,10 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
* ExecQual or ExecProject. They don't need any tuple slots either.
*/
+early_exit:
+ bitmapandstate->bitmapplans = bitmapplanstates;
+ bitmapandstate->nplans = ninited;
+
return bitmapandstate;
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index f35df0b8bf..e6a689eefb 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -665,7 +665,8 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close down subplans
@@ -693,7 +694,8 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
/*
* close heap scan
*/
- table_endscan(scanDesc);
+ if (scanDesc)
+ table_endscan(scanDesc);
}
/* ----------------------------------------------------------------
@@ -763,11 +765,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 83ec9ede89..cc8332ef68 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -263,6 +263,8 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 0bf8af9652..babad1b4b2 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -58,6 +58,7 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
BitmapOrState *bitmaporstate = makeNode(BitmapOrState);
PlanState **bitmapplanstates;
int nplans;
+ int ninited = 0;
int i;
ListCell *l;
Plan *initNode;
@@ -78,8 +79,6 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
bitmaporstate->ps.plan = (Plan *) node;
bitmaporstate->ps.state = estate;
bitmaporstate->ps.ExecProcNode = ExecBitmapOr;
- bitmaporstate->bitmapplans = bitmapplanstates;
- bitmaporstate->nplans = nplans;
/*
* call ExecInitNode on each of the plans to be executed and save the
@@ -90,6 +89,9 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ ninited++;
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
i++;
}
@@ -100,6 +102,10 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
* ExecQual or ExecProject. They don't need any tuple slots either.
*/
+early_exit:
+ bitmaporstate->bitmapplans = bitmapplanstates;
+ bitmaporstate->nplans = ninited;
+
return bitmaporstate;
}
diff --git a/src/backend/executor/nodeCtescan.c b/src/backend/executor/nodeCtescan.c
index cc4c4243e2..eed5b75a4f 100644
--- a/src/backend/executor/nodeCtescan.c
+++ b/src/backend/executor/nodeCtescan.c
@@ -297,14 +297,16 @@ ExecEndCteScan(CteScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* If I am the leader, free the tuplestore.
*/
if (node->leader == node)
{
- tuplestore_end(node->cte_table);
+ if (node->cte_table)
+ tuplestore_end(node->cte_table);
node->cte_table = NULL;
}
}
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index bd42c65b29..b03499fae5 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -61,6 +61,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return css;
css->ss.ss_currentRelation = scan_rel;
}
@@ -127,6 +129,10 @@ ExecCustomScan(PlanState *pstate)
void
ExecEndCustomScan(CustomScanState *node)
{
+ /*
+ * XXX - BeginCustomScan() may not have occurred if ExecInitCustomScan()
+ * hit the early exit case.
+ */
Assert(node->methods->EndCustomScan != NULL);
node->methods->EndCustomScan(node);
@@ -134,8 +140,10 @@ ExecEndCustomScan(CustomScanState *node)
ExecFreeExprContext(&node->ss.ps);
/* Clean out the tuple table */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
void
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index c2139acca0..d3f0a65485 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -173,6 +173,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -264,6 +266,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* Tell the FDW to initialize the scan.
@@ -300,14 +304,17 @@ ExecEndForeignScan(ForeignScanState *node)
ForeignScan *plan = (ForeignScan *) node->ss.ps.plan;
EState *estate = node->ss.ps.state;
- /* Let the FDW shut down */
- if (plan->operation != CMD_SELECT)
+ /* Let the FDW shut down if needed. */
+ if (node->fdw_state)
{
- if (estate->es_epq_active == NULL)
- node->fdwroutine->EndDirectModify(node);
+ if (plan->operation != CMD_SELECT)
+ {
+ if (estate->es_epq_active == NULL)
+ node->fdwroutine->EndDirectModify(node);
+ }
+ else
+ node->fdwroutine->EndForeignScan(node);
}
- else
- node->fdwroutine->EndForeignScan(node);
/* Shut down any outer plan. */
if (outerPlanState(node))
@@ -319,7 +326,8 @@ ExecEndForeignScan(ForeignScanState *node)
/* clean out the tuple table */
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeFunctionscan.c b/src/backend/executor/nodeFunctionscan.c
index dd06ef8aee..792ecda4a9 100644
--- a/src/backend/executor/nodeFunctionscan.c
+++ b/src/backend/executor/nodeFunctionscan.c
@@ -533,7 +533,8 @@ ExecEndFunctionScan(FunctionScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* Release slots and tuplestore resources
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 307fc10eea..365d3af3e4 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -89,6 +89,8 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return gatherstate;
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 9d5e1a46e9..8d2809f079 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -108,6 +108,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return gm_state;
/*
* Leader may access ExecProcNode result directly (if
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 25a1618952..e0832bb778 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -185,6 +185,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return grpstate;
/*
* Initialize scan slot and type.
@@ -231,7 +233,8 @@ ExecEndGroup(GroupState *node)
ExecFreeExprContext(&node->ss.ps);
/* clean up tuple table */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index eceee99374..6afc04edf1 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -379,6 +379,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hashstate;
/*
* initialize our result slot and type. No need to build projection
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index b215e3f59a..9efb238d1c 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -659,8 +659,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hjstate;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hjstate;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
@@ -781,9 +785,12 @@ ExecEndHashJoin(HashJoinState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->hj_OuterTupleSlot);
- ExecClearTuple(node->hj_HashTupleSlot);
+ if (node->js.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
+ if (node->hj_OuterTupleSlot)
+ ExecClearTuple(node->hj_OuterTupleSlot);
+ if (node->hj_HashTupleSlot)
+ ExecClearTuple(node->hj_HashTupleSlot);
/*
* clean up subtrees
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 12bc22f33c..6b2da56044 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1041,6 +1041,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return incrsortstate;
/*
* Initialize scan slot and type.
@@ -1080,12 +1082,16 @@ ExecEndIncrementalSort(IncrementalSortState *node)
SO_printf("ExecEndIncrementalSort: shutting down sort node\n");
/* clean out the scan tuple */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/* must drop pointer to sort result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ if (node->ss.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
/* must drop standalone tuple slots from outer node */
- ExecDropSingleTupleTableSlot(node->group_pivot);
- ExecDropSingleTupleTableSlot(node->transfer_tuple);
+ if (node->group_pivot)
+ ExecDropSingleTupleTableSlot(node->group_pivot);
+ if (node->transfer_tuple)
+ ExecDropSingleTupleTableSlot(node->transfer_tuple);
/*
* Release tuplesort resources.
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0b43a9b969..b60a086464 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -394,7 +394,8 @@ ExecEndIndexOnlyScan(IndexOnlyScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if(node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close the index relation (no-op if we didn't open it)
@@ -512,6 +513,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -565,6 +568,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->ioss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 4540c7781d..628c233919 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -808,7 +808,8 @@ ExecEndIndexScan(IndexScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close the index relation (no-op if we didn't open it)
@@ -925,6 +926,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -970,6 +973,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 425fbfc405..2fcbde74ed 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -476,6 +476,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return limitstate;
/*
* initialize child expressions
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 407414fc0c..3a8aa2b5a4 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -323,6 +323,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return lrstate;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 09632678b0..f146ebb1d7 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return matstate;
/*
* Initialize result type and slot. No need to initialize projection info
@@ -242,7 +244,8 @@ ExecEndMaterial(MaterialState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* Release tuplestore resources
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 74f7d21bc8..a6df43ba19 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -931,6 +931,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mstate;
/*
* Initialize return slot and type. No need to initialize projection info
@@ -1036,6 +1038,7 @@ ExecEndMemoize(MemoizeState *node)
{
#ifdef USE_ASSERT_CHECKING
/* Validate the memory accounting code is correct in assert builds. */
+ if (node->hashtable)
{
int count;
uint64 mem = 0;
@@ -1082,11 +1085,14 @@ ExecEndMemoize(MemoizeState *node)
}
/* Remove the cache context */
- MemoryContextDelete(node->tableContext);
+ if (node->tableContext)
+ MemoryContextDelete(node->tableContext);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/* must drop pointer to cache result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ if (node->ss.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
/*
* free exprcontext
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 399b39c598..40bba35499 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -65,9 +65,10 @@ MergeAppendState *
ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
{
MergeAppendState *mergestate = makeNode(MergeAppendState);
- PlanState **mergeplanstates;
+ PlanState **mergeplanstates = NULL;
Bitmapset *validsubplans;
int nplans;
+ int ninited = 0;
int i,
j;
@@ -81,6 +82,15 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.state = estate;
mergestate->ps.ExecProcNode = ExecMergeAppend;
+ /*
+ * Lock non-leaf partitions. In the pruning case, some of these locks
+ * will be retaken when the partition will be opened for pruning, but it
+ * does not seem worthwhile to spend cycles to filter those out here.
+ */
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_index >= 0)
{
@@ -96,6 +106,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
node->part_prune_index,
node->apprelids,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -122,8 +134,6 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
}
mergeplanstates = (PlanState **) palloc(nplans * sizeof(PlanState *));
- mergestate->mergeplans = mergeplanstates;
- mergestate->ms_nplans = nplans;
mergestate->ms_slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
mergestate->ms_heap = binaryheap_allocate(nplans, heap_compare_slots,
@@ -152,6 +162,9 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ ninited++;
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
}
mergestate->ps.ps_ProjInfo = NULL;
@@ -188,6 +201,10 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
mergestate->ms_initialized = false;
+early_exit:
+ mergestate->mergeplans = mergeplanstates;
+ mergestate->ms_nplans = ninited;
+
return mergestate;
}
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 809aa215c6..968be05568 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1482,11 +1482,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
@@ -1642,8 +1646,10 @@ ExecEndMergeJoin(MergeJoinState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->mj_MarkedTupleSlot);
+ if (node->js.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
+ if (node->mj_MarkedTupleSlot)
+ ExecClearTuple(node->mj_MarkedTupleSlot);
/*
* shut down the subplans
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 3fa2b930a5..7cdbe7f5f5 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3919,6 +3919,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
Plan *subplan = outerPlan(node);
CmdType operation = node->operation;
int nrels = list_length(node->resultRelations);
+ int ninited = 0;
ResultRelInfo *resultRelInfo;
List *arowmarks;
ListCell *l;
@@ -3940,7 +3941,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
mtstate->canSetTag = node->canSetTag;
mtstate->mt_done = false;
- mtstate->mt_nrels = nrels;
mtstate->resultRelInfo = (ResultRelInfo *)
palloc(nrels * sizeof(ResultRelInfo));
@@ -3975,6 +3975,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL, node->epqParam);
mtstate->fireBSTriggers = true;
@@ -4001,6 +4004,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
if (resultRelInfo != mtstate->rootResultRelInfo)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
/*
* For child result relations, store the root result relation
@@ -4028,11 +4033,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
/*
* Do additional per-result-relation initialization.
*/
- for (i = 0; i < nrels; i++)
+ for (i = 0; i < nrels; i++, ninited++)
{
resultRelInfo = &mtstate->resultRelInfo[i];
@@ -4381,6 +4388,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
estate->es_auxmodifytables = lcons(mtstate,
estate->es_auxmodifytables);
+early_exit:
+ mtstate->mt_nrels = ninited;
return mtstate;
}
diff --git a/src/backend/executor/nodeNamedtuplestorescan.c b/src/backend/executor/nodeNamedtuplestorescan.c
index 46832ad82f..1f92c43d3b 100644
--- a/src/backend/executor/nodeNamedtuplestorescan.c
+++ b/src/backend/executor/nodeNamedtuplestorescan.c
@@ -174,7 +174,8 @@ ExecEndNamedTuplestoreScan(NamedTuplestoreScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b3d52e69ec..deda0c2559 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -295,11 +295,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return nlstate;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return nlstate;
/*
* Initialize result slot, type and projection.
@@ -372,7 +376,8 @@ ExecEndNestLoop(NestLoopState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
+ if (node->js.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
/*
* close down subplans
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index f6ff3dc44c..85d20c4680 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -247,6 +247,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return state;
/*
* we don't use inner plan
@@ -328,7 +330,8 @@ ExecEndProjectSet(ProjectSetState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ps.ps_ResultTupleSlot);
/*
* shut down subplans
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index e781003934..967fe4f287 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return rustate;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return rustate;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 4219712d30..c549b684a3 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -208,6 +208,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return resstate;
/*
* we don't use inner plan
@@ -248,7 +250,8 @@ ExecEndResult(ResultState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ps.ps_ResultTupleSlot);
/*
* shut down subplans
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index d7e22b1dbb..b3bc9b1f77 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -125,6 +125,8 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
@@ -198,7 +200,8 @@ ExecEndSampleScan(SampleScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close heap scan
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 4da0f28f7b..e7ca19ee4e 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,8 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
@@ -200,7 +202,8 @@ ExecEndSeqScan(SeqScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close heap scan
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 4bc2406b89..95950a5c20 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return setopstate;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
@@ -583,7 +585,8 @@ void
ExecEndSetOp(SetOpState *node)
{
/* clean up tuple table */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ps.ps_ResultTupleSlot);
/* free subsidiary stuff including hashtable */
if (node->tableContext)
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index c6c72c6e67..89fef86aba 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return sortstate;
/*
* Initialize scan slot and type.
@@ -306,9 +308,11 @@ ExecEndSort(SortState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/* must drop pointer to sort result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ if (node->ss.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
/*
* Release tuplesort resources
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 42471bfc04..9b8cddc89f 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return subquerystate;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
@@ -177,7 +179,8 @@ ExecEndSubqueryScan(SubqueryScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close down subquery
diff --git a/src/backend/executor/nodeTableFuncscan.c b/src/backend/executor/nodeTableFuncscan.c
index 0c6c912778..d7536953f1 100644
--- a/src/backend/executor/nodeTableFuncscan.c
+++ b/src/backend/executor/nodeTableFuncscan.c
@@ -223,7 +223,8 @@ ExecEndTableFuncScan(TableFuncScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* Release tuplestore resources
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 2124c55ef5..1ae451d7a6 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -342,7 +342,8 @@ ExecEndTidRangeScan(TidRangeScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
@@ -386,6 +387,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return tidrangestate;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 862bd0330b..9fe76b1c60 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -483,7 +483,8 @@ ExecEndTidScan(TidScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
@@ -529,6 +530,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return tidstate;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 45035d74fa..69f23b02c6 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -136,6 +136,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return uniquestate;
/*
* Initialize result slot and type. Unique nodes do no projections, so
@@ -169,7 +171,8 @@ void
ExecEndUnique(UniqueState *node)
{
/* clean up tuple table */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ps.ps_ResultTupleSlot);
ExecFreeExprContext(&node->ps);
diff --git a/src/backend/executor/nodeValuesscan.c b/src/backend/executor/nodeValuesscan.c
index 32ace63017..f5dedbab63 100644
--- a/src/backend/executor/nodeValuesscan.c
+++ b/src/backend/executor/nodeValuesscan.c
@@ -340,7 +340,8 @@ ExecEndValuesScan(ValuesScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 7c07fb0684..616bb97675 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -1334,7 +1334,7 @@ release_partition(WindowAggState *winstate)
WindowStatePerFunc perfuncstate = &(winstate->perfunc[i]);
/* Release any partition-local state of this window function */
- if (perfuncstate->winobj)
+ if (perfuncstate && perfuncstate->winobj)
perfuncstate->winobj->localmem = NULL;
}
@@ -1344,12 +1344,17 @@ release_partition(WindowAggState *winstate)
* any aggregate temp data). We don't rely on retail pfree because some
* aggregates might have allocated data we don't have direct pointers to.
*/
- MemoryContextResetAndDeleteChildren(winstate->partcontext);
- MemoryContextResetAndDeleteChildren(winstate->aggcontext);
- for (i = 0; i < winstate->numaggs; i++)
+ if (winstate->partcontext)
+ MemoryContextResetAndDeleteChildren(winstate->partcontext);
+ if (winstate->aggcontext)
+ MemoryContextResetAndDeleteChildren(winstate->aggcontext);
+ if (winstate->peragg)
{
- if (winstate->peragg[i].aggcontext != winstate->aggcontext)
- MemoryContextResetAndDeleteChildren(winstate->peragg[i].aggcontext);
+ for (i = 0; i < winstate->numaggs; i++)
+ {
+ if (winstate->peragg[i].aggcontext != winstate->aggcontext)
+ MemoryContextResetAndDeleteChildren(winstate->peragg[i].aggcontext);
+ }
}
if (winstate->buffer)
@@ -2451,6 +2456,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return winstate;
/*
* initialize source tuple type (which is also the tuple type that we'll
@@ -2679,11 +2686,16 @@ ExecEndWindowAgg(WindowAggState *node)
release_partition(node);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
- ExecClearTuple(node->first_part_slot);
- ExecClearTuple(node->agg_row_slot);
- ExecClearTuple(node->temp_slot_1);
- ExecClearTuple(node->temp_slot_2);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->first_part_slot)
+ ExecClearTuple(node->first_part_slot);
+ if (node->agg_row_slot)
+ ExecClearTuple(node->agg_row_slot);
+ if (node->temp_slot_1)
+ ExecClearTuple(node->temp_slot_1);
+ if (node->temp_slot_2)
+ ExecClearTuple(node->temp_slot_2);
if (node->framehead_slot)
ExecClearTuple(node->framehead_slot);
if (node->frametail_slot)
@@ -2696,16 +2708,23 @@ ExecEndWindowAgg(WindowAggState *node)
node->ss.ps.ps_ExprContext = node->tmpcontext;
ExecFreeExprContext(&node->ss.ps);
- for (i = 0; i < node->numaggs; i++)
+ if (node->peragg)
{
- if (node->peragg[i].aggcontext != node->aggcontext)
- MemoryContextDelete(node->peragg[i].aggcontext);
+ for (i = 0; i < node->numaggs; i++)
+ {
+ if (node->peragg[i].aggcontext != node->aggcontext)
+ MemoryContextDelete(node->peragg[i].aggcontext);
+ }
}
- MemoryContextDelete(node->partcontext);
- MemoryContextDelete(node->aggcontext);
+ if (node->partcontext)
+ MemoryContextDelete(node->partcontext);
+ if (node->aggcontext)
+ MemoryContextDelete(node->aggcontext);
- pfree(node->perfunc);
- pfree(node->peragg);
+ if (node->perfunc)
+ pfree(node->perfunc);
+ if (node->peragg)
+ pfree(node->peragg);
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
diff --git a/src/backend/executor/nodeWorktablescan.c b/src/backend/executor/nodeWorktablescan.c
index 0c13448236..d70c6afde3 100644
--- a/src/backend/executor/nodeWorktablescan.c
+++ b/src/backend/executor/nodeWorktablescan.c
@@ -200,7 +200,8 @@ ExecEndWorkTableScan(WorkTableScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index e3a170c38b..26a9ea342a 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -71,7 +71,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1623,6 +1623,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
_SPI_current->processed = 0;
_SPI_current->tuptable = NULL;
+replan:
/* Create the portal */
if (name == NULL || name[0] == '\0')
{
@@ -1766,7 +1767,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if portal->plan_valid is false which tells that the cached
+ * plan was found to have been invalidated when initializing one of the
+ * plan trees contained in it.
*/
PortalStart(portal, paramLI, 0, snapshot);
@@ -1775,6 +1779,12 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/* Pop the SPI stack */
_SPI_end_call(true);
@@ -2552,6 +2562,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2661,6 +2672,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2668,14 +2680,36 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+
+ /* Take locks if using a CachedPlan */
+ if (qdesc->cplan)
+ eflags |= EXEC_FLAG_GET_LOCKS;
+
+ ExecutorStart(qdesc, eflags);
+ if (!qdesc->plan_valid)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ Assert(cplan);
+ ReleaseCachedPlan(cplan, plan_owner);
+ goto replan;
+ }
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2850,10 +2884,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2897,14 +2930,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index ba00b99249..955286513d 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -513,6 +513,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
WRITE_BOOL_FIELD(security_barrier);
/* we re-use these RELATION fields, too: */
WRITE_OID_FIELD(relid);
+ WRITE_CHAR_FIELD(relkind);
WRITE_INT_FIELD(rellockmode);
WRITE_UINT_FIELD(perminfoindex);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index f3629cdfd1..3bc5a6dca0 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -480,6 +480,7 @@ _readRangeTblEntry(void)
READ_BOOL_FIELD(security_barrier);
/* we re-use these RELATION fields, too: */
READ_OID_FIELD(relid);
+ READ_CHAR_FIELD(relkind);
READ_INT_FIELD(rellockmode);
READ_UINT_FIELD(perminfoindex);
break;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 62b3ec96cc..5f3ffd98af 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -527,6 +527,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->permInfos = glob->finalrteperminfos;
+ result->viewRelations = glob->viewRelations;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 5cc8366af6..f13240bf33 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "access/transam.h"
+#include "catalog/pg_class.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -604,6 +605,10 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, List *rteperminfos,
(newrte->rtekind == RTE_SUBQUERY && OidIsValid(newrte->relid)))
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
+ if (newrte->relkind == RELKIND_VIEW)
+ glob->viewRelations = lappend_int(glob->viewRelations,
+ list_length(glob->finalrtable));
+
/*
* Add a copy of the RTEPermissionInfo, if any, corresponding to this RTE
* to the flattened global list.
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 980dc1816f..1631c8b993 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1849,11 +1849,10 @@ ApplyRetrieveRule(Query *parsetree,
/*
* Clear fields that should not be set in a subquery RTE. Note that we
- * leave the relid, rellockmode, and perminfoindex fields set, so that the
- * view relation can be appropriately locked before execution and its
- * permissions checked.
+ * leave the relid, relkind, rellockmode, and perminfoindex fields set,
+ * so that the view relation can be appropriately locked before execution
+ * and its permissions checked.
*/
- rte->relkind = 0;
rte->tablesample = NULL;
rte->inh = false; /* must not be set for a subquery */
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a672..c807e9cdcc 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -27,6 +27,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
@@ -364,6 +365,50 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
return false;
}
+/*
+ * CheckRelLockedByMe
+ *
+ * Returns true if current transaction holds a lock on the given relation of
+ * mode 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
+ */
+bool
+CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+ Oid dbId = get_rel_relisshared(relid) ? InvalidOid : MyDatabaseId;
+ LOCKTAG tag;
+
+ SET_LOCKTAG_RELATION(tag, dbId, relid);
+
+ if (LockHeldByMe(&tag, lockmode))
+ return true;
+
+ if (orstronger)
+ {
+ LOCKMODE slockmode;
+
+ for (slockmode = lockmode + 1;
+ slockmode <= MaxLockMode;
+ slockmode++)
+ {
+ if (LockHeldByMe(&tag, slockmode))
+ {
+#ifdef NOT_USED
+ /* Sometimes this might be useful for debugging purposes */
+ elog(WARNING, "lock mode %s substituted for %s on relation %s",
+ GetLockmodeName(tag.locktag_lockmethodid, slockmode),
+ GetLockmodeName(tag.locktag_lockmethodid, lockmode),
+ RelationGetRelationName(relation));
+#endif
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
/*
* LockHasWaitersRelation
*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index cab709b07b..6d0ea07801 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1199,6 +1199,7 @@ exec_simple_query(const char *query_string)
* Start the portal. No parameters here.
*/
PortalStart(portal, NULL, 0, InvalidSnapshot);
+ Assert(portal->plan_valid);
/*
* Select the appropriate output format: text unless we are doing a
@@ -1703,6 +1704,7 @@ exec_bind_message(StringInfo input_message)
"commands ignored until end of transaction block"),
errdetail_abort()));
+replan:
/*
* Create the portal. Allow silent replacement of an existing portal only
* if the unnamed portal is specified.
@@ -1994,10 +1996,19 @@ exec_bind_message(StringInfo input_message)
PopActiveSnapshot();
/*
- * And we're ready to start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if portal->plan_valid is false which tells that the cached
+ * plan was found to have been invalidated when initializing one of the
+ * plan trees contained in it.
*/
PortalStart(portal, params, 0, InvalidSnapshot);
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/*
* Apply the result format requests to the portal.
*/
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5f0248acc5..c93a950d7f 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -35,12 +36,6 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc);
static void FillPortalStore(Portal portal, bool isTopLevel);
static uint64 RunFromStore(Portal portal, ScanDirection direction, uint64 count,
DestReceiver *dest);
@@ -65,6 +60,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +73,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan, if plan is from one */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -116,86 +113,6 @@ FreeQueryDesc(QueryDesc *qdesc)
}
-/*
- * ProcessQuery
- * Execute a single plannable query within a PORTAL_MULTI_QUERY,
- * PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
- *
- * plan: the plan tree for the query
- * sourceText: the source text of the query
- * params: any parameters needed
- * dest: where to send results
- * qc: where to store the command completion status data.
- *
- * qc may be NULL if caller doesn't want a status string.
- *
- * Must be called in a memory context that will be reset or deleted on
- * error; otherwise the executor's memory usage will be leaked.
- */
-static void
-ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc)
-{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
-
- /*
- * Run the plan to completion.
- */
- ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
-
- /*
- * Build command completion status data, if caller wants one.
- */
- if (qc)
- {
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
- SetQueryCompletion(qc, CMDTAG_SELECT, queryDesc->estate->es_processed);
- break;
- case CMD_INSERT:
- SetQueryCompletion(qc, CMDTAG_INSERT, queryDesc->estate->es_processed);
- break;
- case CMD_UPDATE:
- SetQueryCompletion(qc, CMDTAG_UPDATE, queryDesc->estate->es_processed);
- break;
- case CMD_DELETE:
- SetQueryCompletion(qc, CMDTAG_DELETE, queryDesc->estate->es_processed);
- break;
- case CMD_MERGE:
- SetQueryCompletion(qc, CMDTAG_MERGE, queryDesc->estate->es_processed);
- break;
- default:
- SetQueryCompletion(qc, CMDTAG_UNKNOWN, queryDesc->estate->es_processed);
- break;
- }
- }
-
- /*
- * Now, we close down all the scans and free allocated resources.
- */
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
-
- FreeQueryDesc(queryDesc);
-}
-
/*
* ChoosePortalStrategy
* Select portal execution strategy given the intended statement list.
@@ -427,7 +344,8 @@ FetchStatementTargetList(Node *stmt)
* to be used for cursors).
*
* On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * tupdesc (if any) is known, unless portal->plan_valid is set to false, in
+ * which case, the caller must retry after generating a new CachedPlan.
*/
void
PortalStart(Portal portal, ParamListInfo params,
@@ -435,7 +353,6 @@ PortalStart(Portal portal, ParamListInfo params,
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
int myeflags;
@@ -448,15 +365,13 @@ PortalStart(Portal portal, ParamListInfo params,
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -472,6 +387,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -493,6 +410,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -501,30 +419,56 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti(). */
+ if (portal->strategy == PORTAL_ONE_RETURNING ||
+ portal->strategy == PORTAL_ONE_MOD_WITH)
+ portal->qdescs = list_make1(queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
+ /* Take locks if using a CachedPlan */
+ if (queryDesc->cplan)
+ myeflags |= EXEC_FLAG_GET_LOCKS;
+
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Call ExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated as we're doing that.
*/
ExecutorStart(queryDesc, myeflags);
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ PopActiveSnapshot();
+ portal->plan_valid = false;
+ goto early_exit;
+ }
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -532,33 +476,11 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -578,11 +500,90 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool first = true;
+
+ /* Take locks if using a CachedPlan */
+ myeflags = 0;
+ if (portal->cplan)
+ myeflags |= EXEC_FLAG_GET_LOCKS;
+
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /*
+ * Push the snapshot to be used by the executor.
+ */
+ if (!is_utility)
+ {
+ /*
+ * Must copy the snapshot if we'll need to update
+ * its command ID.
+ */
+ if (!first)
+ PushCopiedSnapshot(GetTransactionSnapshot());
+ else
+ PushActiveSnapshot(GetTransactionSnapshot());
+ }
+
+ /*
+ * From the 2nd statement onwards, update the command
+ * ID and the snapshot to match.
+ */
+ if (!first)
+ {
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+ }
+
+ first = false;
+
+ /*
+ * Create the QueryDesc object. DestReceiver will
+ * be set in PortalRunMulti().
+ */
+ queryDesc = CreateQueryDesc(plan, portal->cplan,
+ portal->sourceText,
+ !is_utility ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ if (is_utility)
+ continue;
+
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated as
+ * we're doing that.
+ */
+ ExecutorStart(queryDesc, myeflags);
+ PopActiveSnapshot();
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ portal->plan_valid = false;
+ goto early_exit;
+ }
+ }
+ }
+
portal->tupDesc = NULL;
+ portal->plan_valid = true;
break;
}
}
@@ -594,19 +595,18 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+early_exit:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
-
- portal->status = PORTAL_READY;
}
/*
@@ -1193,7 +1193,7 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1214,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = (QueryDesc *) lfirst(qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1271,23 +1272,38 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Run the plan to completion.
+ */
+ qdesc->dest = dest;
+ ExecutorRun(qdesc, ForwardScanDirection, 0L, true);
+
+ /*
+ * Build command completion status data if needed.
+ */
if (pstmt->canSetTag)
{
- /* statement can set tag string */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- dest, qc);
- }
- else
- {
- /* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- altdest, NULL);
+ switch (qdesc->operation)
+ {
+ case CMD_SELECT:
+ SetQueryCompletion(qc, CMDTAG_SELECT, qdesc->estate->es_processed);
+ break;
+ case CMD_INSERT:
+ SetQueryCompletion(qc, CMDTAG_INSERT, qdesc->estate->es_processed);
+ break;
+ case CMD_UPDATE:
+ SetQueryCompletion(qc, CMDTAG_UPDATE, qdesc->estate->es_processed);
+ break;
+ case CMD_DELETE:
+ SetQueryCompletion(qc, CMDTAG_DELETE, qdesc->estate->es_processed);
+ break;
+ case CMD_MERGE:
+ SetQueryCompletion(qc, CMDTAG_MERGE, qdesc->estate->es_processed);
+ break;
+ default:
+ SetQueryCompletion(qc, CMDTAG_UNKNOWN, qdesc->estate->es_processed);
+ break;
+ }
}
if (log_executor_stats)
@@ -1346,8 +1362,15 @@ PortalRunMulti(Portal portal,
* Increment command counter between queries, but not after the last
* one.
*/
- if (lnext(portal->stmts, stmtlist_item) != NULL)
+ if (lnext(portal->qdescs, qdesc_item) != NULL)
CommandCounterIncrement();
+
+ if (qdesc->estate)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ }
+ FreeQueryDesc(qdesc);
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index c07382051d..38ae43e24b 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2073,6 +2073,27 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * get_rel_relisshared
+ *
+ * Returns if the given relation is shared or not
+ */
+bool
+get_rel_relisshared(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relisshared;
+ ReleaseSysCache(tp);
+
+ return result;
+}
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 77c2ba3f8f..4e455d815f 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -100,13 +100,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -787,9 +787,6 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
- *
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -803,60 +800,56 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
-
- return false;
+ return cplan->is_valid;
}
/*
@@ -1126,9 +1119,6 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
- *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1360,8 +1350,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if the executor would need to take additional locks, that is, in
+ * addition to those taken by AcquirePlannerLocks() on a given query.
*/
foreach(lc, plan->stmt_list)
{
@@ -1735,58 +1725,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 06dfa85f04..3ad80c7ecb 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -201,6 +201,10 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /* initialize portal's query context to store QueryDescs */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -224,6 +228,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -594,6 +599,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 7c1071ddd1..da39b2e4ff 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,11 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt, struct CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv);
+extern void ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
@@ -103,6 +107,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index af2bf36dfb..c36c25b497 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -32,9 +32,12 @@
*/
typedef struct QueryDesc
{
+ NodeTag type;
+
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ struct CachedPlan *cplan; /* CachedPlan, if plannedstmt is from one */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -47,6 +50,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ bool plan_valid; /* is planstate tree fully valid? */
/* This field is set by ExecutorRun */
bool already_executed; /* true if previously executed */
@@ -57,6 +61,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ struct CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 946abc0051..5f860662b1 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -59,6 +60,8 @@
#define EXEC_FLAG_MARK 0x0008 /* need mark/restore */
#define EXEC_FLAG_SKIP_TRIGGERS 0x0010 /* skip AfterTrigger calls */
#define EXEC_FLAG_WITH_NO_DATA 0x0020 /* rel scannability doesn't matter */
+#define EXEC_FLAG_GET_LOCKS 0x0400 /* should the executor lock
+ * relations? */
/* Hook for plugins to get control in ExecutorStart() */
@@ -245,6 +248,13 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/* Is the cached plan*/
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanStillValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
@@ -579,6 +589,8 @@ exec_rt_fetch(Index rti, EState *estate)
}
extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
+extern void ExecLockViewRelations(List *viewRelations, EState *estate);
+extern void ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids);
extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Index rti);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index bc67cb9ed8..c6b3885bf6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -621,6 +621,8 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct CachedPlan *es_cachedplan; /* CachedPlan if plannedstmt is from
+ * one */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
const char *es_sourceText; /* Source text from QueryDesc */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d61a62da19..9b888b0d75 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -116,6 +116,9 @@ typedef struct PlannerGlobal
/* "flat" list of RTEPermissionInfos */
List *finalrteperminfos;
+ /* "flat" list of integer RT indexes */
+ List *viewRelations;
+
/* "flat" list of PlanRowMarks */
List *finalrowmarks;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a0bb16cff4..7cae624bbd 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -78,6 +78,9 @@ typedef struct PlannedStmt
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
+ List *viewRelations; /* integer list of RT indexes, or NIL if no
+ * views are queried */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 4ee91e3cf9..598bf2688a 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,7 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
bool orstronger);
+extern bool CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger);
extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 4f5418b972..3074e604dd 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -139,6 +139,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_relisshared(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a443181d41..8990fe72e3 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -221,6 +221,20 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Called by the executor on every relation lock taken when initializing the
+ * plan tree in the CachedPlan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aa08b1e0fc..332a08ccb4 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,9 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ bool plan_valid; /* are plan(s) ready for execution? */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +245,7 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalQueryFinish(QueryDesc *queryDesc);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..2fca84d027 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-replan
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7cd76eb34b..5d7a3e9858 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2023, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,41 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* planner_hook function to provide the desired delay */
+static void
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ queryDesc->cplan->is_valid ? "valid" : "not valid");
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +123,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-replan.out b/src/test/modules/delay_execution/expected/cached-plan-replan.out
new file mode 100644
index 0000000000..4f450b9d9b
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-replan.out
@@ -0,0 +1,117 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo11 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = $1)
+(6 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-----------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo11 foo_1
+ Filter: (a = $1)
+(4 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q2 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+----------------------------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Index Only Scan using foo11_a_idx on foo11 t1
+ -> Materialize
+ -> Index Scan using foo11_a_idx on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(18 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Sort
+ Sort Key: t1.a
+ -> Seq Scan on foo11 t1
+ -> Sort
+ Sort Key: t2.a
+ -> Seq Scan on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(21 rows)
+
diff --git a/src/test/modules/delay_execution/specs/cached-plan-replan.spec b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
new file mode 100644
index 0000000000..67cfed7044
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
@@ -0,0 +1,50 @@
+# Test to check that invalidation of a cached plan during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo1 PARTITION OF foo FOR VALUES IN (1) PARTITION BY LIST (a);
+ CREATE TABLE foo11 PARTITION OF foo1 FOR VALUES IN (1);
+ CREATE INDEX foo11_a ON foo1 (a);
+ CREATE TABLE foo2 PARTITION OF foo FOR VALUES IN (2);
+ CREATE VIEW foov AS SELECT * FROM foo;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP TABLE foo;
+}
+
+session "s1"
+# Creates a prepared statement and forces creation of a generic plan
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q2 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo11_a; }
+
+# While "s1exec" waits to acquire the advisory lock, "s2drop" is able to drop
+# the index being used in the cached plan for `q`, so when "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
--
2.35.3
v35-0001-Add-field-to-store-partitioned-relids-to-Append-.patchapplication/octet-stream; name=v35-0001-Add-field-to-store-partitioned-relids-to-Append-.patchDownload
From d4c6eddd02d048a42a9941ab7dc8dcf1411fef78 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 9 Mar 2023 11:26:06 +0900
Subject: [PATCH v35 1/3] Add field to store partitioned relids to
Append/MergeAppend
A future commit would like to move the timing of locking relations
referenced in a cached plan to ExecInitNode() traversal of the plan
tree from the current loop-over-rangetable in AcquireExecutorLocks().
Given that partitioned tables (their RT indexes) would not be
accessible via the new way of finding the relations to lock, add a
field to Append/MergeAppend to track them separately.
This refactors the code to look up partitioned parent relids from a
given list of leaf partition subpaths of an Append/MergeAppend out
of make_partition_pruneinfo() into its own function called
add_append_subpath_partrelids(). Though, the code needs to be
generalized to the cases where child rels can be joinrels or
upper (grouping) rels. Also, to make it easier to traverse the parent
chain of a child grouping rel, this makes its RelOptInfo.parent to be
set, which is already done for baserels and joinrels.
---
src/backend/optimizer/plan/createplan.c | 36 +++++--
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/util/appendinfo.c | 134 ++++++++++++++++++++++++
src/backend/partitioning/partprune.c | 124 +++-------------------
src/include/nodes/plannodes.h | 14 +++
src/include/optimizer/appendinfo.h | 3 +
src/include/partitioning/partprune.h | 3 +-
7 files changed, 194 insertions(+), 123 deletions(-)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index fa09a6103b..94fe0a28ad 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -25,6 +25,7 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
@@ -1209,6 +1210,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
Oid *nodeCollations = NULL;
bool *nodeNullsFirst = NULL;
bool consider_async = false;
+ List *allpartrelids = NIL;
/*
* The subpaths list could be empty, if every child was proven empty by
@@ -1350,18 +1352,24 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
++nasyncplans;
}
+ /* Populate partitioned parent relids. */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ plan->allpartrelids = allpartrelids;
+
/* Set below if we find quals that we can use to run-time prune */
plan->part_prune_index = -1;
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1381,7 +1389,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
if (prunequal != NIL)
plan->part_prune_index = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
plan->appendplans = subplans;
@@ -1425,6 +1434,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
+ List *allpartrelids = NIL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1514,18 +1524,23 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplan = (Plan *) sort;
}
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ node->allpartrelids = allpartrelids;
+
/* Set below if we find quals that we can use to run-time prune */
node->part_prune_index = -1;
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1545,7 +1560,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
node->part_prune_index = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
node->mergeplans = subplans;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a1873ce26d..62b3ec96cc 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7801,8 +7801,11 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
agg_costs, gd, &child_extra,
&child_partially_grouped_rel);
+ /* Mark as child of grouped_rel. */
+ child_grouped_rel->parent = grouped_rel;
if (child_partially_grouped_rel)
{
+ child_partially_grouped_rel->parent = grouped_rel;
partially_grouped_live_children =
lappend(partially_grouped_live_children,
child_partially_grouped_rel);
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 9d377385f1..4876742ab2 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -40,6 +40,7 @@ static void make_inh_translation_list(Relation oldrelation,
AppendRelInfo *appinfo);
static Node *adjust_appendrel_attrs_mutator(Node *node,
adjust_appendrel_attrs_context *context);
+static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
/*
@@ -1031,3 +1032,136 @@ distribute_row_identity_vars(PlannerInfo *root)
}
}
}
+
+/*
+ * add_append_subpath_partrelids
+ * Look up a child subpath's rel's partitioned parent relids up to
+ * parentrel and add the bitmapset containing those into
+ * 'allpartrelids'
+ */
+List *
+add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids)
+{
+ RelOptInfo *prel = subpath->parent;
+ Relids partrelids = NULL;
+
+ /* Nothing to do if there's no parent to begin with. */
+ if (!IS_OTHER_REL(prel))
+ return allpartrelids;
+
+ /*
+ * Traverse up to the pathrel's topmost partitioned parent, collecting
+ * parent relids as we go; but stop if we reach parentrel. (Normally, a
+ * pathrel's topmost partitioned parent is either parentrel or a UNION ALL
+ * appendrel child of parentrel. But when handling partitionwise joins of
+ * multi-level partitioning trees, we can see an append path whose
+ * parentrel is an intermediate partitioned table.)
+ */
+ do
+ {
+ Relids parent_relids = NULL;
+
+ /*
+ * For simple child rels, we can simply get the parent relid from
+ * prel->parent. But for partitionwise join and aggregate child rels,
+ * while we can use prel->parent to move up the tree, parent relids to
+ * add into 'partrelids' must be found the hard way through the
+ * AppendInfoInfos, because 1) a joinrel's relids may point to RTE_JOIN
+ * entries, 2) topmost parent grouping rel's relids field is left NULL.
+ */
+ if (IS_SIMPLE_REL(prel))
+ {
+ prel = prel->parent;
+ /* Stop once we reach the root partitioned rel. */
+ if (!IS_PARTITIONED_REL(prel))
+ break;
+ parent_relids = bms_add_members(parent_relids, prel->relids);
+ }
+ else
+ {
+ AppendRelInfo **appinfos;
+ int nappinfos,
+ i;
+
+ appinfos = find_appinfos_by_relids(root, prel->relids,
+ &nappinfos);
+ for (i = 0; i < nappinfos; i++)
+ {
+ AppendRelInfo *appinfo = appinfos[i];
+
+ parent_relids = bms_add_member(parent_relids,
+ appinfo->parent_relid);
+ }
+ pfree(appinfos);
+ prel = prel->parent;
+ }
+ /* accept this level as an interesting parent */
+ partrelids = bms_add_members(partrelids, parent_relids);
+ if (prel == parentrel)
+ break; /* don't traverse above parentrel */
+ } while (IS_OTHER_REL(prel));
+
+ if (partrelids == NULL)
+ return allpartrelids;
+
+ return add_part_relids(allpartrelids, partrelids);
+}
+
+/*
+ * add_part_relids
+ * Add new info to a list of Bitmapsets of partitioned relids.
+ *
+ * Within 'allpartrelids', there is one Bitmapset for each topmost parent
+ * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
+ * parent as well as its relevant non-leaf child partitions. Since (by
+ * construction of the rangetable list) parent partitions must have lower
+ * RT indexes than their children, we can distinguish the topmost parent
+ * as being the lowest set bit in the Bitmapset.
+ *
+ * 'partrelids' contains the RT indexes of a parent partitioned rel, and
+ * possibly some non-leaf children, that are newly identified as parents of
+ * some subpath rel passed to make_partition_pruneinfo(). These are added
+ * to an appropriate member of 'allpartrelids'.
+ *
+ * Note that the list contains only RT indexes of partitioned tables that
+ * are parents of some scan-level relation appearing in the 'subpaths' that
+ * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
+ * not allowed to be higher than the 'parentrel' associated with the append
+ * path. In this way, we avoid expending cycles on partitioned rels that
+ * can't contribute useful pruning information for the problem at hand.
+ * (It is possible for 'parentrel' to be a child partitioned table, and it
+ * is also possible for scan-level relations to be child partitioned tables
+ * rather than leaf partitions. Hence we must construct this relation set
+ * with reference to the particular append path we're dealing with, rather
+ * than looking at the full partitioning structure represented in the
+ * RelOptInfos.)
+ */
+static List *
+add_part_relids(List *allpartrelids, Bitmapset *partrelids)
+{
+ Index targetpart;
+ ListCell *lc;
+
+ /* We can easily get the lowest set bit this way: */
+ targetpart = bms_next_member(partrelids, -1);
+ Assert(targetpart > 0);
+
+ /* Look for a matching topmost parent */
+ foreach(lc, allpartrelids)
+ {
+ Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
+ Index currtarget = bms_next_member(currpartrelids, -1);
+
+ if (targetpart == currtarget)
+ {
+ /* Found a match, so add any new RT indexes to this hierarchy */
+ currpartrelids = bms_add_members(currpartrelids, partrelids);
+ lfirst(lc) = currpartrelids;
+ return allpartrelids;
+ }
+ }
+ /* No match, so add the new partition hierarchy to the list */
+ return lappend(allpartrelids, partrelids);
+}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 510145e3c0..3557e07082 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -138,7 +138,6 @@ typedef struct PruneStepResult
} PruneStepResult;
-static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
RelOptInfo *parentrel,
List *prunequal,
@@ -221,33 +220,32 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'allpartrelids' contains Bitmapsets of RT indexes of partitioned parents
+ * whose partitions' Paths are in 'subpaths'; there's one Bitmapset for every
+ * partition tree involved.
*/
int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ List *allpartrelids)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
- List *allpartrelids;
List *prunerelinfos;
int *relid_subplan_map;
ListCell *lc;
int i;
+ Assert(list_length(allpartrelids) > 0);
+
/*
- * Scan the subpaths to see which ones are scans of partition child
- * relations, and identify their parent partitioned rels. (Note: we must
- * restrict the parent partitioned rels to be parentrel or children of
- * parentrel, otherwise we couldn't translate prunequal to match.)
- *
- * Also construct a temporary array to map from partition-child-relation
- * relid to the index in 'subpaths' of the scan plan for that partition.
+ * Construct a temporary array to map from partition-child-relation relid
+ * to the index in 'subpaths' of the scan plan for that partition.
* (Use of "subplan" rather than "subpath" is a bit of a misnomer, but
* we'll let it stand.) For convenience, we use 1-based indexes here, so
* that zero can represent an un-filled array entry.
*/
- allpartrelids = NIL;
relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
i = 1;
@@ -256,50 +254,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Path *path = (Path *) lfirst(lc);
RelOptInfo *pathrel = path->parent;
- /* We don't consider partitioned joins here */
- if (pathrel->reloptkind == RELOPT_OTHER_MEMBER_REL)
- {
- RelOptInfo *prel = pathrel;
- Bitmapset *partrelids = NULL;
-
- /*
- * Traverse up to the pathrel's topmost partitioned parent,
- * collecting parent relids as we go; but stop if we reach
- * parentrel. (Normally, a pathrel's topmost partitioned parent
- * is either parentrel or a UNION ALL appendrel child of
- * parentrel. But when handling partitionwise joins of
- * multi-level partitioning trees, we can see an append path whose
- * parentrel is an intermediate partitioned table.)
- */
- do
- {
- AppendRelInfo *appinfo;
-
- Assert(prel->relid < root->simple_rel_array_size);
- appinfo = root->append_rel_array[prel->relid];
- prel = find_base_rel(root, appinfo->parent_relid);
- if (!IS_PARTITIONED_REL(prel))
- break; /* reached a non-partitioned parent */
- /* accept this level as an interesting parent */
- partrelids = bms_add_member(partrelids, prel->relid);
- if (prel == parentrel)
- break; /* don't traverse above parentrel */
- } while (prel->reloptkind == RELOPT_OTHER_MEMBER_REL);
-
- if (partrelids)
- {
- /*
- * Found some relevant parent partitions, which may or may not
- * overlap with partition trees we already found. Add new
- * information to the allpartrelids list.
- */
- allpartrelids = add_part_relids(allpartrelids, partrelids);
- /* Also record the subplan in relid_subplan_map[] */
- /* No duplicates please */
- Assert(relid_subplan_map[pathrel->relid] == 0);
- relid_subplan_map[pathrel->relid] = i;
- }
- }
+ /* No duplicates please */
+ Assert(relid_subplan_map[pathrel->relid] == 0);
+ relid_subplan_map[pathrel->relid] = i;
i++;
}
@@ -368,63 +325,6 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return list_length(root->partPruneInfos) - 1;
}
-/*
- * add_part_relids
- * Add new info to a list of Bitmapsets of partitioned relids.
- *
- * Within 'allpartrelids', there is one Bitmapset for each topmost parent
- * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
- * parent as well as its relevant non-leaf child partitions. Since (by
- * construction of the rangetable list) parent partitions must have lower
- * RT indexes than their children, we can distinguish the topmost parent
- * as being the lowest set bit in the Bitmapset.
- *
- * 'partrelids' contains the RT indexes of a parent partitioned rel, and
- * possibly some non-leaf children, that are newly identified as parents of
- * some subpath rel passed to make_partition_pruneinfo(). These are added
- * to an appropriate member of 'allpartrelids'.
- *
- * Note that the list contains only RT indexes of partitioned tables that
- * are parents of some scan-level relation appearing in the 'subpaths' that
- * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
- * not allowed to be higher than the 'parentrel' associated with the append
- * path. In this way, we avoid expending cycles on partitioned rels that
- * can't contribute useful pruning information for the problem at hand.
- * (It is possible for 'parentrel' to be a child partitioned table, and it
- * is also possible for scan-level relations to be child partitioned tables
- * rather than leaf partitions. Hence we must construct this relation set
- * with reference to the particular append path we're dealing with, rather
- * than looking at the full partitioning structure represented in the
- * RelOptInfos.)
- */
-static List *
-add_part_relids(List *allpartrelids, Bitmapset *partrelids)
-{
- Index targetpart;
- ListCell *lc;
-
- /* We can easily get the lowest set bit this way: */
- targetpart = bms_next_member(partrelids, -1);
- Assert(targetpart > 0);
-
- /* Look for a matching topmost parent */
- foreach(lc, allpartrelids)
- {
- Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
- Index currtarget = bms_next_member(currpartrelids, -1);
-
- if (targetpart == currtarget)
- {
- /* Found a match, so add any new RT indexes to this hierarchy */
- currpartrelids = bms_add_members(currpartrelids, partrelids);
- lfirst(lc) = currpartrelids;
- return allpartrelids;
- }
- }
- /* No match, so add the new partition hierarchy to the list */
- return lappend(allpartrelids, partrelids);
-}
-
/*
* make_partitionedrel_pruneinfo
* Build a List of PartitionedRelPruneInfos, one for each interesting
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 659bd05c0c..a0bb16cff4 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -270,6 +270,13 @@ typedef struct Append
List *appendplans;
int nasyncplans; /* # of asynchronous plans */
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * appendplans. The list contains a bitmapset for every partition tree
+ * covered by this Append.
+ */
+ List *allpartrelids;
+
/*
* All 'appendplans' preceding this index are non-partial plans. All
* 'appendplans' from this index onwards are partial plans.
@@ -294,6 +301,13 @@ typedef struct MergeAppend
List *mergeplans;
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * mergeplans. The list contains a bitmapset for every partition tree
+ * covered by this MergeAppend.
+ */
+ List *allpartrelids;
+
/* these fields are just like the sort-key info in struct Sort: */
/* number of sort-key columns */
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index a05f91f77d..1621a7319a 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -46,5 +46,8 @@ extern void add_row_identity_columns(PlannerInfo *root, Index rtindex,
RangeTblEntry *target_rte,
Relation target_relation);
extern void distribute_row_identity_vars(PlannerInfo *root);
+extern List *add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids);
#endif /* APPENDINFO_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index c0d6889d47..2d907d31d4 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -73,7 +73,8 @@ typedef struct PartitionPruneContext
extern int make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ List *allpartrelids);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
On Tue, Mar 14, 2023 at 7:07 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Mar 2, 2023 at 10:52 PM Amit Langote <amitlangote09@gmail.com> wrote:
I think I have figured out what might be going wrong on that cfbot
animal after building with the same CPPFLAGS as that animal locally.
I had forgotten to update _out/_readRangeTblEntry() to account for the
patch's change that a view's RTE_SUBQUERY now also preserves relkind
in addition to relid and rellockmode for the locking consideration.Also, I noticed that a multi-query Portal execution with rules was
failing (thanks to a regression test added in a7d71c41db) because of
the snapshot used for the 2nd query onward not being updated for
command ID change under patched model of multi-query Portal execution.
To wit, under the patched model, all queries in the multi-query Portal
case undergo ExecutorStart() before any of it is run with
ExecutorRun(). The patch hadn't changed things however to update the
snapshot's command ID for the 2nd query onwards, which caused the
aforementioned test case to fail.This new model does however mean that the 2nd query onwards must use
PushCopiedSnapshot() given the current requirement of
UpdateActiveSnapshotCommandId() that the snapshot passed to it must
not be referenced anywhere else. The new model basically requires
that each query's QueryDesc points to its own copy of the
ActiveSnapshot. That may not be a thing in favor of the patched model
though. For now, I haven't been able to come up with a better
alternative.Here's a new version addressing the following 2 points.
* Like views, I realized that non-leaf relations of partition trees
scanned by an Append/MergeAppend would need to be locked separately,
because ExecInitNode() traversal of the plan tree would not account
for them. That is, they are not opened using
ExecGetRangeTableRelation() or ExecOpenScanRelation(). One exception
is that some (if not all) of those non-leaf relations may be
referenced in PartitionPruneInfo and so locked as part of initializing
the corresponding PartitionPruneState, but I decided not to complicate
the code to filter out such relations from the set locked separately.
To carry the set of relations to lock, the refactoring patch 0001
re-introduces the List of Bitmapset field named allpartrelids into
Append/MergeAppend nodes, which we had previously removed on the
grounds that those relations need not be locked separately (commits
f2343653f5b, f003a7522bf).* I decided to initialize QueryDesc.planstate even in the cases where
ExecInitNode() traversal is aborted in the middle on detecting
CachedPlan invalidation such that it points to a partially initialized
PlanState tree. My earlier thinking that each PlanState node need not
be visited for resource cleanup in such cases was naive after all. To
that end, I've fixed the ExecEndNode() subroutines of all Plan node
types to account for potentially uninitialized fields. There are a
couple of cases where I'm a bit doubtful though. In
ExecEndCustomScan(), there's no indication in CustomScanState whether
it's OK to call EndCustomScan() when BeginCustomScan() may not have
been called. For ForeignScanState, I've assumed that
ForeignScanState.fdw_state being set can be used as a marker that
BeginForeignScan would have been called, though maybe that's not a
solid assumption.I'm also attaching a new (small) patch 0003 that eliminates the
loop-over-rangetable in ExecCloseRangeTableRelations() in favor of
iterating over a new List field of EState named es_opened_relations,
which is populated by ExecGetRangeTableRelation() with only the
relations that were opened. This speeds up
ExecCloseRangeTableRelations() significantly for the cases with many
runtime-prunable partitions.
Here's another version with some cosmetic changes, like fixing some
factually incorrect / obsolete comments and typos that I found. I
also noticed that I had missed noting near some table_open() that
locks taken with those can't possibly invalidate a plan (such as
lazily opened partition routing target partitions) and thus need the
treatment that locking during execution initialization requires.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v36-0003-Track-opened-range-table-relations-in-a-List-in-.patchapplication/octet-stream; name=v36-0003-Track-opened-range-table-relations-in-a-List-in-.patchDownload
From d9e140e37ab0dab860ace7c662fba4c0061c7679 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Mon, 13 Mar 2023 15:59:38 +0900
Subject: [PATCH v36 3/3] Track opened range table relations in a List in
EState
This makes ExecCloseRangeTableRelations faster when there are many
relations in the range table but only a few are opened during
execution, such as when run-time pruning kicks in on an Append
containing 1000s of partition subplans.
---
src/backend/executor/execMain.c | 9 +++++----
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 2 ++
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index fd6702a686..506e087474 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1630,12 +1630,13 @@ ExecCloseResultRelations(EState *estate)
void
ExecCloseRangeTableRelations(EState *estate)
{
- int i;
+ ListCell *lc;
- for (i = 0; i < estate->es_range_table_size; i++)
+ foreach(lc, estate->es_opened_relations)
{
- if (estate->es_relations[i])
- table_close(estate->es_relations[i], NoLock);
+ Relation rel = lfirst(lc);
+
+ table_close(rel, NoLock);
}
}
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index a485e7dfc5..f7053072d9 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -829,6 +829,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
}
estate->es_relations[rti - 1] = rel;
+ estate->es_opened_relations = lappend(estate->es_opened_relations,
+ rel);
}
return rel;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index dfa72848c7..984fd2e423 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,8 @@ typedef struct EState
Index es_range_table_size; /* size of the range table arrays */
Relation *es_relations; /* Array of per-range-table-entry Relation
* pointers, or NULL if not yet opened */
+ List *es_opened_relations; /* List of non-NULL entries in
+ * es_relations in no specific order */
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
--
2.35.3
v36-0001-Add-field-to-store-partitioned-relids-to-Append-.patchapplication/octet-stream; name=v36-0001-Add-field-to-store-partitioned-relids-to-Append-.patchDownload
From 281b73dcf3ee569f70ee112c46cd3bc77258f4aa Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 9 Mar 2023 11:26:06 +0900
Subject: [PATCH v36 1/3] Add field to store partitioned relids to
Append/MergeAppend
A future commit would like to move the timing of locking relations
referenced in a cached plan to ExecInitNode() traversal of the plan
tree from the current loop-over-rangetable in AcquireExecutorLocks().
Given that partitioned tables (their RT indexes) would not be
accessible via the new way of finding the relations to lock, add a
field to Append/MergeAppend to track them separately.
This refactors the code to look up partitioned parent relids from a
given list of leaf partition subpaths of an Append/MergeAppend out
of make_partition_pruneinfo() into its own function called
add_append_subpath_partrelids(). Though, the code needs to be
generalized to the cases where child rels can be joinrels or
upper (grouping) rels. Also, to make it easier to traverse the parent
chain of a child grouping rel, this makes its RelOptInfo.parent to be
set, which is already done for baserels and joinrels.
---
src/backend/optimizer/plan/createplan.c | 36 +++++--
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/util/appendinfo.c | 134 ++++++++++++++++++++++++
src/backend/partitioning/partprune.c | 124 +++-------------------
src/include/nodes/plannodes.h | 14 +++
src/include/optimizer/appendinfo.h | 3 +
src/include/partitioning/partprune.h | 3 +-
7 files changed, 194 insertions(+), 123 deletions(-)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 910ffbf1e1..794cdb5e3b 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -25,6 +25,7 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
@@ -1209,6 +1210,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
Oid *nodeCollations = NULL;
bool *nodeNullsFirst = NULL;
bool consider_async = false;
+ List *allpartrelids = NIL;
/*
* The subpaths list could be empty, if every child was proven empty by
@@ -1350,18 +1352,24 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
++nasyncplans;
}
+ /* Populate partitioned parent relids. */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ plan->allpartrelids = allpartrelids;
+
/* Set below if we find quals that we can use to run-time prune */
plan->part_prune_index = -1;
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1381,7 +1389,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
if (prunequal != NIL)
plan->part_prune_index = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
plan->appendplans = subplans;
@@ -1425,6 +1434,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
+ List *allpartrelids = NIL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1514,18 +1524,23 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplan = (Plan *) sort;
}
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ node->allpartrelids = allpartrelids;
+
/* Set below if we find quals that we can use to run-time prune */
node->part_prune_index = -1;
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1537,7 +1552,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
node->part_prune_index = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
node->mergeplans = subplans;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a1873ce26d..62b3ec96cc 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7801,8 +7801,11 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
agg_costs, gd, &child_extra,
&child_partially_grouped_rel);
+ /* Mark as child of grouped_rel. */
+ child_grouped_rel->parent = grouped_rel;
if (child_partially_grouped_rel)
{
+ child_partially_grouped_rel->parent = grouped_rel;
partially_grouped_live_children =
lappend(partially_grouped_live_children,
child_partially_grouped_rel);
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 9d377385f1..4876742ab2 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -40,6 +40,7 @@ static void make_inh_translation_list(Relation oldrelation,
AppendRelInfo *appinfo);
static Node *adjust_appendrel_attrs_mutator(Node *node,
adjust_appendrel_attrs_context *context);
+static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
/*
@@ -1031,3 +1032,136 @@ distribute_row_identity_vars(PlannerInfo *root)
}
}
}
+
+/*
+ * add_append_subpath_partrelids
+ * Look up a child subpath's rel's partitioned parent relids up to
+ * parentrel and add the bitmapset containing those into
+ * 'allpartrelids'
+ */
+List *
+add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids)
+{
+ RelOptInfo *prel = subpath->parent;
+ Relids partrelids = NULL;
+
+ /* Nothing to do if there's no parent to begin with. */
+ if (!IS_OTHER_REL(prel))
+ return allpartrelids;
+
+ /*
+ * Traverse up to the pathrel's topmost partitioned parent, collecting
+ * parent relids as we go; but stop if we reach parentrel. (Normally, a
+ * pathrel's topmost partitioned parent is either parentrel or a UNION ALL
+ * appendrel child of parentrel. But when handling partitionwise joins of
+ * multi-level partitioning trees, we can see an append path whose
+ * parentrel is an intermediate partitioned table.)
+ */
+ do
+ {
+ Relids parent_relids = NULL;
+
+ /*
+ * For simple child rels, we can simply get the parent relid from
+ * prel->parent. But for partitionwise join and aggregate child rels,
+ * while we can use prel->parent to move up the tree, parent relids to
+ * add into 'partrelids' must be found the hard way through the
+ * AppendInfoInfos, because 1) a joinrel's relids may point to RTE_JOIN
+ * entries, 2) topmost parent grouping rel's relids field is left NULL.
+ */
+ if (IS_SIMPLE_REL(prel))
+ {
+ prel = prel->parent;
+ /* Stop once we reach the root partitioned rel. */
+ if (!IS_PARTITIONED_REL(prel))
+ break;
+ parent_relids = bms_add_members(parent_relids, prel->relids);
+ }
+ else
+ {
+ AppendRelInfo **appinfos;
+ int nappinfos,
+ i;
+
+ appinfos = find_appinfos_by_relids(root, prel->relids,
+ &nappinfos);
+ for (i = 0; i < nappinfos; i++)
+ {
+ AppendRelInfo *appinfo = appinfos[i];
+
+ parent_relids = bms_add_member(parent_relids,
+ appinfo->parent_relid);
+ }
+ pfree(appinfos);
+ prel = prel->parent;
+ }
+ /* accept this level as an interesting parent */
+ partrelids = bms_add_members(partrelids, parent_relids);
+ if (prel == parentrel)
+ break; /* don't traverse above parentrel */
+ } while (IS_OTHER_REL(prel));
+
+ if (partrelids == NULL)
+ return allpartrelids;
+
+ return add_part_relids(allpartrelids, partrelids);
+}
+
+/*
+ * add_part_relids
+ * Add new info to a list of Bitmapsets of partitioned relids.
+ *
+ * Within 'allpartrelids', there is one Bitmapset for each topmost parent
+ * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
+ * parent as well as its relevant non-leaf child partitions. Since (by
+ * construction of the rangetable list) parent partitions must have lower
+ * RT indexes than their children, we can distinguish the topmost parent
+ * as being the lowest set bit in the Bitmapset.
+ *
+ * 'partrelids' contains the RT indexes of a parent partitioned rel, and
+ * possibly some non-leaf children, that are newly identified as parents of
+ * some subpath rel passed to make_partition_pruneinfo(). These are added
+ * to an appropriate member of 'allpartrelids'.
+ *
+ * Note that the list contains only RT indexes of partitioned tables that
+ * are parents of some scan-level relation appearing in the 'subpaths' that
+ * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
+ * not allowed to be higher than the 'parentrel' associated with the append
+ * path. In this way, we avoid expending cycles on partitioned rels that
+ * can't contribute useful pruning information for the problem at hand.
+ * (It is possible for 'parentrel' to be a child partitioned table, and it
+ * is also possible for scan-level relations to be child partitioned tables
+ * rather than leaf partitions. Hence we must construct this relation set
+ * with reference to the particular append path we're dealing with, rather
+ * than looking at the full partitioning structure represented in the
+ * RelOptInfos.)
+ */
+static List *
+add_part_relids(List *allpartrelids, Bitmapset *partrelids)
+{
+ Index targetpart;
+ ListCell *lc;
+
+ /* We can easily get the lowest set bit this way: */
+ targetpart = bms_next_member(partrelids, -1);
+ Assert(targetpart > 0);
+
+ /* Look for a matching topmost parent */
+ foreach(lc, allpartrelids)
+ {
+ Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
+ Index currtarget = bms_next_member(currpartrelids, -1);
+
+ if (targetpart == currtarget)
+ {
+ /* Found a match, so add any new RT indexes to this hierarchy */
+ currpartrelids = bms_add_members(currpartrelids, partrelids);
+ lfirst(lc) = currpartrelids;
+ return allpartrelids;
+ }
+ }
+ /* No match, so add the new partition hierarchy to the list */
+ return lappend(allpartrelids, partrelids);
+}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 510145e3c0..3557e07082 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -138,7 +138,6 @@ typedef struct PruneStepResult
} PruneStepResult;
-static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
RelOptInfo *parentrel,
List *prunequal,
@@ -221,33 +220,32 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'allpartrelids' contains Bitmapsets of RT indexes of partitioned parents
+ * whose partitions' Paths are in 'subpaths'; there's one Bitmapset for every
+ * partition tree involved.
*/
int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ List *allpartrelids)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
- List *allpartrelids;
List *prunerelinfos;
int *relid_subplan_map;
ListCell *lc;
int i;
+ Assert(list_length(allpartrelids) > 0);
+
/*
- * Scan the subpaths to see which ones are scans of partition child
- * relations, and identify their parent partitioned rels. (Note: we must
- * restrict the parent partitioned rels to be parentrel or children of
- * parentrel, otherwise we couldn't translate prunequal to match.)
- *
- * Also construct a temporary array to map from partition-child-relation
- * relid to the index in 'subpaths' of the scan plan for that partition.
+ * Construct a temporary array to map from partition-child-relation relid
+ * to the index in 'subpaths' of the scan plan for that partition.
* (Use of "subplan" rather than "subpath" is a bit of a misnomer, but
* we'll let it stand.) For convenience, we use 1-based indexes here, so
* that zero can represent an un-filled array entry.
*/
- allpartrelids = NIL;
relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
i = 1;
@@ -256,50 +254,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Path *path = (Path *) lfirst(lc);
RelOptInfo *pathrel = path->parent;
- /* We don't consider partitioned joins here */
- if (pathrel->reloptkind == RELOPT_OTHER_MEMBER_REL)
- {
- RelOptInfo *prel = pathrel;
- Bitmapset *partrelids = NULL;
-
- /*
- * Traverse up to the pathrel's topmost partitioned parent,
- * collecting parent relids as we go; but stop if we reach
- * parentrel. (Normally, a pathrel's topmost partitioned parent
- * is either parentrel or a UNION ALL appendrel child of
- * parentrel. But when handling partitionwise joins of
- * multi-level partitioning trees, we can see an append path whose
- * parentrel is an intermediate partitioned table.)
- */
- do
- {
- AppendRelInfo *appinfo;
-
- Assert(prel->relid < root->simple_rel_array_size);
- appinfo = root->append_rel_array[prel->relid];
- prel = find_base_rel(root, appinfo->parent_relid);
- if (!IS_PARTITIONED_REL(prel))
- break; /* reached a non-partitioned parent */
- /* accept this level as an interesting parent */
- partrelids = bms_add_member(partrelids, prel->relid);
- if (prel == parentrel)
- break; /* don't traverse above parentrel */
- } while (prel->reloptkind == RELOPT_OTHER_MEMBER_REL);
-
- if (partrelids)
- {
- /*
- * Found some relevant parent partitions, which may or may not
- * overlap with partition trees we already found. Add new
- * information to the allpartrelids list.
- */
- allpartrelids = add_part_relids(allpartrelids, partrelids);
- /* Also record the subplan in relid_subplan_map[] */
- /* No duplicates please */
- Assert(relid_subplan_map[pathrel->relid] == 0);
- relid_subplan_map[pathrel->relid] = i;
- }
- }
+ /* No duplicates please */
+ Assert(relid_subplan_map[pathrel->relid] == 0);
+ relid_subplan_map[pathrel->relid] = i;
i++;
}
@@ -368,63 +325,6 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return list_length(root->partPruneInfos) - 1;
}
-/*
- * add_part_relids
- * Add new info to a list of Bitmapsets of partitioned relids.
- *
- * Within 'allpartrelids', there is one Bitmapset for each topmost parent
- * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
- * parent as well as its relevant non-leaf child partitions. Since (by
- * construction of the rangetable list) parent partitions must have lower
- * RT indexes than their children, we can distinguish the topmost parent
- * as being the lowest set bit in the Bitmapset.
- *
- * 'partrelids' contains the RT indexes of a parent partitioned rel, and
- * possibly some non-leaf children, that are newly identified as parents of
- * some subpath rel passed to make_partition_pruneinfo(). These are added
- * to an appropriate member of 'allpartrelids'.
- *
- * Note that the list contains only RT indexes of partitioned tables that
- * are parents of some scan-level relation appearing in the 'subpaths' that
- * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
- * not allowed to be higher than the 'parentrel' associated with the append
- * path. In this way, we avoid expending cycles on partitioned rels that
- * can't contribute useful pruning information for the problem at hand.
- * (It is possible for 'parentrel' to be a child partitioned table, and it
- * is also possible for scan-level relations to be child partitioned tables
- * rather than leaf partitions. Hence we must construct this relation set
- * with reference to the particular append path we're dealing with, rather
- * than looking at the full partitioning structure represented in the
- * RelOptInfos.)
- */
-static List *
-add_part_relids(List *allpartrelids, Bitmapset *partrelids)
-{
- Index targetpart;
- ListCell *lc;
-
- /* We can easily get the lowest set bit this way: */
- targetpart = bms_next_member(partrelids, -1);
- Assert(targetpart > 0);
-
- /* Look for a matching topmost parent */
- foreach(lc, allpartrelids)
- {
- Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
- Index currtarget = bms_next_member(currpartrelids, -1);
-
- if (targetpart == currtarget)
- {
- /* Found a match, so add any new RT indexes to this hierarchy */
- currpartrelids = bms_add_members(currpartrelids, partrelids);
- lfirst(lc) = currpartrelids;
- return allpartrelids;
- }
- }
- /* No match, so add the new partition hierarchy to the list */
- return lappend(allpartrelids, partrelids);
-}
-
/*
* make_partitionedrel_pruneinfo
* Build a List of PartitionedRelPruneInfos, one for each interesting
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 659bd05c0c..a0bb16cff4 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -270,6 +270,13 @@ typedef struct Append
List *appendplans;
int nasyncplans; /* # of asynchronous plans */
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * appendplans. The list contains a bitmapset for every partition tree
+ * covered by this Append.
+ */
+ List *allpartrelids;
+
/*
* All 'appendplans' preceding this index are non-partial plans. All
* 'appendplans' from this index onwards are partial plans.
@@ -294,6 +301,13 @@ typedef struct MergeAppend
List *mergeplans;
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * mergeplans. The list contains a bitmapset for every partition tree
+ * covered by this MergeAppend.
+ */
+ List *allpartrelids;
+
/* these fields are just like the sort-key info in struct Sort: */
/* number of sort-key columns */
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index a05f91f77d..1621a7319a 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -46,5 +46,8 @@ extern void add_row_identity_columns(PlannerInfo *root, Index rtindex,
RangeTblEntry *target_rte,
Relation target_relation);
extern void distribute_row_identity_vars(PlannerInfo *root);
+extern List *add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids);
#endif /* APPENDINFO_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index c0d6889d47..2d907d31d4 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -73,7 +73,8 @@ typedef struct PartitionPruneContext
extern int make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ List *allpartrelids);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v36-0002-Move-AcquireExecutorLocks-s-responsibility-into-.patchapplication/octet-stream; name=v36-0002-Move-AcquireExecutorLocks-s-responsibility-into-.patchDownload
From 5defff624f9072b62ec48a7aa0c13faf233bf336 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 20 Jan 2023 16:52:31 +0900
Subject: [PATCH v36 2/3] Move AcquireExecutorLocks()'s responsibility into the
executor
This commit introduces a new executor flag EXEC_FLAG_GET_LOCKS that
should be passed in eflags to ExecutorStart() if the PlannedStmt
comes from a CachedPlan. When set, the executor will take locks
on any relations referenced in the plan nodes that need to be
initialized for execution. That excludes any partitions that can
be pruned during the executor initialization phase, that is, based
on the values of only the external (PARAM_EXTERN) parameters.
Relations that are not explicitly mentioned in the plan tree, such
as views and non-leaf partition parents whose children are mentioned
in Append/MergeAppend nodes, are locked separately. After taking each
lock, the executor calls CachedPlanStillValid() to check if
CachedPlan.is_valid has been reset by PlanCacheRelCallback() due to
concurrent modification of relations referenced in the plan. If it
is found that the CachedPlan is indeed invalid, the recursive
ExecInitNode() traversal is aborted at that point. To allow the
proper cleanup of such a partially initialized planstate tree,
ExecEndNode() subroutines of various plan nodes have been fixed to
account for potentially uninitialized fields. It is the caller's
(of ExecutorStart()) responsibility to call ExecutorEnd() even on
a QueryDesc containing such a partially initialized PlanState tree.
Call sites that use plancache (GetCachedPlan) to get the plan trees
to pass to the executor for execution should now be prepared to
handle the case that the plan tree may be flagged by the executor as
stale as described above. To that end, this commit refactors the
relevant code sites to move the ExecutorStart() call closer to the
GetCachedPlan() call to reduce the friction in the cases where
replanning is needed due to a CachedPlan being marked stale in this
manner. Callers must check that QueryDesc.plan_valid is true before
passing it on to ExecutorRun() for execution.
PortalStart() now performs CreateQueryDesc() and ExecutorStart() for
all portal strategies, including those pertaining to multiple queries.
The QueryDescs for strategies handled by PortalRunMulti() are
remembered in the Portal in a new List field 'qdescs', allocated in a
new memory context 'queryContext'. This new arrangment is to make it
easier to discard and recreate a Portal if the CachedPlan goes stale
during setup.
---
contrib/postgres_fdw/postgres_fdw.c | 4 +
src/backend/commands/copyto.c | 4 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 146 +++++---
src/backend/commands/extension.c | 2 +
src/backend/commands/matview.c | 3 +-
src/backend/commands/portalcmds.c | 16 +-
src/backend/commands/prepare.c | 32 +-
src/backend/executor/execMain.c | 89 ++++-
src/backend/executor/execParallel.c | 8 +-
src/backend/executor/execPartition.c | 15 +
src/backend/executor/execProcnode.c | 9 +
src/backend/executor/execUtils.c | 60 +++-
src/backend/executor/functions.c | 2 +
src/backend/executor/nodeAgg.c | 23 +-
src/backend/executor/nodeAppend.c | 23 +-
src/backend/executor/nodeBitmapAnd.c | 10 +-
src/backend/executor/nodeBitmapHeapscan.c | 10 +-
src/backend/executor/nodeBitmapIndexscan.c | 2 +
src/backend/executor/nodeBitmapOr.c | 10 +-
src/backend/executor/nodeCtescan.c | 6 +-
src/backend/executor/nodeCustom.c | 12 +-
src/backend/executor/nodeForeignscan.c | 22 +-
src/backend/executor/nodeFunctionscan.c | 3 +-
src/backend/executor/nodeGather.c | 2 +
src/backend/executor/nodeGatherMerge.c | 2 +
src/backend/executor/nodeGroup.c | 5 +-
src/backend/executor/nodeHash.c | 2 +
src/backend/executor/nodeHashjoin.c | 13 +-
src/backend/executor/nodeIncrementalSort.c | 14 +-
src/backend/executor/nodeIndexonlyscan.c | 7 +-
src/backend/executor/nodeIndexscan.c | 7 +-
src/backend/executor/nodeLimit.c | 2 +
src/backend/executor/nodeLockRows.c | 2 +
src/backend/executor/nodeMaterial.c | 5 +-
src/backend/executor/nodeMemoize.c | 12 +-
src/backend/executor/nodeMergeAppend.c | 23 +-
src/backend/executor/nodeMergejoin.c | 10 +-
src/backend/executor/nodeModifyTable.c | 13 +-
.../executor/nodeNamedtuplestorescan.c | 3 +-
src/backend/executor/nodeNestloop.c | 7 +-
src/backend/executor/nodeProjectSet.c | 5 +-
src/backend/executor/nodeRecursiveunion.c | 4 +
src/backend/executor/nodeResult.c | 5 +-
src/backend/executor/nodeSamplescan.c | 5 +-
src/backend/executor/nodeSeqscan.c | 5 +-
src/backend/executor/nodeSetOp.c | 5 +-
src/backend/executor/nodeSort.c | 8 +-
src/backend/executor/nodeSubqueryscan.c | 5 +-
src/backend/executor/nodeTableFuncscan.c | 3 +-
src/backend/executor/nodeTidrangescan.c | 5 +-
src/backend/executor/nodeTidscan.c | 5 +-
src/backend/executor/nodeUnique.c | 5 +-
src/backend/executor/nodeValuesscan.c | 3 +-
src/backend/executor/nodeWindowAgg.c | 55 +++-
src/backend/executor/nodeWorktablescan.c | 3 +-
src/backend/executor/spi.c | 53 ++-
src/backend/nodes/outfuncs.c | 1 +
src/backend/nodes/readfuncs.c | 1 +
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 5 +
src/backend/rewrite/rewriteHandler.c | 7 +-
src/backend/storage/lmgr/lmgr.c | 45 +++
src/backend/tcop/postgres.c | 13 +-
src/backend/tcop/pquery.c | 311 ++++++++++--------
src/backend/utils/cache/lsyscache.c | 21 ++
src/backend/utils/cache/plancache.c | 134 ++------
src/backend/utils/mmgr/portalmem.c | 6 +
src/include/commands/explain.h | 7 +-
src/include/executor/execdesc.h | 5 +
src/include/executor/executor.h | 12 +
src/include/nodes/execnodes.h | 2 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 3 +
src/include/storage/lmgr.h | 1 +
src/include/utils/lsyscache.h | 1 +
src/include/utils/plancache.h | 14 +
src/include/utils/portal.h | 4 +
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 63 +++-
.../expected/cached-plan-replan.out | 117 +++++++
.../specs/cached-plan-replan.spec | 50 +++
82 files changed, 1224 insertions(+), 422 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-replan.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-replan.spec
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index f5926ab89d..93f3f8b5d1 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2659,7 +2659,11 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
/* Get info about foreign table. */
rtindex = node->resultRelInfo->ri_RangeTableIndex;
if (fsplan->scan.scanrelid == 0)
+ {
dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (!ExecPlanStillValid(estate))
+ return;
+ }
else
dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index beea1ac687..e9f77d5711 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,8 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL,
+ pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -569,6 +570,7 @@ BeginCopyTo(ParseState *pstate,
* ExecutorStart computes a result tupdesc for us
*/
ExecutorStart(cstate->queryDesc, 0);
+ Assert(cstate->queryDesc->plan_valid);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index d6c6d514f3..a55b851574 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e57bda7b62..acae5b455b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -384,6 +384,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
else
{
PlannedStmt *plan;
+ QueryDesc *queryDesc;
instr_time planstart,
planduration;
BufferUsage bufusage_start,
@@ -406,12 +407,93 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
+ queryDesc = ExplainQueryDesc(plan, NULL, queryString, into, es,
+ params, queryEnv);
+ Assert(queryDesc);
+
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(queryDesc, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
+/*
+ * ExplainQueryDesc
+ * Set up QueryDesc for EXPLAINing a given plan
+ *
+ * This returns NULL if cplan is found to have been invalidated since its
+ * creation.
+ */
+QueryDesc *
+ExplainQueryDesc(PlannedStmt *stmt, CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv)
+{
+ QueryDesc *queryDesc;
+ DestReceiver *dest;
+ int eflags;
+ int instrument_option = 0;
+
+ /*
+ * Normally we discard the query's output, but if explaining CREATE TABLE
+ * AS, we'd better use the appropriate tuple receiver.
+ */
+ if (into)
+ dest = CreateIntoRelDestReceiver(into);
+ else
+ dest = None_Receiver;
+
+ if (es->analyze && es->timing)
+ instrument_option |= INSTRUMENT_TIMER;
+ else if (es->analyze)
+ instrument_option |= INSTRUMENT_ROWS;
+
+ if (es->buffers)
+ instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
+
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+
+ /* Create a QueryDesc for the query */
+ queryDesc = CreateQueryDesc(stmt, cplan, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv, instrument_option);
+
+ /* Select execution options */
+ if (es->analyze)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_EXPLAIN_ONLY;
+ if (into)
+ eflags |= GetIntoRelEFlags(into);
+
+ /* Take locks if using a CachedPlan */
+ if (queryDesc->cplan)
+ eflags |= EXEC_FLAG_GET_LOCKS;
+
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated as we're doing that.
+ */
+ ExecutorStart(queryDesc, eflags);
+ if (!queryDesc->plan_valid)
+ {
+ /* Clean up. */
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ return NULL;
+ }
+
+ return queryDesc;
+}
+
/*
* ExplainOneUtility -
* print out the execution plan for one utility statement
@@ -515,29 +597,16 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
{
- DestReceiver *dest;
- QueryDesc *queryDesc;
instr_time starttime;
double totaltime = 0;
- int eflags;
- int instrument_option = 0;
- Assert(plannedstmt->commandType != CMD_UTILITY);
-
- if (es->analyze && es->timing)
- instrument_option |= INSTRUMENT_TIMER;
- else if (es->analyze)
- instrument_option |= INSTRUMENT_ROWS;
-
- if (es->buffers)
- instrument_option |= INSTRUMENT_BUFFERS;
- if (es->wal)
- instrument_option |= INSTRUMENT_WAL;
+ Assert(queryDesc->plannedstmt->commandType != CMD_UTILITY);
/*
* We always collect timing for the entire statement, even when node-level
@@ -546,38 +615,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
- /*
- * Normally we discard the query's output, but if explaining CREATE TABLE
- * AS, we'd better use the appropriate tuple receiver.
- */
- if (into)
- dest = CreateIntoRelDestReceiver(into);
- else
- dest = None_Receiver;
-
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
-
- /* Select execution options */
- if (es->analyze)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_EXPLAIN_ONLY;
- if (into)
- eflags |= GetIntoRelEFlags(into);
-
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
-
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -4851,6 +4888,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 0eabe18335..5a76343123 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -797,11 +797,13 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
ExecutorStart(qdesc, 0);
+ Assert(qdesc->plan_valid);
ExecutorRun(qdesc, ForwardScanDirection, 0, true);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index fb30d2595c..9adaf6c527 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -409,12 +409,13 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, 0);
+ Assert(queryDesc->plan_valid);
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 8a3cf98cce..3c34ab4351 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -146,6 +146,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
PortalStart(portal, params, 0, GetActiveSnapshot());
Assert(portal->strategy == PORTAL_ONE_SELECT);
+ Assert(portal->plan_valid);
/*
* We're done; the query won't actually be run until PerformPortalFetch is
@@ -249,6 +250,17 @@ PerformPortalClose(const char *name)
PortalDrop(portal, false);
}
+/*
+ * Release a portal's QueryDesc.
+ */
+void
+PortalQueryFinish(QueryDesc *queryDesc)
+{
+ ExecutorFinish(queryDesc);
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+}
+
/*
* PortalCleanup
*
@@ -295,9 +307,7 @@ PortalCleanup(Portal portal)
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
- FreeQueryDesc(queryDesc);
+ PortalQueryFinish(queryDesc);
CurrentResourceOwner = saveResourceOwner;
}
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..c9070ed97f 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -183,6 +183,7 @@ ExecuteQuery(ParseState *pstate,
paramLI = EvaluateParams(pstate, entry, stmt->params, estate);
}
+replan:
/* Create a new portal to run the query in */
portal = CreateNewPortal();
/* Don't display the portal in pg_cursors, it is for internal use only */
@@ -251,10 +252,19 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal contains a cached plan, it
+ * must be recreated if portal->plan_valid is false which tells that the
+ * cached plan was found to have been invalidated when initializing one of
+ * the plan trees contained in it.
*/
PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
(void) PortalRun(portal, count, false, true, dest, dest, qc);
PortalDrop(portal, false);
@@ -574,7 +584,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -618,6 +628,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI,
CurrentResourceOwner, queryEnv);
@@ -639,8 +650,21 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ {
+ QueryDesc *queryDesc;
+
+ queryDesc = ExplainQueryDesc(pstmt, cplan, queryString,
+ into, es, paramLI, queryEnv);
+ if (queryDesc == NULL)
+ {
+ ExplainResetOutput(es);
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ goto replan;
+ }
+ ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
+ }
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b32f419176..fd6702a686 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -126,11 +126,32 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
* get control when ExecutorStart is called. Such a plugin would
* normally call standard_ExecutorStart().
*
+ * Normally, the plan tree given in queryDesc->plannedstmt is known to be
+ * valid in that *all* relations contained in plannedstmt->relationOids have
+ * already been locked. That may not be the case however if the plannedstmt
+ * comes from a CachedPlan, one given in queryDesc->cplan, in which case only
+ * some of the relations referenced in the plan would have been locked; to
+ * wit, those that AcquirePlannerLocks() deems necessary. Locks necessary
+ * to fully validate such a plan tree, including relations that are added by
+ * the planner, will be taken when initializing the plan tree in InitPlan();
+ * the the caller must have set the EXEC_FLAG_GET_LOCKS bit in eflags. If the
+ * CachedPlan gets invalidated as these locks are taken, plan tree
+ * initialization is suspended at the point when such invalidation is first
+ * detected and InitPlan() returns after setting queryDesc->plan_valid to
+ * false. queryDesc->planstate would be pointing to a potentially partially
+ * initialized PlanState tree in that case. Callers must retry the execution
+ * with a freshly created CachedPlan in that case, after properly freeing the
+ * partially valid QueryDesc.
* ----------------------------------------------------------------
*/
void
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ /* Take locks if the plan tree comes from a CachedPlan. */
+ Assert(queryDesc->cplan == NULL ||
+ (CachedPlanStillValid(queryDesc->cplan) &&
+ (eflags & EXEC_FLAG_GET_LOCKS) != 0));
+
/*
* In some cases (e.g. an EXECUTE statement) a query execution will skip
* parse analysis, which means that the query_id won't be reported. Note
@@ -582,6 +603,16 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
RTEPermissionInfo *perminfo = lfirst_node(RTEPermissionInfo, l);
Assert(OidIsValid(perminfo->relid));
+
+ /*
+ * Relations whose permissions need to be checked must already have
+ * been locked by the parser or by AcquirePlannerLocks() if a
+ * cached plan is being executed.
+ * XXX shouldn't we skip calling ExecCheckPermissions from InitPlan
+ * in a parallel worker?
+ */
+ Assert(CheckRelLockedByMe(perminfo->relid, AccessShareLock, true) ||
+ IsParallelWorker());
result = ExecCheckOneRelPerms(perminfo);
if (!result)
{
@@ -785,12 +816,19 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
-
/* ----------------------------------------------------------------
* InitPlan
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ * If queryDesc contains a CachedPlan, this takes locks on relations.
+ * If any of those relations have undergone concurrent schema changes
+ * between successfully performing RevalidateCachedQuery() on the
+ * containing CachedPlanSource and here, locking those relations would
+ * invalidate the CachedPlan by way of PlanCacheRelCallback(). In that
+ * case, queryDesc->plan_valid would be set to false to tell the caller
+ * to retry after creating a new CachedPlan.
* ----------------------------------------------------------------
*/
static void
@@ -801,20 +839,32 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
- PlanState *planstate;
+ PlanState *planstate = NULL;
TupleDesc tupType;
ListCell *l;
int i;
/*
- * Do permissions checks
+ * Set up range table in EState.
*/
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
+ ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+
+ /* Make sure ExecPlanStillValid() can work. */
+ estate->es_cachedplan = queryDesc->cplan;
/*
- * initialize the node's execution state
+ * Lock any views that were mentioned in the query if needed. View
+ * relations must be locked separately like this, because they are not
+ * referenced in the plan tree.
*/
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+ ExecLockViewRelations(plannedstmt->viewRelations, estate);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
+
+ /*
+ * Do permissions checks
+ */
+ ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
@@ -849,6 +899,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -919,6 +971,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
i++;
}
@@ -929,6 +983,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -972,6 +1028,17 @@ InitPlan(QueryDesc *queryDesc, int eflags)
queryDesc->tupDesc = tupType;
queryDesc->planstate = planstate;
+ queryDesc->plan_valid = true;
+ return;
+
+failed:
+ /*
+ * Plan initialization failed. Mark QueryDesc as such. Note that we do
+ * set planstate, even if it may only be partially initialized, so that
+ * ExecEndPlan() can process it.
+ */
+ queryDesc->planstate = planstate;
+ queryDesc->plan_valid = false;
}
/*
@@ -1389,7 +1456,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked.
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
@@ -2797,7 +2864,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2884,6 +2952,7 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
PlanState *subplanstate;
subplanstate = ExecInitNode(subplan, rcestate, 0);
+ Assert(ExecPlanStillValid(rcestate));
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
}
@@ -2937,6 +3006,10 @@ EvalPlanQualEnd(EPQState *epqstate)
MemoryContext oldcontext;
ListCell *l;
+ /* Nothing to do if EvalPlanQualInit() wasn't done to begin with. */
+ if (epqstate->parentestate == NULL)
+ return;
+
rtsize = epqstate->parentestate->es_range_table_size;
/*
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aa3f283453..df4cc5ddaf 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1249,8 +1249,13 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. Note that no CachedPlan is available
+ * here even if the containing plan tree may have come from one in the
+ * leader.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
@@ -1432,6 +1437,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
ExecutorStart(queryDesc, fpes->eflags);
+ Assert(queryDesc->plan_valid);
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index fd6ca8a5d9..b8580e98f7 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -513,6 +513,12 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
oldcxt = MemoryContextSwitchTo(proute->memcxt);
+ /*
+ * Note that while we must check ExecPlanStillValid() for other locks taken
+ * during execution initialization, it is OK to not do so for partitions
+ * opened like this, for tuple routing, because it can't possibly
+ * invalidate the plan.
+ */
partrel = table_open(partOid, RowExclusiveLock);
leaf_part_rri = makeNode(ResultRelInfo);
@@ -1111,6 +1117,11 @@ ExecInitPartitionDispatchInfo(EState *estate,
* Only sub-partitioned tables need to be locked here. The root
* partitioned table will already have been locked as it's referenced in
* the query's rtable.
+ *
+ * Note that while we must check ExecPlanStillValid() for other locks taken
+ * during execution initialization, it is OK to not do so for partitions
+ * opened like this, for tuple routing, because it can't possibly
+ * invalidate the plan.
*/
if (partoid != RelationGetRelid(proute->partition_root))
rel = table_open(partoid, RowExclusiveLock);
@@ -1817,6 +1828,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/* Create the working data structure for pruning */
prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Perform an initial partition prune pass, if required.
@@ -1943,6 +1956,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d4..6f3c37b6fd 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -388,6 +388,9 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
break;
}
+ if (!ExecPlanStillValid(estate))
+ return result;
+
ExecSetExecProcNode(result, result->ExecProcNode);
/*
@@ -403,6 +406,12 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
Assert(IsA(subplan, SubPlan));
sstate = ExecInitSubPlan(subplan, result);
subps = lappend(subps, sstate);
+ if (!ExecPlanStillValid(estate))
+ {
+ /* Don't lose track of those initialized. */
+ result->initPlan = subps;
+ return result;
+ }
}
result->initPlan = subps;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 012dbb6965..a485e7dfc5 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -804,7 +804,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (!IsParallelWorker() &&
+ (estate->es_top_eflags & EXEC_FLAG_GET_LOCKS) == 0)
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -833,6 +834,61 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
return rel;
}
+/*
+ * ExecLockViewRelations
+ * Lock view relations, if any, in a given query
+ */
+void
+ExecLockViewRelations(List *viewRelations, EState *estate)
+{
+ ListCell *lc;
+
+ /* Nothing to do if no locks need to be taken. */
+ if ((estate->es_top_eflags & EXEC_FLAG_GET_LOCKS) == 0)
+ return;
+
+ foreach(lc, viewRelations)
+ {
+ Index rti = lfirst_int(lc);
+ RangeTblEntry *rte = exec_rt_fetch(rti, estate);
+
+ Assert(OidIsValid(rte->relid));
+ Assert(rte->relkind == RELKIND_VIEW);
+ Assert(rte->rellockmode != NoLock);
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * ExecLockAppendNonLeafRelations
+ * Lock non-leaf relations whose children are scanned by a given
+ * Append/MergeAppend node
+ */
+void
+ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids)
+{
+ ListCell *l;
+
+ /* Nothing to do if no locks need to be taken. */
+ if ((estate->es_top_eflags & EXEC_FLAG_GET_LOCKS) == 0)
+ return;
+
+ foreach(l, allpartrelids)
+ {
+ Bitmapset *partrelids = lfirst_node(Bitmapset, l);
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partrelids, i)) > 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(i, estate);
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+}
+
/*
* ExecInitResultRelation
* Open relation given by the passed-in RT index and fill its
@@ -848,6 +904,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (!ExecPlanStillValid(estate))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 50e06ec693..f8c9de1fda 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -843,6 +843,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL, /* fmgr_sql() doesn't use CachedPlans */
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -868,6 +869,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
eflags = 0; /* default run-to-completion flags */
ExecutorStart(es->qd, eflags);
+ Assert(es->qd->plan_valid);
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 19342a420c..06e0d7d149 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3134,15 +3134,18 @@ hashagg_reset_spill_state(AggState *aggstate)
{
HashAggSpill *spill = &aggstate->hash_spills[setno];
- pfree(spill->ntuples);
- pfree(spill->partitions);
+ if (spill->ntuples)
+ pfree(spill->ntuples);
+ if (spill->partitions)
+ pfree(spill->partitions);
}
pfree(aggstate->hash_spills);
aggstate->hash_spills = NULL;
}
/* free batches */
- list_free_deep(aggstate->hash_batches);
+ if (aggstate->hash_batches)
+ list_free_deep(aggstate->hash_batches);
aggstate->hash_batches = NIL;
/* close tape set */
@@ -3296,6 +3299,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return aggstate;
/*
* initialize source tuple type.
@@ -4336,10 +4341,13 @@ ExecEndAgg(AggState *node)
{
AggStatePerTrans pertrans = &node->pertrans[transno];
- for (setno = 0; setno < numGroupingSets; setno++)
+ if (pertrans)
{
- if (pertrans->sortstates[setno])
- tuplesort_end(pertrans->sortstates[setno]);
+ for (setno = 0; setno < numGroupingSets; setno++)
+ {
+ if (pertrans->sortstates[setno])
+ tuplesort_end(pertrans->sortstates[setno]);
+ }
}
}
@@ -4357,7 +4365,8 @@ ExecEndAgg(AggState *node)
ExecFreeExprContext(&node->ss.ps);
/* clean up tuple table */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c185b11c67..091f979c46 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -109,10 +109,11 @@ AppendState *
ExecInitAppend(Append *node, EState *estate, int eflags)
{
AppendState *appendstate = makeNode(AppendState);
- PlanState **appendplanstates;
+ PlanState **appendplanstates = NULL;
Bitmapset *validsubplans;
Bitmapset *asyncplans;
int nplans;
+ int ninited = 0;
int nasyncplans;
int firstvalid;
int i,
@@ -133,6 +134,15 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_syncdone = false;
appendstate->as_begun = false;
+ /*
+ * Lock non-leaf partitions. In the pruning case, some of these locks
+ * will be retaken when the partition will be opened for pruning, but it
+ * does not seem worthwhile to spend cycles to filter those out here.
+ */
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_index >= 0)
{
@@ -148,6 +158,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
node->part_prune_index,
node->apprelids,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -222,11 +234,12 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ ninited++;
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
}
appendstate->as_first_partial_plan = firstvalid;
- appendstate->appendplans = appendplanstates;
- appendstate->as_nplans = nplans;
/* Initialize async state */
appendstate->as_asyncplans = asyncplans;
@@ -276,6 +289,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
/* For parallel query, this will be overridden later. */
appendstate->choose_next_subplan = choose_next_subplan_locally;
+early_exit:
+ appendstate->appendplans = appendplanstates;
+ appendstate->as_nplans = ninited;
+
return appendstate;
}
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4c5eb2b23b..acc6c50e20 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -57,6 +57,7 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
BitmapAndState *bitmapandstate = makeNode(BitmapAndState);
PlanState **bitmapplanstates;
int nplans;
+ int ninited = 0;
int i;
ListCell *l;
Plan *initNode;
@@ -77,8 +78,6 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
bitmapandstate->ps.plan = (Plan *) node;
bitmapandstate->ps.state = estate;
bitmapandstate->ps.ExecProcNode = ExecBitmapAnd;
- bitmapandstate->bitmapplans = bitmapplanstates;
- bitmapandstate->nplans = nplans;
/*
* call ExecInitNode on each of the plans to be executed and save the
@@ -89,6 +88,9 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ ninited++;
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
i++;
}
@@ -99,6 +101,10 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
* ExecQual or ExecProject. They don't need any tuple slots either.
*/
+early_exit:
+ bitmapandstate->bitmapplans = bitmapplanstates;
+ bitmapandstate->nplans = ninited;
+
return bitmapandstate;
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index f35df0b8bf..e6a689eefb 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -665,7 +665,8 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close down subplans
@@ -693,7 +694,8 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
/*
* close heap scan
*/
- table_endscan(scanDesc);
+ if (scanDesc)
+ table_endscan(scanDesc);
}
/* ----------------------------------------------------------------
@@ -763,11 +765,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 83ec9ede89..cc8332ef68 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -263,6 +263,8 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 0bf8af9652..babad1b4b2 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -58,6 +58,7 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
BitmapOrState *bitmaporstate = makeNode(BitmapOrState);
PlanState **bitmapplanstates;
int nplans;
+ int ninited = 0;
int i;
ListCell *l;
Plan *initNode;
@@ -78,8 +79,6 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
bitmaporstate->ps.plan = (Plan *) node;
bitmaporstate->ps.state = estate;
bitmaporstate->ps.ExecProcNode = ExecBitmapOr;
- bitmaporstate->bitmapplans = bitmapplanstates;
- bitmaporstate->nplans = nplans;
/*
* call ExecInitNode on each of the plans to be executed and save the
@@ -90,6 +89,9 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ ninited++;
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
i++;
}
@@ -100,6 +102,10 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
* ExecQual or ExecProject. They don't need any tuple slots either.
*/
+early_exit:
+ bitmaporstate->bitmapplans = bitmapplanstates;
+ bitmaporstate->nplans = ninited;
+
return bitmaporstate;
}
diff --git a/src/backend/executor/nodeCtescan.c b/src/backend/executor/nodeCtescan.c
index cc4c4243e2..eed5b75a4f 100644
--- a/src/backend/executor/nodeCtescan.c
+++ b/src/backend/executor/nodeCtescan.c
@@ -297,14 +297,16 @@ ExecEndCteScan(CteScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* If I am the leader, free the tuplestore.
*/
if (node->leader == node)
{
- tuplestore_end(node->cte_table);
+ if (node->cte_table)
+ tuplestore_end(node->cte_table);
node->cte_table = NULL;
}
}
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index bd42c65b29..b03499fae5 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -61,6 +61,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return css;
css->ss.ss_currentRelation = scan_rel;
}
@@ -127,6 +129,10 @@ ExecCustomScan(PlanState *pstate)
void
ExecEndCustomScan(CustomScanState *node)
{
+ /*
+ * XXX - BeginCustomScan() may not have occurred if ExecInitCustomScan()
+ * hit the early exit case.
+ */
Assert(node->methods->EndCustomScan != NULL);
node->methods->EndCustomScan(node);
@@ -134,8 +140,10 @@ ExecEndCustomScan(CustomScanState *node)
ExecFreeExprContext(&node->ss.ps);
/* Clean out the tuple table */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
void
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index c2139acca0..d3f0a65485 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -173,6 +173,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -264,6 +266,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* Tell the FDW to initialize the scan.
@@ -300,14 +304,17 @@ ExecEndForeignScan(ForeignScanState *node)
ForeignScan *plan = (ForeignScan *) node->ss.ps.plan;
EState *estate = node->ss.ps.state;
- /* Let the FDW shut down */
- if (plan->operation != CMD_SELECT)
+ /* Let the FDW shut down if needed. */
+ if (node->fdw_state)
{
- if (estate->es_epq_active == NULL)
- node->fdwroutine->EndDirectModify(node);
+ if (plan->operation != CMD_SELECT)
+ {
+ if (estate->es_epq_active == NULL)
+ node->fdwroutine->EndDirectModify(node);
+ }
+ else
+ node->fdwroutine->EndForeignScan(node);
}
- else
- node->fdwroutine->EndForeignScan(node);
/* Shut down any outer plan. */
if (outerPlanState(node))
@@ -319,7 +326,8 @@ ExecEndForeignScan(ForeignScanState *node)
/* clean out the tuple table */
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeFunctionscan.c b/src/backend/executor/nodeFunctionscan.c
index dd06ef8aee..792ecda4a9 100644
--- a/src/backend/executor/nodeFunctionscan.c
+++ b/src/backend/executor/nodeFunctionscan.c
@@ -533,7 +533,8 @@ ExecEndFunctionScan(FunctionScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* Release slots and tuplestore resources
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 307fc10eea..365d3af3e4 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -89,6 +89,8 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return gatherstate;
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 9d5e1a46e9..8d2809f079 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -108,6 +108,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return gm_state;
/*
* Leader may access ExecProcNode result directly (if
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 25a1618952..e0832bb778 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -185,6 +185,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return grpstate;
/*
* Initialize scan slot and type.
@@ -231,7 +233,8 @@ ExecEndGroup(GroupState *node)
ExecFreeExprContext(&node->ss.ps);
/* clean up tuple table */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 1e624fed7a..a8966f8b4a 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -386,6 +386,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hashstate;
/*
* initialize our result slot and type. No need to build projection
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 32f12fefd7..1448a7ddba 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -689,8 +689,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hjstate;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hjstate;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
@@ -811,9 +815,12 @@ ExecEndHashJoin(HashJoinState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->hj_OuterTupleSlot);
- ExecClearTuple(node->hj_HashTupleSlot);
+ if (node->js.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
+ if (node->hj_OuterTupleSlot)
+ ExecClearTuple(node->hj_OuterTupleSlot);
+ if (node->hj_HashTupleSlot)
+ ExecClearTuple(node->hj_HashTupleSlot);
/*
* clean up subtrees
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 12bc22f33c..6b2da56044 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1041,6 +1041,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return incrsortstate;
/*
* Initialize scan slot and type.
@@ -1080,12 +1082,16 @@ ExecEndIncrementalSort(IncrementalSortState *node)
SO_printf("ExecEndIncrementalSort: shutting down sort node\n");
/* clean out the scan tuple */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/* must drop pointer to sort result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ if (node->ss.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
/* must drop standalone tuple slots from outer node */
- ExecDropSingleTupleTableSlot(node->group_pivot);
- ExecDropSingleTupleTableSlot(node->transfer_tuple);
+ if (node->group_pivot)
+ ExecDropSingleTupleTableSlot(node->group_pivot);
+ if (node->transfer_tuple)
+ ExecDropSingleTupleTableSlot(node->transfer_tuple);
/*
* Release tuplesort resources.
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0b43a9b969..b60a086464 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -394,7 +394,8 @@ ExecEndIndexOnlyScan(IndexOnlyScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if(node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close the index relation (no-op if we didn't open it)
@@ -512,6 +513,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -565,6 +568,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->ioss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 4540c7781d..628c233919 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -808,7 +808,8 @@ ExecEndIndexScan(IndexScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close the index relation (no-op if we didn't open it)
@@ -925,6 +926,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -970,6 +973,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 425fbfc405..2fcbde74ed 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -476,6 +476,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return limitstate;
/*
* initialize child expressions
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 407414fc0c..3a8aa2b5a4 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -323,6 +323,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return lrstate;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 09632678b0..f146ebb1d7 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return matstate;
/*
* Initialize result type and slot. No need to initialize projection info
@@ -242,7 +244,8 @@ ExecEndMaterial(MaterialState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* Release tuplestore resources
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 4f04269e26..3003ee1e5c 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -938,6 +938,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mstate;
/*
* Initialize return slot and type. No need to initialize projection info
@@ -1043,6 +1045,7 @@ ExecEndMemoize(MemoizeState *node)
{
#ifdef USE_ASSERT_CHECKING
/* Validate the memory accounting code is correct in assert builds. */
+ if (node->hashtable)
{
int count;
uint64 mem = 0;
@@ -1089,11 +1092,14 @@ ExecEndMemoize(MemoizeState *node)
}
/* Remove the cache context */
- MemoryContextDelete(node->tableContext);
+ if (node->tableContext)
+ MemoryContextDelete(node->tableContext);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/* must drop pointer to cache result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ if (node->ss.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
/*
* free exprcontext
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 399b39c598..40bba35499 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -65,9 +65,10 @@ MergeAppendState *
ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
{
MergeAppendState *mergestate = makeNode(MergeAppendState);
- PlanState **mergeplanstates;
+ PlanState **mergeplanstates = NULL;
Bitmapset *validsubplans;
int nplans;
+ int ninited = 0;
int i,
j;
@@ -81,6 +82,15 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.state = estate;
mergestate->ps.ExecProcNode = ExecMergeAppend;
+ /*
+ * Lock non-leaf partitions. In the pruning case, some of these locks
+ * will be retaken when the partition will be opened for pruning, but it
+ * does not seem worthwhile to spend cycles to filter those out here.
+ */
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_index >= 0)
{
@@ -96,6 +106,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
node->part_prune_index,
node->apprelids,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -122,8 +134,6 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
}
mergeplanstates = (PlanState **) palloc(nplans * sizeof(PlanState *));
- mergestate->mergeplans = mergeplanstates;
- mergestate->ms_nplans = nplans;
mergestate->ms_slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
mergestate->ms_heap = binaryheap_allocate(nplans, heap_compare_slots,
@@ -152,6 +162,9 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ ninited++;
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
}
mergestate->ps.ps_ProjInfo = NULL;
@@ -188,6 +201,10 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
mergestate->ms_initialized = false;
+early_exit:
+ mergestate->mergeplans = mergeplanstates;
+ mergestate->ms_nplans = ninited;
+
return mergestate;
}
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 809aa215c6..968be05568 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1482,11 +1482,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
@@ -1642,8 +1646,10 @@ ExecEndMergeJoin(MergeJoinState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->mj_MarkedTupleSlot);
+ if (node->js.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
+ if (node->mj_MarkedTupleSlot)
+ ExecClearTuple(node->mj_MarkedTupleSlot);
/*
* shut down the subplans
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 3a67389508..b07d7cac28 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3922,6 +3922,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
Plan *subplan = outerPlan(node);
CmdType operation = node->operation;
int nrels = list_length(node->resultRelations);
+ int ninited = 0;
ResultRelInfo *resultRelInfo;
List *arowmarks;
ListCell *l;
@@ -3943,7 +3944,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
mtstate->canSetTag = node->canSetTag;
mtstate->mt_done = false;
- mtstate->mt_nrels = nrels;
mtstate->resultRelInfo = (ResultRelInfo *)
palloc(nrels * sizeof(ResultRelInfo));
@@ -3978,6 +3978,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL, node->epqParam);
mtstate->fireBSTriggers = true;
@@ -4004,6 +4007,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
if (resultRelInfo != mtstate->rootResultRelInfo)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
/*
* For child result relations, store the root result relation
@@ -4031,11 +4036,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
/*
* Do additional per-result-relation initialization.
*/
- for (i = 0; i < nrels; i++)
+ for (i = 0; i < nrels; i++, ninited++)
{
resultRelInfo = &mtstate->resultRelInfo[i];
@@ -4384,6 +4391,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
estate->es_auxmodifytables = lcons(mtstate,
estate->es_auxmodifytables);
+early_exit:
+ mtstate->mt_nrels = ninited;
return mtstate;
}
diff --git a/src/backend/executor/nodeNamedtuplestorescan.c b/src/backend/executor/nodeNamedtuplestorescan.c
index 46832ad82f..1f92c43d3b 100644
--- a/src/backend/executor/nodeNamedtuplestorescan.c
+++ b/src/backend/executor/nodeNamedtuplestorescan.c
@@ -174,7 +174,8 @@ ExecEndNamedTuplestoreScan(NamedTuplestoreScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b3d52e69ec..deda0c2559 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -295,11 +295,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return nlstate;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return nlstate;
/*
* Initialize result slot, type and projection.
@@ -372,7 +376,8 @@ ExecEndNestLoop(NestLoopState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
+ if (node->js.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
/*
* close down subplans
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index f6ff3dc44c..85d20c4680 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -247,6 +247,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return state;
/*
* we don't use inner plan
@@ -328,7 +330,8 @@ ExecEndProjectSet(ProjectSetState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ps.ps_ResultTupleSlot);
/*
* shut down subplans
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index e781003934..967fe4f287 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return rustate;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return rustate;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 4219712d30..c549b684a3 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -208,6 +208,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return resstate;
/*
* we don't use inner plan
@@ -248,7 +250,8 @@ ExecEndResult(ResultState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ps.ps_ResultTupleSlot);
/*
* shut down subplans
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index d7e22b1dbb..b3bc9b1f77 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -125,6 +125,8 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
@@ -198,7 +200,8 @@ ExecEndSampleScan(SampleScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close heap scan
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 4da0f28f7b..e7ca19ee4e 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,8 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
@@ -200,7 +202,8 @@ ExecEndSeqScan(SeqScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close heap scan
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 4bc2406b89..95950a5c20 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return setopstate;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
@@ -583,7 +585,8 @@ void
ExecEndSetOp(SetOpState *node)
{
/* clean up tuple table */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ps.ps_ResultTupleSlot);
/* free subsidiary stuff including hashtable */
if (node->tableContext)
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index c6c72c6e67..89fef86aba 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return sortstate;
/*
* Initialize scan slot and type.
@@ -306,9 +308,11 @@ ExecEndSort(SortState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/* must drop pointer to sort result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ if (node->ss.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
/*
* Release tuplesort resources
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 42471bfc04..9b8cddc89f 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return subquerystate;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
@@ -177,7 +179,8 @@ ExecEndSubqueryScan(SubqueryScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close down subquery
diff --git a/src/backend/executor/nodeTableFuncscan.c b/src/backend/executor/nodeTableFuncscan.c
index 0c6c912778..d7536953f1 100644
--- a/src/backend/executor/nodeTableFuncscan.c
+++ b/src/backend/executor/nodeTableFuncscan.c
@@ -223,7 +223,8 @@ ExecEndTableFuncScan(TableFuncScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* Release tuplestore resources
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 2124c55ef5..1ae451d7a6 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -342,7 +342,8 @@ ExecEndTidRangeScan(TidRangeScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
@@ -386,6 +387,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return tidrangestate;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 862bd0330b..9fe76b1c60 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -483,7 +483,8 @@ ExecEndTidScan(TidScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
@@ -529,6 +530,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return tidstate;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 45035d74fa..69f23b02c6 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -136,6 +136,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return uniquestate;
/*
* Initialize result slot and type. Unique nodes do no projections, so
@@ -169,7 +171,8 @@ void
ExecEndUnique(UniqueState *node)
{
/* clean up tuple table */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ps.ps_ResultTupleSlot);
ExecFreeExprContext(&node->ps);
diff --git a/src/backend/executor/nodeValuesscan.c b/src/backend/executor/nodeValuesscan.c
index 32ace63017..f5dedbab63 100644
--- a/src/backend/executor/nodeValuesscan.c
+++ b/src/backend/executor/nodeValuesscan.c
@@ -340,7 +340,8 @@ ExecEndValuesScan(ValuesScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 7c07fb0684..616bb97675 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -1334,7 +1334,7 @@ release_partition(WindowAggState *winstate)
WindowStatePerFunc perfuncstate = &(winstate->perfunc[i]);
/* Release any partition-local state of this window function */
- if (perfuncstate->winobj)
+ if (perfuncstate && perfuncstate->winobj)
perfuncstate->winobj->localmem = NULL;
}
@@ -1344,12 +1344,17 @@ release_partition(WindowAggState *winstate)
* any aggregate temp data). We don't rely on retail pfree because some
* aggregates might have allocated data we don't have direct pointers to.
*/
- MemoryContextResetAndDeleteChildren(winstate->partcontext);
- MemoryContextResetAndDeleteChildren(winstate->aggcontext);
- for (i = 0; i < winstate->numaggs; i++)
+ if (winstate->partcontext)
+ MemoryContextResetAndDeleteChildren(winstate->partcontext);
+ if (winstate->aggcontext)
+ MemoryContextResetAndDeleteChildren(winstate->aggcontext);
+ if (winstate->peragg)
{
- if (winstate->peragg[i].aggcontext != winstate->aggcontext)
- MemoryContextResetAndDeleteChildren(winstate->peragg[i].aggcontext);
+ for (i = 0; i < winstate->numaggs; i++)
+ {
+ if (winstate->peragg[i].aggcontext != winstate->aggcontext)
+ MemoryContextResetAndDeleteChildren(winstate->peragg[i].aggcontext);
+ }
}
if (winstate->buffer)
@@ -2451,6 +2456,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return winstate;
/*
* initialize source tuple type (which is also the tuple type that we'll
@@ -2679,11 +2686,16 @@ ExecEndWindowAgg(WindowAggState *node)
release_partition(node);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
- ExecClearTuple(node->first_part_slot);
- ExecClearTuple(node->agg_row_slot);
- ExecClearTuple(node->temp_slot_1);
- ExecClearTuple(node->temp_slot_2);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->first_part_slot)
+ ExecClearTuple(node->first_part_slot);
+ if (node->agg_row_slot)
+ ExecClearTuple(node->agg_row_slot);
+ if (node->temp_slot_1)
+ ExecClearTuple(node->temp_slot_1);
+ if (node->temp_slot_2)
+ ExecClearTuple(node->temp_slot_2);
if (node->framehead_slot)
ExecClearTuple(node->framehead_slot);
if (node->frametail_slot)
@@ -2696,16 +2708,23 @@ ExecEndWindowAgg(WindowAggState *node)
node->ss.ps.ps_ExprContext = node->tmpcontext;
ExecFreeExprContext(&node->ss.ps);
- for (i = 0; i < node->numaggs; i++)
+ if (node->peragg)
{
- if (node->peragg[i].aggcontext != node->aggcontext)
- MemoryContextDelete(node->peragg[i].aggcontext);
+ for (i = 0; i < node->numaggs; i++)
+ {
+ if (node->peragg[i].aggcontext != node->aggcontext)
+ MemoryContextDelete(node->peragg[i].aggcontext);
+ }
}
- MemoryContextDelete(node->partcontext);
- MemoryContextDelete(node->aggcontext);
+ if (node->partcontext)
+ MemoryContextDelete(node->partcontext);
+ if (node->aggcontext)
+ MemoryContextDelete(node->aggcontext);
- pfree(node->perfunc);
- pfree(node->peragg);
+ if (node->perfunc)
+ pfree(node->perfunc);
+ if (node->peragg)
+ pfree(node->peragg);
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
diff --git a/src/backend/executor/nodeWorktablescan.c b/src/backend/executor/nodeWorktablescan.c
index 0c13448236..d70c6afde3 100644
--- a/src/backend/executor/nodeWorktablescan.c
+++ b/src/backend/executor/nodeWorktablescan.c
@@ -200,7 +200,8 @@ ExecEndWorkTableScan(WorkTableScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index e3a170c38b..26a9ea342a 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -71,7 +71,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1623,6 +1623,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
_SPI_current->processed = 0;
_SPI_current->tuptable = NULL;
+replan:
/* Create the portal */
if (name == NULL || name[0] == '\0')
{
@@ -1766,7 +1767,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if portal->plan_valid is false which tells that the cached
+ * plan was found to have been invalidated when initializing one of the
+ * plan trees contained in it.
*/
PortalStart(portal, paramLI, 0, snapshot);
@@ -1775,6 +1779,12 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/* Pop the SPI stack */
_SPI_end_call(true);
@@ -2552,6 +2562,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2661,6 +2672,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2668,14 +2680,36 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+
+ /* Take locks if using a CachedPlan */
+ if (qdesc->cplan)
+ eflags |= EXEC_FLAG_GET_LOCKS;
+
+ ExecutorStart(qdesc, eflags);
+ if (!qdesc->plan_valid)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ Assert(cplan);
+ ReleaseCachedPlan(cplan, plan_owner);
+ goto replan;
+ }
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2850,10 +2884,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2897,14 +2930,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index ba00b99249..955286513d 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -513,6 +513,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
WRITE_BOOL_FIELD(security_barrier);
/* we re-use these RELATION fields, too: */
WRITE_OID_FIELD(relid);
+ WRITE_CHAR_FIELD(relkind);
WRITE_INT_FIELD(rellockmode);
WRITE_UINT_FIELD(perminfoindex);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 597e5b3ea8..a136ae1d60 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -503,6 +503,7 @@ _readRangeTblEntry(void)
READ_BOOL_FIELD(security_barrier);
/* we re-use these RELATION fields, too: */
READ_OID_FIELD(relid);
+ READ_CHAR_FIELD(relkind);
READ_INT_FIELD(rellockmode);
READ_UINT_FIELD(perminfoindex);
break;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 62b3ec96cc..5f3ffd98af 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -527,6 +527,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->permInfos = glob->finalrteperminfos;
+ result->viewRelations = glob->viewRelations;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 5cc8366af6..f13240bf33 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "access/transam.h"
+#include "catalog/pg_class.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -604,6 +605,10 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, List *rteperminfos,
(newrte->rtekind == RTE_SUBQUERY && OidIsValid(newrte->relid)))
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
+ if (newrte->relkind == RELKIND_VIEW)
+ glob->viewRelations = lappend_int(glob->viewRelations,
+ list_length(glob->finalrtable));
+
/*
* Add a copy of the RTEPermissionInfo, if any, corresponding to this RTE
* to the flattened global list.
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 980dc1816f..1631c8b993 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1849,11 +1849,10 @@ ApplyRetrieveRule(Query *parsetree,
/*
* Clear fields that should not be set in a subquery RTE. Note that we
- * leave the relid, rellockmode, and perminfoindex fields set, so that the
- * view relation can be appropriately locked before execution and its
- * permissions checked.
+ * leave the relid, relkind, rellockmode, and perminfoindex fields set,
+ * so that the view relation can be appropriately locked before execution
+ * and its permissions checked.
*/
- rte->relkind = 0;
rte->tablesample = NULL;
rte->inh = false; /* must not be set for a subquery */
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a672..c807e9cdcc 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -27,6 +27,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
@@ -364,6 +365,50 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
return false;
}
+/*
+ * CheckRelLockedByMe
+ *
+ * Returns true if current transaction holds a lock on the given relation of
+ * mode 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
+ */
+bool
+CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+ Oid dbId = get_rel_relisshared(relid) ? InvalidOid : MyDatabaseId;
+ LOCKTAG tag;
+
+ SET_LOCKTAG_RELATION(tag, dbId, relid);
+
+ if (LockHeldByMe(&tag, lockmode))
+ return true;
+
+ if (orstronger)
+ {
+ LOCKMODE slockmode;
+
+ for (slockmode = lockmode + 1;
+ slockmode <= MaxLockMode;
+ slockmode++)
+ {
+ if (LockHeldByMe(&tag, slockmode))
+ {
+#ifdef NOT_USED
+ /* Sometimes this might be useful for debugging purposes */
+ elog(WARNING, "lock mode %s substituted for %s on relation %s",
+ GetLockmodeName(tag.locktag_lockmethodid, slockmode),
+ GetLockmodeName(tag.locktag_lockmethodid, lockmode),
+ RelationGetRelationName(relation));
+#endif
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
/*
* LockHasWaitersRelation
*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index cab709b07b..6d0ea07801 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1199,6 +1199,7 @@ exec_simple_query(const char *query_string)
* Start the portal. No parameters here.
*/
PortalStart(portal, NULL, 0, InvalidSnapshot);
+ Assert(portal->plan_valid);
/*
* Select the appropriate output format: text unless we are doing a
@@ -1703,6 +1704,7 @@ exec_bind_message(StringInfo input_message)
"commands ignored until end of transaction block"),
errdetail_abort()));
+replan:
/*
* Create the portal. Allow silent replacement of an existing portal only
* if the unnamed portal is specified.
@@ -1994,10 +1996,19 @@ exec_bind_message(StringInfo input_message)
PopActiveSnapshot();
/*
- * And we're ready to start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if portal->plan_valid is false which tells that the cached
+ * plan was found to have been invalidated when initializing one of the
+ * plan trees contained in it.
*/
PortalStart(portal, params, 0, InvalidSnapshot);
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/*
* Apply the result format requests to the portal.
*/
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5f0248acc5..c93a950d7f 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -35,12 +36,6 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc);
static void FillPortalStore(Portal portal, bool isTopLevel);
static uint64 RunFromStore(Portal portal, ScanDirection direction, uint64 count,
DestReceiver *dest);
@@ -65,6 +60,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +73,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan, if plan is from one */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -116,86 +113,6 @@ FreeQueryDesc(QueryDesc *qdesc)
}
-/*
- * ProcessQuery
- * Execute a single plannable query within a PORTAL_MULTI_QUERY,
- * PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
- *
- * plan: the plan tree for the query
- * sourceText: the source text of the query
- * params: any parameters needed
- * dest: where to send results
- * qc: where to store the command completion status data.
- *
- * qc may be NULL if caller doesn't want a status string.
- *
- * Must be called in a memory context that will be reset or deleted on
- * error; otherwise the executor's memory usage will be leaked.
- */
-static void
-ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc)
-{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
-
- /*
- * Run the plan to completion.
- */
- ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
-
- /*
- * Build command completion status data, if caller wants one.
- */
- if (qc)
- {
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
- SetQueryCompletion(qc, CMDTAG_SELECT, queryDesc->estate->es_processed);
- break;
- case CMD_INSERT:
- SetQueryCompletion(qc, CMDTAG_INSERT, queryDesc->estate->es_processed);
- break;
- case CMD_UPDATE:
- SetQueryCompletion(qc, CMDTAG_UPDATE, queryDesc->estate->es_processed);
- break;
- case CMD_DELETE:
- SetQueryCompletion(qc, CMDTAG_DELETE, queryDesc->estate->es_processed);
- break;
- case CMD_MERGE:
- SetQueryCompletion(qc, CMDTAG_MERGE, queryDesc->estate->es_processed);
- break;
- default:
- SetQueryCompletion(qc, CMDTAG_UNKNOWN, queryDesc->estate->es_processed);
- break;
- }
- }
-
- /*
- * Now, we close down all the scans and free allocated resources.
- */
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
-
- FreeQueryDesc(queryDesc);
-}
-
/*
* ChoosePortalStrategy
* Select portal execution strategy given the intended statement list.
@@ -427,7 +344,8 @@ FetchStatementTargetList(Node *stmt)
* to be used for cursors).
*
* On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * tupdesc (if any) is known, unless portal->plan_valid is set to false, in
+ * which case, the caller must retry after generating a new CachedPlan.
*/
void
PortalStart(Portal portal, ParamListInfo params,
@@ -435,7 +353,6 @@ PortalStart(Portal portal, ParamListInfo params,
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
int myeflags;
@@ -448,15 +365,13 @@ PortalStart(Portal portal, ParamListInfo params,
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -472,6 +387,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -493,6 +410,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -501,30 +419,56 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti(). */
+ if (portal->strategy == PORTAL_ONE_RETURNING ||
+ portal->strategy == PORTAL_ONE_MOD_WITH)
+ portal->qdescs = list_make1(queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
+ /* Take locks if using a CachedPlan */
+ if (queryDesc->cplan)
+ myeflags |= EXEC_FLAG_GET_LOCKS;
+
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Call ExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated as we're doing that.
*/
ExecutorStart(queryDesc, myeflags);
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ PopActiveSnapshot();
+ portal->plan_valid = false;
+ goto early_exit;
+ }
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -532,33 +476,11 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -578,11 +500,90 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool first = true;
+
+ /* Take locks if using a CachedPlan */
+ myeflags = 0;
+ if (portal->cplan)
+ myeflags |= EXEC_FLAG_GET_LOCKS;
+
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /*
+ * Push the snapshot to be used by the executor.
+ */
+ if (!is_utility)
+ {
+ /*
+ * Must copy the snapshot if we'll need to update
+ * its command ID.
+ */
+ if (!first)
+ PushCopiedSnapshot(GetTransactionSnapshot());
+ else
+ PushActiveSnapshot(GetTransactionSnapshot());
+ }
+
+ /*
+ * From the 2nd statement onwards, update the command
+ * ID and the snapshot to match.
+ */
+ if (!first)
+ {
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+ }
+
+ first = false;
+
+ /*
+ * Create the QueryDesc object. DestReceiver will
+ * be set in PortalRunMulti().
+ */
+ queryDesc = CreateQueryDesc(plan, portal->cplan,
+ portal->sourceText,
+ !is_utility ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ if (is_utility)
+ continue;
+
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated as
+ * we're doing that.
+ */
+ ExecutorStart(queryDesc, myeflags);
+ PopActiveSnapshot();
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ portal->plan_valid = false;
+ goto early_exit;
+ }
+ }
+ }
+
portal->tupDesc = NULL;
+ portal->plan_valid = true;
break;
}
}
@@ -594,19 +595,18 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+early_exit:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
-
- portal->status = PORTAL_READY;
}
/*
@@ -1193,7 +1193,7 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1214,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = (QueryDesc *) lfirst(qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1271,23 +1272,38 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Run the plan to completion.
+ */
+ qdesc->dest = dest;
+ ExecutorRun(qdesc, ForwardScanDirection, 0L, true);
+
+ /*
+ * Build command completion status data if needed.
+ */
if (pstmt->canSetTag)
{
- /* statement can set tag string */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- dest, qc);
- }
- else
- {
- /* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- altdest, NULL);
+ switch (qdesc->operation)
+ {
+ case CMD_SELECT:
+ SetQueryCompletion(qc, CMDTAG_SELECT, qdesc->estate->es_processed);
+ break;
+ case CMD_INSERT:
+ SetQueryCompletion(qc, CMDTAG_INSERT, qdesc->estate->es_processed);
+ break;
+ case CMD_UPDATE:
+ SetQueryCompletion(qc, CMDTAG_UPDATE, qdesc->estate->es_processed);
+ break;
+ case CMD_DELETE:
+ SetQueryCompletion(qc, CMDTAG_DELETE, qdesc->estate->es_processed);
+ break;
+ case CMD_MERGE:
+ SetQueryCompletion(qc, CMDTAG_MERGE, qdesc->estate->es_processed);
+ break;
+ default:
+ SetQueryCompletion(qc, CMDTAG_UNKNOWN, qdesc->estate->es_processed);
+ break;
+ }
}
if (log_executor_stats)
@@ -1346,8 +1362,15 @@ PortalRunMulti(Portal portal,
* Increment command counter between queries, but not after the last
* one.
*/
- if (lnext(portal->stmts, stmtlist_item) != NULL)
+ if (lnext(portal->qdescs, qdesc_item) != NULL)
CommandCounterIncrement();
+
+ if (qdesc->estate)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ }
+ FreeQueryDesc(qdesc);
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index c07382051d..38ae43e24b 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2073,6 +2073,27 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * get_rel_relisshared
+ *
+ * Returns if the given relation is shared or not
+ */
+bool
+get_rel_relisshared(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relisshared;
+ ReleaseSysCache(tp);
+
+ return result;
+}
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 77c2ba3f8f..4e455d815f 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -100,13 +100,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -787,9 +787,6 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
- *
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -803,60 +800,56 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
-
- return false;
+ return cplan->is_valid;
}
/*
@@ -1126,9 +1119,6 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
- *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1360,8 +1350,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if the executor would need to take additional locks, that is, in
+ * addition to those taken by AcquirePlannerLocks() on a given query.
*/
foreach(lc, plan->stmt_list)
{
@@ -1735,58 +1725,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 06dfa85f04..3ad80c7ecb 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -201,6 +201,10 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /* initialize portal's query context to store QueryDescs */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -224,6 +228,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -594,6 +599,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 7c1071ddd1..da39b2e4ff 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,11 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt, struct CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv);
+extern void ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
@@ -103,6 +107,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index af2bf36dfb..c36c25b497 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -32,9 +32,12 @@
*/
typedef struct QueryDesc
{
+ NodeTag type;
+
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ struct CachedPlan *cplan; /* CachedPlan, if plannedstmt is from one */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -47,6 +50,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ bool plan_valid; /* is planstate tree fully valid? */
/* This field is set by ExecutorRun */
bool already_executed; /* true if previously executed */
@@ -57,6 +61,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ struct CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index dbd77050c7..ebb6665950 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -59,6 +60,8 @@
#define EXEC_FLAG_MARK 0x0008 /* need mark/restore */
#define EXEC_FLAG_SKIP_TRIGGERS 0x0010 /* skip AfterTrigger calls */
#define EXEC_FLAG_WITH_NO_DATA 0x0020 /* rel scannability doesn't matter */
+#define EXEC_FLAG_GET_LOCKS 0x0400 /* should the executor lock
+ * relations? */
/* Hook for plugins to get control in ExecutorStart() */
@@ -245,6 +248,13 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/* Is the cached plan*/
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanStillValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
@@ -579,6 +589,8 @@ exec_rt_fetch(Index rti, EState *estate)
}
extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
+extern void ExecLockViewRelations(List *viewRelations, EState *estate);
+extern void ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids);
extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Index rti);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index d97f5a8e7d..dfa72848c7 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -623,6 +623,8 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct CachedPlan *es_cachedplan; /* CachedPlan if plannedstmt is from
+ * one */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
const char *es_sourceText; /* Source text from QueryDesc */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d61a62da19..9b888b0d75 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -116,6 +116,9 @@ typedef struct PlannerGlobal
/* "flat" list of RTEPermissionInfos */
List *finalrteperminfos;
+ /* "flat" list of integer RT indexes */
+ List *viewRelations;
+
/* "flat" list of PlanRowMarks */
List *finalrowmarks;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a0bb16cff4..7cae624bbd 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -78,6 +78,9 @@ typedef struct PlannedStmt
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
+ List *viewRelations; /* integer list of RT indexes, or NIL if no
+ * views are queried */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 4ee91e3cf9..598bf2688a 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,7 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
bool orstronger);
+extern bool CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger);
extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 4f5418b972..3074e604dd 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -139,6 +139,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_relisshared(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a443181d41..8990fe72e3 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -221,6 +221,20 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Called by the executor on every relation lock taken when initializing the
+ * plan tree in the CachedPlan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aa08b1e0fc..332a08ccb4 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,9 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ bool plan_valid; /* are plan(s) ready for execution? */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +245,7 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalQueryFinish(QueryDesc *queryDesc);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..2fca84d027 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-replan
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7cd76eb34b..5d7a3e9858 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2023, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,41 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* planner_hook function to provide the desired delay */
+static void
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ queryDesc->cplan->is_valid ? "valid" : "not valid");
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +123,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-replan.out b/src/test/modules/delay_execution/expected/cached-plan-replan.out
new file mode 100644
index 0000000000..4f450b9d9b
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-replan.out
@@ -0,0 +1,117 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo11 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = $1)
+(6 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-----------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo11 foo_1
+ Filter: (a = $1)
+(4 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q2 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+----------------------------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Index Only Scan using foo11_a_idx on foo11 t1
+ -> Materialize
+ -> Index Scan using foo11_a_idx on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(18 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Sort
+ Sort Key: t1.a
+ -> Seq Scan on foo11 t1
+ -> Sort
+ Sort Key: t2.a
+ -> Seq Scan on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(21 rows)
+
diff --git a/src/test/modules/delay_execution/specs/cached-plan-replan.spec b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
new file mode 100644
index 0000000000..67cfed7044
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
@@ -0,0 +1,50 @@
+# Test to check that invalidation of a cached plan during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo1 PARTITION OF foo FOR VALUES IN (1) PARTITION BY LIST (a);
+ CREATE TABLE foo11 PARTITION OF foo1 FOR VALUES IN (1);
+ CREATE INDEX foo11_a ON foo1 (a);
+ CREATE TABLE foo2 PARTITION OF foo FOR VALUES IN (2);
+ CREATE VIEW foov AS SELECT * FROM foo;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP TABLE foo;
+}
+
+session "s1"
+# Creates a prepared statement and forces creation of a generic plan
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q2 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo11_a; }
+
+# While "s1exec" waits to acquire the advisory lock, "s2drop" is able to drop
+# the index being used in the cached plan for `q`, so when "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
--
2.35.3
On Wed, Mar 22, 2023 at 9:48 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Tue, Mar 14, 2023 at 7:07 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Mar 2, 2023 at 10:52 PM Amit Langote <amitlangote09@gmail.com> wrote:
I think I have figured out what might be going wrong on that cfbot
animal after building with the same CPPFLAGS as that animal locally.
I had forgotten to update _out/_readRangeTblEntry() to account for the
patch's change that a view's RTE_SUBQUERY now also preserves relkind
in addition to relid and rellockmode for the locking consideration.Also, I noticed that a multi-query Portal execution with rules was
failing (thanks to a regression test added in a7d71c41db) because of
the snapshot used for the 2nd query onward not being updated for
command ID change under patched model of multi-query Portal execution.
To wit, under the patched model, all queries in the multi-query Portal
case undergo ExecutorStart() before any of it is run with
ExecutorRun(). The patch hadn't changed things however to update the
snapshot's command ID for the 2nd query onwards, which caused the
aforementioned test case to fail.This new model does however mean that the 2nd query onwards must use
PushCopiedSnapshot() given the current requirement of
UpdateActiveSnapshotCommandId() that the snapshot passed to it must
not be referenced anywhere else. The new model basically requires
that each query's QueryDesc points to its own copy of the
ActiveSnapshot. That may not be a thing in favor of the patched model
though. For now, I haven't been able to come up with a better
alternative.Here's a new version addressing the following 2 points.
* Like views, I realized that non-leaf relations of partition trees
scanned by an Append/MergeAppend would need to be locked separately,
because ExecInitNode() traversal of the plan tree would not account
for them. That is, they are not opened using
ExecGetRangeTableRelation() or ExecOpenScanRelation(). One exception
is that some (if not all) of those non-leaf relations may be
referenced in PartitionPruneInfo and so locked as part of initializing
the corresponding PartitionPruneState, but I decided not to complicate
the code to filter out such relations from the set locked separately.
To carry the set of relations to lock, the refactoring patch 0001
re-introduces the List of Bitmapset field named allpartrelids into
Append/MergeAppend nodes, which we had previously removed on the
grounds that those relations need not be locked separately (commits
f2343653f5b, f003a7522bf).* I decided to initialize QueryDesc.planstate even in the cases where
ExecInitNode() traversal is aborted in the middle on detecting
CachedPlan invalidation such that it points to a partially initialized
PlanState tree. My earlier thinking that each PlanState node need not
be visited for resource cleanup in such cases was naive after all. To
that end, I've fixed the ExecEndNode() subroutines of all Plan node
types to account for potentially uninitialized fields. There are a
couple of cases where I'm a bit doubtful though. In
ExecEndCustomScan(), there's no indication in CustomScanState whether
it's OK to call EndCustomScan() when BeginCustomScan() may not have
been called. For ForeignScanState, I've assumed that
ForeignScanState.fdw_state being set can be used as a marker that
BeginForeignScan would have been called, though maybe that's not a
solid assumption.I'm also attaching a new (small) patch 0003 that eliminates the
loop-over-rangetable in ExecCloseRangeTableRelations() in favor of
iterating over a new List field of EState named es_opened_relations,
which is populated by ExecGetRangeTableRelation() with only the
relations that were opened. This speeds up
ExecCloseRangeTableRelations() significantly for the cases with many
runtime-prunable partitions.Here's another version with some cosmetic changes, like fixing some
factually incorrect / obsolete comments and typos that I found. I
also noticed that I had missed noting near some table_open() that
locks taken with those can't possibly invalidate a plan (such as
lazily opened partition routing target partitions) and thus need the
treatment that locking during execution initialization requires.
Rebased over 3c05284d83b2 ("Invent GENERIC_PLAN option for EXPLAIN.").
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v37-0001-Add-field-to-store-partitioned-relids-to-Append-.patchapplication/octet-stream; name=v37-0001-Add-field-to-store-partitioned-relids-to-Append-.patchDownload
From dfc41510ef3ebec38e7a56b639ffa41193109b43 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 9 Mar 2023 11:26:06 +0900
Subject: [PATCH v37 1/3] Add field to store partitioned relids to
Append/MergeAppend
A future commit would like to move the timing of locking relations
referenced in a cached plan to ExecInitNode() traversal of the plan
tree from the current loop-over-rangetable in AcquireExecutorLocks().
Given that partitioned tables (their RT indexes) would not be
accessible via the new way of finding the relations to lock, add a
field to Append/MergeAppend to track them separately.
This refactors the code to look up partitioned parent relids from a
given list of leaf partition subpaths of an Append/MergeAppend out
of make_partition_pruneinfo() into its own function called
add_append_subpath_partrelids(). Though, the code needs to be
generalized to the cases where child rels can be joinrels or
upper (grouping) rels. Also, to make it easier to traverse the parent
chain of a child grouping rel, this makes its RelOptInfo.parent to be
set, which is already done for baserels and joinrels.
---
src/backend/optimizer/plan/createplan.c | 36 +++++--
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/util/appendinfo.c | 134 ++++++++++++++++++++++++
src/backend/partitioning/partprune.c | 124 +++-------------------
src/include/nodes/plannodes.h | 14 +++
src/include/optimizer/appendinfo.h | 3 +
src/include/partitioning/partprune.h | 3 +-
7 files changed, 194 insertions(+), 123 deletions(-)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 910ffbf1e1..794cdb5e3b 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -25,6 +25,7 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
@@ -1209,6 +1210,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
Oid *nodeCollations = NULL;
bool *nodeNullsFirst = NULL;
bool consider_async = false;
+ List *allpartrelids = NIL;
/*
* The subpaths list could be empty, if every child was proven empty by
@@ -1350,18 +1352,24 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
++nasyncplans;
}
+ /* Populate partitioned parent relids. */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ plan->allpartrelids = allpartrelids;
+
/* Set below if we find quals that we can use to run-time prune */
plan->part_prune_index = -1;
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1381,7 +1389,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
if (prunequal != NIL)
plan->part_prune_index = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
plan->appendplans = subplans;
@@ -1425,6 +1434,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
+ List *allpartrelids = NIL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1514,18 +1524,23 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplan = (Plan *) sort;
}
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ node->allpartrelids = allpartrelids;
+
/* Set below if we find quals that we can use to run-time prune */
node->part_prune_index = -1;
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1537,7 +1552,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
node->part_prune_index = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
node->mergeplans = subplans;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a1873ce26d..62b3ec96cc 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7801,8 +7801,11 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
agg_costs, gd, &child_extra,
&child_partially_grouped_rel);
+ /* Mark as child of grouped_rel. */
+ child_grouped_rel->parent = grouped_rel;
if (child_partially_grouped_rel)
{
+ child_partially_grouped_rel->parent = grouped_rel;
partially_grouped_live_children =
lappend(partially_grouped_live_children,
child_partially_grouped_rel);
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 9d377385f1..4876742ab2 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -40,6 +40,7 @@ static void make_inh_translation_list(Relation oldrelation,
AppendRelInfo *appinfo);
static Node *adjust_appendrel_attrs_mutator(Node *node,
adjust_appendrel_attrs_context *context);
+static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
/*
@@ -1031,3 +1032,136 @@ distribute_row_identity_vars(PlannerInfo *root)
}
}
}
+
+/*
+ * add_append_subpath_partrelids
+ * Look up a child subpath's rel's partitioned parent relids up to
+ * parentrel and add the bitmapset containing those into
+ * 'allpartrelids'
+ */
+List *
+add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids)
+{
+ RelOptInfo *prel = subpath->parent;
+ Relids partrelids = NULL;
+
+ /* Nothing to do if there's no parent to begin with. */
+ if (!IS_OTHER_REL(prel))
+ return allpartrelids;
+
+ /*
+ * Traverse up to the pathrel's topmost partitioned parent, collecting
+ * parent relids as we go; but stop if we reach parentrel. (Normally, a
+ * pathrel's topmost partitioned parent is either parentrel or a UNION ALL
+ * appendrel child of parentrel. But when handling partitionwise joins of
+ * multi-level partitioning trees, we can see an append path whose
+ * parentrel is an intermediate partitioned table.)
+ */
+ do
+ {
+ Relids parent_relids = NULL;
+
+ /*
+ * For simple child rels, we can simply get the parent relid from
+ * prel->parent. But for partitionwise join and aggregate child rels,
+ * while we can use prel->parent to move up the tree, parent relids to
+ * add into 'partrelids' must be found the hard way through the
+ * AppendInfoInfos, because 1) a joinrel's relids may point to RTE_JOIN
+ * entries, 2) topmost parent grouping rel's relids field is left NULL.
+ */
+ if (IS_SIMPLE_REL(prel))
+ {
+ prel = prel->parent;
+ /* Stop once we reach the root partitioned rel. */
+ if (!IS_PARTITIONED_REL(prel))
+ break;
+ parent_relids = bms_add_members(parent_relids, prel->relids);
+ }
+ else
+ {
+ AppendRelInfo **appinfos;
+ int nappinfos,
+ i;
+
+ appinfos = find_appinfos_by_relids(root, prel->relids,
+ &nappinfos);
+ for (i = 0; i < nappinfos; i++)
+ {
+ AppendRelInfo *appinfo = appinfos[i];
+
+ parent_relids = bms_add_member(parent_relids,
+ appinfo->parent_relid);
+ }
+ pfree(appinfos);
+ prel = prel->parent;
+ }
+ /* accept this level as an interesting parent */
+ partrelids = bms_add_members(partrelids, parent_relids);
+ if (prel == parentrel)
+ break; /* don't traverse above parentrel */
+ } while (IS_OTHER_REL(prel));
+
+ if (partrelids == NULL)
+ return allpartrelids;
+
+ return add_part_relids(allpartrelids, partrelids);
+}
+
+/*
+ * add_part_relids
+ * Add new info to a list of Bitmapsets of partitioned relids.
+ *
+ * Within 'allpartrelids', there is one Bitmapset for each topmost parent
+ * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
+ * parent as well as its relevant non-leaf child partitions. Since (by
+ * construction of the rangetable list) parent partitions must have lower
+ * RT indexes than their children, we can distinguish the topmost parent
+ * as being the lowest set bit in the Bitmapset.
+ *
+ * 'partrelids' contains the RT indexes of a parent partitioned rel, and
+ * possibly some non-leaf children, that are newly identified as parents of
+ * some subpath rel passed to make_partition_pruneinfo(). These are added
+ * to an appropriate member of 'allpartrelids'.
+ *
+ * Note that the list contains only RT indexes of partitioned tables that
+ * are parents of some scan-level relation appearing in the 'subpaths' that
+ * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
+ * not allowed to be higher than the 'parentrel' associated with the append
+ * path. In this way, we avoid expending cycles on partitioned rels that
+ * can't contribute useful pruning information for the problem at hand.
+ * (It is possible for 'parentrel' to be a child partitioned table, and it
+ * is also possible for scan-level relations to be child partitioned tables
+ * rather than leaf partitions. Hence we must construct this relation set
+ * with reference to the particular append path we're dealing with, rather
+ * than looking at the full partitioning structure represented in the
+ * RelOptInfos.)
+ */
+static List *
+add_part_relids(List *allpartrelids, Bitmapset *partrelids)
+{
+ Index targetpart;
+ ListCell *lc;
+
+ /* We can easily get the lowest set bit this way: */
+ targetpart = bms_next_member(partrelids, -1);
+ Assert(targetpart > 0);
+
+ /* Look for a matching topmost parent */
+ foreach(lc, allpartrelids)
+ {
+ Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
+ Index currtarget = bms_next_member(currpartrelids, -1);
+
+ if (targetpart == currtarget)
+ {
+ /* Found a match, so add any new RT indexes to this hierarchy */
+ currpartrelids = bms_add_members(currpartrelids, partrelids);
+ lfirst(lc) = currpartrelids;
+ return allpartrelids;
+ }
+ }
+ /* No match, so add the new partition hierarchy to the list */
+ return lappend(allpartrelids, partrelids);
+}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 510145e3c0..3557e07082 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -138,7 +138,6 @@ typedef struct PruneStepResult
} PruneStepResult;
-static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
RelOptInfo *parentrel,
List *prunequal,
@@ -221,33 +220,32 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'allpartrelids' contains Bitmapsets of RT indexes of partitioned parents
+ * whose partitions' Paths are in 'subpaths'; there's one Bitmapset for every
+ * partition tree involved.
*/
int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ List *allpartrelids)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
- List *allpartrelids;
List *prunerelinfos;
int *relid_subplan_map;
ListCell *lc;
int i;
+ Assert(list_length(allpartrelids) > 0);
+
/*
- * Scan the subpaths to see which ones are scans of partition child
- * relations, and identify their parent partitioned rels. (Note: we must
- * restrict the parent partitioned rels to be parentrel or children of
- * parentrel, otherwise we couldn't translate prunequal to match.)
- *
- * Also construct a temporary array to map from partition-child-relation
- * relid to the index in 'subpaths' of the scan plan for that partition.
+ * Construct a temporary array to map from partition-child-relation relid
+ * to the index in 'subpaths' of the scan plan for that partition.
* (Use of "subplan" rather than "subpath" is a bit of a misnomer, but
* we'll let it stand.) For convenience, we use 1-based indexes here, so
* that zero can represent an un-filled array entry.
*/
- allpartrelids = NIL;
relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
i = 1;
@@ -256,50 +254,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Path *path = (Path *) lfirst(lc);
RelOptInfo *pathrel = path->parent;
- /* We don't consider partitioned joins here */
- if (pathrel->reloptkind == RELOPT_OTHER_MEMBER_REL)
- {
- RelOptInfo *prel = pathrel;
- Bitmapset *partrelids = NULL;
-
- /*
- * Traverse up to the pathrel's topmost partitioned parent,
- * collecting parent relids as we go; but stop if we reach
- * parentrel. (Normally, a pathrel's topmost partitioned parent
- * is either parentrel or a UNION ALL appendrel child of
- * parentrel. But when handling partitionwise joins of
- * multi-level partitioning trees, we can see an append path whose
- * parentrel is an intermediate partitioned table.)
- */
- do
- {
- AppendRelInfo *appinfo;
-
- Assert(prel->relid < root->simple_rel_array_size);
- appinfo = root->append_rel_array[prel->relid];
- prel = find_base_rel(root, appinfo->parent_relid);
- if (!IS_PARTITIONED_REL(prel))
- break; /* reached a non-partitioned parent */
- /* accept this level as an interesting parent */
- partrelids = bms_add_member(partrelids, prel->relid);
- if (prel == parentrel)
- break; /* don't traverse above parentrel */
- } while (prel->reloptkind == RELOPT_OTHER_MEMBER_REL);
-
- if (partrelids)
- {
- /*
- * Found some relevant parent partitions, which may or may not
- * overlap with partition trees we already found. Add new
- * information to the allpartrelids list.
- */
- allpartrelids = add_part_relids(allpartrelids, partrelids);
- /* Also record the subplan in relid_subplan_map[] */
- /* No duplicates please */
- Assert(relid_subplan_map[pathrel->relid] == 0);
- relid_subplan_map[pathrel->relid] = i;
- }
- }
+ /* No duplicates please */
+ Assert(relid_subplan_map[pathrel->relid] == 0);
+ relid_subplan_map[pathrel->relid] = i;
i++;
}
@@ -368,63 +325,6 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return list_length(root->partPruneInfos) - 1;
}
-/*
- * add_part_relids
- * Add new info to a list of Bitmapsets of partitioned relids.
- *
- * Within 'allpartrelids', there is one Bitmapset for each topmost parent
- * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
- * parent as well as its relevant non-leaf child partitions. Since (by
- * construction of the rangetable list) parent partitions must have lower
- * RT indexes than their children, we can distinguish the topmost parent
- * as being the lowest set bit in the Bitmapset.
- *
- * 'partrelids' contains the RT indexes of a parent partitioned rel, and
- * possibly some non-leaf children, that are newly identified as parents of
- * some subpath rel passed to make_partition_pruneinfo(). These are added
- * to an appropriate member of 'allpartrelids'.
- *
- * Note that the list contains only RT indexes of partitioned tables that
- * are parents of some scan-level relation appearing in the 'subpaths' that
- * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
- * not allowed to be higher than the 'parentrel' associated with the append
- * path. In this way, we avoid expending cycles on partitioned rels that
- * can't contribute useful pruning information for the problem at hand.
- * (It is possible for 'parentrel' to be a child partitioned table, and it
- * is also possible for scan-level relations to be child partitioned tables
- * rather than leaf partitions. Hence we must construct this relation set
- * with reference to the particular append path we're dealing with, rather
- * than looking at the full partitioning structure represented in the
- * RelOptInfos.)
- */
-static List *
-add_part_relids(List *allpartrelids, Bitmapset *partrelids)
-{
- Index targetpart;
- ListCell *lc;
-
- /* We can easily get the lowest set bit this way: */
- targetpart = bms_next_member(partrelids, -1);
- Assert(targetpart > 0);
-
- /* Look for a matching topmost parent */
- foreach(lc, allpartrelids)
- {
- Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
- Index currtarget = bms_next_member(currpartrelids, -1);
-
- if (targetpart == currtarget)
- {
- /* Found a match, so add any new RT indexes to this hierarchy */
- currpartrelids = bms_add_members(currpartrelids, partrelids);
- lfirst(lc) = currpartrelids;
- return allpartrelids;
- }
- }
- /* No match, so add the new partition hierarchy to the list */
- return lappend(allpartrelids, partrelids);
-}
-
/*
* make_partitionedrel_pruneinfo
* Build a List of PartitionedRelPruneInfos, one for each interesting
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 659bd05c0c..a0bb16cff4 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -270,6 +270,13 @@ typedef struct Append
List *appendplans;
int nasyncplans; /* # of asynchronous plans */
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * appendplans. The list contains a bitmapset for every partition tree
+ * covered by this Append.
+ */
+ List *allpartrelids;
+
/*
* All 'appendplans' preceding this index are non-partial plans. All
* 'appendplans' from this index onwards are partial plans.
@@ -294,6 +301,13 @@ typedef struct MergeAppend
List *mergeplans;
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * mergeplans. The list contains a bitmapset for every partition tree
+ * covered by this MergeAppend.
+ */
+ List *allpartrelids;
+
/* these fields are just like the sort-key info in struct Sort: */
/* number of sort-key columns */
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index a05f91f77d..1621a7319a 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -46,5 +46,8 @@ extern void add_row_identity_columns(PlannerInfo *root, Index rtindex,
RangeTblEntry *target_rte,
Relation target_relation);
extern void distribute_row_identity_vars(PlannerInfo *root);
+extern List *add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids);
#endif /* APPENDINFO_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index c0d6889d47..2d907d31d4 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -73,7 +73,8 @@ typedef struct PartitionPruneContext
extern int make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ List *allpartrelids);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v37-0003-Track-opened-range-table-relations-in-a-List-in-.patchapplication/octet-stream; name=v37-0003-Track-opened-range-table-relations-in-a-List-in-.patchDownload
From 3c67d3142062334e4ac061f3eb5bc0be306fbb1c Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Mon, 13 Mar 2023 15:59:38 +0900
Subject: [PATCH v37 3/3] Track opened range table relations in a List in
EState
This makes ExecCloseRangeTableRelations faster when there are many
relations in the range table but only a few are opened during
execution, such as when run-time pruning kicks in on an Append
containing 1000s of partition subplans.
---
src/backend/executor/execMain.c | 9 +++++----
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 2 ++
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0366be9fd6..94f8324cff 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1630,12 +1630,13 @@ ExecCloseResultRelations(EState *estate)
void
ExecCloseRangeTableRelations(EState *estate)
{
- int i;
+ ListCell *lc;
- for (i = 0; i < estate->es_range_table_size; i++)
+ foreach(lc, estate->es_opened_relations)
{
- if (estate->es_relations[i])
- table_close(estate->es_relations[i], NoLock);
+ Relation rel = lfirst(lc);
+
+ table_close(rel, NoLock);
}
}
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index a485e7dfc5..f7053072d9 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -829,6 +829,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
}
estate->es_relations[rti - 1] = rel;
+ estate->es_opened_relations = lappend(estate->es_opened_relations,
+ rel);
}
return rel;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index dfa72848c7..984fd2e423 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,8 @@ typedef struct EState
Index es_range_table_size; /* size of the range table arrays */
Relation *es_relations; /* Array of per-range-table-entry Relation
* pointers, or NULL if not yet opened */
+ List *es_opened_relations; /* List of non-NULL entries in
+ * es_relations in no specific order */
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
--
2.35.3
v37-0002-Move-AcquireExecutorLocks-s-responsibility-into-.patchapplication/octet-stream; name=v37-0002-Move-AcquireExecutorLocks-s-responsibility-into-.patchDownload
From 4ac824f6f0f6795c3a813d5b046f3b44ee223377 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 20 Jan 2023 16:52:31 +0900
Subject: [PATCH v37 2/3] Move AcquireExecutorLocks()'s responsibility into the
executor
This commit introduces a new executor flag EXEC_FLAG_GET_LOCKS that
should be passed in eflags to ExecutorStart() if the PlannedStmt
comes from a CachedPlan. When set, the executor will take locks
on any relations referenced in the plan nodes that need to be
initialized for execution. That excludes any partitions that can
be pruned during the executor initialization phase, that is, based
on the values of only the external (PARAM_EXTERN) parameters.
Relations that are not explicitly mentioned in the plan tree, such
as views and non-leaf partition parents whose children are mentioned
in Append/MergeAppend nodes, are locked separately. After taking each
lock, the executor calls CachedPlanStillValid() to check if
CachedPlan.is_valid has been reset by PlanCacheRelCallback() due to
concurrent modification of relations referenced in the plan. If it
is found that the CachedPlan is indeed invalid, the recursive
ExecInitNode() traversal is aborted at that point. To allow the
proper cleanup of such a partially initialized planstate tree,
ExecEndNode() subroutines of various plan nodes have been fixed to
account for potentially uninitialized fields. It is the caller's
(of ExecutorStart()) responsibility to call ExecutorEnd() even on
a QueryDesc containing such a partially initialized PlanState tree.
Call sites that use plancache (GetCachedPlan) to get the plan trees
to pass to the executor for execution should now be prepared to
handle the case that the plan tree may be flagged by the executor as
stale as described above. To that end, this commit refactors the
relevant code sites to move the ExecutorStart() call closer to the
GetCachedPlan() call to reduce the friction in the cases where
replanning is needed due to a CachedPlan being marked stale in this
manner. Callers must check that QueryDesc.plan_valid is true before
passing it on to ExecutorRun() for execution.
PortalStart() now performs CreateQueryDesc() and ExecutorStart() for
all portal strategies, including those pertaining to multiple queries.
The QueryDescs for strategies handled by PortalRunMulti() are
remembered in the Portal in a new List field 'qdescs', allocated in a
new memory context 'queryContext'. This new arrangment is to make it
easier to discard and recreate a Portal if the CachedPlan goes stale
during setup.
---
contrib/postgres_fdw/postgres_fdw.c | 4 +
src/backend/commands/copyto.c | 4 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 150 ++++++---
src/backend/commands/extension.c | 2 +
src/backend/commands/matview.c | 3 +-
src/backend/commands/portalcmds.c | 16 +-
src/backend/commands/prepare.c | 32 +-
src/backend/executor/execMain.c | 89 ++++-
src/backend/executor/execParallel.c | 8 +-
src/backend/executor/execPartition.c | 15 +
src/backend/executor/execProcnode.c | 9 +
src/backend/executor/execUtils.c | 60 +++-
src/backend/executor/functions.c | 2 +
src/backend/executor/nodeAgg.c | 23 +-
src/backend/executor/nodeAppend.c | 23 +-
src/backend/executor/nodeBitmapAnd.c | 10 +-
src/backend/executor/nodeBitmapHeapscan.c | 10 +-
src/backend/executor/nodeBitmapIndexscan.c | 2 +
src/backend/executor/nodeBitmapOr.c | 10 +-
src/backend/executor/nodeCtescan.c | 6 +-
src/backend/executor/nodeCustom.c | 12 +-
src/backend/executor/nodeForeignscan.c | 22 +-
src/backend/executor/nodeFunctionscan.c | 3 +-
src/backend/executor/nodeGather.c | 2 +
src/backend/executor/nodeGatherMerge.c | 2 +
src/backend/executor/nodeGroup.c | 5 +-
src/backend/executor/nodeHash.c | 2 +
src/backend/executor/nodeHashjoin.c | 13 +-
src/backend/executor/nodeIncrementalSort.c | 14 +-
src/backend/executor/nodeIndexonlyscan.c | 7 +-
src/backend/executor/nodeIndexscan.c | 7 +-
src/backend/executor/nodeLimit.c | 2 +
src/backend/executor/nodeLockRows.c | 2 +
src/backend/executor/nodeMaterial.c | 5 +-
src/backend/executor/nodeMemoize.c | 12 +-
src/backend/executor/nodeMergeAppend.c | 23 +-
src/backend/executor/nodeMergejoin.c | 10 +-
src/backend/executor/nodeModifyTable.c | 13 +-
.../executor/nodeNamedtuplestorescan.c | 3 +-
src/backend/executor/nodeNestloop.c | 7 +-
src/backend/executor/nodeProjectSet.c | 5 +-
src/backend/executor/nodeRecursiveunion.c | 4 +
src/backend/executor/nodeResult.c | 5 +-
src/backend/executor/nodeSamplescan.c | 5 +-
src/backend/executor/nodeSeqscan.c | 5 +-
src/backend/executor/nodeSetOp.c | 5 +-
src/backend/executor/nodeSort.c | 8 +-
src/backend/executor/nodeSubqueryscan.c | 5 +-
src/backend/executor/nodeTableFuncscan.c | 3 +-
src/backend/executor/nodeTidrangescan.c | 5 +-
src/backend/executor/nodeTidscan.c | 5 +-
src/backend/executor/nodeUnique.c | 5 +-
src/backend/executor/nodeValuesscan.c | 3 +-
src/backend/executor/nodeWindowAgg.c | 55 +++-
src/backend/executor/nodeWorktablescan.c | 3 +-
src/backend/executor/spi.c | 53 ++-
src/backend/nodes/outfuncs.c | 1 +
src/backend/nodes/readfuncs.c | 1 +
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 5 +
src/backend/rewrite/rewriteHandler.c | 7 +-
src/backend/storage/lmgr/lmgr.c | 45 +++
src/backend/tcop/postgres.c | 13 +-
src/backend/tcop/pquery.c | 311 ++++++++++--------
src/backend/utils/cache/lsyscache.c | 21 ++
src/backend/utils/cache/plancache.c | 134 ++------
src/backend/utils/mmgr/portalmem.c | 6 +
src/include/commands/explain.h | 7 +-
src/include/executor/execdesc.h | 5 +
src/include/executor/executor.h | 16 +
src/include/nodes/execnodes.h | 2 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 3 +
src/include/storage/lmgr.h | 1 +
src/include/utils/lsyscache.h | 1 +
src/include/utils/plancache.h | 14 +
src/include/utils/portal.h | 4 +
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 63 +++-
.../expected/cached-plan-replan.out | 117 +++++++
.../specs/cached-plan-replan.spec | 50 +++
82 files changed, 1230 insertions(+), 424 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-replan.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-replan.spec
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index f5926ab89d..93f3f8b5d1 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2659,7 +2659,11 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
/* Get info about foreign table. */
rtindex = node->resultRelInfo->ri_RangeTableIndex;
if (fsplan->scan.scanrelid == 0)
+ {
dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (!ExecPlanStillValid(estate))
+ return;
+ }
else
dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index beea1ac687..e9f77d5711 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,8 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL,
+ pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -569,6 +570,7 @@ BeginCopyTo(ParseState *pstate,
* ExecutorStart computes a result tupdesc for us
*/
ExecutorStart(cstate->queryDesc, 0);
+ Assert(cstate->queryDesc->plan_valid);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index d6c6d514f3..a55b851574 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 878d2fd172..826a47af0a 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -393,6 +393,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
else
{
PlannedStmt *plan;
+ QueryDesc *queryDesc;
instr_time planstart,
planduration;
BufferUsage bufusage_start,
@@ -415,12 +416,95 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
+ queryDesc = ExplainQueryDesc(plan, NULL, queryString, into, es,
+ params, queryEnv);
+ Assert(queryDesc);
+
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(queryDesc, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
+/*
+ * ExplainQueryDesc
+ * Set up QueryDesc for EXPLAINing a given plan
+ *
+ * This returns NULL if cplan is found to have been invalidated since its
+ * creation.
+ */
+QueryDesc *
+ExplainQueryDesc(PlannedStmt *stmt, CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv)
+{
+ QueryDesc *queryDesc;
+ DestReceiver *dest;
+ int eflags;
+ int instrument_option = 0;
+
+ /*
+ * Normally we discard the query's output, but if explaining CREATE TABLE
+ * AS, we'd better use the appropriate tuple receiver.
+ */
+ if (into)
+ dest = CreateIntoRelDestReceiver(into);
+ else
+ dest = None_Receiver;
+
+ if (es->analyze && es->timing)
+ instrument_option |= INSTRUMENT_TIMER;
+ else if (es->analyze)
+ instrument_option |= INSTRUMENT_ROWS;
+
+ if (es->buffers)
+ instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
+
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+
+ /* Create a QueryDesc for the query */
+ queryDesc = CreateQueryDesc(stmt, cplan, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv, instrument_option);
+
+ /* Select execution options */
+ if (es->analyze)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_EXPLAIN_ONLY;
+ if (es->generic)
+ eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
+ if (into)
+ eflags |= GetIntoRelEFlags(into);
+
+ /* Take locks if using a CachedPlan */
+ if (queryDesc->cplan)
+ eflags |= EXEC_FLAG_GET_LOCKS;
+
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated as we're doing that.
+ */
+ ExecutorStart(queryDesc, eflags);
+ if (!queryDesc->plan_valid)
+ {
+ /* Clean up. */
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ return NULL;
+ }
+
+ return queryDesc;
+}
+
/*
* ExplainOneUtility -
* print out the execution plan for one utility statement
@@ -524,29 +608,16 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
{
- DestReceiver *dest;
- QueryDesc *queryDesc;
instr_time starttime;
double totaltime = 0;
- int eflags;
- int instrument_option = 0;
- Assert(plannedstmt->commandType != CMD_UTILITY);
-
- if (es->analyze && es->timing)
- instrument_option |= INSTRUMENT_TIMER;
- else if (es->analyze)
- instrument_option |= INSTRUMENT_ROWS;
-
- if (es->buffers)
- instrument_option |= INSTRUMENT_BUFFERS;
- if (es->wal)
- instrument_option |= INSTRUMENT_WAL;
+ Assert(queryDesc->plannedstmt->commandType != CMD_UTILITY);
/*
* We always collect timing for the entire statement, even when node-level
@@ -555,40 +626,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
- /*
- * Normally we discard the query's output, but if explaining CREATE TABLE
- * AS, we'd better use the appropriate tuple receiver.
- */
- if (into)
- dest = CreateIntoRelDestReceiver(into);
- else
- dest = None_Receiver;
-
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
-
- /* Select execution options */
- if (es->analyze)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_EXPLAIN_ONLY;
- if (es->generic)
- eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
- if (into)
- eflags |= GetIntoRelEFlags(into);
-
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
-
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -4862,6 +4899,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 0eabe18335..5a76343123 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -797,11 +797,13 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
ExecutorStart(qdesc, 0);
+ Assert(qdesc->plan_valid);
ExecutorRun(qdesc, ForwardScanDirection, 0, true);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index c00b9df3e3..80f2c38b35 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -409,12 +409,13 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, 0);
+ Assert(queryDesc->plan_valid);
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 8a3cf98cce..3c34ab4351 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -146,6 +146,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
PortalStart(portal, params, 0, GetActiveSnapshot());
Assert(portal->strategy == PORTAL_ONE_SELECT);
+ Assert(portal->plan_valid);
/*
* We're done; the query won't actually be run until PerformPortalFetch is
@@ -249,6 +250,17 @@ PerformPortalClose(const char *name)
PortalDrop(portal, false);
}
+/*
+ * Release a portal's QueryDesc.
+ */
+void
+PortalQueryFinish(QueryDesc *queryDesc)
+{
+ ExecutorFinish(queryDesc);
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+}
+
/*
* PortalCleanup
*
@@ -295,9 +307,7 @@ PortalCleanup(Portal portal)
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
- FreeQueryDesc(queryDesc);
+ PortalQueryFinish(queryDesc);
CurrentResourceOwner = saveResourceOwner;
}
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..c9070ed97f 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -183,6 +183,7 @@ ExecuteQuery(ParseState *pstate,
paramLI = EvaluateParams(pstate, entry, stmt->params, estate);
}
+replan:
/* Create a new portal to run the query in */
portal = CreateNewPortal();
/* Don't display the portal in pg_cursors, it is for internal use only */
@@ -251,10 +252,19 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal contains a cached plan, it
+ * must be recreated if portal->plan_valid is false which tells that the
+ * cached plan was found to have been invalidated when initializing one of
+ * the plan trees contained in it.
*/
PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
(void) PortalRun(portal, count, false, true, dest, dest, qc);
PortalDrop(portal, false);
@@ -574,7 +584,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -618,6 +628,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI,
CurrentResourceOwner, queryEnv);
@@ -639,8 +650,21 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ {
+ QueryDesc *queryDesc;
+
+ queryDesc = ExplainQueryDesc(pstmt, cplan, queryString,
+ into, es, paramLI, queryEnv);
+ if (queryDesc == NULL)
+ {
+ ExplainResetOutput(es);
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ goto replan;
+ }
+ ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
+ }
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 1b007dc32c..0366be9fd6 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -126,11 +126,32 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
* get control when ExecutorStart is called. Such a plugin would
* normally call standard_ExecutorStart().
*
+ * Normally, the plan tree given in queryDesc->plannedstmt is known to be
+ * valid in that *all* relations contained in plannedstmt->relationOids have
+ * already been locked. That may not be the case however if the plannedstmt
+ * comes from a CachedPlan, one given in queryDesc->cplan, in which case only
+ * some of the relations referenced in the plan would have been locked; to
+ * wit, those that AcquirePlannerLocks() deems necessary. Locks necessary
+ * to fully validate such a plan tree, including relations that are added by
+ * the planner, will be taken when initializing the plan tree in InitPlan();
+ * the the caller must have set the EXEC_FLAG_GET_LOCKS bit in eflags. If the
+ * CachedPlan gets invalidated as these locks are taken, plan tree
+ * initialization is suspended at the point when such invalidation is first
+ * detected and InitPlan() returns after setting queryDesc->plan_valid to
+ * false. queryDesc->planstate would be pointing to a potentially partially
+ * initialized PlanState tree in that case. Callers must retry the execution
+ * with a freshly created CachedPlan in that case, after properly freeing the
+ * partially valid QueryDesc.
* ----------------------------------------------------------------
*/
void
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ /* Take locks if the plan tree comes from a CachedPlan. */
+ Assert(queryDesc->cplan == NULL ||
+ (CachedPlanStillValid(queryDesc->cplan) &&
+ (eflags & EXEC_FLAG_GET_LOCKS) != 0));
+
/*
* In some cases (e.g. an EXECUTE statement) a query execution will skip
* parse analysis, which means that the query_id won't be reported. Note
@@ -582,6 +603,16 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
RTEPermissionInfo *perminfo = lfirst_node(RTEPermissionInfo, l);
Assert(OidIsValid(perminfo->relid));
+
+ /*
+ * Relations whose permissions need to be checked must already have
+ * been locked by the parser or by AcquirePlannerLocks() if a
+ * cached plan is being executed.
+ * XXX shouldn't we skip calling ExecCheckPermissions from InitPlan
+ * in a parallel worker?
+ */
+ Assert(CheckRelLockedByMe(perminfo->relid, AccessShareLock, true) ||
+ IsParallelWorker());
result = ExecCheckOneRelPerms(perminfo);
if (!result)
{
@@ -785,12 +816,19 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
-
/* ----------------------------------------------------------------
* InitPlan
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ * If queryDesc contains a CachedPlan, this takes locks on relations.
+ * If any of those relations have undergone concurrent schema changes
+ * between successfully performing RevalidateCachedQuery() on the
+ * containing CachedPlanSource and here, locking those relations would
+ * invalidate the CachedPlan by way of PlanCacheRelCallback(). In that
+ * case, queryDesc->plan_valid would be set to false to tell the caller
+ * to retry after creating a new CachedPlan.
* ----------------------------------------------------------------
*/
static void
@@ -801,20 +839,32 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
- PlanState *planstate;
+ PlanState *planstate = NULL;
TupleDesc tupType;
ListCell *l;
int i;
/*
- * Do permissions checks
+ * Set up range table in EState.
*/
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
+ ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+
+ /* Make sure ExecPlanStillValid() can work. */
+ estate->es_cachedplan = queryDesc->cplan;
/*
- * initialize the node's execution state
+ * Lock any views that were mentioned in the query if needed. View
+ * relations must be locked separately like this, because they are not
+ * referenced in the plan tree.
*/
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+ ExecLockViewRelations(plannedstmt->viewRelations, estate);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
+
+ /*
+ * Do permissions checks
+ */
+ ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
@@ -849,6 +899,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -919,6 +971,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
i++;
}
@@ -929,6 +983,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -972,6 +1028,17 @@ InitPlan(QueryDesc *queryDesc, int eflags)
queryDesc->tupDesc = tupType;
queryDesc->planstate = planstate;
+ queryDesc->plan_valid = true;
+ return;
+
+failed:
+ /*
+ * Plan initialization failed. Mark QueryDesc as such. Note that we do
+ * set planstate, even if it may only be partially initialized, so that
+ * ExecEndPlan() can process it.
+ */
+ queryDesc->planstate = planstate;
+ queryDesc->plan_valid = false;
}
/*
@@ -1389,7 +1456,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked.
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
@@ -2797,7 +2864,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2884,6 +2952,7 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
PlanState *subplanstate;
subplanstate = ExecInitNode(subplan, rcestate, 0);
+ Assert(ExecPlanStillValid(rcestate));
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
}
@@ -2937,6 +3006,10 @@ EvalPlanQualEnd(EPQState *epqstate)
MemoryContext oldcontext;
ListCell *l;
+ /* Nothing to do if EvalPlanQualInit() wasn't done to begin with. */
+ if (epqstate->parentestate == NULL)
+ return;
+
rtsize = epqstate->parentestate->es_range_table_size;
/*
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aa3f283453..df4cc5ddaf 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1249,8 +1249,13 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. Note that no CachedPlan is available
+ * here even if the containing plan tree may have come from one in the
+ * leader.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
@@ -1432,6 +1437,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
ExecutorStart(queryDesc, fpes->eflags);
+ Assert(queryDesc->plan_valid);
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 9799968a42..3425ffcca7 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -513,6 +513,12 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
oldcxt = MemoryContextSwitchTo(proute->memcxt);
+ /*
+ * Note that while we must check ExecPlanStillValid() for other locks taken
+ * during execution initialization, it is OK to not do so for partitions
+ * opened like this, for tuple routing, because it can't possibly
+ * invalidate the plan.
+ */
partrel = table_open(partOid, RowExclusiveLock);
leaf_part_rri = makeNode(ResultRelInfo);
@@ -1111,6 +1117,11 @@ ExecInitPartitionDispatchInfo(EState *estate,
* Only sub-partitioned tables need to be locked here. The root
* partitioned table will already have been locked as it's referenced in
* the query's rtable.
+ *
+ * Note that while we must check ExecPlanStillValid() for other locks taken
+ * during execution initialization, it is OK to not do so for partitions
+ * opened like this, for tuple routing, because it can't possibly
+ * invalidate the plan.
*/
if (partoid != RelationGetRelid(proute->partition_root))
rel = table_open(partoid, RowExclusiveLock);
@@ -1817,6 +1828,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/* Create the working data structure for pruning */
prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Perform an initial partition prune pass, if required.
@@ -1943,6 +1956,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d4..6f3c37b6fd 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -388,6 +388,9 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
break;
}
+ if (!ExecPlanStillValid(estate))
+ return result;
+
ExecSetExecProcNode(result, result->ExecProcNode);
/*
@@ -403,6 +406,12 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
Assert(IsA(subplan, SubPlan));
sstate = ExecInitSubPlan(subplan, result);
subps = lappend(subps, sstate);
+ if (!ExecPlanStillValid(estate))
+ {
+ /* Don't lose track of those initialized. */
+ result->initPlan = subps;
+ return result;
+ }
}
result->initPlan = subps;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 012dbb6965..a485e7dfc5 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -804,7 +804,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (!IsParallelWorker() &&
+ (estate->es_top_eflags & EXEC_FLAG_GET_LOCKS) == 0)
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -833,6 +834,61 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
return rel;
}
+/*
+ * ExecLockViewRelations
+ * Lock view relations, if any, in a given query
+ */
+void
+ExecLockViewRelations(List *viewRelations, EState *estate)
+{
+ ListCell *lc;
+
+ /* Nothing to do if no locks need to be taken. */
+ if ((estate->es_top_eflags & EXEC_FLAG_GET_LOCKS) == 0)
+ return;
+
+ foreach(lc, viewRelations)
+ {
+ Index rti = lfirst_int(lc);
+ RangeTblEntry *rte = exec_rt_fetch(rti, estate);
+
+ Assert(OidIsValid(rte->relid));
+ Assert(rte->relkind == RELKIND_VIEW);
+ Assert(rte->rellockmode != NoLock);
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * ExecLockAppendNonLeafRelations
+ * Lock non-leaf relations whose children are scanned by a given
+ * Append/MergeAppend node
+ */
+void
+ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids)
+{
+ ListCell *l;
+
+ /* Nothing to do if no locks need to be taken. */
+ if ((estate->es_top_eflags & EXEC_FLAG_GET_LOCKS) == 0)
+ return;
+
+ foreach(l, allpartrelids)
+ {
+ Bitmapset *partrelids = lfirst_node(Bitmapset, l);
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partrelids, i)) > 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(i, estate);
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+}
+
/*
* ExecInitResultRelation
* Open relation given by the passed-in RT index and fill its
@@ -848,6 +904,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (!ExecPlanStillValid(estate))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f55424eb5a..c88f72bc4e 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -838,6 +838,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL, /* fmgr_sql() doesn't use CachedPlans */
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -863,6 +864,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
eflags = 0; /* default run-to-completion flags */
ExecutorStart(es->qd, eflags);
+ Assert(es->qd->plan_valid);
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 19342a420c..06e0d7d149 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3134,15 +3134,18 @@ hashagg_reset_spill_state(AggState *aggstate)
{
HashAggSpill *spill = &aggstate->hash_spills[setno];
- pfree(spill->ntuples);
- pfree(spill->partitions);
+ if (spill->ntuples)
+ pfree(spill->ntuples);
+ if (spill->partitions)
+ pfree(spill->partitions);
}
pfree(aggstate->hash_spills);
aggstate->hash_spills = NULL;
}
/* free batches */
- list_free_deep(aggstate->hash_batches);
+ if (aggstate->hash_batches)
+ list_free_deep(aggstate->hash_batches);
aggstate->hash_batches = NIL;
/* close tape set */
@@ -3296,6 +3299,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return aggstate;
/*
* initialize source tuple type.
@@ -4336,10 +4341,13 @@ ExecEndAgg(AggState *node)
{
AggStatePerTrans pertrans = &node->pertrans[transno];
- for (setno = 0; setno < numGroupingSets; setno++)
+ if (pertrans)
{
- if (pertrans->sortstates[setno])
- tuplesort_end(pertrans->sortstates[setno]);
+ for (setno = 0; setno < numGroupingSets; setno++)
+ {
+ if (pertrans->sortstates[setno])
+ tuplesort_end(pertrans->sortstates[setno]);
+ }
}
}
@@ -4357,7 +4365,8 @@ ExecEndAgg(AggState *node)
ExecFreeExprContext(&node->ss.ps);
/* clean up tuple table */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c185b11c67..091f979c46 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -109,10 +109,11 @@ AppendState *
ExecInitAppend(Append *node, EState *estate, int eflags)
{
AppendState *appendstate = makeNode(AppendState);
- PlanState **appendplanstates;
+ PlanState **appendplanstates = NULL;
Bitmapset *validsubplans;
Bitmapset *asyncplans;
int nplans;
+ int ninited = 0;
int nasyncplans;
int firstvalid;
int i,
@@ -133,6 +134,15 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_syncdone = false;
appendstate->as_begun = false;
+ /*
+ * Lock non-leaf partitions. In the pruning case, some of these locks
+ * will be retaken when the partition will be opened for pruning, but it
+ * does not seem worthwhile to spend cycles to filter those out here.
+ */
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_index >= 0)
{
@@ -148,6 +158,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
node->part_prune_index,
node->apprelids,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -222,11 +234,12 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ ninited++;
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
}
appendstate->as_first_partial_plan = firstvalid;
- appendstate->appendplans = appendplanstates;
- appendstate->as_nplans = nplans;
/* Initialize async state */
appendstate->as_asyncplans = asyncplans;
@@ -276,6 +289,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
/* For parallel query, this will be overridden later. */
appendstate->choose_next_subplan = choose_next_subplan_locally;
+early_exit:
+ appendstate->appendplans = appendplanstates;
+ appendstate->as_nplans = ninited;
+
return appendstate;
}
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4c5eb2b23b..acc6c50e20 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -57,6 +57,7 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
BitmapAndState *bitmapandstate = makeNode(BitmapAndState);
PlanState **bitmapplanstates;
int nplans;
+ int ninited = 0;
int i;
ListCell *l;
Plan *initNode;
@@ -77,8 +78,6 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
bitmapandstate->ps.plan = (Plan *) node;
bitmapandstate->ps.state = estate;
bitmapandstate->ps.ExecProcNode = ExecBitmapAnd;
- bitmapandstate->bitmapplans = bitmapplanstates;
- bitmapandstate->nplans = nplans;
/*
* call ExecInitNode on each of the plans to be executed and save the
@@ -89,6 +88,9 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ ninited++;
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
i++;
}
@@ -99,6 +101,10 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
* ExecQual or ExecProject. They don't need any tuple slots either.
*/
+early_exit:
+ bitmapandstate->bitmapplans = bitmapplanstates;
+ bitmapandstate->nplans = ninited;
+
return bitmapandstate;
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index f35df0b8bf..e6a689eefb 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -665,7 +665,8 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close down subplans
@@ -693,7 +694,8 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
/*
* close heap scan
*/
- table_endscan(scanDesc);
+ if (scanDesc)
+ table_endscan(scanDesc);
}
/* ----------------------------------------------------------------
@@ -763,11 +765,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 83ec9ede89..cc8332ef68 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -263,6 +263,8 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 0bf8af9652..babad1b4b2 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -58,6 +58,7 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
BitmapOrState *bitmaporstate = makeNode(BitmapOrState);
PlanState **bitmapplanstates;
int nplans;
+ int ninited = 0;
int i;
ListCell *l;
Plan *initNode;
@@ -78,8 +79,6 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
bitmaporstate->ps.plan = (Plan *) node;
bitmaporstate->ps.state = estate;
bitmaporstate->ps.ExecProcNode = ExecBitmapOr;
- bitmaporstate->bitmapplans = bitmapplanstates;
- bitmaporstate->nplans = nplans;
/*
* call ExecInitNode on each of the plans to be executed and save the
@@ -90,6 +89,9 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ ninited++;
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
i++;
}
@@ -100,6 +102,10 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
* ExecQual or ExecProject. They don't need any tuple slots either.
*/
+early_exit:
+ bitmaporstate->bitmapplans = bitmapplanstates;
+ bitmaporstate->nplans = ninited;
+
return bitmaporstate;
}
diff --git a/src/backend/executor/nodeCtescan.c b/src/backend/executor/nodeCtescan.c
index cc4c4243e2..eed5b75a4f 100644
--- a/src/backend/executor/nodeCtescan.c
+++ b/src/backend/executor/nodeCtescan.c
@@ -297,14 +297,16 @@ ExecEndCteScan(CteScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* If I am the leader, free the tuplestore.
*/
if (node->leader == node)
{
- tuplestore_end(node->cte_table);
+ if (node->cte_table)
+ tuplestore_end(node->cte_table);
node->cte_table = NULL;
}
}
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index bd42c65b29..b03499fae5 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -61,6 +61,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return css;
css->ss.ss_currentRelation = scan_rel;
}
@@ -127,6 +129,10 @@ ExecCustomScan(PlanState *pstate)
void
ExecEndCustomScan(CustomScanState *node)
{
+ /*
+ * XXX - BeginCustomScan() may not have occurred if ExecInitCustomScan()
+ * hit the early exit case.
+ */
Assert(node->methods->EndCustomScan != NULL);
node->methods->EndCustomScan(node);
@@ -134,8 +140,10 @@ ExecEndCustomScan(CustomScanState *node)
ExecFreeExprContext(&node->ss.ps);
/* Clean out the tuple table */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
void
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index c2139acca0..d3f0a65485 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -173,6 +173,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -264,6 +266,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* Tell the FDW to initialize the scan.
@@ -300,14 +304,17 @@ ExecEndForeignScan(ForeignScanState *node)
ForeignScan *plan = (ForeignScan *) node->ss.ps.plan;
EState *estate = node->ss.ps.state;
- /* Let the FDW shut down */
- if (plan->operation != CMD_SELECT)
+ /* Let the FDW shut down if needed. */
+ if (node->fdw_state)
{
- if (estate->es_epq_active == NULL)
- node->fdwroutine->EndDirectModify(node);
+ if (plan->operation != CMD_SELECT)
+ {
+ if (estate->es_epq_active == NULL)
+ node->fdwroutine->EndDirectModify(node);
+ }
+ else
+ node->fdwroutine->EndForeignScan(node);
}
- else
- node->fdwroutine->EndForeignScan(node);
/* Shut down any outer plan. */
if (outerPlanState(node))
@@ -319,7 +326,8 @@ ExecEndForeignScan(ForeignScanState *node)
/* clean out the tuple table */
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeFunctionscan.c b/src/backend/executor/nodeFunctionscan.c
index dd06ef8aee..792ecda4a9 100644
--- a/src/backend/executor/nodeFunctionscan.c
+++ b/src/backend/executor/nodeFunctionscan.c
@@ -533,7 +533,8 @@ ExecEndFunctionScan(FunctionScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* Release slots and tuplestore resources
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 307fc10eea..365d3af3e4 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -89,6 +89,8 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return gatherstate;
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 9d5e1a46e9..8d2809f079 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -108,6 +108,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return gm_state;
/*
* Leader may access ExecProcNode result directly (if
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 25a1618952..e0832bb778 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -185,6 +185,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return grpstate;
/*
* Initialize scan slot and type.
@@ -231,7 +233,8 @@ ExecEndGroup(GroupState *node)
ExecFreeExprContext(&node->ss.ps);
/* clean up tuple table */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 748c9b0024..891bcee919 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -386,6 +386,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hashstate;
/*
* initialize our result slot and type. No need to build projection
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index f189fb4d28..93ce0c8be0 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -691,8 +691,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hjstate;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hjstate;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
@@ -813,9 +817,12 @@ ExecEndHashJoin(HashJoinState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->hj_OuterTupleSlot);
- ExecClearTuple(node->hj_HashTupleSlot);
+ if (node->js.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
+ if (node->hj_OuterTupleSlot)
+ ExecClearTuple(node->hj_OuterTupleSlot);
+ if (node->hj_HashTupleSlot)
+ ExecClearTuple(node->hj_HashTupleSlot);
/*
* clean up subtrees
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 12bc22f33c..6b2da56044 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1041,6 +1041,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return incrsortstate;
/*
* Initialize scan slot and type.
@@ -1080,12 +1082,16 @@ ExecEndIncrementalSort(IncrementalSortState *node)
SO_printf("ExecEndIncrementalSort: shutting down sort node\n");
/* clean out the scan tuple */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/* must drop pointer to sort result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ if (node->ss.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
/* must drop standalone tuple slots from outer node */
- ExecDropSingleTupleTableSlot(node->group_pivot);
- ExecDropSingleTupleTableSlot(node->transfer_tuple);
+ if (node->group_pivot)
+ ExecDropSingleTupleTableSlot(node->group_pivot);
+ if (node->transfer_tuple)
+ ExecDropSingleTupleTableSlot(node->transfer_tuple);
/*
* Release tuplesort resources.
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0b43a9b969..b60a086464 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -394,7 +394,8 @@ ExecEndIndexOnlyScan(IndexOnlyScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if(node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close the index relation (no-op if we didn't open it)
@@ -512,6 +513,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -565,6 +568,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->ioss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 4540c7781d..628c233919 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -808,7 +808,8 @@ ExecEndIndexScan(IndexScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close the index relation (no-op if we didn't open it)
@@ -925,6 +926,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -970,6 +973,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 425fbfc405..2fcbde74ed 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -476,6 +476,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return limitstate;
/*
* initialize child expressions
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 407414fc0c..3a8aa2b5a4 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -323,6 +323,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return lrstate;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 09632678b0..f146ebb1d7 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return matstate;
/*
* Initialize result type and slot. No need to initialize projection info
@@ -242,7 +244,8 @@ ExecEndMaterial(MaterialState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* Release tuplestore resources
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 4f04269e26..3003ee1e5c 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -938,6 +938,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mstate;
/*
* Initialize return slot and type. No need to initialize projection info
@@ -1043,6 +1045,7 @@ ExecEndMemoize(MemoizeState *node)
{
#ifdef USE_ASSERT_CHECKING
/* Validate the memory accounting code is correct in assert builds. */
+ if (node->hashtable)
{
int count;
uint64 mem = 0;
@@ -1089,11 +1092,14 @@ ExecEndMemoize(MemoizeState *node)
}
/* Remove the cache context */
- MemoryContextDelete(node->tableContext);
+ if (node->tableContext)
+ MemoryContextDelete(node->tableContext);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/* must drop pointer to cache result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ if (node->ss.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
/*
* free exprcontext
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 399b39c598..40bba35499 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -65,9 +65,10 @@ MergeAppendState *
ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
{
MergeAppendState *mergestate = makeNode(MergeAppendState);
- PlanState **mergeplanstates;
+ PlanState **mergeplanstates = NULL;
Bitmapset *validsubplans;
int nplans;
+ int ninited = 0;
int i,
j;
@@ -81,6 +82,15 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.state = estate;
mergestate->ps.ExecProcNode = ExecMergeAppend;
+ /*
+ * Lock non-leaf partitions. In the pruning case, some of these locks
+ * will be retaken when the partition will be opened for pruning, but it
+ * does not seem worthwhile to spend cycles to filter those out here.
+ */
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_index >= 0)
{
@@ -96,6 +106,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
node->part_prune_index,
node->apprelids,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -122,8 +134,6 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
}
mergeplanstates = (PlanState **) palloc(nplans * sizeof(PlanState *));
- mergestate->mergeplans = mergeplanstates;
- mergestate->ms_nplans = nplans;
mergestate->ms_slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
mergestate->ms_heap = binaryheap_allocate(nplans, heap_compare_slots,
@@ -152,6 +162,9 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ ninited++;
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
}
mergestate->ps.ps_ProjInfo = NULL;
@@ -188,6 +201,10 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
mergestate->ms_initialized = false;
+early_exit:
+ mergestate->mergeplans = mergeplanstates;
+ mergestate->ms_nplans = ninited;
+
return mergestate;
}
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 809aa215c6..968be05568 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1482,11 +1482,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
@@ -1642,8 +1646,10 @@ ExecEndMergeJoin(MergeJoinState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->mj_MarkedTupleSlot);
+ if (node->js.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
+ if (node->mj_MarkedTupleSlot)
+ ExecClearTuple(node->mj_MarkedTupleSlot);
/*
* shut down the subplans
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index e350375681..8a70543326 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3900,6 +3900,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
Plan *subplan = outerPlan(node);
CmdType operation = node->operation;
int nrels = list_length(node->resultRelations);
+ int ninited = 0;
ResultRelInfo *resultRelInfo;
List *arowmarks;
ListCell *l;
@@ -3921,7 +3922,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
mtstate->canSetTag = node->canSetTag;
mtstate->mt_done = false;
- mtstate->mt_nrels = nrels;
mtstate->resultRelInfo = (ResultRelInfo *)
palloc(nrels * sizeof(ResultRelInfo));
@@ -3956,6 +3956,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL, node->epqParam);
mtstate->fireBSTriggers = true;
@@ -3982,6 +3985,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
if (resultRelInfo != mtstate->rootResultRelInfo)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
/*
* For child result relations, store the root result relation
@@ -4009,11 +4014,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
/*
* Do additional per-result-relation initialization.
*/
- for (i = 0; i < nrels; i++)
+ for (i = 0; i < nrels; i++, ninited++)
{
resultRelInfo = &mtstate->resultRelInfo[i];
@@ -4362,6 +4369,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
estate->es_auxmodifytables = lcons(mtstate,
estate->es_auxmodifytables);
+early_exit:
+ mtstate->mt_nrels = ninited;
return mtstate;
}
diff --git a/src/backend/executor/nodeNamedtuplestorescan.c b/src/backend/executor/nodeNamedtuplestorescan.c
index 46832ad82f..1f92c43d3b 100644
--- a/src/backend/executor/nodeNamedtuplestorescan.c
+++ b/src/backend/executor/nodeNamedtuplestorescan.c
@@ -174,7 +174,8 @@ ExecEndNamedTuplestoreScan(NamedTuplestoreScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b3d52e69ec..deda0c2559 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -295,11 +295,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return nlstate;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return nlstate;
/*
* Initialize result slot, type and projection.
@@ -372,7 +376,8 @@ ExecEndNestLoop(NestLoopState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
+ if (node->js.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
/*
* close down subplans
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index f6ff3dc44c..85d20c4680 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -247,6 +247,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return state;
/*
* we don't use inner plan
@@ -328,7 +330,8 @@ ExecEndProjectSet(ProjectSetState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ps.ps_ResultTupleSlot);
/*
* shut down subplans
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index e781003934..967fe4f287 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return rustate;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return rustate;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 4219712d30..c549b684a3 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -208,6 +208,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return resstate;
/*
* we don't use inner plan
@@ -248,7 +250,8 @@ ExecEndResult(ResultState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ps.ps_ResultTupleSlot);
/*
* shut down subplans
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index d7e22b1dbb..b3bc9b1f77 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -125,6 +125,8 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
@@ -198,7 +200,8 @@ ExecEndSampleScan(SampleScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close heap scan
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 4da0f28f7b..e7ca19ee4e 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,8 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
@@ -200,7 +202,8 @@ ExecEndSeqScan(SeqScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close heap scan
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 4bc2406b89..95950a5c20 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return setopstate;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
@@ -583,7 +585,8 @@ void
ExecEndSetOp(SetOpState *node)
{
/* clean up tuple table */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ps.ps_ResultTupleSlot);
/* free subsidiary stuff including hashtable */
if (node->tableContext)
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index c6c72c6e67..89fef86aba 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return sortstate;
/*
* Initialize scan slot and type.
@@ -306,9 +308,11 @@ ExecEndSort(SortState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/* must drop pointer to sort result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ if (node->ss.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
/*
* Release tuplesort resources
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 42471bfc04..9b8cddc89f 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return subquerystate;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
@@ -177,7 +179,8 @@ ExecEndSubqueryScan(SubqueryScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close down subquery
diff --git a/src/backend/executor/nodeTableFuncscan.c b/src/backend/executor/nodeTableFuncscan.c
index 0c6c912778..d7536953f1 100644
--- a/src/backend/executor/nodeTableFuncscan.c
+++ b/src/backend/executor/nodeTableFuncscan.c
@@ -223,7 +223,8 @@ ExecEndTableFuncScan(TableFuncScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* Release tuplestore resources
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 2124c55ef5..1ae451d7a6 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -342,7 +342,8 @@ ExecEndTidRangeScan(TidRangeScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
@@ -386,6 +387,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return tidrangestate;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 862bd0330b..9fe76b1c60 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -483,7 +483,8 @@ ExecEndTidScan(TidScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
@@ -529,6 +530,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return tidstate;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 45035d74fa..69f23b02c6 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -136,6 +136,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return uniquestate;
/*
* Initialize result slot and type. Unique nodes do no projections, so
@@ -169,7 +171,8 @@ void
ExecEndUnique(UniqueState *node)
{
/* clean up tuple table */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ps.ps_ResultTupleSlot);
ExecFreeExprContext(&node->ps);
diff --git a/src/backend/executor/nodeValuesscan.c b/src/backend/executor/nodeValuesscan.c
index 32ace63017..f5dedbab63 100644
--- a/src/backend/executor/nodeValuesscan.c
+++ b/src/backend/executor/nodeValuesscan.c
@@ -340,7 +340,8 @@ ExecEndValuesScan(ValuesScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 7c07fb0684..616bb97675 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -1334,7 +1334,7 @@ release_partition(WindowAggState *winstate)
WindowStatePerFunc perfuncstate = &(winstate->perfunc[i]);
/* Release any partition-local state of this window function */
- if (perfuncstate->winobj)
+ if (perfuncstate && perfuncstate->winobj)
perfuncstate->winobj->localmem = NULL;
}
@@ -1344,12 +1344,17 @@ release_partition(WindowAggState *winstate)
* any aggregate temp data). We don't rely on retail pfree because some
* aggregates might have allocated data we don't have direct pointers to.
*/
- MemoryContextResetAndDeleteChildren(winstate->partcontext);
- MemoryContextResetAndDeleteChildren(winstate->aggcontext);
- for (i = 0; i < winstate->numaggs; i++)
+ if (winstate->partcontext)
+ MemoryContextResetAndDeleteChildren(winstate->partcontext);
+ if (winstate->aggcontext)
+ MemoryContextResetAndDeleteChildren(winstate->aggcontext);
+ if (winstate->peragg)
{
- if (winstate->peragg[i].aggcontext != winstate->aggcontext)
- MemoryContextResetAndDeleteChildren(winstate->peragg[i].aggcontext);
+ for (i = 0; i < winstate->numaggs; i++)
+ {
+ if (winstate->peragg[i].aggcontext != winstate->aggcontext)
+ MemoryContextResetAndDeleteChildren(winstate->peragg[i].aggcontext);
+ }
}
if (winstate->buffer)
@@ -2451,6 +2456,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return winstate;
/*
* initialize source tuple type (which is also the tuple type that we'll
@@ -2679,11 +2686,16 @@ ExecEndWindowAgg(WindowAggState *node)
release_partition(node);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
- ExecClearTuple(node->first_part_slot);
- ExecClearTuple(node->agg_row_slot);
- ExecClearTuple(node->temp_slot_1);
- ExecClearTuple(node->temp_slot_2);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->first_part_slot)
+ ExecClearTuple(node->first_part_slot);
+ if (node->agg_row_slot)
+ ExecClearTuple(node->agg_row_slot);
+ if (node->temp_slot_1)
+ ExecClearTuple(node->temp_slot_1);
+ if (node->temp_slot_2)
+ ExecClearTuple(node->temp_slot_2);
if (node->framehead_slot)
ExecClearTuple(node->framehead_slot);
if (node->frametail_slot)
@@ -2696,16 +2708,23 @@ ExecEndWindowAgg(WindowAggState *node)
node->ss.ps.ps_ExprContext = node->tmpcontext;
ExecFreeExprContext(&node->ss.ps);
- for (i = 0; i < node->numaggs; i++)
+ if (node->peragg)
{
- if (node->peragg[i].aggcontext != node->aggcontext)
- MemoryContextDelete(node->peragg[i].aggcontext);
+ for (i = 0; i < node->numaggs; i++)
+ {
+ if (node->peragg[i].aggcontext != node->aggcontext)
+ MemoryContextDelete(node->peragg[i].aggcontext);
+ }
}
- MemoryContextDelete(node->partcontext);
- MemoryContextDelete(node->aggcontext);
+ if (node->partcontext)
+ MemoryContextDelete(node->partcontext);
+ if (node->aggcontext)
+ MemoryContextDelete(node->aggcontext);
- pfree(node->perfunc);
- pfree(node->peragg);
+ if (node->perfunc)
+ pfree(node->perfunc);
+ if (node->peragg)
+ pfree(node->peragg);
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
diff --git a/src/backend/executor/nodeWorktablescan.c b/src/backend/executor/nodeWorktablescan.c
index 0c13448236..d70c6afde3 100644
--- a/src/backend/executor/nodeWorktablescan.c
+++ b/src/backend/executor/nodeWorktablescan.c
@@ -200,7 +200,8 @@ ExecEndWorkTableScan(WorkTableScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index e3a170c38b..26a9ea342a 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -71,7 +71,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1623,6 +1623,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
_SPI_current->processed = 0;
_SPI_current->tuptable = NULL;
+replan:
/* Create the portal */
if (name == NULL || name[0] == '\0')
{
@@ -1766,7 +1767,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if portal->plan_valid is false which tells that the cached
+ * plan was found to have been invalidated when initializing one of the
+ * plan trees contained in it.
*/
PortalStart(portal, paramLI, 0, snapshot);
@@ -1775,6 +1779,12 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/* Pop the SPI stack */
_SPI_end_call(true);
@@ -2552,6 +2562,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2661,6 +2672,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2668,14 +2680,36 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+
+ /* Take locks if using a CachedPlan */
+ if (qdesc->cplan)
+ eflags |= EXEC_FLAG_GET_LOCKS;
+
+ ExecutorStart(qdesc, eflags);
+ if (!qdesc->plan_valid)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ Assert(cplan);
+ ReleaseCachedPlan(cplan, plan_owner);
+ goto replan;
+ }
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2850,10 +2884,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2897,14 +2930,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index ba00b99249..955286513d 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -513,6 +513,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
WRITE_BOOL_FIELD(security_barrier);
/* we re-use these RELATION fields, too: */
WRITE_OID_FIELD(relid);
+ WRITE_CHAR_FIELD(relkind);
WRITE_INT_FIELD(rellockmode);
WRITE_UINT_FIELD(perminfoindex);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 597e5b3ea8..a136ae1d60 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -503,6 +503,7 @@ _readRangeTblEntry(void)
READ_BOOL_FIELD(security_barrier);
/* we re-use these RELATION fields, too: */
READ_OID_FIELD(relid);
+ READ_CHAR_FIELD(relkind);
READ_INT_FIELD(rellockmode);
READ_UINT_FIELD(perminfoindex);
break;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 62b3ec96cc..5f3ffd98af 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -527,6 +527,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->permInfos = glob->finalrteperminfos;
+ result->viewRelations = glob->viewRelations;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 5cc8366af6..f13240bf33 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "access/transam.h"
+#include "catalog/pg_class.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -604,6 +605,10 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, List *rteperminfos,
(newrte->rtekind == RTE_SUBQUERY && OidIsValid(newrte->relid)))
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
+ if (newrte->relkind == RELKIND_VIEW)
+ glob->viewRelations = lappend_int(glob->viewRelations,
+ list_length(glob->finalrtable));
+
/*
* Add a copy of the RTEPermissionInfo, if any, corresponding to this RTE
* to the flattened global list.
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 980dc1816f..1631c8b993 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1849,11 +1849,10 @@ ApplyRetrieveRule(Query *parsetree,
/*
* Clear fields that should not be set in a subquery RTE. Note that we
- * leave the relid, rellockmode, and perminfoindex fields set, so that the
- * view relation can be appropriately locked before execution and its
- * permissions checked.
+ * leave the relid, relkind, rellockmode, and perminfoindex fields set,
+ * so that the view relation can be appropriately locked before execution
+ * and its permissions checked.
*/
- rte->relkind = 0;
rte->tablesample = NULL;
rte->inh = false; /* must not be set for a subquery */
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a672..c807e9cdcc 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -27,6 +27,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
@@ -364,6 +365,50 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
return false;
}
+/*
+ * CheckRelLockedByMe
+ *
+ * Returns true if current transaction holds a lock on the given relation of
+ * mode 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
+ */
+bool
+CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+ Oid dbId = get_rel_relisshared(relid) ? InvalidOid : MyDatabaseId;
+ LOCKTAG tag;
+
+ SET_LOCKTAG_RELATION(tag, dbId, relid);
+
+ if (LockHeldByMe(&tag, lockmode))
+ return true;
+
+ if (orstronger)
+ {
+ LOCKMODE slockmode;
+
+ for (slockmode = lockmode + 1;
+ slockmode <= MaxLockMode;
+ slockmode++)
+ {
+ if (LockHeldByMe(&tag, slockmode))
+ {
+#ifdef NOT_USED
+ /* Sometimes this might be useful for debugging purposes */
+ elog(WARNING, "lock mode %s substituted for %s on relation %s",
+ GetLockmodeName(tag.locktag_lockmethodid, slockmode),
+ GetLockmodeName(tag.locktag_lockmethodid, lockmode),
+ RelationGetRelationName(relation));
+#endif
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
/*
* LockHasWaitersRelation
*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index cab709b07b..6d0ea07801 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1199,6 +1199,7 @@ exec_simple_query(const char *query_string)
* Start the portal. No parameters here.
*/
PortalStart(portal, NULL, 0, InvalidSnapshot);
+ Assert(portal->plan_valid);
/*
* Select the appropriate output format: text unless we are doing a
@@ -1703,6 +1704,7 @@ exec_bind_message(StringInfo input_message)
"commands ignored until end of transaction block"),
errdetail_abort()));
+replan:
/*
* Create the portal. Allow silent replacement of an existing portal only
* if the unnamed portal is specified.
@@ -1994,10 +1996,19 @@ exec_bind_message(StringInfo input_message)
PopActiveSnapshot();
/*
- * And we're ready to start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if portal->plan_valid is false which tells that the cached
+ * plan was found to have been invalidated when initializing one of the
+ * plan trees contained in it.
*/
PortalStart(portal, params, 0, InvalidSnapshot);
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/*
* Apply the result format requests to the portal.
*/
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5f0248acc5..c93a950d7f 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -35,12 +36,6 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc);
static void FillPortalStore(Portal portal, bool isTopLevel);
static uint64 RunFromStore(Portal portal, ScanDirection direction, uint64 count,
DestReceiver *dest);
@@ -65,6 +60,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +73,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan, if plan is from one */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -116,86 +113,6 @@ FreeQueryDesc(QueryDesc *qdesc)
}
-/*
- * ProcessQuery
- * Execute a single plannable query within a PORTAL_MULTI_QUERY,
- * PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
- *
- * plan: the plan tree for the query
- * sourceText: the source text of the query
- * params: any parameters needed
- * dest: where to send results
- * qc: where to store the command completion status data.
- *
- * qc may be NULL if caller doesn't want a status string.
- *
- * Must be called in a memory context that will be reset or deleted on
- * error; otherwise the executor's memory usage will be leaked.
- */
-static void
-ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc)
-{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
-
- /*
- * Run the plan to completion.
- */
- ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
-
- /*
- * Build command completion status data, if caller wants one.
- */
- if (qc)
- {
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
- SetQueryCompletion(qc, CMDTAG_SELECT, queryDesc->estate->es_processed);
- break;
- case CMD_INSERT:
- SetQueryCompletion(qc, CMDTAG_INSERT, queryDesc->estate->es_processed);
- break;
- case CMD_UPDATE:
- SetQueryCompletion(qc, CMDTAG_UPDATE, queryDesc->estate->es_processed);
- break;
- case CMD_DELETE:
- SetQueryCompletion(qc, CMDTAG_DELETE, queryDesc->estate->es_processed);
- break;
- case CMD_MERGE:
- SetQueryCompletion(qc, CMDTAG_MERGE, queryDesc->estate->es_processed);
- break;
- default:
- SetQueryCompletion(qc, CMDTAG_UNKNOWN, queryDesc->estate->es_processed);
- break;
- }
- }
-
- /*
- * Now, we close down all the scans and free allocated resources.
- */
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
-
- FreeQueryDesc(queryDesc);
-}
-
/*
* ChoosePortalStrategy
* Select portal execution strategy given the intended statement list.
@@ -427,7 +344,8 @@ FetchStatementTargetList(Node *stmt)
* to be used for cursors).
*
* On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * tupdesc (if any) is known, unless portal->plan_valid is set to false, in
+ * which case, the caller must retry after generating a new CachedPlan.
*/
void
PortalStart(Portal portal, ParamListInfo params,
@@ -435,7 +353,6 @@ PortalStart(Portal portal, ParamListInfo params,
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
int myeflags;
@@ -448,15 +365,13 @@ PortalStart(Portal portal, ParamListInfo params,
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -472,6 +387,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -493,6 +410,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -501,30 +419,56 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti(). */
+ if (portal->strategy == PORTAL_ONE_RETURNING ||
+ portal->strategy == PORTAL_ONE_MOD_WITH)
+ portal->qdescs = list_make1(queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
+ /* Take locks if using a CachedPlan */
+ if (queryDesc->cplan)
+ myeflags |= EXEC_FLAG_GET_LOCKS;
+
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Call ExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated as we're doing that.
*/
ExecutorStart(queryDesc, myeflags);
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ PopActiveSnapshot();
+ portal->plan_valid = false;
+ goto early_exit;
+ }
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -532,33 +476,11 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -578,11 +500,90 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool first = true;
+
+ /* Take locks if using a CachedPlan */
+ myeflags = 0;
+ if (portal->cplan)
+ myeflags |= EXEC_FLAG_GET_LOCKS;
+
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /*
+ * Push the snapshot to be used by the executor.
+ */
+ if (!is_utility)
+ {
+ /*
+ * Must copy the snapshot if we'll need to update
+ * its command ID.
+ */
+ if (!first)
+ PushCopiedSnapshot(GetTransactionSnapshot());
+ else
+ PushActiveSnapshot(GetTransactionSnapshot());
+ }
+
+ /*
+ * From the 2nd statement onwards, update the command
+ * ID and the snapshot to match.
+ */
+ if (!first)
+ {
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+ }
+
+ first = false;
+
+ /*
+ * Create the QueryDesc object. DestReceiver will
+ * be set in PortalRunMulti().
+ */
+ queryDesc = CreateQueryDesc(plan, portal->cplan,
+ portal->sourceText,
+ !is_utility ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ if (is_utility)
+ continue;
+
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated as
+ * we're doing that.
+ */
+ ExecutorStart(queryDesc, myeflags);
+ PopActiveSnapshot();
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ portal->plan_valid = false;
+ goto early_exit;
+ }
+ }
+ }
+
portal->tupDesc = NULL;
+ portal->plan_valid = true;
break;
}
}
@@ -594,19 +595,18 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+early_exit:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
-
- portal->status = PORTAL_READY;
}
/*
@@ -1193,7 +1193,7 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1214,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = (QueryDesc *) lfirst(qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1271,23 +1272,38 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Run the plan to completion.
+ */
+ qdesc->dest = dest;
+ ExecutorRun(qdesc, ForwardScanDirection, 0L, true);
+
+ /*
+ * Build command completion status data if needed.
+ */
if (pstmt->canSetTag)
{
- /* statement can set tag string */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- dest, qc);
- }
- else
- {
- /* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- altdest, NULL);
+ switch (qdesc->operation)
+ {
+ case CMD_SELECT:
+ SetQueryCompletion(qc, CMDTAG_SELECT, qdesc->estate->es_processed);
+ break;
+ case CMD_INSERT:
+ SetQueryCompletion(qc, CMDTAG_INSERT, qdesc->estate->es_processed);
+ break;
+ case CMD_UPDATE:
+ SetQueryCompletion(qc, CMDTAG_UPDATE, qdesc->estate->es_processed);
+ break;
+ case CMD_DELETE:
+ SetQueryCompletion(qc, CMDTAG_DELETE, qdesc->estate->es_processed);
+ break;
+ case CMD_MERGE:
+ SetQueryCompletion(qc, CMDTAG_MERGE, qdesc->estate->es_processed);
+ break;
+ default:
+ SetQueryCompletion(qc, CMDTAG_UNKNOWN, qdesc->estate->es_processed);
+ break;
+ }
}
if (log_executor_stats)
@@ -1346,8 +1362,15 @@ PortalRunMulti(Portal portal,
* Increment command counter between queries, but not after the last
* one.
*/
- if (lnext(portal->stmts, stmtlist_item) != NULL)
+ if (lnext(portal->qdescs, qdesc_item) != NULL)
CommandCounterIncrement();
+
+ if (qdesc->estate)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ }
+ FreeQueryDesc(qdesc);
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index c7607895cd..014cd476f4 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2073,6 +2073,27 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * get_rel_relisshared
+ *
+ * Returns if the given relation is shared or not
+ */
+bool
+get_rel_relisshared(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relisshared;
+ ReleaseSysCache(tp);
+
+ return result;
+}
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 77c2ba3f8f..4e455d815f 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -100,13 +100,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -787,9 +787,6 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
- *
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -803,60 +800,56 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
-
- return false;
+ return cplan->is_valid;
}
/*
@@ -1126,9 +1119,6 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
- *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1360,8 +1350,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if the executor would need to take additional locks, that is, in
+ * addition to those taken by AcquirePlannerLocks() on a given query.
*/
foreach(lc, plan->stmt_list)
{
@@ -1735,58 +1725,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 06dfa85f04..3ad80c7ecb 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -201,6 +201,10 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /* initialize portal's query context to store QueryDescs */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -224,6 +228,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -594,6 +599,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3d3e632a0c..392abb5150 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -88,7 +88,11 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt, struct CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv);
+extern void ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
@@ -104,6 +108,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index af2bf36dfb..c36c25b497 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -32,9 +32,12 @@
*/
typedef struct QueryDesc
{
+ NodeTag type;
+
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ struct CachedPlan *cplan; /* CachedPlan, if plannedstmt is from one */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -47,6 +50,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ bool plan_valid; /* is planstate tree fully valid? */
/* This field is set by ExecutorRun */
bool already_executed; /* true if previously executed */
@@ -57,6 +61,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ struct CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index f9e6bf3d4a..a6ac772400 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -61,6 +62,10 @@
* WITH_NO_DATA indicates that we are performing REFRESH MATERIALIZED VIEW
* ... WITH NO DATA. Currently, the only effect is to suppress errors about
* scanning unpopulated materialized views.
+ *
+ * GET_LOCKS indicates that the caller of ExecutorStart() is executing a
+ * cached plan which must be validated by taking the remaining locks necessary
+ * for execution.
*/
#define EXEC_FLAG_EXPLAIN_ONLY 0x0001 /* EXPLAIN, no ANALYZE */
#define EXEC_FLAG_EXPLAIN_GENERIC 0x0002 /* EXPLAIN (GENERIC_PLAN) */
@@ -69,6 +74,8 @@
#define EXEC_FLAG_MARK 0x0010 /* need mark/restore */
#define EXEC_FLAG_SKIP_TRIGGERS 0x0020 /* skip AfterTrigger setup */
#define EXEC_FLAG_WITH_NO_DATA 0x0040 /* REFRESH ... WITH NO DATA */
+#define EXEC_FLAG_GET_LOCKS 0x0400 /* should the executor lock
+ * relations? */
/* Hook for plugins to get control in ExecutorStart() */
@@ -255,6 +262,13 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/* Is the cached plan*/
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanStillValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
@@ -589,6 +603,8 @@ exec_rt_fetch(Index rti, EState *estate)
}
extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
+extern void ExecLockViewRelations(List *viewRelations, EState *estate);
+extern void ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids);
extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Index rti);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index d97f5a8e7d..dfa72848c7 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -623,6 +623,8 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct CachedPlan *es_cachedplan; /* CachedPlan if plannedstmt is from
+ * one */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
const char *es_sourceText; /* Source text from QueryDesc */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d61a62da19..9b888b0d75 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -116,6 +116,9 @@ typedef struct PlannerGlobal
/* "flat" list of RTEPermissionInfos */
List *finalrteperminfos;
+ /* "flat" list of integer RT indexes */
+ List *viewRelations;
+
/* "flat" list of PlanRowMarks */
List *finalrowmarks;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a0bb16cff4..7cae624bbd 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -78,6 +78,9 @@ typedef struct PlannedStmt
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
+ List *viewRelations; /* integer list of RT indexes, or NIL if no
+ * views are queried */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 4ee91e3cf9..598bf2688a 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,7 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
bool orstronger);
+extern bool CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger);
extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 4f5418b972..3074e604dd 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -139,6 +139,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_relisshared(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a443181d41..8990fe72e3 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -221,6 +221,20 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Called by the executor on every relation lock taken when initializing the
+ * plan tree in the CachedPlan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aa08b1e0fc..332a08ccb4 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,9 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ bool plan_valid; /* are plan(s) ready for execution? */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +245,7 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalQueryFinish(QueryDesc *queryDesc);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..2fca84d027 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-replan
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7cd76eb34b..5d7a3e9858 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2023, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,41 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* planner_hook function to provide the desired delay */
+static void
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ queryDesc->cplan->is_valid ? "valid" : "not valid");
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +123,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-replan.out b/src/test/modules/delay_execution/expected/cached-plan-replan.out
new file mode 100644
index 0000000000..4f450b9d9b
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-replan.out
@@ -0,0 +1,117 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo11 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = $1)
+(6 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-----------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo11 foo_1
+ Filter: (a = $1)
+(4 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q2 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+----------------------------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Index Only Scan using foo11_a_idx on foo11 t1
+ -> Materialize
+ -> Index Scan using foo11_a_idx on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(18 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Sort
+ Sort Key: t1.a
+ -> Seq Scan on foo11 t1
+ -> Sort
+ Sort Key: t2.a
+ -> Seq Scan on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(21 rows)
+
diff --git a/src/test/modules/delay_execution/specs/cached-plan-replan.spec b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
new file mode 100644
index 0000000000..67cfed7044
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
@@ -0,0 +1,50 @@
+# Test to check that invalidation of a cached plan during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo1 PARTITION OF foo FOR VALUES IN (1) PARTITION BY LIST (a);
+ CREATE TABLE foo11 PARTITION OF foo1 FOR VALUES IN (1);
+ CREATE INDEX foo11_a ON foo1 (a);
+ CREATE TABLE foo2 PARTITION OF foo FOR VALUES IN (2);
+ CREATE VIEW foov AS SELECT * FROM foo;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP TABLE foo;
+}
+
+session "s1"
+# Creates a prepared statement and forces creation of a generic plan
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q2 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo11_a; }
+
+# While "s1exec" waits to acquire the advisory lock, "s2drop" is able to drop
+# the index being used in the cached plan for `q`, so when "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
--
2.35.3
On Tue, Mar 14, 2023 at 7:07 PM Amit Langote <amitlangote09@gmail.com> wrote:
* I decided to initialize QueryDesc.planstate even in the cases where
ExecInitNode() traversal is aborted in the middle on detecting
CachedPlan invalidation such that it points to a partially initialized
PlanState tree. My earlier thinking that each PlanState node need not
be visited for resource cleanup in such cases was naive after all. To
that end, I've fixed the ExecEndNode() subroutines of all Plan node
types to account for potentially uninitialized fields. There are a
couple of cases where I'm a bit doubtful though. In
ExecEndCustomScan(), there's no indication in CustomScanState whether
it's OK to call EndCustomScan() when BeginCustomScan() may not have
been called. For ForeignScanState, I've assumed that
ForeignScanState.fdw_state being set can be used as a marker that
BeginForeignScan would have been called, though maybe that's not a
solid assumption.
It seems I hadn't noted in the ExecEndNode()'s comment that all node
types' recursive subroutines need to handle the change made by this
patch that the corresponding ExecInitNode() subroutine may now return
early without having initialized all state struct fields.
Also noted in the documentation for CustomScan and ForeignScan that
the Begin*Scan callback may not have been called at all, so the
End*Scan should handle that gracefully.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v38-0001-Add-field-to-store-partitioned-relids-to-Append-.patchapplication/octet-stream; name=v38-0001-Add-field-to-store-partitioned-relids-to-Append-.patchDownload
From dfc41510ef3ebec38e7a56b639ffa41193109b43 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 9 Mar 2023 11:26:06 +0900
Subject: [PATCH v38 1/3] Add field to store partitioned relids to
Append/MergeAppend
A future commit would like to move the timing of locking relations
referenced in a cached plan to ExecInitNode() traversal of the plan
tree from the current loop-over-rangetable in AcquireExecutorLocks().
Given that partitioned tables (their RT indexes) would not be
accessible via the new way of finding the relations to lock, add a
field to Append/MergeAppend to track them separately.
This refactors the code to look up partitioned parent relids from a
given list of leaf partition subpaths of an Append/MergeAppend out
of make_partition_pruneinfo() into its own function called
add_append_subpath_partrelids(). Though, the code needs to be
generalized to the cases where child rels can be joinrels or
upper (grouping) rels. Also, to make it easier to traverse the parent
chain of a child grouping rel, this makes its RelOptInfo.parent to be
set, which is already done for baserels and joinrels.
---
src/backend/optimizer/plan/createplan.c | 36 +++++--
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/util/appendinfo.c | 134 ++++++++++++++++++++++++
src/backend/partitioning/partprune.c | 124 +++-------------------
src/include/nodes/plannodes.h | 14 +++
src/include/optimizer/appendinfo.h | 3 +
src/include/partitioning/partprune.h | 3 +-
7 files changed, 194 insertions(+), 123 deletions(-)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 910ffbf1e1..794cdb5e3b 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -25,6 +25,7 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
@@ -1209,6 +1210,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
Oid *nodeCollations = NULL;
bool *nodeNullsFirst = NULL;
bool consider_async = false;
+ List *allpartrelids = NIL;
/*
* The subpaths list could be empty, if every child was proven empty by
@@ -1350,18 +1352,24 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
++nasyncplans;
}
+ /* Populate partitioned parent relids. */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ plan->allpartrelids = allpartrelids;
+
/* Set below if we find quals that we can use to run-time prune */
plan->part_prune_index = -1;
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1381,7 +1389,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
if (prunequal != NIL)
plan->part_prune_index = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
plan->appendplans = subplans;
@@ -1425,6 +1434,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
+ List *allpartrelids = NIL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1514,18 +1524,23 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplan = (Plan *) sort;
}
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ node->allpartrelids = allpartrelids;
+
/* Set below if we find quals that we can use to run-time prune */
node->part_prune_index = -1;
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1537,7 +1552,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
node->part_prune_index = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
node->mergeplans = subplans;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a1873ce26d..62b3ec96cc 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7801,8 +7801,11 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
agg_costs, gd, &child_extra,
&child_partially_grouped_rel);
+ /* Mark as child of grouped_rel. */
+ child_grouped_rel->parent = grouped_rel;
if (child_partially_grouped_rel)
{
+ child_partially_grouped_rel->parent = grouped_rel;
partially_grouped_live_children =
lappend(partially_grouped_live_children,
child_partially_grouped_rel);
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 9d377385f1..4876742ab2 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -40,6 +40,7 @@ static void make_inh_translation_list(Relation oldrelation,
AppendRelInfo *appinfo);
static Node *adjust_appendrel_attrs_mutator(Node *node,
adjust_appendrel_attrs_context *context);
+static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
/*
@@ -1031,3 +1032,136 @@ distribute_row_identity_vars(PlannerInfo *root)
}
}
}
+
+/*
+ * add_append_subpath_partrelids
+ * Look up a child subpath's rel's partitioned parent relids up to
+ * parentrel and add the bitmapset containing those into
+ * 'allpartrelids'
+ */
+List *
+add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids)
+{
+ RelOptInfo *prel = subpath->parent;
+ Relids partrelids = NULL;
+
+ /* Nothing to do if there's no parent to begin with. */
+ if (!IS_OTHER_REL(prel))
+ return allpartrelids;
+
+ /*
+ * Traverse up to the pathrel's topmost partitioned parent, collecting
+ * parent relids as we go; but stop if we reach parentrel. (Normally, a
+ * pathrel's topmost partitioned parent is either parentrel or a UNION ALL
+ * appendrel child of parentrel. But when handling partitionwise joins of
+ * multi-level partitioning trees, we can see an append path whose
+ * parentrel is an intermediate partitioned table.)
+ */
+ do
+ {
+ Relids parent_relids = NULL;
+
+ /*
+ * For simple child rels, we can simply get the parent relid from
+ * prel->parent. But for partitionwise join and aggregate child rels,
+ * while we can use prel->parent to move up the tree, parent relids to
+ * add into 'partrelids' must be found the hard way through the
+ * AppendInfoInfos, because 1) a joinrel's relids may point to RTE_JOIN
+ * entries, 2) topmost parent grouping rel's relids field is left NULL.
+ */
+ if (IS_SIMPLE_REL(prel))
+ {
+ prel = prel->parent;
+ /* Stop once we reach the root partitioned rel. */
+ if (!IS_PARTITIONED_REL(prel))
+ break;
+ parent_relids = bms_add_members(parent_relids, prel->relids);
+ }
+ else
+ {
+ AppendRelInfo **appinfos;
+ int nappinfos,
+ i;
+
+ appinfos = find_appinfos_by_relids(root, prel->relids,
+ &nappinfos);
+ for (i = 0; i < nappinfos; i++)
+ {
+ AppendRelInfo *appinfo = appinfos[i];
+
+ parent_relids = bms_add_member(parent_relids,
+ appinfo->parent_relid);
+ }
+ pfree(appinfos);
+ prel = prel->parent;
+ }
+ /* accept this level as an interesting parent */
+ partrelids = bms_add_members(partrelids, parent_relids);
+ if (prel == parentrel)
+ break; /* don't traverse above parentrel */
+ } while (IS_OTHER_REL(prel));
+
+ if (partrelids == NULL)
+ return allpartrelids;
+
+ return add_part_relids(allpartrelids, partrelids);
+}
+
+/*
+ * add_part_relids
+ * Add new info to a list of Bitmapsets of partitioned relids.
+ *
+ * Within 'allpartrelids', there is one Bitmapset for each topmost parent
+ * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
+ * parent as well as its relevant non-leaf child partitions. Since (by
+ * construction of the rangetable list) parent partitions must have lower
+ * RT indexes than their children, we can distinguish the topmost parent
+ * as being the lowest set bit in the Bitmapset.
+ *
+ * 'partrelids' contains the RT indexes of a parent partitioned rel, and
+ * possibly some non-leaf children, that are newly identified as parents of
+ * some subpath rel passed to make_partition_pruneinfo(). These are added
+ * to an appropriate member of 'allpartrelids'.
+ *
+ * Note that the list contains only RT indexes of partitioned tables that
+ * are parents of some scan-level relation appearing in the 'subpaths' that
+ * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
+ * not allowed to be higher than the 'parentrel' associated with the append
+ * path. In this way, we avoid expending cycles on partitioned rels that
+ * can't contribute useful pruning information for the problem at hand.
+ * (It is possible for 'parentrel' to be a child partitioned table, and it
+ * is also possible for scan-level relations to be child partitioned tables
+ * rather than leaf partitions. Hence we must construct this relation set
+ * with reference to the particular append path we're dealing with, rather
+ * than looking at the full partitioning structure represented in the
+ * RelOptInfos.)
+ */
+static List *
+add_part_relids(List *allpartrelids, Bitmapset *partrelids)
+{
+ Index targetpart;
+ ListCell *lc;
+
+ /* We can easily get the lowest set bit this way: */
+ targetpart = bms_next_member(partrelids, -1);
+ Assert(targetpart > 0);
+
+ /* Look for a matching topmost parent */
+ foreach(lc, allpartrelids)
+ {
+ Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
+ Index currtarget = bms_next_member(currpartrelids, -1);
+
+ if (targetpart == currtarget)
+ {
+ /* Found a match, so add any new RT indexes to this hierarchy */
+ currpartrelids = bms_add_members(currpartrelids, partrelids);
+ lfirst(lc) = currpartrelids;
+ return allpartrelids;
+ }
+ }
+ /* No match, so add the new partition hierarchy to the list */
+ return lappend(allpartrelids, partrelids);
+}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 510145e3c0..3557e07082 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -138,7 +138,6 @@ typedef struct PruneStepResult
} PruneStepResult;
-static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
RelOptInfo *parentrel,
List *prunequal,
@@ -221,33 +220,32 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'allpartrelids' contains Bitmapsets of RT indexes of partitioned parents
+ * whose partitions' Paths are in 'subpaths'; there's one Bitmapset for every
+ * partition tree involved.
*/
int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ List *allpartrelids)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
- List *allpartrelids;
List *prunerelinfos;
int *relid_subplan_map;
ListCell *lc;
int i;
+ Assert(list_length(allpartrelids) > 0);
+
/*
- * Scan the subpaths to see which ones are scans of partition child
- * relations, and identify their parent partitioned rels. (Note: we must
- * restrict the parent partitioned rels to be parentrel or children of
- * parentrel, otherwise we couldn't translate prunequal to match.)
- *
- * Also construct a temporary array to map from partition-child-relation
- * relid to the index in 'subpaths' of the scan plan for that partition.
+ * Construct a temporary array to map from partition-child-relation relid
+ * to the index in 'subpaths' of the scan plan for that partition.
* (Use of "subplan" rather than "subpath" is a bit of a misnomer, but
* we'll let it stand.) For convenience, we use 1-based indexes here, so
* that zero can represent an un-filled array entry.
*/
- allpartrelids = NIL;
relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
i = 1;
@@ -256,50 +254,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Path *path = (Path *) lfirst(lc);
RelOptInfo *pathrel = path->parent;
- /* We don't consider partitioned joins here */
- if (pathrel->reloptkind == RELOPT_OTHER_MEMBER_REL)
- {
- RelOptInfo *prel = pathrel;
- Bitmapset *partrelids = NULL;
-
- /*
- * Traverse up to the pathrel's topmost partitioned parent,
- * collecting parent relids as we go; but stop if we reach
- * parentrel. (Normally, a pathrel's topmost partitioned parent
- * is either parentrel or a UNION ALL appendrel child of
- * parentrel. But when handling partitionwise joins of
- * multi-level partitioning trees, we can see an append path whose
- * parentrel is an intermediate partitioned table.)
- */
- do
- {
- AppendRelInfo *appinfo;
-
- Assert(prel->relid < root->simple_rel_array_size);
- appinfo = root->append_rel_array[prel->relid];
- prel = find_base_rel(root, appinfo->parent_relid);
- if (!IS_PARTITIONED_REL(prel))
- break; /* reached a non-partitioned parent */
- /* accept this level as an interesting parent */
- partrelids = bms_add_member(partrelids, prel->relid);
- if (prel == parentrel)
- break; /* don't traverse above parentrel */
- } while (prel->reloptkind == RELOPT_OTHER_MEMBER_REL);
-
- if (partrelids)
- {
- /*
- * Found some relevant parent partitions, which may or may not
- * overlap with partition trees we already found. Add new
- * information to the allpartrelids list.
- */
- allpartrelids = add_part_relids(allpartrelids, partrelids);
- /* Also record the subplan in relid_subplan_map[] */
- /* No duplicates please */
- Assert(relid_subplan_map[pathrel->relid] == 0);
- relid_subplan_map[pathrel->relid] = i;
- }
- }
+ /* No duplicates please */
+ Assert(relid_subplan_map[pathrel->relid] == 0);
+ relid_subplan_map[pathrel->relid] = i;
i++;
}
@@ -368,63 +325,6 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return list_length(root->partPruneInfos) - 1;
}
-/*
- * add_part_relids
- * Add new info to a list of Bitmapsets of partitioned relids.
- *
- * Within 'allpartrelids', there is one Bitmapset for each topmost parent
- * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
- * parent as well as its relevant non-leaf child partitions. Since (by
- * construction of the rangetable list) parent partitions must have lower
- * RT indexes than their children, we can distinguish the topmost parent
- * as being the lowest set bit in the Bitmapset.
- *
- * 'partrelids' contains the RT indexes of a parent partitioned rel, and
- * possibly some non-leaf children, that are newly identified as parents of
- * some subpath rel passed to make_partition_pruneinfo(). These are added
- * to an appropriate member of 'allpartrelids'.
- *
- * Note that the list contains only RT indexes of partitioned tables that
- * are parents of some scan-level relation appearing in the 'subpaths' that
- * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
- * not allowed to be higher than the 'parentrel' associated with the append
- * path. In this way, we avoid expending cycles on partitioned rels that
- * can't contribute useful pruning information for the problem at hand.
- * (It is possible for 'parentrel' to be a child partitioned table, and it
- * is also possible for scan-level relations to be child partitioned tables
- * rather than leaf partitions. Hence we must construct this relation set
- * with reference to the particular append path we're dealing with, rather
- * than looking at the full partitioning structure represented in the
- * RelOptInfos.)
- */
-static List *
-add_part_relids(List *allpartrelids, Bitmapset *partrelids)
-{
- Index targetpart;
- ListCell *lc;
-
- /* We can easily get the lowest set bit this way: */
- targetpart = bms_next_member(partrelids, -1);
- Assert(targetpart > 0);
-
- /* Look for a matching topmost parent */
- foreach(lc, allpartrelids)
- {
- Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
- Index currtarget = bms_next_member(currpartrelids, -1);
-
- if (targetpart == currtarget)
- {
- /* Found a match, so add any new RT indexes to this hierarchy */
- currpartrelids = bms_add_members(currpartrelids, partrelids);
- lfirst(lc) = currpartrelids;
- return allpartrelids;
- }
- }
- /* No match, so add the new partition hierarchy to the list */
- return lappend(allpartrelids, partrelids);
-}
-
/*
* make_partitionedrel_pruneinfo
* Build a List of PartitionedRelPruneInfos, one for each interesting
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 659bd05c0c..a0bb16cff4 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -270,6 +270,13 @@ typedef struct Append
List *appendplans;
int nasyncplans; /* # of asynchronous plans */
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * appendplans. The list contains a bitmapset for every partition tree
+ * covered by this Append.
+ */
+ List *allpartrelids;
+
/*
* All 'appendplans' preceding this index are non-partial plans. All
* 'appendplans' from this index onwards are partial plans.
@@ -294,6 +301,13 @@ typedef struct MergeAppend
List *mergeplans;
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * mergeplans. The list contains a bitmapset for every partition tree
+ * covered by this MergeAppend.
+ */
+ List *allpartrelids;
+
/* these fields are just like the sort-key info in struct Sort: */
/* number of sort-key columns */
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index a05f91f77d..1621a7319a 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -46,5 +46,8 @@ extern void add_row_identity_columns(PlannerInfo *root, Index rtindex,
RangeTblEntry *target_rte,
Relation target_relation);
extern void distribute_row_identity_vars(PlannerInfo *root);
+extern List *add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids);
#endif /* APPENDINFO_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index c0d6889d47..2d907d31d4 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -73,7 +73,8 @@ typedef struct PartitionPruneContext
extern int make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ List *allpartrelids);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v38-0003-Track-opened-range-table-relations-in-a-List-in-.patchapplication/octet-stream; name=v38-0003-Track-opened-range-table-relations-in-a-List-in-.patchDownload
From 7859c3ee10dbe81606241478ef085aeb7d45a95d Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Mon, 13 Mar 2023 15:59:38 +0900
Subject: [PATCH v38 3/3] Track opened range table relations in a List in
EState
This makes ExecCloseRangeTableRelations faster when there are many
relations in the range table but only a few are opened during
execution, such as when run-time pruning kicks in on an Append
containing 1000s of partition subplans.
---
src/backend/executor/execMain.c | 9 +++++----
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 2 ++
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0366be9fd6..94f8324cff 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1630,12 +1630,13 @@ ExecCloseResultRelations(EState *estate)
void
ExecCloseRangeTableRelations(EState *estate)
{
- int i;
+ ListCell *lc;
- for (i = 0; i < estate->es_range_table_size; i++)
+ foreach(lc, estate->es_opened_relations)
{
- if (estate->es_relations[i])
- table_close(estate->es_relations[i], NoLock);
+ Relation rel = lfirst(lc);
+
+ table_close(rel, NoLock);
}
}
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index a485e7dfc5..f7053072d9 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -829,6 +829,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
}
estate->es_relations[rti - 1] = rel;
+ estate->es_opened_relations = lappend(estate->es_opened_relations,
+ rel);
}
return rel;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index dfa72848c7..984fd2e423 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,8 @@ typedef struct EState
Index es_range_table_size; /* size of the range table arrays */
Relation *es_relations; /* Array of per-range-table-entry Relation
* pointers, or NULL if not yet opened */
+ List *es_opened_relations; /* List of non-NULL entries in
+ * es_relations in no specific order */
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
--
2.35.3
v38-0002-Move-AcquireExecutorLocks-s-responsibility-into-.patchapplication/octet-stream; name=v38-0002-Move-AcquireExecutorLocks-s-responsibility-into-.patchDownload
From 3476743ef207ba23f0c366aba509c439a4cdf559 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 20 Jan 2023 16:52:31 +0900
Subject: [PATCH v38 2/3] Move AcquireExecutorLocks()'s responsibility into the
executor
This commit introduces a new executor flag EXEC_FLAG_GET_LOCKS that
should be passed in eflags to ExecutorStart() if the PlannedStmt
comes from a CachedPlan. When set, the executor will take locks
on any relations referenced in the plan nodes that need to be
initialized for execution. That excludes any partitions that can
be pruned during the executor initialization phase, that is, based
on the values of only the external (PARAM_EXTERN) parameters.
Relations that are not explicitly mentioned in the plan tree, such
as views and non-leaf partition parents whose children are mentioned
in Append/MergeAppend nodes, are locked separately. After taking each
lock, the executor calls CachedPlanStillValid() to check if
CachedPlan.is_valid has been reset by PlanCacheRelCallback() due to
concurrent modification of relations referenced in the plan. If it
is found that the CachedPlan is indeed invalid, the recursive
ExecInitNode() traversal is aborted at that point. To allow the
proper cleanup of such a partially initialized planstate tree,
ExecEndNode() subroutines of various plan nodes have been fixed to
account for potentially uninitialized fields. It is the caller's
(of ExecutorStart()) responsibility to call ExecutorEnd() even on
a QueryDesc containing such a partially initialized PlanState tree.
Call sites that use plancache (GetCachedPlan) to get the plan trees
to pass to the executor for execution should now be prepared to
handle the case that the plan tree may be flagged by the executor as
stale as described above. To that end, this commit refactors the
relevant code sites to move the ExecutorStart() call closer to the
GetCachedPlan() call to reduce the friction in the cases where
replanning is needed due to a CachedPlan being marked stale in this
manner. Callers must check that QueryDesc.plan_valid is true before
passing it on to ExecutorRun() for execution.
PortalStart() now performs CreateQueryDesc() and ExecutorStart() for
all portal strategies, including those pertaining to multiple queries.
The QueryDescs for strategies handled by PortalRunMulti() are
remembered in the Portal in a new List field 'qdescs', allocated in a
new memory context 'queryContext'. This new arrangment is to make it
easier to discard and recreate a Portal if the CachedPlan goes stale
during setup.
---
contrib/postgres_fdw/postgres_fdw.c | 4 +
doc/src/sgml/custom-scan.sgml | 4 +-
doc/src/sgml/fdwhandler.sgml | 4 +-
src/backend/commands/copyto.c | 4 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 150 ++++++---
src/backend/commands/extension.c | 2 +
src/backend/commands/matview.c | 3 +-
src/backend/commands/portalcmds.c | 16 +-
src/backend/commands/prepare.c | 32 +-
src/backend/executor/execMain.c | 89 ++++-
src/backend/executor/execParallel.c | 8 +-
src/backend/executor/execPartition.c | 15 +
src/backend/executor/execProcnode.c | 16 +
src/backend/executor/execUtils.c | 60 +++-
src/backend/executor/functions.c | 2 +
src/backend/executor/nodeAgg.c | 23 +-
src/backend/executor/nodeAppend.c | 23 +-
src/backend/executor/nodeBitmapAnd.c | 10 +-
src/backend/executor/nodeBitmapHeapscan.c | 10 +-
src/backend/executor/nodeBitmapIndexscan.c | 2 +
src/backend/executor/nodeBitmapOr.c | 10 +-
src/backend/executor/nodeCtescan.c | 6 +-
src/backend/executor/nodeCustom.c | 13 +-
src/backend/executor/nodeForeignscan.c | 28 +-
src/backend/executor/nodeFunctionscan.c | 3 +-
src/backend/executor/nodeGather.c | 2 +
src/backend/executor/nodeGatherMerge.c | 2 +
src/backend/executor/nodeGroup.c | 5 +-
src/backend/executor/nodeHash.c | 2 +
src/backend/executor/nodeHashjoin.c | 13 +-
src/backend/executor/nodeIncrementalSort.c | 14 +-
src/backend/executor/nodeIndexonlyscan.c | 7 +-
src/backend/executor/nodeIndexscan.c | 7 +-
src/backend/executor/nodeLimit.c | 2 +
src/backend/executor/nodeLockRows.c | 2 +
src/backend/executor/nodeMaterial.c | 5 +-
src/backend/executor/nodeMemoize.c | 12 +-
src/backend/executor/nodeMergeAppend.c | 23 +-
src/backend/executor/nodeMergejoin.c | 10 +-
src/backend/executor/nodeModifyTable.c | 13 +-
.../executor/nodeNamedtuplestorescan.c | 3 +-
src/backend/executor/nodeNestloop.c | 7 +-
src/backend/executor/nodeProjectSet.c | 5 +-
src/backend/executor/nodeRecursiveunion.c | 4 +
src/backend/executor/nodeResult.c | 5 +-
src/backend/executor/nodeSamplescan.c | 5 +-
src/backend/executor/nodeSeqscan.c | 5 +-
src/backend/executor/nodeSetOp.c | 5 +-
src/backend/executor/nodeSort.c | 8 +-
src/backend/executor/nodeSubqueryscan.c | 5 +-
src/backend/executor/nodeTableFuncscan.c | 3 +-
src/backend/executor/nodeTidrangescan.c | 5 +-
src/backend/executor/nodeTidscan.c | 5 +-
src/backend/executor/nodeUnique.c | 5 +-
src/backend/executor/nodeValuesscan.c | 3 +-
src/backend/executor/nodeWindowAgg.c | 55 +++-
src/backend/executor/nodeWorktablescan.c | 3 +-
src/backend/executor/spi.c | 53 ++-
src/backend/nodes/outfuncs.c | 1 +
src/backend/nodes/readfuncs.c | 1 +
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 5 +
src/backend/rewrite/rewriteHandler.c | 7 +-
src/backend/storage/lmgr/lmgr.c | 45 +++
src/backend/tcop/postgres.c | 13 +-
src/backend/tcop/pquery.c | 311 ++++++++++--------
src/backend/utils/cache/lsyscache.c | 21 ++
src/backend/utils/cache/plancache.c | 134 ++------
src/backend/utils/mmgr/portalmem.c | 6 +
src/include/commands/explain.h | 7 +-
src/include/executor/execdesc.h | 5 +
src/include/executor/executor.h | 16 +
src/include/nodes/execnodes.h | 2 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 3 +
src/include/storage/lmgr.h | 1 +
src/include/utils/lsyscache.h | 1 +
src/include/utils/plancache.h | 14 +
src/include/utils/portal.h | 4 +
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 63 +++-
.../expected/cached-plan-replan.out | 117 +++++++
.../specs/cached-plan-replan.spec | 50 +++
84 files changed, 1250 insertions(+), 426 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-replan.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-replan.spec
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index f5926ab89d..93f3f8b5d1 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2659,7 +2659,11 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
/* Get info about foreign table. */
rtindex = node->resultRelInfo->ri_RangeTableIndex;
if (fsplan->scan.scanrelid == 0)
+ {
dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (!ExecPlanStillValid(estate))
+ return;
+ }
else
dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
index 93d96f2f56..c755e2d681 100644
--- a/doc/src/sgml/custom-scan.sgml
+++ b/doc/src/sgml/custom-scan.sgml
@@ -275,7 +275,9 @@ void (*EndCustomScan) (CustomScanState *node);
</programlisting>
Clean up any private data associated with the <literal>CustomScanState</literal>.
This method is required, but it does not need to do anything if there is
- no associated data or it will be cleaned up automatically.
+ no associated data or it will be cleaned up automatically. Note that this
+ may be called even if the corresponding <function>BeginCustomScan</function>
+ was not called by <function>ExecInitCustomScan</function>.
</para>
<para>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index ac1717bc3c..a97dcd9054 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -289,7 +289,9 @@ EndForeignScan(ForeignScanState *node);
End the scan and release resources. It is normally not important
to release palloc'd memory, but for example open files and connections
- to remote servers should be cleaned up.
+ to remote servers should be cleaned up. Note that this may be called
+ even if the corresponding <function>BeginForeignScan</function> was
+ not called by <function>ExecInitForeignScan</function>.
</para>
</sect2>
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index beea1ac687..e9f77d5711 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,8 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL,
+ pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -569,6 +570,7 @@ BeginCopyTo(ParseState *pstate,
* ExecutorStart computes a result tupdesc for us
*/
ExecutorStart(cstate->queryDesc, 0);
+ Assert(cstate->queryDesc->plan_valid);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index d6c6d514f3..a55b851574 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 878d2fd172..826a47af0a 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -393,6 +393,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
else
{
PlannedStmt *plan;
+ QueryDesc *queryDesc;
instr_time planstart,
planduration;
BufferUsage bufusage_start,
@@ -415,12 +416,95 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
+ queryDesc = ExplainQueryDesc(plan, NULL, queryString, into, es,
+ params, queryEnv);
+ Assert(queryDesc);
+
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(queryDesc, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
+/*
+ * ExplainQueryDesc
+ * Set up QueryDesc for EXPLAINing a given plan
+ *
+ * This returns NULL if cplan is found to have been invalidated since its
+ * creation.
+ */
+QueryDesc *
+ExplainQueryDesc(PlannedStmt *stmt, CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv)
+{
+ QueryDesc *queryDesc;
+ DestReceiver *dest;
+ int eflags;
+ int instrument_option = 0;
+
+ /*
+ * Normally we discard the query's output, but if explaining CREATE TABLE
+ * AS, we'd better use the appropriate tuple receiver.
+ */
+ if (into)
+ dest = CreateIntoRelDestReceiver(into);
+ else
+ dest = None_Receiver;
+
+ if (es->analyze && es->timing)
+ instrument_option |= INSTRUMENT_TIMER;
+ else if (es->analyze)
+ instrument_option |= INSTRUMENT_ROWS;
+
+ if (es->buffers)
+ instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
+
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+
+ /* Create a QueryDesc for the query */
+ queryDesc = CreateQueryDesc(stmt, cplan, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv, instrument_option);
+
+ /* Select execution options */
+ if (es->analyze)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_EXPLAIN_ONLY;
+ if (es->generic)
+ eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
+ if (into)
+ eflags |= GetIntoRelEFlags(into);
+
+ /* Take locks if using a CachedPlan */
+ if (queryDesc->cplan)
+ eflags |= EXEC_FLAG_GET_LOCKS;
+
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated as we're doing that.
+ */
+ ExecutorStart(queryDesc, eflags);
+ if (!queryDesc->plan_valid)
+ {
+ /* Clean up. */
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ return NULL;
+ }
+
+ return queryDesc;
+}
+
/*
* ExplainOneUtility -
* print out the execution plan for one utility statement
@@ -524,29 +608,16 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
{
- DestReceiver *dest;
- QueryDesc *queryDesc;
instr_time starttime;
double totaltime = 0;
- int eflags;
- int instrument_option = 0;
- Assert(plannedstmt->commandType != CMD_UTILITY);
-
- if (es->analyze && es->timing)
- instrument_option |= INSTRUMENT_TIMER;
- else if (es->analyze)
- instrument_option |= INSTRUMENT_ROWS;
-
- if (es->buffers)
- instrument_option |= INSTRUMENT_BUFFERS;
- if (es->wal)
- instrument_option |= INSTRUMENT_WAL;
+ Assert(queryDesc->plannedstmt->commandType != CMD_UTILITY);
/*
* We always collect timing for the entire statement, even when node-level
@@ -555,40 +626,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
- /*
- * Normally we discard the query's output, but if explaining CREATE TABLE
- * AS, we'd better use the appropriate tuple receiver.
- */
- if (into)
- dest = CreateIntoRelDestReceiver(into);
- else
- dest = None_Receiver;
-
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
-
- /* Select execution options */
- if (es->analyze)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_EXPLAIN_ONLY;
- if (es->generic)
- eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
- if (into)
- eflags |= GetIntoRelEFlags(into);
-
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
-
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -4862,6 +4899,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 0eabe18335..5a76343123 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -797,11 +797,13 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
ExecutorStart(qdesc, 0);
+ Assert(qdesc->plan_valid);
ExecutorRun(qdesc, ForwardScanDirection, 0, true);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index c00b9df3e3..80f2c38b35 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -409,12 +409,13 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, 0);
+ Assert(queryDesc->plan_valid);
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 8a3cf98cce..3c34ab4351 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -146,6 +146,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
PortalStart(portal, params, 0, GetActiveSnapshot());
Assert(portal->strategy == PORTAL_ONE_SELECT);
+ Assert(portal->plan_valid);
/*
* We're done; the query won't actually be run until PerformPortalFetch is
@@ -249,6 +250,17 @@ PerformPortalClose(const char *name)
PortalDrop(portal, false);
}
+/*
+ * Release a portal's QueryDesc.
+ */
+void
+PortalQueryFinish(QueryDesc *queryDesc)
+{
+ ExecutorFinish(queryDesc);
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+}
+
/*
* PortalCleanup
*
@@ -295,9 +307,7 @@ PortalCleanup(Portal portal)
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
- FreeQueryDesc(queryDesc);
+ PortalQueryFinish(queryDesc);
CurrentResourceOwner = saveResourceOwner;
}
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..c9070ed97f 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -183,6 +183,7 @@ ExecuteQuery(ParseState *pstate,
paramLI = EvaluateParams(pstate, entry, stmt->params, estate);
}
+replan:
/* Create a new portal to run the query in */
portal = CreateNewPortal();
/* Don't display the portal in pg_cursors, it is for internal use only */
@@ -251,10 +252,19 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal contains a cached plan, it
+ * must be recreated if portal->plan_valid is false which tells that the
+ * cached plan was found to have been invalidated when initializing one of
+ * the plan trees contained in it.
*/
PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
(void) PortalRun(portal, count, false, true, dest, dest, qc);
PortalDrop(portal, false);
@@ -574,7 +584,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -618,6 +628,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI,
CurrentResourceOwner, queryEnv);
@@ -639,8 +650,21 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ {
+ QueryDesc *queryDesc;
+
+ queryDesc = ExplainQueryDesc(pstmt, cplan, queryString,
+ into, es, paramLI, queryEnv);
+ if (queryDesc == NULL)
+ {
+ ExplainResetOutput(es);
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ goto replan;
+ }
+ ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
+ }
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 1b007dc32c..0366be9fd6 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -126,11 +126,32 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
* get control when ExecutorStart is called. Such a plugin would
* normally call standard_ExecutorStart().
*
+ * Normally, the plan tree given in queryDesc->plannedstmt is known to be
+ * valid in that *all* relations contained in plannedstmt->relationOids have
+ * already been locked. That may not be the case however if the plannedstmt
+ * comes from a CachedPlan, one given in queryDesc->cplan, in which case only
+ * some of the relations referenced in the plan would have been locked; to
+ * wit, those that AcquirePlannerLocks() deems necessary. Locks necessary
+ * to fully validate such a plan tree, including relations that are added by
+ * the planner, will be taken when initializing the plan tree in InitPlan();
+ * the the caller must have set the EXEC_FLAG_GET_LOCKS bit in eflags. If the
+ * CachedPlan gets invalidated as these locks are taken, plan tree
+ * initialization is suspended at the point when such invalidation is first
+ * detected and InitPlan() returns after setting queryDesc->plan_valid to
+ * false. queryDesc->planstate would be pointing to a potentially partially
+ * initialized PlanState tree in that case. Callers must retry the execution
+ * with a freshly created CachedPlan in that case, after properly freeing the
+ * partially valid QueryDesc.
* ----------------------------------------------------------------
*/
void
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ /* Take locks if the plan tree comes from a CachedPlan. */
+ Assert(queryDesc->cplan == NULL ||
+ (CachedPlanStillValid(queryDesc->cplan) &&
+ (eflags & EXEC_FLAG_GET_LOCKS) != 0));
+
/*
* In some cases (e.g. an EXECUTE statement) a query execution will skip
* parse analysis, which means that the query_id won't be reported. Note
@@ -582,6 +603,16 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
RTEPermissionInfo *perminfo = lfirst_node(RTEPermissionInfo, l);
Assert(OidIsValid(perminfo->relid));
+
+ /*
+ * Relations whose permissions need to be checked must already have
+ * been locked by the parser or by AcquirePlannerLocks() if a
+ * cached plan is being executed.
+ * XXX shouldn't we skip calling ExecCheckPermissions from InitPlan
+ * in a parallel worker?
+ */
+ Assert(CheckRelLockedByMe(perminfo->relid, AccessShareLock, true) ||
+ IsParallelWorker());
result = ExecCheckOneRelPerms(perminfo);
if (!result)
{
@@ -785,12 +816,19 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
-
/* ----------------------------------------------------------------
* InitPlan
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ * If queryDesc contains a CachedPlan, this takes locks on relations.
+ * If any of those relations have undergone concurrent schema changes
+ * between successfully performing RevalidateCachedQuery() on the
+ * containing CachedPlanSource and here, locking those relations would
+ * invalidate the CachedPlan by way of PlanCacheRelCallback(). In that
+ * case, queryDesc->plan_valid would be set to false to tell the caller
+ * to retry after creating a new CachedPlan.
* ----------------------------------------------------------------
*/
static void
@@ -801,20 +839,32 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
- PlanState *planstate;
+ PlanState *planstate = NULL;
TupleDesc tupType;
ListCell *l;
int i;
/*
- * Do permissions checks
+ * Set up range table in EState.
*/
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
+ ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+
+ /* Make sure ExecPlanStillValid() can work. */
+ estate->es_cachedplan = queryDesc->cplan;
/*
- * initialize the node's execution state
+ * Lock any views that were mentioned in the query if needed. View
+ * relations must be locked separately like this, because they are not
+ * referenced in the plan tree.
*/
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+ ExecLockViewRelations(plannedstmt->viewRelations, estate);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
+
+ /*
+ * Do permissions checks
+ */
+ ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
@@ -849,6 +899,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -919,6 +971,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
i++;
}
@@ -929,6 +983,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ goto failed;
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -972,6 +1028,17 @@ InitPlan(QueryDesc *queryDesc, int eflags)
queryDesc->tupDesc = tupType;
queryDesc->planstate = planstate;
+ queryDesc->plan_valid = true;
+ return;
+
+failed:
+ /*
+ * Plan initialization failed. Mark QueryDesc as such. Note that we do
+ * set planstate, even if it may only be partially initialized, so that
+ * ExecEndPlan() can process it.
+ */
+ queryDesc->planstate = planstate;
+ queryDesc->plan_valid = false;
}
/*
@@ -1389,7 +1456,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked.
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
@@ -2797,7 +2864,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2884,6 +2952,7 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
PlanState *subplanstate;
subplanstate = ExecInitNode(subplan, rcestate, 0);
+ Assert(ExecPlanStillValid(rcestate));
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
}
@@ -2937,6 +3006,10 @@ EvalPlanQualEnd(EPQState *epqstate)
MemoryContext oldcontext;
ListCell *l;
+ /* Nothing to do if EvalPlanQualInit() wasn't done to begin with. */
+ if (epqstate->parentestate == NULL)
+ return;
+
rtsize = epqstate->parentestate->es_range_table_size;
/*
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aa3f283453..df4cc5ddaf 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1249,8 +1249,13 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. Note that no CachedPlan is available
+ * here even if the containing plan tree may have come from one in the
+ * leader.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
@@ -1432,6 +1437,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
ExecutorStart(queryDesc, fpes->eflags);
+ Assert(queryDesc->plan_valid);
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 9799968a42..3425ffcca7 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -513,6 +513,12 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
oldcxt = MemoryContextSwitchTo(proute->memcxt);
+ /*
+ * Note that while we must check ExecPlanStillValid() for other locks taken
+ * during execution initialization, it is OK to not do so for partitions
+ * opened like this, for tuple routing, because it can't possibly
+ * invalidate the plan.
+ */
partrel = table_open(partOid, RowExclusiveLock);
leaf_part_rri = makeNode(ResultRelInfo);
@@ -1111,6 +1117,11 @@ ExecInitPartitionDispatchInfo(EState *estate,
* Only sub-partitioned tables need to be locked here. The root
* partitioned table will already have been locked as it's referenced in
* the query's rtable.
+ *
+ * Note that while we must check ExecPlanStillValid() for other locks taken
+ * during execution initialization, it is OK to not do so for partitions
+ * opened like this, for tuple routing, because it can't possibly
+ * invalidate the plan.
*/
if (partoid != RelationGetRelid(proute->partition_root))
rel = table_open(partoid, RowExclusiveLock);
@@ -1817,6 +1828,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/* Create the working data structure for pruning */
prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Perform an initial partition prune pass, if required.
@@ -1943,6 +1956,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d4..a9b22e0f16 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -388,6 +388,9 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
break;
}
+ if (!ExecPlanStillValid(estate))
+ return result;
+
ExecSetExecProcNode(result, result->ExecProcNode);
/*
@@ -403,6 +406,12 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
Assert(IsA(subplan, SubPlan));
sstate = ExecInitSubPlan(subplan, result);
subps = lappend(subps, sstate);
+ if (!ExecPlanStillValid(estate))
+ {
+ /* Don't lose track of those initialized. */
+ result->initPlan = subps;
+ return result;
+ }
}
result->initPlan = subps;
@@ -551,6 +560,13 @@ MultiExecProcNode(PlanState *node)
* After this operation, the query plan will not be able to be
* processed any further. This should be called only after
* the query plan has been fully executed.
+ *
+ * Note: Subroutines for various node types should be prepared to handle
+ * the cases where the corresponding ExecInitNode() subroutines may return
+ * early if the lock taken on relation handled by a given node caused the
+ * plan to be invalidated (ExecPlanStillValid() stops returnins true), in
+ * which case, not all of the node's PlanState struct's fields would have
+ * been initialized.
* ----------------------------------------------------------------
*/
void
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 012dbb6965..a485e7dfc5 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -804,7 +804,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (!IsParallelWorker() &&
+ (estate->es_top_eflags & EXEC_FLAG_GET_LOCKS) == 0)
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -833,6 +834,61 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
return rel;
}
+/*
+ * ExecLockViewRelations
+ * Lock view relations, if any, in a given query
+ */
+void
+ExecLockViewRelations(List *viewRelations, EState *estate)
+{
+ ListCell *lc;
+
+ /* Nothing to do if no locks need to be taken. */
+ if ((estate->es_top_eflags & EXEC_FLAG_GET_LOCKS) == 0)
+ return;
+
+ foreach(lc, viewRelations)
+ {
+ Index rti = lfirst_int(lc);
+ RangeTblEntry *rte = exec_rt_fetch(rti, estate);
+
+ Assert(OidIsValid(rte->relid));
+ Assert(rte->relkind == RELKIND_VIEW);
+ Assert(rte->rellockmode != NoLock);
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * ExecLockAppendNonLeafRelations
+ * Lock non-leaf relations whose children are scanned by a given
+ * Append/MergeAppend node
+ */
+void
+ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids)
+{
+ ListCell *l;
+
+ /* Nothing to do if no locks need to be taken. */
+ if ((estate->es_top_eflags & EXEC_FLAG_GET_LOCKS) == 0)
+ return;
+
+ foreach(l, allpartrelids)
+ {
+ Bitmapset *partrelids = lfirst_node(Bitmapset, l);
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partrelids, i)) > 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(i, estate);
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+}
+
/*
* ExecInitResultRelation
* Open relation given by the passed-in RT index and fill its
@@ -848,6 +904,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (!ExecPlanStillValid(estate))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f55424eb5a..c88f72bc4e 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -838,6 +838,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL, /* fmgr_sql() doesn't use CachedPlans */
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -863,6 +864,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
eflags = 0; /* default run-to-completion flags */
ExecutorStart(es->qd, eflags);
+ Assert(es->qd->plan_valid);
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 19342a420c..06e0d7d149 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3134,15 +3134,18 @@ hashagg_reset_spill_state(AggState *aggstate)
{
HashAggSpill *spill = &aggstate->hash_spills[setno];
- pfree(spill->ntuples);
- pfree(spill->partitions);
+ if (spill->ntuples)
+ pfree(spill->ntuples);
+ if (spill->partitions)
+ pfree(spill->partitions);
}
pfree(aggstate->hash_spills);
aggstate->hash_spills = NULL;
}
/* free batches */
- list_free_deep(aggstate->hash_batches);
+ if (aggstate->hash_batches)
+ list_free_deep(aggstate->hash_batches);
aggstate->hash_batches = NIL;
/* close tape set */
@@ -3296,6 +3299,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return aggstate;
/*
* initialize source tuple type.
@@ -4336,10 +4341,13 @@ ExecEndAgg(AggState *node)
{
AggStatePerTrans pertrans = &node->pertrans[transno];
- for (setno = 0; setno < numGroupingSets; setno++)
+ if (pertrans)
{
- if (pertrans->sortstates[setno])
- tuplesort_end(pertrans->sortstates[setno]);
+ for (setno = 0; setno < numGroupingSets; setno++)
+ {
+ if (pertrans->sortstates[setno])
+ tuplesort_end(pertrans->sortstates[setno]);
+ }
}
}
@@ -4357,7 +4365,8 @@ ExecEndAgg(AggState *node)
ExecFreeExprContext(&node->ss.ps);
/* clean up tuple table */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c185b11c67..091f979c46 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -109,10 +109,11 @@ AppendState *
ExecInitAppend(Append *node, EState *estate, int eflags)
{
AppendState *appendstate = makeNode(AppendState);
- PlanState **appendplanstates;
+ PlanState **appendplanstates = NULL;
Bitmapset *validsubplans;
Bitmapset *asyncplans;
int nplans;
+ int ninited = 0;
int nasyncplans;
int firstvalid;
int i,
@@ -133,6 +134,15 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_syncdone = false;
appendstate->as_begun = false;
+ /*
+ * Lock non-leaf partitions. In the pruning case, some of these locks
+ * will be retaken when the partition will be opened for pruning, but it
+ * does not seem worthwhile to spend cycles to filter those out here.
+ */
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_index >= 0)
{
@@ -148,6 +158,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
node->part_prune_index,
node->apprelids,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -222,11 +234,12 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ ninited++;
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
}
appendstate->as_first_partial_plan = firstvalid;
- appendstate->appendplans = appendplanstates;
- appendstate->as_nplans = nplans;
/* Initialize async state */
appendstate->as_asyncplans = asyncplans;
@@ -276,6 +289,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
/* For parallel query, this will be overridden later. */
appendstate->choose_next_subplan = choose_next_subplan_locally;
+early_exit:
+ appendstate->appendplans = appendplanstates;
+ appendstate->as_nplans = ninited;
+
return appendstate;
}
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4c5eb2b23b..acc6c50e20 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -57,6 +57,7 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
BitmapAndState *bitmapandstate = makeNode(BitmapAndState);
PlanState **bitmapplanstates;
int nplans;
+ int ninited = 0;
int i;
ListCell *l;
Plan *initNode;
@@ -77,8 +78,6 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
bitmapandstate->ps.plan = (Plan *) node;
bitmapandstate->ps.state = estate;
bitmapandstate->ps.ExecProcNode = ExecBitmapAnd;
- bitmapandstate->bitmapplans = bitmapplanstates;
- bitmapandstate->nplans = nplans;
/*
* call ExecInitNode on each of the plans to be executed and save the
@@ -89,6 +88,9 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ ninited++;
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
i++;
}
@@ -99,6 +101,10 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
* ExecQual or ExecProject. They don't need any tuple slots either.
*/
+early_exit:
+ bitmapandstate->bitmapplans = bitmapplanstates;
+ bitmapandstate->nplans = ninited;
+
return bitmapandstate;
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index f35df0b8bf..e6a689eefb 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -665,7 +665,8 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close down subplans
@@ -693,7 +694,8 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
/*
* close heap scan
*/
- table_endscan(scanDesc);
+ if (scanDesc)
+ table_endscan(scanDesc);
}
/* ----------------------------------------------------------------
@@ -763,11 +765,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 83ec9ede89..cc8332ef68 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -263,6 +263,8 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 0bf8af9652..babad1b4b2 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -58,6 +58,7 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
BitmapOrState *bitmaporstate = makeNode(BitmapOrState);
PlanState **bitmapplanstates;
int nplans;
+ int ninited = 0;
int i;
ListCell *l;
Plan *initNode;
@@ -78,8 +79,6 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
bitmaporstate->ps.plan = (Plan *) node;
bitmaporstate->ps.state = estate;
bitmaporstate->ps.ExecProcNode = ExecBitmapOr;
- bitmaporstate->bitmapplans = bitmapplanstates;
- bitmaporstate->nplans = nplans;
/*
* call ExecInitNode on each of the plans to be executed and save the
@@ -90,6 +89,9 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ ninited++;
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
i++;
}
@@ -100,6 +102,10 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
* ExecQual or ExecProject. They don't need any tuple slots either.
*/
+early_exit:
+ bitmaporstate->bitmapplans = bitmapplanstates;
+ bitmaporstate->nplans = ninited;
+
return bitmaporstate;
}
diff --git a/src/backend/executor/nodeCtescan.c b/src/backend/executor/nodeCtescan.c
index cc4c4243e2..eed5b75a4f 100644
--- a/src/backend/executor/nodeCtescan.c
+++ b/src/backend/executor/nodeCtescan.c
@@ -297,14 +297,16 @@ ExecEndCteScan(CteScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* If I am the leader, free the tuplestore.
*/
if (node->leader == node)
{
- tuplestore_end(node->cte_table);
+ if (node->cte_table)
+ tuplestore_end(node->cte_table);
node->cte_table = NULL;
}
}
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index bd42c65b29..544593ccaf 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -61,6 +61,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return css;
css->ss.ss_currentRelation = scan_rel;
}
@@ -127,6 +129,11 @@ ExecCustomScan(PlanState *pstate)
void
ExecEndCustomScan(CustomScanState *node)
{
+ /*
+ * XXX - BeginCustomScan() may not have occurred if ExecInitCustomScan()
+ * hit the early exit case. Perhaps we should document that the custom
+ * scan provider should be ready to encounter such a situation.
+ */
Assert(node->methods->EndCustomScan != NULL);
node->methods->EndCustomScan(node);
@@ -134,8 +141,10 @@ ExecEndCustomScan(CustomScanState *node)
ExecFreeExprContext(&node->ss.ps);
/* Clean out the tuple table */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
void
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index c2139acca0..aa54f60127 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -173,6 +173,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -264,6 +266,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* Tell the FDW to initialize the scan.
@@ -300,14 +304,23 @@ ExecEndForeignScan(ForeignScanState *node)
ForeignScan *plan = (ForeignScan *) node->ss.ps.plan;
EState *estate = node->ss.ps.state;
- /* Let the FDW shut down */
- if (plan->operation != CMD_SELECT)
+ /*
+ * Let the FDW shut down if needed.
+ *
+ * XXX - BeginDirectModify()/BeginForeignScan() may not have been called
+ * if ExecInitForeignScan() returned early due to plan being invalidated
+ * upon taking a lock on the foreign table.
+ */
+ if (node->fdw_state)
{
- if (estate->es_epq_active == NULL)
- node->fdwroutine->EndDirectModify(node);
+ if (plan->operation != CMD_SELECT)
+ {
+ if (estate->es_epq_active == NULL)
+ node->fdwroutine->EndDirectModify(node);
+ }
+ else
+ node->fdwroutine->EndForeignScan(node);
}
- else
- node->fdwroutine->EndForeignScan(node);
/* Shut down any outer plan. */
if (outerPlanState(node))
@@ -319,7 +332,8 @@ ExecEndForeignScan(ForeignScanState *node)
/* clean out the tuple table */
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeFunctionscan.c b/src/backend/executor/nodeFunctionscan.c
index dd06ef8aee..792ecda4a9 100644
--- a/src/backend/executor/nodeFunctionscan.c
+++ b/src/backend/executor/nodeFunctionscan.c
@@ -533,7 +533,8 @@ ExecEndFunctionScan(FunctionScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* Release slots and tuplestore resources
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 307fc10eea..365d3af3e4 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -89,6 +89,8 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return gatherstate;
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 9d5e1a46e9..8d2809f079 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -108,6 +108,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return gm_state;
/*
* Leader may access ExecProcNode result directly (if
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 25a1618952..e0832bb778 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -185,6 +185,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return grpstate;
/*
* Initialize scan slot and type.
@@ -231,7 +233,8 @@ ExecEndGroup(GroupState *node)
ExecFreeExprContext(&node->ss.ps);
/* clean up tuple table */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 748c9b0024..891bcee919 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -386,6 +386,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hashstate;
/*
* initialize our result slot and type. No need to build projection
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index f189fb4d28..93ce0c8be0 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -691,8 +691,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hjstate;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hjstate;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
@@ -813,9 +817,12 @@ ExecEndHashJoin(HashJoinState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->hj_OuterTupleSlot);
- ExecClearTuple(node->hj_HashTupleSlot);
+ if (node->js.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
+ if (node->hj_OuterTupleSlot)
+ ExecClearTuple(node->hj_OuterTupleSlot);
+ if (node->hj_HashTupleSlot)
+ ExecClearTuple(node->hj_HashTupleSlot);
/*
* clean up subtrees
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 12bc22f33c..6b2da56044 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1041,6 +1041,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return incrsortstate;
/*
* Initialize scan slot and type.
@@ -1080,12 +1082,16 @@ ExecEndIncrementalSort(IncrementalSortState *node)
SO_printf("ExecEndIncrementalSort: shutting down sort node\n");
/* clean out the scan tuple */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/* must drop pointer to sort result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ if (node->ss.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
/* must drop standalone tuple slots from outer node */
- ExecDropSingleTupleTableSlot(node->group_pivot);
- ExecDropSingleTupleTableSlot(node->transfer_tuple);
+ if (node->group_pivot)
+ ExecDropSingleTupleTableSlot(node->group_pivot);
+ if (node->transfer_tuple)
+ ExecDropSingleTupleTableSlot(node->transfer_tuple);
/*
* Release tuplesort resources.
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0b43a9b969..b60a086464 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -394,7 +394,8 @@ ExecEndIndexOnlyScan(IndexOnlyScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if(node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close the index relation (no-op if we didn't open it)
@@ -512,6 +513,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -565,6 +568,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->ioss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 4540c7781d..628c233919 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -808,7 +808,8 @@ ExecEndIndexScan(IndexScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close the index relation (no-op if we didn't open it)
@@ -925,6 +926,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -970,6 +973,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 425fbfc405..2fcbde74ed 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -476,6 +476,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return limitstate;
/*
* initialize child expressions
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 407414fc0c..3a8aa2b5a4 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -323,6 +323,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return lrstate;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 09632678b0..f146ebb1d7 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return matstate;
/*
* Initialize result type and slot. No need to initialize projection info
@@ -242,7 +244,8 @@ ExecEndMaterial(MaterialState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* Release tuplestore resources
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 4f04269e26..3003ee1e5c 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -938,6 +938,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mstate;
/*
* Initialize return slot and type. No need to initialize projection info
@@ -1043,6 +1045,7 @@ ExecEndMemoize(MemoizeState *node)
{
#ifdef USE_ASSERT_CHECKING
/* Validate the memory accounting code is correct in assert builds. */
+ if (node->hashtable)
{
int count;
uint64 mem = 0;
@@ -1089,11 +1092,14 @@ ExecEndMemoize(MemoizeState *node)
}
/* Remove the cache context */
- MemoryContextDelete(node->tableContext);
+ if (node->tableContext)
+ MemoryContextDelete(node->tableContext);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/* must drop pointer to cache result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ if (node->ss.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
/*
* free exprcontext
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 399b39c598..40bba35499 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -65,9 +65,10 @@ MergeAppendState *
ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
{
MergeAppendState *mergestate = makeNode(MergeAppendState);
- PlanState **mergeplanstates;
+ PlanState **mergeplanstates = NULL;
Bitmapset *validsubplans;
int nplans;
+ int ninited = 0;
int i,
j;
@@ -81,6 +82,15 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.state = estate;
mergestate->ps.ExecProcNode = ExecMergeAppend;
+ /*
+ * Lock non-leaf partitions. In the pruning case, some of these locks
+ * will be retaken when the partition will be opened for pruning, but it
+ * does not seem worthwhile to spend cycles to filter those out here.
+ */
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_index >= 0)
{
@@ -96,6 +106,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
node->part_prune_index,
node->apprelids,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -122,8 +134,6 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
}
mergeplanstates = (PlanState **) palloc(nplans * sizeof(PlanState *));
- mergestate->mergeplans = mergeplanstates;
- mergestate->ms_nplans = nplans;
mergestate->ms_slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
mergestate->ms_heap = binaryheap_allocate(nplans, heap_compare_slots,
@@ -152,6 +162,9 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ ninited++;
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
}
mergestate->ps.ps_ProjInfo = NULL;
@@ -188,6 +201,10 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
mergestate->ms_initialized = false;
+early_exit:
+ mergestate->mergeplans = mergeplanstates;
+ mergestate->ms_nplans = ninited;
+
return mergestate;
}
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 809aa215c6..968be05568 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1482,11 +1482,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
@@ -1642,8 +1646,10 @@ ExecEndMergeJoin(MergeJoinState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->mj_MarkedTupleSlot);
+ if (node->js.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
+ if (node->mj_MarkedTupleSlot)
+ ExecClearTuple(node->mj_MarkedTupleSlot);
/*
* shut down the subplans
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index e350375681..8a70543326 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3900,6 +3900,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
Plan *subplan = outerPlan(node);
CmdType operation = node->operation;
int nrels = list_length(node->resultRelations);
+ int ninited = 0;
ResultRelInfo *resultRelInfo;
List *arowmarks;
ListCell *l;
@@ -3921,7 +3922,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
mtstate->canSetTag = node->canSetTag;
mtstate->mt_done = false;
- mtstate->mt_nrels = nrels;
mtstate->resultRelInfo = (ResultRelInfo *)
palloc(nrels * sizeof(ResultRelInfo));
@@ -3956,6 +3956,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL, node->epqParam);
mtstate->fireBSTriggers = true;
@@ -3982,6 +3985,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
if (resultRelInfo != mtstate->rootResultRelInfo)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
/*
* For child result relations, store the root result relation
@@ -4009,11 +4014,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ goto early_exit;
/*
* Do additional per-result-relation initialization.
*/
- for (i = 0; i < nrels; i++)
+ for (i = 0; i < nrels; i++, ninited++)
{
resultRelInfo = &mtstate->resultRelInfo[i];
@@ -4362,6 +4369,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
estate->es_auxmodifytables = lcons(mtstate,
estate->es_auxmodifytables);
+early_exit:
+ mtstate->mt_nrels = ninited;
return mtstate;
}
diff --git a/src/backend/executor/nodeNamedtuplestorescan.c b/src/backend/executor/nodeNamedtuplestorescan.c
index 46832ad82f..1f92c43d3b 100644
--- a/src/backend/executor/nodeNamedtuplestorescan.c
+++ b/src/backend/executor/nodeNamedtuplestorescan.c
@@ -174,7 +174,8 @@ ExecEndNamedTuplestoreScan(NamedTuplestoreScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b3d52e69ec..deda0c2559 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -295,11 +295,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return nlstate;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return nlstate;
/*
* Initialize result slot, type and projection.
@@ -372,7 +376,8 @@ ExecEndNestLoop(NestLoopState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
+ if (node->js.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
/*
* close down subplans
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index f6ff3dc44c..85d20c4680 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -247,6 +247,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return state;
/*
* we don't use inner plan
@@ -328,7 +330,8 @@ ExecEndProjectSet(ProjectSetState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ps.ps_ResultTupleSlot);
/*
* shut down subplans
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index e781003934..967fe4f287 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return rustate;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return rustate;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 4219712d30..c549b684a3 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -208,6 +208,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return resstate;
/*
* we don't use inner plan
@@ -248,7 +250,8 @@ ExecEndResult(ResultState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ps.ps_ResultTupleSlot);
/*
* shut down subplans
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index d7e22b1dbb..b3bc9b1f77 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -125,6 +125,8 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
@@ -198,7 +200,8 @@ ExecEndSampleScan(SampleScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close heap scan
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 4da0f28f7b..e7ca19ee4e 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,8 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
@@ -200,7 +202,8 @@ ExecEndSeqScan(SeqScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close heap scan
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 4bc2406b89..95950a5c20 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return setopstate;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
@@ -583,7 +585,8 @@ void
ExecEndSetOp(SetOpState *node)
{
/* clean up tuple table */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ps.ps_ResultTupleSlot);
/* free subsidiary stuff including hashtable */
if (node->tableContext)
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index c6c72c6e67..89fef86aba 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return sortstate;
/*
* Initialize scan slot and type.
@@ -306,9 +308,11 @@ ExecEndSort(SortState *node)
/*
* clean out the tuple table
*/
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/* must drop pointer to sort result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+ if (node->ss.ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
/*
* Release tuplesort resources
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 42471bfc04..9b8cddc89f 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return subquerystate;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
@@ -177,7 +179,8 @@ ExecEndSubqueryScan(SubqueryScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* close down subquery
diff --git a/src/backend/executor/nodeTableFuncscan.c b/src/backend/executor/nodeTableFuncscan.c
index 0c6c912778..d7536953f1 100644
--- a/src/backend/executor/nodeTableFuncscan.c
+++ b/src/backend/executor/nodeTableFuncscan.c
@@ -223,7 +223,8 @@ ExecEndTableFuncScan(TableFuncScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/*
* Release tuplestore resources
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 2124c55ef5..1ae451d7a6 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -342,7 +342,8 @@ ExecEndTidRangeScan(TidRangeScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
@@ -386,6 +387,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return tidrangestate;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 862bd0330b..9fe76b1c60 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -483,7 +483,8 @@ ExecEndTidScan(TidScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
@@ -529,6 +530,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return tidstate;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 45035d74fa..69f23b02c6 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -136,6 +136,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return uniquestate;
/*
* Initialize result slot and type. Unique nodes do no projections, so
@@ -169,7 +171,8 @@ void
ExecEndUnique(UniqueState *node)
{
/* clean up tuple table */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ps.ps_ResultTupleSlot);
ExecFreeExprContext(&node->ps);
diff --git a/src/backend/executor/nodeValuesscan.c b/src/backend/executor/nodeValuesscan.c
index 32ace63017..f5dedbab63 100644
--- a/src/backend/executor/nodeValuesscan.c
+++ b/src/backend/executor/nodeValuesscan.c
@@ -340,7 +340,8 @@ ExecEndValuesScan(ValuesScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 7c07fb0684..616bb97675 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -1334,7 +1334,7 @@ release_partition(WindowAggState *winstate)
WindowStatePerFunc perfuncstate = &(winstate->perfunc[i]);
/* Release any partition-local state of this window function */
- if (perfuncstate->winobj)
+ if (perfuncstate && perfuncstate->winobj)
perfuncstate->winobj->localmem = NULL;
}
@@ -1344,12 +1344,17 @@ release_partition(WindowAggState *winstate)
* any aggregate temp data). We don't rely on retail pfree because some
* aggregates might have allocated data we don't have direct pointers to.
*/
- MemoryContextResetAndDeleteChildren(winstate->partcontext);
- MemoryContextResetAndDeleteChildren(winstate->aggcontext);
- for (i = 0; i < winstate->numaggs; i++)
+ if (winstate->partcontext)
+ MemoryContextResetAndDeleteChildren(winstate->partcontext);
+ if (winstate->aggcontext)
+ MemoryContextResetAndDeleteChildren(winstate->aggcontext);
+ if (winstate->peragg)
{
- if (winstate->peragg[i].aggcontext != winstate->aggcontext)
- MemoryContextResetAndDeleteChildren(winstate->peragg[i].aggcontext);
+ for (i = 0; i < winstate->numaggs; i++)
+ {
+ if (winstate->peragg[i].aggcontext != winstate->aggcontext)
+ MemoryContextResetAndDeleteChildren(winstate->peragg[i].aggcontext);
+ }
}
if (winstate->buffer)
@@ -2451,6 +2456,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return winstate;
/*
* initialize source tuple type (which is also the tuple type that we'll
@@ -2679,11 +2686,16 @@ ExecEndWindowAgg(WindowAggState *node)
release_partition(node);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
- ExecClearTuple(node->first_part_slot);
- ExecClearTuple(node->agg_row_slot);
- ExecClearTuple(node->temp_slot_1);
- ExecClearTuple(node->temp_slot_2);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->first_part_slot)
+ ExecClearTuple(node->first_part_slot);
+ if (node->agg_row_slot)
+ ExecClearTuple(node->agg_row_slot);
+ if (node->temp_slot_1)
+ ExecClearTuple(node->temp_slot_1);
+ if (node->temp_slot_2)
+ ExecClearTuple(node->temp_slot_2);
if (node->framehead_slot)
ExecClearTuple(node->framehead_slot);
if (node->frametail_slot)
@@ -2696,16 +2708,23 @@ ExecEndWindowAgg(WindowAggState *node)
node->ss.ps.ps_ExprContext = node->tmpcontext;
ExecFreeExprContext(&node->ss.ps);
- for (i = 0; i < node->numaggs; i++)
+ if (node->peragg)
{
- if (node->peragg[i].aggcontext != node->aggcontext)
- MemoryContextDelete(node->peragg[i].aggcontext);
+ for (i = 0; i < node->numaggs; i++)
+ {
+ if (node->peragg[i].aggcontext != node->aggcontext)
+ MemoryContextDelete(node->peragg[i].aggcontext);
+ }
}
- MemoryContextDelete(node->partcontext);
- MemoryContextDelete(node->aggcontext);
+ if (node->partcontext)
+ MemoryContextDelete(node->partcontext);
+ if (node->aggcontext)
+ MemoryContextDelete(node->aggcontext);
- pfree(node->perfunc);
- pfree(node->peragg);
+ if (node->perfunc)
+ pfree(node->perfunc);
+ if (node->peragg)
+ pfree(node->peragg);
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
diff --git a/src/backend/executor/nodeWorktablescan.c b/src/backend/executor/nodeWorktablescan.c
index 0c13448236..d70c6afde3 100644
--- a/src/backend/executor/nodeWorktablescan.c
+++ b/src/backend/executor/nodeWorktablescan.c
@@ -200,7 +200,8 @@ ExecEndWorkTableScan(WorkTableScanState *node)
*/
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->ss.ss_ScanTupleSlot)
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index e3a170c38b..26a9ea342a 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -71,7 +71,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1623,6 +1623,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
_SPI_current->processed = 0;
_SPI_current->tuptable = NULL;
+replan:
/* Create the portal */
if (name == NULL || name[0] == '\0')
{
@@ -1766,7 +1767,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if portal->plan_valid is false which tells that the cached
+ * plan was found to have been invalidated when initializing one of the
+ * plan trees contained in it.
*/
PortalStart(portal, paramLI, 0, snapshot);
@@ -1775,6 +1779,12 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/* Pop the SPI stack */
_SPI_end_call(true);
@@ -2552,6 +2562,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2661,6 +2672,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2668,14 +2680,36 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+
+ /* Take locks if using a CachedPlan */
+ if (qdesc->cplan)
+ eflags |= EXEC_FLAG_GET_LOCKS;
+
+ ExecutorStart(qdesc, eflags);
+ if (!qdesc->plan_valid)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ Assert(cplan);
+ ReleaseCachedPlan(cplan, plan_owner);
+ goto replan;
+ }
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2850,10 +2884,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2897,14 +2930,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index ba00b99249..955286513d 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -513,6 +513,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
WRITE_BOOL_FIELD(security_barrier);
/* we re-use these RELATION fields, too: */
WRITE_OID_FIELD(relid);
+ WRITE_CHAR_FIELD(relkind);
WRITE_INT_FIELD(rellockmode);
WRITE_UINT_FIELD(perminfoindex);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 597e5b3ea8..a136ae1d60 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -503,6 +503,7 @@ _readRangeTblEntry(void)
READ_BOOL_FIELD(security_barrier);
/* we re-use these RELATION fields, too: */
READ_OID_FIELD(relid);
+ READ_CHAR_FIELD(relkind);
READ_INT_FIELD(rellockmode);
READ_UINT_FIELD(perminfoindex);
break;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 62b3ec96cc..5f3ffd98af 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -527,6 +527,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->permInfos = glob->finalrteperminfos;
+ result->viewRelations = glob->viewRelations;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 5cc8366af6..f13240bf33 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "access/transam.h"
+#include "catalog/pg_class.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -604,6 +605,10 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, List *rteperminfos,
(newrte->rtekind == RTE_SUBQUERY && OidIsValid(newrte->relid)))
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
+ if (newrte->relkind == RELKIND_VIEW)
+ glob->viewRelations = lappend_int(glob->viewRelations,
+ list_length(glob->finalrtable));
+
/*
* Add a copy of the RTEPermissionInfo, if any, corresponding to this RTE
* to the flattened global list.
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 980dc1816f..1631c8b993 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1849,11 +1849,10 @@ ApplyRetrieveRule(Query *parsetree,
/*
* Clear fields that should not be set in a subquery RTE. Note that we
- * leave the relid, rellockmode, and perminfoindex fields set, so that the
- * view relation can be appropriately locked before execution and its
- * permissions checked.
+ * leave the relid, relkind, rellockmode, and perminfoindex fields set,
+ * so that the view relation can be appropriately locked before execution
+ * and its permissions checked.
*/
- rte->relkind = 0;
rte->tablesample = NULL;
rte->inh = false; /* must not be set for a subquery */
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a672..c807e9cdcc 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -27,6 +27,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
@@ -364,6 +365,50 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
return false;
}
+/*
+ * CheckRelLockedByMe
+ *
+ * Returns true if current transaction holds a lock on the given relation of
+ * mode 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
+ */
+bool
+CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+ Oid dbId = get_rel_relisshared(relid) ? InvalidOid : MyDatabaseId;
+ LOCKTAG tag;
+
+ SET_LOCKTAG_RELATION(tag, dbId, relid);
+
+ if (LockHeldByMe(&tag, lockmode))
+ return true;
+
+ if (orstronger)
+ {
+ LOCKMODE slockmode;
+
+ for (slockmode = lockmode + 1;
+ slockmode <= MaxLockMode;
+ slockmode++)
+ {
+ if (LockHeldByMe(&tag, slockmode))
+ {
+#ifdef NOT_USED
+ /* Sometimes this might be useful for debugging purposes */
+ elog(WARNING, "lock mode %s substituted for %s on relation %s",
+ GetLockmodeName(tag.locktag_lockmethodid, slockmode),
+ GetLockmodeName(tag.locktag_lockmethodid, lockmode),
+ RelationGetRelationName(relation));
+#endif
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
/*
* LockHasWaitersRelation
*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index cab709b07b..6d0ea07801 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1199,6 +1199,7 @@ exec_simple_query(const char *query_string)
* Start the portal. No parameters here.
*/
PortalStart(portal, NULL, 0, InvalidSnapshot);
+ Assert(portal->plan_valid);
/*
* Select the appropriate output format: text unless we are doing a
@@ -1703,6 +1704,7 @@ exec_bind_message(StringInfo input_message)
"commands ignored until end of transaction block"),
errdetail_abort()));
+replan:
/*
* Create the portal. Allow silent replacement of an existing portal only
* if the unnamed portal is specified.
@@ -1994,10 +1996,19 @@ exec_bind_message(StringInfo input_message)
PopActiveSnapshot();
/*
- * And we're ready to start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if portal->plan_valid is false which tells that the cached
+ * plan was found to have been invalidated when initializing one of the
+ * plan trees contained in it.
*/
PortalStart(portal, params, 0, InvalidSnapshot);
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/*
* Apply the result format requests to the portal.
*/
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5f0248acc5..c93a950d7f 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -35,12 +36,6 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc);
static void FillPortalStore(Portal portal, bool isTopLevel);
static uint64 RunFromStore(Portal portal, ScanDirection direction, uint64 count,
DestReceiver *dest);
@@ -65,6 +60,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +73,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan, if plan is from one */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -116,86 +113,6 @@ FreeQueryDesc(QueryDesc *qdesc)
}
-/*
- * ProcessQuery
- * Execute a single plannable query within a PORTAL_MULTI_QUERY,
- * PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
- *
- * plan: the plan tree for the query
- * sourceText: the source text of the query
- * params: any parameters needed
- * dest: where to send results
- * qc: where to store the command completion status data.
- *
- * qc may be NULL if caller doesn't want a status string.
- *
- * Must be called in a memory context that will be reset or deleted on
- * error; otherwise the executor's memory usage will be leaked.
- */
-static void
-ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc)
-{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
-
- /*
- * Run the plan to completion.
- */
- ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);
-
- /*
- * Build command completion status data, if caller wants one.
- */
- if (qc)
- {
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
- SetQueryCompletion(qc, CMDTAG_SELECT, queryDesc->estate->es_processed);
- break;
- case CMD_INSERT:
- SetQueryCompletion(qc, CMDTAG_INSERT, queryDesc->estate->es_processed);
- break;
- case CMD_UPDATE:
- SetQueryCompletion(qc, CMDTAG_UPDATE, queryDesc->estate->es_processed);
- break;
- case CMD_DELETE:
- SetQueryCompletion(qc, CMDTAG_DELETE, queryDesc->estate->es_processed);
- break;
- case CMD_MERGE:
- SetQueryCompletion(qc, CMDTAG_MERGE, queryDesc->estate->es_processed);
- break;
- default:
- SetQueryCompletion(qc, CMDTAG_UNKNOWN, queryDesc->estate->es_processed);
- break;
- }
- }
-
- /*
- * Now, we close down all the scans and free allocated resources.
- */
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
-
- FreeQueryDesc(queryDesc);
-}
-
/*
* ChoosePortalStrategy
* Select portal execution strategy given the intended statement list.
@@ -427,7 +344,8 @@ FetchStatementTargetList(Node *stmt)
* to be used for cursors).
*
* On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * tupdesc (if any) is known, unless portal->plan_valid is set to false, in
+ * which case, the caller must retry after generating a new CachedPlan.
*/
void
PortalStart(Portal portal, ParamListInfo params,
@@ -435,7 +353,6 @@ PortalStart(Portal portal, ParamListInfo params,
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
int myeflags;
@@ -448,15 +365,13 @@ PortalStart(Portal portal, ParamListInfo params,
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -472,6 +387,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -493,6 +410,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -501,30 +419,56 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti(). */
+ if (portal->strategy == PORTAL_ONE_RETURNING ||
+ portal->strategy == PORTAL_ONE_MOD_WITH)
+ portal->qdescs = list_make1(queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
+ /* Take locks if using a CachedPlan */
+ if (queryDesc->cplan)
+ myeflags |= EXEC_FLAG_GET_LOCKS;
+
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Call ExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated as we're doing that.
*/
ExecutorStart(queryDesc, myeflags);
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ PopActiveSnapshot();
+ portal->plan_valid = false;
+ goto early_exit;
+ }
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -532,33 +476,11 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -578,11 +500,90 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool first = true;
+
+ /* Take locks if using a CachedPlan */
+ myeflags = 0;
+ if (portal->cplan)
+ myeflags |= EXEC_FLAG_GET_LOCKS;
+
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /*
+ * Push the snapshot to be used by the executor.
+ */
+ if (!is_utility)
+ {
+ /*
+ * Must copy the snapshot if we'll need to update
+ * its command ID.
+ */
+ if (!first)
+ PushCopiedSnapshot(GetTransactionSnapshot());
+ else
+ PushActiveSnapshot(GetTransactionSnapshot());
+ }
+
+ /*
+ * From the 2nd statement onwards, update the command
+ * ID and the snapshot to match.
+ */
+ if (!first)
+ {
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+ }
+
+ first = false;
+
+ /*
+ * Create the QueryDesc object. DestReceiver will
+ * be set in PortalRunMulti().
+ */
+ queryDesc = CreateQueryDesc(plan, portal->cplan,
+ portal->sourceText,
+ !is_utility ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ if (is_utility)
+ continue;
+
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated as
+ * we're doing that.
+ */
+ ExecutorStart(queryDesc, myeflags);
+ PopActiveSnapshot();
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ portal->plan_valid = false;
+ goto early_exit;
+ }
+ }
+ }
+
portal->tupDesc = NULL;
+ portal->plan_valid = true;
break;
}
}
@@ -594,19 +595,18 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+early_exit:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
-
- portal->status = PORTAL_READY;
}
/*
@@ -1193,7 +1193,7 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1214,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = (QueryDesc *) lfirst(qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1271,23 +1272,38 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Run the plan to completion.
+ */
+ qdesc->dest = dest;
+ ExecutorRun(qdesc, ForwardScanDirection, 0L, true);
+
+ /*
+ * Build command completion status data if needed.
+ */
if (pstmt->canSetTag)
{
- /* statement can set tag string */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- dest, qc);
- }
- else
- {
- /* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- altdest, NULL);
+ switch (qdesc->operation)
+ {
+ case CMD_SELECT:
+ SetQueryCompletion(qc, CMDTAG_SELECT, qdesc->estate->es_processed);
+ break;
+ case CMD_INSERT:
+ SetQueryCompletion(qc, CMDTAG_INSERT, qdesc->estate->es_processed);
+ break;
+ case CMD_UPDATE:
+ SetQueryCompletion(qc, CMDTAG_UPDATE, qdesc->estate->es_processed);
+ break;
+ case CMD_DELETE:
+ SetQueryCompletion(qc, CMDTAG_DELETE, qdesc->estate->es_processed);
+ break;
+ case CMD_MERGE:
+ SetQueryCompletion(qc, CMDTAG_MERGE, qdesc->estate->es_processed);
+ break;
+ default:
+ SetQueryCompletion(qc, CMDTAG_UNKNOWN, qdesc->estate->es_processed);
+ break;
+ }
}
if (log_executor_stats)
@@ -1346,8 +1362,15 @@ PortalRunMulti(Portal portal,
* Increment command counter between queries, but not after the last
* one.
*/
- if (lnext(portal->stmts, stmtlist_item) != NULL)
+ if (lnext(portal->qdescs, qdesc_item) != NULL)
CommandCounterIncrement();
+
+ if (qdesc->estate)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ }
+ FreeQueryDesc(qdesc);
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index c7607895cd..014cd476f4 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2073,6 +2073,27 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * get_rel_relisshared
+ *
+ * Returns if the given relation is shared or not
+ */
+bool
+get_rel_relisshared(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relisshared;
+ ReleaseSysCache(tp);
+
+ return result;
+}
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 77c2ba3f8f..4e455d815f 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -100,13 +100,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -787,9 +787,6 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
- *
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -803,60 +800,56 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
-
- return false;
+ return cplan->is_valid;
}
/*
@@ -1126,9 +1119,6 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
- *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1360,8 +1350,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if the executor would need to take additional locks, that is, in
+ * addition to those taken by AcquirePlannerLocks() on a given query.
*/
foreach(lc, plan->stmt_list)
{
@@ -1735,58 +1725,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 06dfa85f04..3ad80c7ecb 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -201,6 +201,10 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /* initialize portal's query context to store QueryDescs */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -224,6 +228,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -594,6 +599,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3d3e632a0c..392abb5150 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -88,7 +88,11 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt, struct CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv);
+extern void ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
@@ -104,6 +108,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index af2bf36dfb..c36c25b497 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -32,9 +32,12 @@
*/
typedef struct QueryDesc
{
+ NodeTag type;
+
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ struct CachedPlan *cplan; /* CachedPlan, if plannedstmt is from one */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -47,6 +50,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ bool plan_valid; /* is planstate tree fully valid? */
/* This field is set by ExecutorRun */
bool already_executed; /* true if previously executed */
@@ -57,6 +61,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ struct CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index f9e6bf3d4a..a6ac772400 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -61,6 +62,10 @@
* WITH_NO_DATA indicates that we are performing REFRESH MATERIALIZED VIEW
* ... WITH NO DATA. Currently, the only effect is to suppress errors about
* scanning unpopulated materialized views.
+ *
+ * GET_LOCKS indicates that the caller of ExecutorStart() is executing a
+ * cached plan which must be validated by taking the remaining locks necessary
+ * for execution.
*/
#define EXEC_FLAG_EXPLAIN_ONLY 0x0001 /* EXPLAIN, no ANALYZE */
#define EXEC_FLAG_EXPLAIN_GENERIC 0x0002 /* EXPLAIN (GENERIC_PLAN) */
@@ -69,6 +74,8 @@
#define EXEC_FLAG_MARK 0x0010 /* need mark/restore */
#define EXEC_FLAG_SKIP_TRIGGERS 0x0020 /* skip AfterTrigger setup */
#define EXEC_FLAG_WITH_NO_DATA 0x0040 /* REFRESH ... WITH NO DATA */
+#define EXEC_FLAG_GET_LOCKS 0x0400 /* should the executor lock
+ * relations? */
/* Hook for plugins to get control in ExecutorStart() */
@@ -255,6 +262,13 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/* Is the cached plan*/
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanStillValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
@@ -589,6 +603,8 @@ exec_rt_fetch(Index rti, EState *estate)
}
extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
+extern void ExecLockViewRelations(List *viewRelations, EState *estate);
+extern void ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids);
extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Index rti);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index d97f5a8e7d..dfa72848c7 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -623,6 +623,8 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct CachedPlan *es_cachedplan; /* CachedPlan if plannedstmt is from
+ * one */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
const char *es_sourceText; /* Source text from QueryDesc */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d61a62da19..9b888b0d75 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -116,6 +116,9 @@ typedef struct PlannerGlobal
/* "flat" list of RTEPermissionInfos */
List *finalrteperminfos;
+ /* "flat" list of integer RT indexes */
+ List *viewRelations;
+
/* "flat" list of PlanRowMarks */
List *finalrowmarks;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a0bb16cff4..7cae624bbd 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -78,6 +78,9 @@ typedef struct PlannedStmt
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
+ List *viewRelations; /* integer list of RT indexes, or NIL if no
+ * views are queried */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 4ee91e3cf9..598bf2688a 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,7 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
bool orstronger);
+extern bool CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger);
extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 4f5418b972..3074e604dd 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -139,6 +139,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_relisshared(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a443181d41..8990fe72e3 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -221,6 +221,20 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Called by the executor on every relation lock taken when initializing the
+ * plan tree in the CachedPlan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aa08b1e0fc..332a08ccb4 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,9 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ bool plan_valid; /* are plan(s) ready for execution? */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +245,7 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalQueryFinish(QueryDesc *queryDesc);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..2fca84d027 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-replan
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7cd76eb34b..5d7a3e9858 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2023, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,41 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* planner_hook function to provide the desired delay */
+static void
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ queryDesc->cplan->is_valid ? "valid" : "not valid");
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +123,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-replan.out b/src/test/modules/delay_execution/expected/cached-plan-replan.out
new file mode 100644
index 0000000000..4f450b9d9b
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-replan.out
@@ -0,0 +1,117 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo11 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = $1)
+(6 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-----------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo11 foo_1
+ Filter: (a = $1)
+(4 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q2 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+----------------------------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Index Only Scan using foo11_a_idx on foo11 t1
+ -> Materialize
+ -> Index Scan using foo11_a_idx on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(18 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Sort
+ Sort Key: t1.a
+ -> Seq Scan on foo11 t1
+ -> Sort
+ Sort Key: t2.a
+ -> Seq Scan on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(21 rows)
+
diff --git a/src/test/modules/delay_execution/specs/cached-plan-replan.spec b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
new file mode 100644
index 0000000000..67cfed7044
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
@@ -0,0 +1,50 @@
+# Test to check that invalidation of a cached plan during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo1 PARTITION OF foo FOR VALUES IN (1) PARTITION BY LIST (a);
+ CREATE TABLE foo11 PARTITION OF foo1 FOR VALUES IN (1);
+ CREATE INDEX foo11_a ON foo1 (a);
+ CREATE TABLE foo2 PARTITION OF foo FOR VALUES IN (2);
+ CREATE VIEW foov AS SELECT * FROM foo;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP TABLE foo;
+}
+
+session "s1"
+# Creates a prepared statement and forces creation of a generic plan
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q2 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo11_a; }
+
+# While "s1exec" waits to acquire the advisory lock, "s2drop" is able to drop
+# the index being used in the cached plan for `q`, so when "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
--
2.35.3
Amit Langote <amitlangote09@gmail.com> writes:
[ v38 patchset ]
I spent a little bit of time looking through this, and concluded that
it's not something I will be wanting to push into v16 at this stage.
The patch doesn't seem very close to being committable on its own
terms, and even if it was now is not a great time in the dev cycle
to be making significant executor API changes. Too much risk of
having to thrash the API during beta, or even change it some more
in v17. I suggest that we push this forward to the next CF with the
hope of landing it early in v17.
A few concrete thoughts:
* I understand that your plan now is to acquire locks on all the
originally-named tables, then do permissions checks (which will
involve only those tables), then dynamically lock just inheritance and
partitioning child tables as we descend the plan tree. That seems
more or less okay to me, but it could be reflected better in the
structure of the patch perhaps.
* In particular I don't much like the "viewRelations" list, which
seems like a wart; those ought to be handled more nearly the same way
as other RTEs. (One concrete reason why is that this scheme is going
to result in locking views in a different order than they were locked
during original parsing, which perhaps could contribute to deadlocks.)
Maybe we should store an integer list of which RTIs need to be locked
in the early phase? Building that in the parser/rewriter would provide
a solid guide to the original locking order, so we'd be trivially sure
of duplicating that. (It might be close enough to follow the RT list
order, which is basically what AcquireExecutorLocks does today, but
this'd be more certain to do the right thing.) I'm less concerned
about lock order for child tables because those are just going to
follow the inheritance or partitioning structure.
* I don't understand the need for changes like this:
/* clean up tuple table */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->ps.ps_ResultTupleSlot)
+ ExecClearTuple(node->ps.ps_ResultTupleSlot);
ISTM that the process ought to involve taking a lock (if needed)
before we have built any execution state for a given plan node,
and if we find we have to fail, returning NULL instead of a
partially-valid planstate node. Otherwise, considerations of how
to handle partially-valid nodes are going to metastasize into all
sorts of places, almost certainly including EXPLAIN for instance.
I think we ought to be able to limit the damage to "parent nodes
might have NULL child links that you wouldn't have expected".
That wouldn't faze ExecEndNode at all, nor most other code.
* More attention is needed to comments. For example, in a couple of
places in plancache.c you have removed function header comments
defining API details and not replaced them with any info about the new
details, despite the fact that those details are more complex than the
old.
It seems I hadn't noted in the ExecEndNode()'s comment that all node
types' recursive subroutines need to handle the change made by this
patch that the corresponding ExecInitNode() subroutine may now return
early without having initialized all state struct fields.
Also noted in the documentation for CustomScan and ForeignScan that
the Begin*Scan callback may not have been called at all, so the
End*Scan should handle that gracefully.
Yeah, I think we need to avoid adding such requirements. It's the
sort of thing that would far too easily get past developer testing
and only fail once in a blue moon in the field.
regards, tom lane
On Tue, Apr 4, 2023 at 6:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Langote <amitlangote09@gmail.com> writes:
[ v38 patchset ]
I spent a little bit of time looking through this, and concluded that
it's not something I will be wanting to push into v16 at this stage.
The patch doesn't seem very close to being committable on its own
terms, and even if it was now is not a great time in the dev cycle
to be making significant executor API changes. Too much risk of
having to thrash the API during beta, or even change it some more
in v17. I suggest that we push this forward to the next CF with the
hope of landing it early in v17.
OK, thanks a lot for your feedback.
A few concrete thoughts:
* I understand that your plan now is to acquire locks on all the
originally-named tables, then do permissions checks (which will
involve only those tables), then dynamically lock just inheritance and
partitioning child tables as we descend the plan tree.
Actually, with the current implementation of the patch, *all* of the
relations mentioned in the plan tree would get locked during the
ExecInitNode() traversal of the plan tree (and of those in
plannedstmt->subplans), not just the inheritance child tables.
Locking of non-child tables done by the executor after this patch is
duplicative with AcquirePlannerLocks(), so that's something to be
improved.
That seems
more or less okay to me, but it could be reflected better in the
structure of the patch perhaps.* In particular I don't much like the "viewRelations" list, which
seems like a wart; those ought to be handled more nearly the same way
as other RTEs. (One concrete reason why is that this scheme is going
to result in locking views in a different order than they were locked
during original parsing, which perhaps could contribute to deadlocks.)
Maybe we should store an integer list of which RTIs need to be locked
in the early phase? Building that in the parser/rewriter would provide
a solid guide to the original locking order, so we'd be trivially sure
of duplicating that. (It might be close enough to follow the RT list
order, which is basically what AcquireExecutorLocks does today, but
this'd be more certain to do the right thing.) I'm less concerned
about lock order for child tables because those are just going to
follow the inheritance or partitioning structure.
What you've described here sounds somewhat like what I had implemented
in the patch versions till v31, though it used a bitmapset named
minLockRelids that is initialized by setrefs.c. Your idea of
initializing a list before planning seems more appealing offhand than
the code I had added in setrefs.c to populate that minLockRelids
bitmapset, which would be bms_add_range(1, list_lenth(finalrtable)),
followed by bms_del_members(set-of-child-rel-rtis).
I'll give your idea a try.
* I don't understand the need for changes like this:
/* clean up tuple table */ - ExecClearTuple(node->ps.ps_ResultTupleSlot); + if (node->ps.ps_ResultTupleSlot) + ExecClearTuple(node->ps.ps_ResultTupleSlot);ISTM that the process ought to involve taking a lock (if needed)
before we have built any execution state for a given plan node,
and if we find we have to fail, returning NULL instead of a
partially-valid planstate node. Otherwise, considerations of how
to handle partially-valid nodes are going to metastasize into all
sorts of places, almost certainly including EXPLAIN for instance.
I think we ought to be able to limit the damage to "parent nodes
might have NULL child links that you wouldn't have expected".
That wouldn't faze ExecEndNode at all, nor most other code.
Hmm, yes, taking a lock before allocating any of the stuff to add into
the planstate seems like it's much easier to reason about than the
alternative I've implemented.
* More attention is needed to comments. For example, in a couple of
places in plancache.c you have removed function header comments
defining API details and not replaced them with any info about the new
details, despite the fact that those details are more complex than the
old.
OK, yeah, maybe I've added a bunch of explanations in execMain.c that
should perhaps have been in plancache.c.
It seems I hadn't noted in the ExecEndNode()'s comment that all node
types' recursive subroutines need to handle the change made by this
patch that the corresponding ExecInitNode() subroutine may now return
early without having initialized all state struct fields.
Also noted in the documentation for CustomScan and ForeignScan that
the Begin*Scan callback may not have been called at all, so the
End*Scan should handle that gracefully.Yeah, I think we need to avoid adding such requirements. It's the
sort of thing that would far too easily get past developer testing
and only fail once in a blue moon in the field.
OK, got it.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
On Tue, Apr 4, 2023 at 10:29 PM Amit Langote <amitlangote09@gmail.com>
wrote:
On Tue, Apr 4, 2023 at 6:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
A few concrete thoughts:
* I understand that your plan now is to acquire locks on all the
originally-named tables, then do permissions checks (which will
involve only those tables), then dynamically lock just inheritance and
partitioning child tables as we descend the plan tree.Actually, with the current implementation of the patch, *all* of the
relations mentioned in the plan tree would get locked during the
ExecInitNode() traversal of the plan tree (and of those in
plannedstmt->subplans), not just the inheritance child tables.
Locking of non-child tables done by the executor after this patch is
duplicative with AcquirePlannerLocks(), so that's something to be
improved.That seems
more or less okay to me, but it could be reflected better in the
structure of the patch perhaps.* In particular I don't much like the "viewRelations" list, which
seems like a wart; those ought to be handled more nearly the same way
as other RTEs. (One concrete reason why is that this scheme is going
to result in locking views in a different order than they were locked
during original parsing, which perhaps could contribute to deadlocks.)
Maybe we should store an integer list of which RTIs need to be locked
in the early phase? Building that in the parser/rewriter would provide
a solid guide to the original locking order, so we'd be trivially sure
of duplicating that. (It might be close enough to follow the RT list
order, which is basically what AcquireExecutorLocks does today, but
this'd be more certain to do the right thing.) I'm less concerned
about lock order for child tables because those are just going to
follow the inheritance or partitioning structure.What you've described here sounds somewhat like what I had implemented
in the patch versions till v31, though it used a bitmapset named
minLockRelids that is initialized by setrefs.c. Your idea of
initializing a list before planning seems more appealing offhand than
the code I had added in setrefs.c to populate that minLockRelids
bitmapset, which would be bms_add_range(1, list_lenth(finalrtable)),
followed by bms_del_members(set-of-child-rel-rtis).I'll give your idea a try.
After sleeping on this, I think we perhaps don't need to remember
originally-named relations if only for the purpose of locking them for
execution. That's because, for a reused (cached) plan,
AcquirePlannerLocks() would have taken those locks anyway.
AcquirePlannerLocks() doesn't lock inheritance children because they would
be added to the range table by the planner, so they should be locked
separately for execution, if needed. I thought taking the execution-time
locks only when inside ExecInit[Merge]Append would work, but then we have
cases where single-child Append/MergeAppend are stripped of the
Append/MergeAppend nodes by setrefs.c. Maybe we need a place to remember
such child relations, that is, only in the cases where Append/MergeAppend
elision occurs, in something maybe esoteric-sounding like
PlannedStmt.elidedAppendChildRels or something?
Another set of child relations that are not covered by Append/MergeAppend
child nodes is non-leaf partitions. I've proposed adding a List of
Bitmapset field to Append/MergeAppend named 'allpartrelids' as part of this
patchset (patch 0001) to track those for execution-time locking.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Here is a new version. Summary of main changes since the last version
that Tom reviewed back in April:
* ExecInitNode() subroutines now return NULL (as opposed to a
partially initialized PlanState node as in the last version) upon
detecting that the CachedPlan that the plan tree is from is no longer
valid due to invalidation messages processed upon taking locks. Plan
tree subnodes that are fully initialized till the point of detection
are added by ExecInitNode() into a List in EState called
es_inited_plannodes. ExecEndPlan() now iterates over that list to
close each one individually using ExecEndNode(). ExecEndNode() or its
subroutines thus no longer need to be recursive to close the child
nodes. Also, with this design, there is no longer the possibility of
partially initialized PlanState trees with partially initialized
individual PlanState nodes, so the ExecEndNode() subroutine changes
that were in the last version to account for partial initialization
are not necessary.
* Instead of setting EXEC_FLAG_GET_LOCKS in es_top_eflags for the
entire duration of InitPlan(), it is now only set in ExecInitAppend()
and ExecInitMergeAppend(), because that's where the subnodes scanning
child tables would be and the executor only needs to lock child tables
to validate a CachedPlan in a race-free manner. Parent tables that
appear in the query would have been locked by AcquirePlannerLocks().
Child tables whose scan subnodes don't appear under Append/MergeAppend
(due to the latter being removed by setrefs.c for there being only a
single child) are identified in PlannedStmt.elidedAppendChildRelations
and InitPlan() locks each one found there if the plan tree is from a
CachedPlan.
* There's no longer PlannedStmt.viewRelations, because view relations
need not be tracked separately for locking as AcquirePlannerLocks()
covers them.
Attachments:
v39-0004-Track-opened-range-table-relations-in-a-List-in-.patchapplication/octet-stream; name=v39-0004-Track-opened-range-table-relations-in-a-List-in-.patchDownload
From b27b16024d8e673062520b8a3792b71d51e1aed9 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Mon, 13 Mar 2023 15:59:38 +0900
Subject: [PATCH v39 4/4] Track opened range table relations in a List in
EState
This makes ExecCloseRangeTableRelations faster when there are many
relations in the range table but only a few are opened during
execution, such as when run-time pruning kicks in on an Append
containing 1000s of partition subplans.
---
src/backend/executor/execMain.c | 9 +++++----
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 2 ++
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 00db6eb307..28b72213c4 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1649,12 +1649,13 @@ ExecCloseResultRelations(EState *estate)
void
ExecCloseRangeTableRelations(EState *estate)
{
- int i;
+ ListCell *lc;
- for (i = 0; i < estate->es_range_table_size; i++)
+ foreach(lc, estate->es_opened_relations)
{
- if (estate->es_relations[i])
- table_close(estate->es_relations[i], NoLock);
+ Relation rel = lfirst(lc);
+
+ table_close(rel, NoLock);
}
}
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 599db4d597..be100f4bd8 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -831,6 +831,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
}
estate->es_relations[rti - 1] = rel;
+ estate->es_opened_relations = lappend(estate->es_opened_relations,
+ rel);
}
return rel;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f0c5177b06..be06c40766 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,8 @@ typedef struct EState
Index es_range_table_size; /* size of the range table arrays */
Relation *es_relations; /* Array of per-range-table-entry Relation
* pointers, or NULL if not yet opened */
+ List *es_opened_relations; /* List of non-NULL entries in
+ * es_relations in no specific order */
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
--
2.35.3
v39-0002-Add-a-PlannedStmt-field-to-store-RT-indexes-of-o.patchapplication/octet-stream; name=v39-0002-Add-a-PlannedStmt-field-to-store-RT-indexes-of-o.patchDownload
From aa3de3f0770cf7f2b91d70de90922fdfce947cf5 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 7 Jun 2023 21:00:58 +0900
Subject: [PATCH v39 2/4] Add a PlannedStmt field to store RT indexes of
once-child relations
A future commit will teach the executor to lock only the child tables
appearing in a plan and to identify those tables it will rely on the
fact that plan tree subnodes to scan the child tables normally appear
under an Append/MergeAppend node. But when there's only one child
subnode, setrefs.c removes the redundant Append/MergeAppend node,
making it impossible for the executor to identify the subnode as
scanning a child table.
This commit makes setrefs.c store the RT indexes of child tables
scanned by such once-child subnodes into a new field of PlannedStmt
called elidedAppendChildRelations.
There are no users of that field as of this commit but the
aforementioned future commit will use it to lock child tables that
don't appears under Append/MergeAppend.
---
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 97 ++++++++++++++++++++++++++++
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 2 +
4 files changed, 103 insertions(+)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 8e3d2c1e35..27a4c7585a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -526,6 +526,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
+ result->elidedAppendChildRels = glob->elidedAppendChildRels;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
result->rowMarks = glob->finalrowmarks;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index c4db6812ec..8ccc869bdd 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -16,6 +16,7 @@
#include "postgres.h"
#include "access/transam.h"
+#include "catalog/pg_class.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -134,6 +135,7 @@ static void flatten_unplanned_rtes(PlannerGlobal *glob, RangeTblEntry *rte);
static bool flatten_rtes_walker(Node *node, flatten_rtes_walker_context *cxt);
static void add_rte_to_flat_rtable(PlannerGlobal *glob, List *rteperminfos,
RangeTblEntry *rte);
+static List *add_plan_scanrelids(List *scanrelids, Plan *plan);
static Plan *set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset);
static Plan *set_indexonlyscan_references(PlannerInfo *root,
IndexOnlyScan *plan,
@@ -601,6 +603,93 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, List *rteperminfos,
}
}
+/*
+ * Recursively adds the RT index(es) of the relation(s) scanned by plan
+ * to 'scanrelids'.
+ */
+static List *
+add_plan_scanrelids(List *scanrelids, Plan *plan)
+{
+ if (plan == NULL)
+ return scanrelids;
+
+ switch (nodeTag(plan))
+ {
+ case T_SeqScan:
+ case T_SampleScan:
+ case T_IndexScan:
+ case T_IndexOnlyScan:
+ case T_BitmapHeapScan:
+ case T_TidScan:
+ case T_TidRangeScan:
+ case T_SubqueryScan:
+ case T_FunctionScan:
+ case T_TableFuncScan:
+ case T_ValuesScan:
+ case T_CteScan:
+ case T_WorkTableScan:
+ case T_NamedTuplestoreScan:
+ case T_ForeignScan:
+ case T_CustomScan:
+ scanrelids = lappend_int(scanrelids, ((Scan *) plan)->scanrelid);
+ break;
+
+ /* Recurse for nodes that have child plans. */
+ case T_Append:
+ {
+ Append *aplan = (Append *) plan;
+ ListCell *l;
+
+ foreach(l, aplan->appendplans)
+ scanrelids = add_plan_scanrelids(scanrelids,
+ (Plan *) lfirst(l));
+ }
+ break;
+
+ case T_MergeAppend:
+ {
+ MergeAppend *mplan = (MergeAppend *) plan;
+ ListCell *l;
+
+ foreach(l, mplan->mergeplans)
+ scanrelids = add_plan_scanrelids(scanrelids,
+ (Plan *) lfirst(l));
+ }
+ break;
+
+ case T_BitmapAnd:
+ {
+ BitmapAnd *baplan = (BitmapAnd *) plan;
+ ListCell *l;
+
+ foreach(l, baplan->bitmapplans)
+ scanrelids = add_plan_scanrelids(scanrelids,
+ (Plan *) lfirst(l));
+ }
+ break;
+
+ case T_BitmapOr:
+ {
+ BitmapOr *boplan = (BitmapOr *) plan;
+ ListCell *l;
+
+ foreach(l, boplan->bitmapplans)
+ scanrelids = add_plan_scanrelids(scanrelids,
+ (Plan *) lfirst(l));
+ }
+ break;
+
+ default:
+ break;
+ }
+
+ /* Recurse into child plans. */
+ scanrelids = add_plan_scanrelids(scanrelids, plan->lefttree);
+ scanrelids = add_plan_scanrelids(scanrelids, plan->righttree);
+
+ return scanrelids;
+}
+
/*
* set_plan_refs: recurse through the Plan nodes of a single subquery level
*/
@@ -1743,7 +1832,11 @@ set_append_references(PlannerInfo *root,
Plan *p = (Plan *) linitial(aplan->appendplans);
if (p->parallel_aware == aplan->plan.parallel_aware)
+ {
+ root->glob->elidedAppendChildRels =
+ add_plan_scanrelids(root->glob->elidedAppendChildRels, p);
return clean_up_removed_plan_level((Plan *) aplan, p);
+ }
}
/*
@@ -1821,7 +1914,11 @@ set_mergeappend_references(PlannerInfo *root,
Plan *p = (Plan *) linitial(mplan->mergeplans);
if (p->parallel_aware == mplan->plan.parallel_aware)
+ {
+ root->glob->elidedAppendChildRels =
+ add_plan_scanrelids(root->glob->elidedAppendChildRels, p);
return clean_up_removed_plan_level((Plan *) mplan, p);
+ }
}
/*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c17b53f7ad..4303482499 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* "flat" list of integer RT indexes */
+ List *elidedAppendChildRels;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 7a5f3ba625..203268d1ba 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -80,6 +80,8 @@ typedef struct PlannedStmt
List *appendRelations; /* list of AppendRelInfo nodes */
+ List *elidedAppendChildRels; /* "flat" list of integer RT indexes */
+
List *subplans; /* Plan trees for SubPlan expressions; note
* that some could be NULL */
--
2.35.3
v39-0001-Add-field-to-store-partitioned-relids-to-Append-.patchapplication/octet-stream; name=v39-0001-Add-field-to-store-partitioned-relids-to-Append-.patchDownload
From 6d035cbd208f4b7de978bf4c7fbb7e1f8db07d24 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 9 Mar 2023 11:26:06 +0900
Subject: [PATCH v39 1/4] Add field to store partitioned relids to
Append/MergeAppend
A future commit would like to move the timing of locking relations
referenced in a cached plan to ExecInitNode() traversal of the plan
tree from the current loop-over-rangetable in AcquireExecutorLocks().
But partitioned tables do not have their own Scan nodes for the
executor to be able to find them through the plan tree traversal, so
their RT indexes must be remembered via this new field of
Append/MergeAppend node.
The code to look up partitioned parent relids for a given list of
partition scan subpaths of an Append/MergeAppend is already present
in make_partition_pruneinfo() but was local to partprune.c. This
commit refactors that code into its own function called
add_append_subpath_partrelids(). Considering that the partitioned
parent relids must also be looked up for the cases where
Append/MergeAppend child subpaths are not simple scan nodes, but also
join or aggregrate nodes (due to partitionwise join and aggregate
features), the code in new function needs to be generalized to the
cases where child rels can be joinrels or upper (grouping) rels.
Finally, to facilitate the lookup of parent rels in
add_append_subpath_partrelids(), set the link to parent rels in the
RelOptInfos of child grouping rels too, like it's already done for
the RelOptInfos of child baserels (scan-level) and child joinrels.
---
src/backend/optimizer/plan/createplan.c | 41 ++++++--
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/plan/setrefs.c | 4 +
src/backend/optimizer/util/appendinfo.c | 134 ++++++++++++++++++++++++
src/backend/partitioning/partprune.c | 124 +++-------------------
src/include/nodes/plannodes.h | 14 +++
src/include/optimizer/appendinfo.h | 3 +
src/include/partitioning/partprune.h | 3 +-
8 files changed, 203 insertions(+), 123 deletions(-)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 4bb38160b3..48febf4045 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -25,6 +25,7 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
@@ -1210,6 +1211,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
Oid *nodeCollations = NULL;
bool *nodeNullsFirst = NULL;
bool consider_async = false;
+ List *allpartrelids = NIL;
/*
* The subpaths list could be empty, if every child was proven empty by
@@ -1351,15 +1353,23 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
++nasyncplans;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ plan->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1380,7 +1390,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
partpruneinfo =
make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
plan->appendplans = subplans;
@@ -1426,6 +1437,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ List *allpartrelids = NIL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1515,15 +1527,23 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplan = (Plan *) sort;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ node->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1535,7 +1555,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
partpruneinfo = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
node->mergeplans = subplans;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1e4dd27dba..8e3d2c1e35 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7798,8 +7798,11 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
agg_costs, gd, &child_extra,
&child_partially_grouped_rel);
+ /* Mark as child of grouped_rel. */
+ child_grouped_rel->parent = grouped_rel;
if (child_partially_grouped_rel)
{
+ child_partially_grouped_rel->parent = grouped_rel;
partially_grouped_live_children =
lappend(partially_grouped_live_children,
child_partially_grouped_rel);
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1ca26baa25..c4db6812ec 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1754,6 +1754,8 @@ set_append_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) aplan, rtoffset);
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
+ foreach(l, aplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (aplan->part_prune_info)
{
@@ -1830,6 +1832,8 @@ set_mergeappend_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) mplan, rtoffset);
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
+ foreach(l, mplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (mplan->part_prune_info)
{
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index f456b3b0a4..14c74a3a4e 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -41,6 +41,7 @@ static void make_inh_translation_list(Relation oldrelation,
AppendRelInfo *appinfo);
static Node *adjust_appendrel_attrs_mutator(Node *node,
adjust_appendrel_attrs_context *context);
+static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
/*
@@ -1035,3 +1036,136 @@ distribute_row_identity_vars(PlannerInfo *root)
}
}
}
+
+/*
+ * add_append_subpath_partrelids
+ * Look up a child subpath's rel's partitioned parent relids up to
+ * parentrel and add the bitmapset containing those into
+ * 'allpartrelids'
+ */
+List *
+add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids)
+{
+ RelOptInfo *prel = subpath->parent;
+ Relids partrelids = NULL;
+
+ /* Nothing to do if there's no parent to begin with. */
+ if (!IS_OTHER_REL(prel))
+ return allpartrelids;
+
+ /*
+ * Traverse up to the pathrel's topmost partitioned parent, collecting
+ * parent relids as we go; but stop if we reach parentrel. (Normally, a
+ * pathrel's topmost partitioned parent is either parentrel or a UNION ALL
+ * appendrel child of parentrel. But when handling partitionwise joins of
+ * multi-level partitioning trees, we can see an append path whose
+ * parentrel is an intermediate partitioned table.)
+ */
+ do
+ {
+ Relids parent_relids = NULL;
+
+ /*
+ * For simple child rels, we can simply get the parent relid from
+ * prel->parent. But for partitionwise join and aggregate child rels,
+ * while we can use prel->parent to move up the tree, parent relids to
+ * add into 'partrelids' must be found the hard way through the
+ * AppendInfoInfos, because 1) a joinrel's relids may point to RTE_JOIN
+ * entries, 2) topmost parent grouping rel's relids field is left NULL.
+ */
+ if (IS_SIMPLE_REL(prel))
+ {
+ prel = prel->parent;
+ /* Stop once we reach the root partitioned rel. */
+ if (!IS_PARTITIONED_REL(prel))
+ break;
+ parent_relids = bms_add_members(parent_relids, prel->relids);
+ }
+ else
+ {
+ AppendRelInfo **appinfos;
+ int nappinfos,
+ i;
+
+ appinfos = find_appinfos_by_relids(root, prel->relids,
+ &nappinfos);
+ for (i = 0; i < nappinfos; i++)
+ {
+ AppendRelInfo *appinfo = appinfos[i];
+
+ parent_relids = bms_add_member(parent_relids,
+ appinfo->parent_relid);
+ }
+ pfree(appinfos);
+ prel = prel->parent;
+ }
+ /* accept this level as an interesting parent */
+ partrelids = bms_add_members(partrelids, parent_relids);
+ if (prel == parentrel)
+ break; /* don't traverse above parentrel */
+ } while (IS_OTHER_REL(prel));
+
+ if (partrelids == NULL)
+ return allpartrelids;
+
+ return add_part_relids(allpartrelids, partrelids);
+}
+
+/*
+ * add_part_relids
+ * Add new info to a list of Bitmapsets of partitioned relids.
+ *
+ * Within 'allpartrelids', there is one Bitmapset for each topmost parent
+ * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
+ * parent as well as its relevant non-leaf child partitions. Since (by
+ * construction of the rangetable list) parent partitions must have lower
+ * RT indexes than their children, we can distinguish the topmost parent
+ * as being the lowest set bit in the Bitmapset.
+ *
+ * 'partrelids' contains the RT indexes of a parent partitioned rel, and
+ * possibly some non-leaf children, that are newly identified as parents of
+ * some subpath rel passed to make_partition_pruneinfo(). These are added
+ * to an appropriate member of 'allpartrelids'.
+ *
+ * Note that the list contains only RT indexes of partitioned tables that
+ * are parents of some scan-level relation appearing in the 'subpaths' that
+ * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
+ * not allowed to be higher than the 'parentrel' associated with the append
+ * path. In this way, we avoid expending cycles on partitioned rels that
+ * can't contribute useful pruning information for the problem at hand.
+ * (It is possible for 'parentrel' to be a child partitioned table, and it
+ * is also possible for scan-level relations to be child partitioned tables
+ * rather than leaf partitions. Hence we must construct this relation set
+ * with reference to the particular append path we're dealing with, rather
+ * than looking at the full partitioning structure represented in the
+ * RelOptInfos.)
+ */
+static List *
+add_part_relids(List *allpartrelids, Bitmapset *partrelids)
+{
+ Index targetpart;
+ ListCell *lc;
+
+ /* We can easily get the lowest set bit this way: */
+ targetpart = bms_next_member(partrelids, -1);
+ Assert(targetpart > 0);
+
+ /* Look for a matching topmost parent */
+ foreach(lc, allpartrelids)
+ {
+ Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
+ Index currtarget = bms_next_member(currpartrelids, -1);
+
+ if (targetpart == currtarget)
+ {
+ /* Found a match, so add any new RT indexes to this hierarchy */
+ currpartrelids = bms_add_members(currpartrelids, partrelids);
+ lfirst(lc) = currpartrelids;
+ return allpartrelids;
+ }
+ }
+ /* No match, so add the new partition hierarchy to the list */
+ return lappend(allpartrelids, partrelids);
+}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 7179b22a05..213512a5f4 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -138,7 +138,6 @@ typedef struct PruneStepResult
} PruneStepResult;
-static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
RelOptInfo *parentrel,
List *prunequal,
@@ -218,33 +217,32 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'allpartrelids' contains Bitmapsets of RT indexes of partitioned parents
+ * whose partitions' Paths are in 'subpaths'; there's one Bitmapset for every
+ * partition tree involved.
*/
PartitionPruneInfo *
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ List *allpartrelids)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
- List *allpartrelids;
List *prunerelinfos;
int *relid_subplan_map;
ListCell *lc;
int i;
+ Assert(list_length(allpartrelids) > 0);
+
/*
- * Scan the subpaths to see which ones are scans of partition child
- * relations, and identify their parent partitioned rels. (Note: we must
- * restrict the parent partitioned rels to be parentrel or children of
- * parentrel, otherwise we couldn't translate prunequal to match.)
- *
- * Also construct a temporary array to map from partition-child-relation
- * relid to the index in 'subpaths' of the scan plan for that partition.
+ * Construct a temporary array to map from partition-child-relation relid
+ * to the index in 'subpaths' of the scan plan for that partition.
* (Use of "subplan" rather than "subpath" is a bit of a misnomer, but
* we'll let it stand.) For convenience, we use 1-based indexes here, so
* that zero can represent an un-filled array entry.
*/
- allpartrelids = NIL;
relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
i = 1;
@@ -253,50 +251,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Path *path = (Path *) lfirst(lc);
RelOptInfo *pathrel = path->parent;
- /* We don't consider partitioned joins here */
- if (pathrel->reloptkind == RELOPT_OTHER_MEMBER_REL)
- {
- RelOptInfo *prel = pathrel;
- Bitmapset *partrelids = NULL;
-
- /*
- * Traverse up to the pathrel's topmost partitioned parent,
- * collecting parent relids as we go; but stop if we reach
- * parentrel. (Normally, a pathrel's topmost partitioned parent
- * is either parentrel or a UNION ALL appendrel child of
- * parentrel. But when handling partitionwise joins of
- * multi-level partitioning trees, we can see an append path whose
- * parentrel is an intermediate partitioned table.)
- */
- do
- {
- AppendRelInfo *appinfo;
-
- Assert(prel->relid < root->simple_rel_array_size);
- appinfo = root->append_rel_array[prel->relid];
- prel = find_base_rel(root, appinfo->parent_relid);
- if (!IS_PARTITIONED_REL(prel))
- break; /* reached a non-partitioned parent */
- /* accept this level as an interesting parent */
- partrelids = bms_add_member(partrelids, prel->relid);
- if (prel == parentrel)
- break; /* don't traverse above parentrel */
- } while (prel->reloptkind == RELOPT_OTHER_MEMBER_REL);
-
- if (partrelids)
- {
- /*
- * Found some relevant parent partitions, which may or may not
- * overlap with partition trees we already found. Add new
- * information to the allpartrelids list.
- */
- allpartrelids = add_part_relids(allpartrelids, partrelids);
- /* Also record the subplan in relid_subplan_map[] */
- /* No duplicates please */
- Assert(relid_subplan_map[pathrel->relid] == 0);
- relid_subplan_map[pathrel->relid] = i;
- }
- }
+ /* No duplicates please */
+ Assert(relid_subplan_map[pathrel->relid] == 0);
+ relid_subplan_map[pathrel->relid] = i;
i++;
}
@@ -362,63 +319,6 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return pruneinfo;
}
-/*
- * add_part_relids
- * Add new info to a list of Bitmapsets of partitioned relids.
- *
- * Within 'allpartrelids', there is one Bitmapset for each topmost parent
- * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
- * parent as well as its relevant non-leaf child partitions. Since (by
- * construction of the rangetable list) parent partitions must have lower
- * RT indexes than their children, we can distinguish the topmost parent
- * as being the lowest set bit in the Bitmapset.
- *
- * 'partrelids' contains the RT indexes of a parent partitioned rel, and
- * possibly some non-leaf children, that are newly identified as parents of
- * some subpath rel passed to make_partition_pruneinfo(). These are added
- * to an appropriate member of 'allpartrelids'.
- *
- * Note that the list contains only RT indexes of partitioned tables that
- * are parents of some scan-level relation appearing in the 'subpaths' that
- * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
- * not allowed to be higher than the 'parentrel' associated with the append
- * path. In this way, we avoid expending cycles on partitioned rels that
- * can't contribute useful pruning information for the problem at hand.
- * (It is possible for 'parentrel' to be a child partitioned table, and it
- * is also possible for scan-level relations to be child partitioned tables
- * rather than leaf partitions. Hence we must construct this relation set
- * with reference to the particular append path we're dealing with, rather
- * than looking at the full partitioning structure represented in the
- * RelOptInfos.)
- */
-static List *
-add_part_relids(List *allpartrelids, Bitmapset *partrelids)
-{
- Index targetpart;
- ListCell *lc;
-
- /* We can easily get the lowest set bit this way: */
- targetpart = bms_next_member(partrelids, -1);
- Assert(targetpart > 0);
-
- /* Look for a matching topmost parent */
- foreach(lc, allpartrelids)
- {
- Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
- Index currtarget = bms_next_member(currpartrelids, -1);
-
- if (targetpart == currtarget)
- {
- /* Found a match, so add any new RT indexes to this hierarchy */
- currpartrelids = bms_add_members(currpartrelids, partrelids);
- lfirst(lc) = currpartrelids;
- return allpartrelids;
- }
- }
- /* No match, so add the new partition hierarchy to the list */
- return lappend(allpartrelids, partrelids);
-}
-
/*
* make_partitionedrel_pruneinfo
* Build a List of PartitionedRelPruneInfos, one for each interesting
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 1b787fe031..7a5f3ba625 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -267,6 +267,13 @@ typedef struct Append
List *appendplans;
int nasyncplans; /* # of asynchronous plans */
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * appendplans. The list contains a bitmapset for every partition tree
+ * covered by this Append.
+ */
+ List *allpartrelids;
+
/*
* All 'appendplans' preceding this index are non-partial plans. All
* 'appendplans' from this index onwards are partial plans.
@@ -291,6 +298,13 @@ typedef struct MergeAppend
List *mergeplans;
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * mergeplans. The list contains a bitmapset for every partition tree
+ * covered by this MergeAppend.
+ */
+ List *allpartrelids;
+
/* these fields are just like the sort-key info in struct Sort: */
/* number of sort-key columns */
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index a05f91f77d..1621a7319a 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -46,5 +46,8 @@ extern void add_row_identity_columns(PlannerInfo *root, Index rtindex,
RangeTblEntry *target_rte,
Relation target_relation);
extern void distribute_row_identity_vars(PlannerInfo *root);
+extern List *add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids);
#endif /* APPENDINFO_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 8636e04e37..caa774a111 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -73,7 +73,8 @@ typedef struct PartitionPruneContext
extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ List *allpartrelids);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v39-0003-Delay-locking-of-child-tables-in-cached-plans-un.patchapplication/octet-stream; name=v39-0003-Delay-locking-of-child-tables-in-cached-plans-un.patchDownload
From 5a91c233c6bd0c66db36b41a4c3005a6f3a910bd Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 20 Jan 2023 16:52:31 +0900
Subject: [PATCH v39 3/4] Delay locking of child tables in cached plans until
ExecutorStart()
Currently, GetCachedPlan() takes locks on all relations contained in
a cached plan before returning it as a valid plan to its callers for
execution. One disadvantage is that if the plan contains partitions
that are prunable with conditions involving EXTERN parameters, many
of them would be locked unnecessarily, because only those that
survive the pruning need to have been locked. Locking all partitions
this way causes significant delay when there are many partitions.
This commit rearranges things so that the child tables are now locked
during the ExecInitNode() initialization of the plan tree in a
CachedPlan. If the locking of child tables causes the CachedPlan
to go stale (that is, its is_valid set to false by
PlanCacheRelCallback() when an invalidation message matching some
child table contained in the plan is processed), the executor abandons
further execution with it and asks the caller to retry with a new one.
When a CachedPlan is found to have gone stale as described above,
QueryDesc.planstate is set to NULL, indicating that no execution is
possible with the plan tree as is. Though some plan tree subnodes
may get fully initialized before the CachedPlan's staleness is
detected, so to ensure that they are released by ExecEndPlan(),
ExecInitNode() now adds successfully initialized nodes a new List
in EState called es_inited_plannodes. ExecEndPlan() releases them
individually by calling ExecEndNode() on each. ExecEndNode() is
no longer recursive, because all nodes that need to be closed are
found by interating es_inited_plannodes.
This commit introduces a new executor flag EXEC_FLAG_GET_LOCKS that
should be added into eflags to indicate that
ExecGetRangeTableRelation() take a lock on the table. It is only
set when in ExecInit[Merge]Append() given that child tables or plan
subnodes to scan them normally appear under Append/MergeAppend.
Those that don't are locked directly in InitPlan() using their RT
indexes found in PlannedStmt.elidedAppendChildRelations.
Call sites that use plancache (GetCachedPlan) to get the plan trees
to pass to the executor for execution should now be prepared to
handle the case that the plan tree may be flagged by the executor as
stale as described above. To that end, this commit refactors the
relevant code sites to move the ExecutorStart() call closer to the
GetCachedPlan() to implement the replan loop conveniently.
PortalStart() now performs CreateQueryDesc() and ExecutorStart() for
all portal strategies, including those pertaining to multiple queries.
The QueryDescs for strategies handled by PortalRunMulti() are
remembered in the Portal in a new List field 'qdescs', allocated in a
new memory context 'queryContext'. This new arrangment is to make it
easier to discard and recreate a Portal if the CachedPlan goes stale
during setup.
---
contrib/postgres_fdw/postgres_fdw.c | 4 +
src/backend/commands/copyto.c | 4 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 146 ++++++---
src/backend/commands/extension.c | 2 +
src/backend/commands/matview.c | 3 +-
src/backend/commands/portalcmds.c | 16 +-
src/backend/commands/prepare.c | 32 +-
src/backend/executor/execMain.c | 116 +++++--
src/backend/executor/execParallel.c | 9 +-
src/backend/executor/execPartition.c | 14 +
src/backend/executor/execProcnode.c | 50 ++-
src/backend/executor/execUtils.c | 65 +++-
src/backend/executor/functions.c | 2 +
src/backend/executor/nodeAgg.c | 6 +-
src/backend/executor/nodeAppend.c | 54 ++--
src/backend/executor/nodeBitmapAnd.c | 31 +-
src/backend/executor/nodeBitmapHeapscan.c | 9 +-
src/backend/executor/nodeBitmapIndexscan.c | 9 +-
src/backend/executor/nodeBitmapOr.c | 31 +-
src/backend/executor/nodeCustom.c | 2 +
src/backend/executor/nodeForeignscan.c | 8 +-
src/backend/executor/nodeGather.c | 4 +-
src/backend/executor/nodeGatherMerge.c | 3 +-
src/backend/executor/nodeGroup.c | 7 +-
src/backend/executor/nodeHash.c | 10 +-
src/backend/executor/nodeHashjoin.c | 10 +-
src/backend/executor/nodeIncrementalSort.c | 7 +-
src/backend/executor/nodeIndexonlyscan.c | 11 +-
src/backend/executor/nodeIndexscan.c | 11 +-
src/backend/executor/nodeLimit.c | 3 +-
src/backend/executor/nodeLockRows.c | 3 +-
src/backend/executor/nodeMaterial.c | 7 +-
src/backend/executor/nodeMemoize.c | 7 +-
src/backend/executor/nodeMergeAppend.c | 53 +--
src/backend/executor/nodeMergejoin.c | 10 +-
src/backend/executor/nodeModifyTable.c | 12 +-
src/backend/executor/nodeNestloop.c | 10 +-
src/backend/executor/nodeProjectSet.c | 7 +-
src/backend/executor/nodeRecursiveunion.c | 10 +-
src/backend/executor/nodeResult.c | 7 +-
src/backend/executor/nodeSamplescan.c | 2 +
src/backend/executor/nodeSeqscan.c | 2 +
src/backend/executor/nodeSetOp.c | 4 +-
src/backend/executor/nodeSort.c | 7 +-
src/backend/executor/nodeSubqueryscan.c | 7 +-
src/backend/executor/nodeTidrangescan.c | 2 +
src/backend/executor/nodeTidscan.c | 2 +
src/backend/executor/nodeUnique.c | 4 +-
src/backend/executor/nodeWindowAgg.c | 6 +-
src/backend/executor/spi.c | 49 ++-
src/backend/storage/lmgr/lmgr.c | 45 +++
src/backend/tcop/postgres.c | 13 +-
src/backend/tcop/pquery.c | 305 +++++++++---------
src/backend/utils/cache/lsyscache.c | 21 ++
src/backend/utils/cache/plancache.c | 149 +++------
src/backend/utils/mmgr/portalmem.c | 9 +
src/include/commands/explain.h | 7 +-
src/include/executor/execdesc.h | 5 +
src/include/executor/executor.h | 21 ++
src/include/nodes/execnodes.h | 6 +
src/include/storage/lmgr.h | 1 +
src/include/utils/lsyscache.h | 1 +
src/include/utils/plancache.h | 14 +
src/include/utils/portal.h | 4 +
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 63 +++-
.../expected/cached-plan-replan.out | 156 +++++++++
.../specs/cached-plan-replan.spec | 61 ++++
69 files changed, 1218 insertions(+), 558 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-replan.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-replan.spec
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index c5cada55fb..1edd4c3f17 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2658,7 +2658,11 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
/* Get info about foreign table. */
rtindex = node->resultRelInfo->ri_RangeTableIndex;
if (fsplan->scan.scanrelid == 0)
+ {
dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (!ExecPlanStillValid(estate))
+ return;
+ }
else
dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9e4b2437a5..8244194681 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,8 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL,
+ pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -569,6 +570,7 @@ BeginCopyTo(ParseState *pstate,
* ExecutorStart computes a result tupdesc for us
*/
ExecutorStart(cstate->queryDesc, 0);
+ Assert(cstate->queryDesc->plan_valid);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index e91920ca14..18b07c0200 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 15f9bddcdf..91632d83e4 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -393,6 +393,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
else
{
PlannedStmt *plan;
+ QueryDesc *queryDesc;
instr_time planstart,
planduration;
BufferUsage bufusage_start,
@@ -415,12 +416,91 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
+ queryDesc = ExplainQueryDesc(plan, NULL, queryString, into, es,
+ params, queryEnv);
+ Assert(queryDesc);
+
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(queryDesc, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
+/*
+ * ExplainQueryDesc
+ * Set up QueryDesc for EXPLAINing a given plan
+ *
+ * This returns NULL if cplan is found to have been invalidated since its
+ * creation.
+ */
+QueryDesc *
+ExplainQueryDesc(PlannedStmt *stmt, CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv)
+{
+ QueryDesc *queryDesc;
+ DestReceiver *dest;
+ int eflags;
+ int instrument_option = 0;
+
+ /*
+ * Normally we discard the query's output, but if explaining CREATE TABLE
+ * AS, we'd better use the appropriate tuple receiver.
+ */
+ if (into)
+ dest = CreateIntoRelDestReceiver(into);
+ else
+ dest = None_Receiver;
+
+ if (es->analyze && es->timing)
+ instrument_option |= INSTRUMENT_TIMER;
+ else if (es->analyze)
+ instrument_option |= INSTRUMENT_ROWS;
+
+ if (es->buffers)
+ instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
+
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+
+ /* Create a QueryDesc for the query */
+ queryDesc = CreateQueryDesc(stmt, cplan, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv, instrument_option);
+
+ /* Select execution options */
+ if (es->analyze)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_EXPLAIN_ONLY;
+ if (es->generic)
+ eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
+ if (into)
+ eflags |= GetIntoRelEFlags(into);
+
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated as we're doing that.
+ */
+ ExecutorStart(queryDesc, eflags);
+ if (!queryDesc->plan_valid)
+ {
+ /* Clean up. */
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ return NULL;
+ }
+
+ return queryDesc;
+}
+
/*
* ExplainOneUtility -
* print out the execution plan for one utility statement
@@ -524,29 +604,16 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
{
- DestReceiver *dest;
- QueryDesc *queryDesc;
instr_time starttime;
double totaltime = 0;
- int eflags;
- int instrument_option = 0;
-
- Assert(plannedstmt->commandType != CMD_UTILITY);
- if (es->analyze && es->timing)
- instrument_option |= INSTRUMENT_TIMER;
- else if (es->analyze)
- instrument_option |= INSTRUMENT_ROWS;
-
- if (es->buffers)
- instrument_option |= INSTRUMENT_BUFFERS;
- if (es->wal)
- instrument_option |= INSTRUMENT_WAL;
+ Assert(queryDesc->plannedstmt->commandType != CMD_UTILITY);
/*
* We always collect timing for the entire statement, even when node-level
@@ -555,40 +622,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
- /*
- * Normally we discard the query's output, but if explaining CREATE TABLE
- * AS, we'd better use the appropriate tuple receiver.
- */
- if (into)
- dest = CreateIntoRelDestReceiver(into);
- else
- dest = None_Receiver;
-
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
-
- /* Select execution options */
- if (es->analyze)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_EXPLAIN_ONLY;
- if (es->generic)
- eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
- if (into)
- eflags |= GetIntoRelEFlags(into);
-
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
-
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -4865,6 +4898,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 0eabe18335..5a76343123 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -797,11 +797,13 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
ExecutorStart(qdesc, 0);
+ Assert(qdesc->plan_valid);
ExecutorRun(qdesc, ForwardScanDirection, 0, true);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index f9a3bdfc3a..1c1ce1e17d 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -409,12 +409,13 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, 0);
+ Assert(queryDesc->plan_valid);
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 73ed7aa2f0..4abbec054b 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -146,6 +146,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
PortalStart(portal, params, 0, GetActiveSnapshot());
Assert(portal->strategy == PORTAL_ONE_SELECT);
+ Assert(portal->plan_valid);
/*
* We're done; the query won't actually be run until PerformPortalFetch is
@@ -249,6 +250,17 @@ PerformPortalClose(const char *name)
PortalDrop(portal, false);
}
+/*
+ * Release a portal's QueryDesc.
+ */
+void
+PortalQueryFinish(QueryDesc *queryDesc)
+{
+ ExecutorFinish(queryDesc);
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+}
+
/*
* PortalCleanup
*
@@ -295,9 +307,7 @@ PortalCleanup(Portal portal)
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
- FreeQueryDesc(queryDesc);
+ PortalQueryFinish(queryDesc);
CurrentResourceOwner = saveResourceOwner;
}
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..c9070ed97f 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -183,6 +183,7 @@ ExecuteQuery(ParseState *pstate,
paramLI = EvaluateParams(pstate, entry, stmt->params, estate);
}
+replan:
/* Create a new portal to run the query in */
portal = CreateNewPortal();
/* Don't display the portal in pg_cursors, it is for internal use only */
@@ -251,10 +252,19 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal contains a cached plan, it
+ * must be recreated if portal->plan_valid is false which tells that the
+ * cached plan was found to have been invalidated when initializing one of
+ * the plan trees contained in it.
*/
PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
(void) PortalRun(portal, count, false, true, dest, dest, qc);
PortalDrop(portal, false);
@@ -574,7 +584,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -618,6 +628,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI,
CurrentResourceOwner, queryEnv);
@@ -639,8 +650,21 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ {
+ QueryDesc *queryDesc;
+
+ queryDesc = ExplainQueryDesc(pstmt, cplan, queryString,
+ into, es, paramLI, queryEnv);
+ if (queryDesc == NULL)
+ {
+ ExplainResetOutput(es);
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ goto replan;
+ }
+ ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
+ }
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index c76fdf59ec..00db6eb307 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -611,6 +611,16 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
RTEPermissionInfo *perminfo = lfirst_node(RTEPermissionInfo, l);
Assert(OidIsValid(perminfo->relid));
+
+ /*
+ * Relations whose permissions need to be checked must already have
+ * been locked by the parser or by AcquirePlannerLocks() if a
+ * cached plan is being executed.
+ * XXX shouldn't we skip calling ExecCheckPermissions from InitPlan
+ * in a parallel worker?
+ */
+ Assert(CheckRelLockedByMe(perminfo->relid, AccessShareLock, true) ||
+ IsParallelWorker());
result = ExecCheckOneRelPerms(perminfo);
if (!result)
{
@@ -820,6 +830,23 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ *
+ * Normally, the plan tree given in queryDesc->plannedstmt is known to be
+ * valid in that *all* relations contained in plannedstmt->relationOids have
+ * already been locked. That may not be the case however if the plannedstmt
+ * comes from a CachedPlan, one given in queryDesc->cplan, not all relations
+ * referenced in the plan would have been locked; to wit AcquirePlannerLocks()
+ * only locks relations mentioned in the query but not any child relations
+ * that would have been added by the planner. Locks on the child relations
+ * will be taken when initializing their Scan nodes in ExecInitNode() that is
+ * done here. If the CachedPlan gets invalidated as these locks are taken,
+ * plan tree initialization is suspended at the point when such invalidation is
+ * first detected and queryDesc->planstat will be set to NULL and
+ * queryDesc->plan_valid to false. Callers must retry the execution after
+ * creating a new CachedPlan in that case, after properly releasing the
+ * resources of this QueryDesc, which includes calling ExecutorFinish() and
+ * ExecutorEnd() on the EState contained therein.
* ----------------------------------------------------------------
*/
static void
@@ -830,7 +857,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
- PlanState *planstate;
+ PlanState *planstate = NULL;
TupleDesc tupType;
ListCell *l;
int i;
@@ -841,10 +868,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
/*
- * initialize the node's execution state
+ * Set up range table in EState.
*/
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+ estate->es_cachedplan = queryDesc->cplan;
estate->es_plannedstmt = plannedstmt;
/*
@@ -877,6 +905,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (!ExecPlanStillValid(estate))
+ goto plan_init_suspended;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -920,6 +950,16 @@ InitPlan(QueryDesc *queryDesc, int eflags)
/* signal that this EState is not used for EPQ */
estate->es_epq_active = NULL;
+ /*
+ * Must take locks on child tables if running a cached plan, because
+ * GetCachedPlan() would've only locked the root parent named in the
+ * query. Child relations that appear under Append/MergeAppend are
+ * locked in ExecInit[Merge]Append().
+ */
+ if (estate->es_cachedplan)
+ ExecLockElidedAppendChildRelations(estate,
+ plannedstmt->elidedAppendChildRels);
+
/*
* Initialize private state information for each SubPlan. We must do this
* before running ExecInitNode on the main query tree, since
@@ -944,10 +984,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
sp_eflags |= EXEC_FLAG_REWIND;
subplanstate = ExecInitNode(subplan, estate, sp_eflags);
+ if (!ExecPlanStillValid(estate))
+ {
+ Assert(subplanstate == NULL);
+ goto plan_init_suspended;
+ }
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
-
i++;
}
@@ -957,6 +1001,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ {
+ Assert(planstate == NULL);
+ goto plan_init_suspended;
+ }
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -999,7 +1048,19 @@ InitPlan(QueryDesc *queryDesc, int eflags)
}
queryDesc->tupDesc = tupType;
+ Assert(planstate != NULL);
queryDesc->planstate = planstate;
+ queryDesc->plan_valid = true;
+ return;
+
+plan_init_suspended:
+ /*
+ * Plan initialization failed. Mark QueryDesc as such. ExecEndPlan()
+ * will clean up initialized plan nodes from estate->es_inited_plannodes.
+ */
+ Assert(planstate == NULL);
+ queryDesc->planstate = NULL;
+ queryDesc->plan_valid = false;
}
/*
@@ -1417,7 +1478,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked.
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
@@ -1495,18 +1556,15 @@ ExecEndPlan(PlanState *planstate, EState *estate)
ListCell *l;
/*
- * shut down the node-type-specific query processing
- */
- ExecEndNode(planstate);
-
- /*
- * for subplans too
+ * Shut down the node-type-specific query processing for all nodes that
+ * were initialized during InitPlan(), both in the main plan tree and those
+ * in subplans (es_subplanstates), if any.
*/
- foreach(l, estate->es_subplanstates)
+ foreach(l, estate->es_inited_plannodes)
{
- PlanState *subplanstate = (PlanState *) lfirst(l);
+ PlanState *planstate = (PlanState *) lfirst(l);
- ExecEndNode(subplanstate);
+ ExecEndNode(planstate);
}
/*
@@ -2849,7 +2907,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2936,6 +2995,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
PlanState *subplanstate;
subplanstate = ExecInitNode(subplan, rcestate, 0);
+
+ /*
+ * At this point, we had better not received any new invalidation
+ * messages that would have caused the plan tree to go stale.
+ */
+ Assert(ExecPlanStillValid(rcestate) && subplanstate);
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
}
@@ -2979,6 +3044,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
*/
epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
+ /*
+ * At this point, we had better not received any new invalidation messages
+ * that would have caused the plan tree to go stale.
+ */
+ Assert(ExecPlanStillValid(rcestate) && epqstate->recheckplanstate);
+
MemoryContextSwitchTo(oldcontext);
}
@@ -3001,6 +3072,10 @@ EvalPlanQualEnd(EPQState *epqstate)
MemoryContext oldcontext;
ListCell *l;
+ /* Nothing to do if EvalPlanQualInit() wasn't done to begin with. */
+ if (epqstate->parentestate == NULL)
+ return;
+
rtsize = epqstate->parentestate->es_range_table_size;
/*
@@ -3021,13 +3096,16 @@ EvalPlanQualEnd(EPQState *epqstate)
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
- ExecEndNode(epqstate->recheckplanstate);
-
- foreach(l, estate->es_subplanstates)
+ /*
+ * Shut down the node-type-specific query processing for all nodes that
+ * were initialized during EvalPlanQualStart(), both in the main plan tree
+ * and those in subplans (es_subplanstates), if any.
+ */
+ foreach(l, estate->es_inited_plannodes)
{
- PlanState *subplanstate = (PlanState *) lfirst(l);
+ PlanState *planstate = (PlanState *) lfirst(l);
- ExecEndNode(subplanstate);
+ ExecEndNode(planstate);
}
/* throw away the per-estate tuple table, some node may have used it */
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index cc2b8ccab7..72f1511720 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1248,8 +1248,14 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. Note that no CachedPlan is available
+ * here even if the leader may have gotten the plan tree from one. That's
+ * fine though, because the leader would have taken the locks necessary
+ * for the plan tree that we have here to be fully valid.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
@@ -1431,6 +1437,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
ExecutorStart(queryDesc, fpes->eflags);
+ Assert(queryDesc->plan_valid);
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index eb8a87fd63..cf73d28baa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -513,6 +513,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
oldcxt = MemoryContextSwitchTo(proute->memcxt);
+ /*
+ * Note that while we normally check ExecPlanStillValid(estate) after each
+ * lock taken during execution initialization, it is fine not do so for
+ * partitions opened here, for tuple routing. Locks taken here can't
+ * possibly invalidate the plan given that the plan doesn't contain any
+ * info about those partitions.
+ */
partrel = table_open(partOid, RowExclusiveLock);
leaf_part_rri = makeNode(ResultRelInfo);
@@ -1111,6 +1118,9 @@ ExecInitPartitionDispatchInfo(EState *estate,
* Only sub-partitioned tables need to be locked here. The root
* partitioned table will already have been locked as it's referenced in
* the query's rtable.
+ *
+ * See the comment in ExecInitPartitionInfo() about taking locks and
+ * not checking ExecPlanStillValid(estate) here.
*/
if (partoid != RelationGetRelid(proute->partition_root))
rel = table_open(partoid, RowExclusiveLock);
@@ -1801,6 +1811,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/* Create the working data structure for pruning */
prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Perform an initial partition prune pass, if required.
@@ -1927,6 +1939,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d4..f3bb1d4591 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -135,7 +135,17 @@ static bool ExecShutdownNode_walker(PlanState *node, void *context);
* 'estate' is the shared execution state for the plan tree
* 'eflags' is a bitwise OR of flag bits described in executor.h
*
- * Returns a PlanState node corresponding to the given Plan node.
+ * Returns a PlanState node corresponding to the given Plan node or NULL.
+ *
+ * NULL may be returned either if the input node is NULL or if the plan
+ * tree that the node is a part of is found to have been invalidated when
+ * taking a lock on the relation mentioned in the node or in a child
+ * node. The latter case arises if the plan tree contains inheritance/
+ * partition child tables and is from a CachedPlan.
+ *
+ * Also, all non-NULL PlanState nodes are added to
+ * estate->es_inited_plannodes for ExecEndPlan() to iterate over to close
+ * each one using ExecEndNode().
* ------------------------------------------------------------------------
*/
PlanState *
@@ -388,6 +398,13 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
break;
}
+ if (!ExecPlanStillValid(estate))
+ {
+ Assert(result == NULL);
+ return NULL;
+ }
+
+ Assert(result != NULL);
ExecSetExecProcNode(result, result->ExecProcNode);
/*
@@ -411,6 +428,13 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
result->instrument = InstrAlloc(1, estate->es_instrument,
result->async_capable);
+ /*
+ * Remember valid PlanState nodes in EState for the processing in
+ * ExecEndPlan().
+ */
+ estate->es_inited_plannodes = lappend(estate->es_inited_plannodes,
+ result);
+
return result;
}
@@ -545,29 +569,21 @@ MultiExecProcNode(PlanState *node)
/* ----------------------------------------------------------------
* ExecEndNode
*
- * Recursively cleans up all the nodes in the plan rooted
- * at 'node'.
+ * Cleans up node
*
- * After this operation, the query plan will not be able to be
- * processed any further. This should be called only after
+ * Child nodes, if any, would have been closed by the caller, so the
+ * ExecEnd* routine for a given node type is only responsible for
+ * cleaning up the resources local to that node.
+ *
+ * After this operation, the query plan containing this node will not be
+ * able to be processed any further. This should be called only after
* the query plan has been fully executed.
* ----------------------------------------------------------------
*/
void
ExecEndNode(PlanState *node)
{
- /*
- * do nothing when we get to the end of a leaf on tree.
- */
- if (node == NULL)
- return;
-
- /*
- * Make sure there's enough stack available. Need to check here, in
- * addition to ExecProcNode() (via ExecProcNodeFirst()), because it's not
- * guaranteed that ExecProcNode() is reached for all nodes.
- */
- check_stack_depth();
+ Assert(node != NULL);
if (node->chgParam != NULL)
{
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 4758ab4132..599db4d597 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -804,7 +804,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (!IsParallelWorker() &&
+ (estate->es_top_eflags & EXEC_FLAG_GET_LOCKS) == 0)
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -820,9 +821,11 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
else
{
/*
- * If we are a parallel worker, we need to obtain our own local
- * lock on the relation. This ensures sane behavior in case the
- * parent process exits before we do.
+ * Take a lock if we are a parallel worker or if the caller has set
+ * the GET_LOCKS flag (callers that open a child relation when
+ * initializing a cached plan). Parallel workers need to have
+ * their own local lock on the relation. This ensures sane
+ * behavior in case the parent process exits before we do.
*/
rel = table_open(rte->relid, rte->rellockmode);
}
@@ -833,6 +836,58 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
return rel;
}
+/*
+ * ExecLockElidedAppendChildRelations
+ * Lock child relations whose parent Append/MergeAppend node was removed
+ * by the planner
+ */
+void
+ExecLockElidedAppendChildRelations(EState *estate, List *elidedAppendChildRels)
+{
+ ListCell *l;
+
+ foreach(l, elidedAppendChildRels)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(lfirst_int(l), estate);
+
+ Assert(rte->rtekind == RTE_RELATION);
+ Assert(OidIsValid(rte->relid));
+ Assert(rte->rellockmode != NoLock);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * ExecLockAppendNonLeafRelations
+ * Lock non-leaf relations whose children are scanned by a given
+ * Append/MergeAppend node
+ */
+void
+ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids)
+{
+ ListCell *l;
+
+ foreach(l, allpartrelids)
+ {
+ Bitmapset *partrelids = lfirst_node(Bitmapset, l);
+ int i;
+
+ /*
+ * XXX - no need really to lock the first member of each bitmapset
+ * because it stands for the root parent mentioned in the query
+ * that should always have been locked before entering the
+ * executor.
+ */
+ i = -1;
+ while ((i = bms_next_member(partrelids, i)) > 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(i, estate);
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+}
+
/*
* ExecInitResultRelation
* Open relation given by the passed-in RT index and fill its
@@ -848,6 +903,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (!ExecPlanStillValid(estate))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f55424eb5a..c88f72bc4e 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -838,6 +838,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL, /* fmgr_sql() doesn't use CachedPlans */
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -863,6 +864,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
eflags = 0; /* default run-to-completion flags */
ExecutorStart(es->qd, eflags);
+ Assert(es->qd->plan_valid);
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 468db94fe5..54f742820b 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3304,6 +3304,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize source tuple type.
@@ -4304,7 +4306,6 @@ GetAggInitVal(Datum textInitVal, Oid transtype)
void
ExecEndAgg(AggState *node)
{
- PlanState *outerPlan;
int transno;
int numGroupingSets = Max(node->maxsets, 1);
int setno;
@@ -4366,9 +4367,6 @@ ExecEndAgg(AggState *node)
/* clean up tuple table */
ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
}
void
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..bd81d0ca4b 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -133,6 +133,24 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_syncdone = false;
appendstate->as_begun = false;
+ /*
+ * Must take locks on child tables if running a cached plan, because
+ * GetCachedPlan() would've only locked the root parent named in the
+ * query.
+ *
+ * First lock non-leaf partitions before doing pruning if any. Even when
+ * no pruning is to be done, non-leaf partitions still must be locked
+ * explicitly like this, because they're not referenced elsewhere in
+ * the plan tree.
+ */
+ if (estate->es_cachedplan)
+ {
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
+ }
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
@@ -147,6 +165,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
list_length(node->appendplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -188,6 +208,13 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendplanstates = (PlanState **) palloc(nplans *
sizeof(PlanState *));
+ /*
+ * Set eflags so that ExecInitNode() recursively locks child relations
+ * appearing in appendplans.
+ */
+ if (estate->es_cachedplan)
+ estate->es_top_eflags |= EXEC_FLAG_GET_LOCKS;
+
/*
* call ExecInitNode on each of the valid plans to be executed and save
* the results into the appendplanstates array.
@@ -221,8 +248,12 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
+ estate->es_top_eflags &= ~EXEC_FLAG_GET_LOCKS;
+
appendstate->as_first_partial_plan = firstvalid;
appendstate->appendplans = appendplanstates;
appendstate->as_nplans = nplans;
@@ -376,30 +407,15 @@ ExecAppend(PlanState *pstate)
/* ----------------------------------------------------------------
* ExecEndAppend
- *
- * Shuts down the subscans of the append node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndAppend(AppendState *node)
{
- PlanState **appendplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- appendplans = node->appendplans;
- nplans = node->as_nplans;
-
- /*
- * shut down each of the subscans
- */
- for (i = 0; i < nplans; i++)
- ExecEndNode(appendplans[i]);
+ /*
+ * Nothing to do as subscans of the append node would be cleaned up by
+ * ExecEndPlan().
+ */
}
void
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4c5eb2b23b..187aea4bb8 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -88,8 +88,9 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
foreach(l, node->bitmapplans)
{
initNode = (Plan *) lfirst(l);
- bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
- i++;
+ bitmapplanstates[i++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
/*
@@ -168,33 +169,15 @@ MultiExecBitmapAnd(BitmapAndState *node)
/* ----------------------------------------------------------------
* ExecEndBitmapAnd
- *
- * Shuts down the subscans of the BitmapAnd node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndBitmapAnd(BitmapAndState *node)
{
- PlanState **bitmapplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- bitmapplans = node->bitmapplans;
- nplans = node->nplans;
-
- /*
- * shut down each of the subscans (that we've initialized)
- */
- for (i = 0; i < nplans; i++)
- {
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
- }
+ /*
+ * Nothing to do as any subscans that would have been initialized would
+ * be cleaned up by ExecEndPlan().
+ */
}
void
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index f35df0b8bf..ee1008519b 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -667,11 +667,6 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
ExecClearTuple(node->ss.ss_ScanTupleSlot);
- /*
- * close down subplans
- */
- ExecEndNode(outerPlanState(node));
-
/*
* release bitmaps and buffers if any
*/
@@ -763,11 +758,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 83ec9ede89..99015812a1 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -211,6 +211,7 @@ BitmapIndexScanState *
ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
{
BitmapIndexScanState *indexstate;
+ Relation indexRelation;
LOCKMODE lockmode;
/* check for unsupported flags */
@@ -262,7 +263,13 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
+ indexRelation = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ {
+ index_close(indexRelation, lockmode);
+ return NULL;
+ }
+ indexstate->biss_RelationDesc = indexRelation;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 0bf8af9652..3f51918fe1 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -89,8 +89,9 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
foreach(l, node->bitmapplans)
{
initNode = (Plan *) lfirst(l);
- bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
- i++;
+ bitmapplanstates[i++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
/*
@@ -186,33 +187,15 @@ MultiExecBitmapOr(BitmapOrState *node)
/* ----------------------------------------------------------------
* ExecEndBitmapOr
- *
- * Shuts down the subscans of the BitmapOr node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndBitmapOr(BitmapOrState *node)
{
- PlanState **bitmapplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- bitmapplans = node->bitmapplans;
- nplans = node->nplans;
-
- /*
- * shut down each of the subscans (that we've initialized)
- */
- for (i = 0; i < nplans; i++)
- {
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
- }
+ /*
+ * Nothing to do as any subscans that would have been initialized would
+ * be cleaned up by ExecEndPlan().
+ */
}
void
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index bd42c65b29..91239cc500 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -61,6 +61,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
css->ss.ss_currentRelation = scan_rel;
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index c2139acca0..207165f44f 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -173,6 +173,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -264,6 +266,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Tell the FDW to initialize the scan.
@@ -309,10 +313,6 @@ ExecEndForeignScan(ForeignScanState *node)
else
node->fdwroutine->EndForeignScan(node);
- /* Shut down any outer plan. */
- if (outerPlanState(node))
- ExecEndNode(outerPlanState(node));
-
/* Free the exprcontext */
ExecFreeExprContext(&node->ss.ps);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 307fc10eea..400c8b42ed 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -89,6 +89,9 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
@@ -248,7 +251,6 @@ ExecGather(PlanState *pstate)
void
ExecEndGather(GatherState *node)
{
- ExecEndNode(outerPlanState(node)); /* let children clean up first */
ExecShutdownGather(node);
ExecFreeExprContext(&node->ps);
if (node->ps.ps_ResultTupleSlot)
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 9d5e1a46e9..9077c4bc55 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -108,6 +108,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Leader may access ExecProcNode result directly (if
@@ -288,7 +290,6 @@ ExecGatherMerge(PlanState *pstate)
void
ExecEndGatherMerge(GatherMergeState *node)
{
- ExecEndNode(outerPlanState(node)); /* let children clean up first */
ExecShutdownGatherMerge(node);
ExecFreeExprContext(&node->ps);
if (node->ps.ps_ResultTupleSlot)
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 25a1618952..976e739ab7 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -185,6 +185,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
@@ -226,15 +228,10 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
void
ExecEndGroup(GroupState *node)
{
- PlanState *outerPlan;
-
ExecFreeExprContext(&node->ss.ps);
/* clean up tuple table */
ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
}
void
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 8b5c35b82b..fc7a6b2ccc 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -386,6 +386,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize our result slot and type. No need to build projection
@@ -413,18 +415,10 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
void
ExecEndHash(HashState *node)
{
- PlanState *outerPlan;
-
/*
* free exprcontext
*/
ExecFreeExprContext(&node->ps);
-
- /*
- * shut down the subplan
- */
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
}
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 980746128b..4c4b39ce2d 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -752,8 +752,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
@@ -878,12 +882,6 @@ ExecEndHashJoin(HashJoinState *node)
ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
ExecClearTuple(node->hj_OuterTupleSlot);
ExecClearTuple(node->hj_HashTupleSlot);
-
- /*
- * clean up subtrees
- */
- ExecEndNode(outerPlanState(node));
- ExecEndNode(innerPlanState(node));
}
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 34257ce34b..8dfb2cb0f6 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1041,6 +1041,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
@@ -1101,11 +1103,6 @@ ExecEndIncrementalSort(IncrementalSortState *node)
node->prefixsort_state = NULL;
}
- /*
- * Shut down the subplan.
- */
- ExecEndNode(outerPlanState(node));
-
SO_printf("ExecEndIncrementalSort: sort node shutdown\n");
}
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0b43a9b969..ea8bef4b97 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -490,6 +490,7 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
{
IndexOnlyScanState *indexstate;
Relation currentRelation;
+ Relation indexRelation;
LOCKMODE lockmode;
TupleDesc tupDesc;
@@ -512,6 +513,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -564,7 +567,13 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->ioss_RelationDesc = index_open(node->indexid, lockmode);
+ indexRelation = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ {
+ index_close(indexRelation, lockmode);
+ return NULL;
+ }
+ indexstate->ioss_RelationDesc = indexRelation;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 4540c7781d..956e9e5543 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -904,6 +904,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
{
IndexScanState *indexstate;
Relation currentRelation;
+ Relation indexRelation;
LOCKMODE lockmode;
/*
@@ -925,6 +926,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -969,7 +972,13 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+ indexRelation = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ {
+ index_close(indexRelation, lockmode);
+ return NULL;
+ }
+ indexstate->iss_RelationDesc = indexRelation;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 425fbfc405..1cc884bc65 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -476,6 +476,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize child expressions
@@ -535,7 +537,6 @@ void
ExecEndLimit(LimitState *node)
{
ExecFreeExprContext(&node->ps);
- ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index e459971d32..77731c0c8c 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -322,6 +322,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
@@ -386,7 +388,6 @@ ExecEndLockRows(LockRowsState *node)
{
/* We may have shut down EPQ already, but no harm in another call */
EvalPlanQualEnd(&node->lr_epqstate);
- ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 09632678b0..a38b9805a5 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result type and slot. No need to initialize projection info
@@ -250,11 +252,6 @@ ExecEndMaterial(MaterialState *node)
if (node->tuplestorestate != NULL)
tuplestore_end(node->tuplestorestate);
node->tuplestorestate = NULL;
-
- /*
- * shut down the subplan
- */
- ExecEndNode(outerPlanState(node));
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 4f04269e26..a8997ba7da 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -938,6 +938,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize return slot and type. No need to initialize projection info
@@ -1099,11 +1101,6 @@ ExecEndMemoize(MemoizeState *node)
* free exprcontext
*/
ExecFreeExprContext(&node->ss.ps);
-
- /*
- * shut down the subplan
- */
- ExecEndNode(outerPlanState(node));
}
void
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 21b5726e6e..06a4827e00 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -81,6 +81,24 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.state = estate;
mergestate->ps.ExecProcNode = ExecMergeAppend;
+ /*
+ * Must take locks on child tables if running a cached plan, because
+ * GetCachedPlan() would've only locked the root parent named in the
+ * query.
+ *
+ * First lock non-leaf partitions before doing pruning if any. Even when
+ * no pruning is to be done, non-leaf partitions still must be locked
+ * explicitly like this, because they're not referenced elsewhere in
+ * the plan tree.
+ */
+ if (estate->es_cachedplan)
+ {
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
+ }
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
@@ -95,6 +113,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
list_length(node->mergeplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -140,6 +160,13 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.resultopsset = true;
mergestate->ps.resultopsfixed = false;
+ /*
+ * Set eflags so that ExecInitNode() recursively locks child relations
+ * appearing in appendplans.
+ */
+ if (estate->es_cachedplan)
+ estate->es_top_eflags |= EXEC_FLAG_GET_LOCKS;
+
/*
* call ExecInitNode on each of the valid plans to be executed and save
* the results into the mergeplanstates array.
@@ -151,8 +178,12 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
+ estate->es_top_eflags &= ~EXEC_FLAG_GET_LOCKS;
+
mergestate->ps.ps_ProjInfo = NULL;
/*
@@ -310,30 +341,14 @@ heap_compare_slots(Datum a, Datum b, void *arg)
/* ----------------------------------------------------------------
* ExecEndMergeAppend
- *
- * Shuts down the subscans of the MergeAppend node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndMergeAppend(MergeAppendState *node)
{
- PlanState **mergeplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- mergeplans = node->mergeplans;
- nplans = node->ms_nplans;
-
- /*
- * shut down each of the subscans
- */
- for (i = 0; i < nplans; i++)
- ExecEndNode(mergeplans[i]);
+ /*
+ * Nothing to do as subscans would be cleaned up by ExecEndPlan().
+ */
}
void
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 00f96d045e..c6644c6816 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1490,11 +1490,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
@@ -1654,12 +1658,6 @@ ExecEndMergeJoin(MergeJoinState *node)
ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
ExecClearTuple(node->mj_MarkedTupleSlot);
- /*
- * shut down the subplans
- */
- ExecEndNode(innerPlanState(node));
- ExecEndNode(outerPlanState(node));
-
MJ1_printf("ExecEndMergeJoin: %s\n",
"node processing ended");
}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 2a5fec8d01..0c3aeb1154 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3984,6 +3984,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
node->epqParam, node->resultRelations);
@@ -4011,6 +4014,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
if (resultRelInfo != mtstate->rootResultRelInfo)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* For child result relations, store the root result relation
@@ -4038,6 +4043,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Do additional per-result-relation initialization.
@@ -4460,11 +4467,6 @@ ExecEndModifyTable(ModifyTableState *node)
* Terminate EPQ execution if active
*/
EvalPlanQualEnd(&node->mt_epqstate);
-
- /*
- * shut down subplan
- */
- ExecEndNode(outerPlanState(node));
}
void
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b3d52e69ec..71a1f8101c 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -295,11 +295,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result slot, type and projection.
@@ -374,12 +378,6 @@ ExecEndNestLoop(NestLoopState *node)
*/
ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- /*
- * close down subplans
- */
- ExecEndNode(outerPlanState(node));
- ExecEndNode(innerPlanState(node));
-
NL1_printf("ExecEndNestLoop: %s\n",
"node processing ended");
}
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index f6ff3dc44c..abcbd7e765 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -247,6 +247,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* we don't use inner plan
@@ -329,11 +331,6 @@ ExecEndProjectSet(ProjectSetState *node)
* clean out the tuple table
*/
ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
- /*
- * shut down subplans
- */
- ExecEndNode(outerPlanState(node));
}
void
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index e781003934..84a706458a 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
@@ -280,12 +284,6 @@ ExecEndRecursiveUnion(RecursiveUnionState *node)
MemoryContextDelete(node->tempContext);
if (node->tableContext)
MemoryContextDelete(node->tableContext);
-
- /*
- * close down subplans
- */
- ExecEndNode(outerPlanState(node));
- ExecEndNode(innerPlanState(node));
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 4219712d30..330ca68d12 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -208,6 +208,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* we don't use inner plan
@@ -249,11 +251,6 @@ ExecEndResult(ResultState *node)
* clean out the tuple table
*/
ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
- /*
- * shut down subplans
- */
- ExecEndNode(outerPlanState(node));
}
void
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index d7e22b1dbb..22357e7a0e 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -125,6 +125,8 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 4da0f28f7b..b0b34cd14e 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,8 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 4bc2406b89..912cf7b37f 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
@@ -589,8 +591,6 @@ ExecEndSetOp(SetOpState *node)
if (node->tableContext)
MemoryContextDelete(node->tableContext);
ExecFreeExprContext(&node->ps);
-
- ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index c6c72c6e67..1ba53373c2 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
@@ -317,11 +319,6 @@ ExecEndSort(SortState *node)
tuplesort_end((Tuplesortstate *) node->tuplesortstate);
node->tuplesortstate = NULL;
- /*
- * shut down the subplan
- */
- ExecEndNode(outerPlanState(node));
-
SO1_printf("ExecEndSort: %s\n",
"sort node shutdown");
}
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 42471bfc04..12014250ae 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
@@ -178,11 +180,6 @@ ExecEndSubqueryScan(SubqueryScanState *node)
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
- /*
- * close down subquery
- */
- ExecEndNode(node->subplan);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 2124c55ef5..613b377c7c 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -386,6 +386,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 862bd0330b..1b0a2d8083 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -529,6 +529,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 45035d74fa..bd71033622 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -136,6 +136,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result slot and type. Unique nodes do no projections, so
@@ -172,8 +174,6 @@ ExecEndUnique(UniqueState *node)
ExecClearTuple(node->ps.ps_ResultTupleSlot);
ExecFreeExprContext(&node->ps);
-
- ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 310ac23e3a..483f23da18 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2458,6 +2458,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize source tuple type (which is also the tuple type that we'll
@@ -2681,7 +2683,6 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
void
ExecEndWindowAgg(WindowAggState *node)
{
- PlanState *outerPlan;
int i;
release_partition(node);
@@ -2713,9 +2714,6 @@ ExecEndWindowAgg(WindowAggState *node)
pfree(node->perfunc);
pfree(node->peragg);
-
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
}
/* -----------------
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 33975687b3..07b1f453e2 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -71,7 +71,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1623,6 +1623,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
_SPI_current->processed = 0;
_SPI_current->tuptable = NULL;
+replan:
/* Create the portal */
if (name == NULL || name[0] == '\0')
{
@@ -1766,7 +1767,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if portal->plan_valid is false which tells that the cached
+ * plan was found to have been invalidated when initializing one of the
+ * plan trees contained in it.
*/
PortalStart(portal, paramLI, 0, snapshot);
@@ -1775,6 +1779,12 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/* Pop the SPI stack */
_SPI_end_call(true);
@@ -2552,6 +2562,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2661,6 +2672,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2668,14 +2680,32 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+
+ ExecutorStart(qdesc, eflags);
+ if (!qdesc->plan_valid)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ Assert(cplan);
+ ReleaseCachedPlan(cplan, plan_owner);
+ goto replan;
+ }
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2850,10 +2880,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2897,14 +2926,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a672..c807e9cdcc 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -27,6 +27,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
@@ -364,6 +365,50 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
return false;
}
+/*
+ * CheckRelLockedByMe
+ *
+ * Returns true if current transaction holds a lock on the given relation of
+ * mode 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
+ */
+bool
+CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+ Oid dbId = get_rel_relisshared(relid) ? InvalidOid : MyDatabaseId;
+ LOCKTAG tag;
+
+ SET_LOCKTAG_RELATION(tag, dbId, relid);
+
+ if (LockHeldByMe(&tag, lockmode))
+ return true;
+
+ if (orstronger)
+ {
+ LOCKMODE slockmode;
+
+ for (slockmode = lockmode + 1;
+ slockmode <= MaxLockMode;
+ slockmode++)
+ {
+ if (LockHeldByMe(&tag, slockmode))
+ {
+#ifdef NOT_USED
+ /* Sometimes this might be useful for debugging purposes */
+ elog(WARNING, "lock mode %s substituted for %s on relation %s",
+ GetLockmodeName(tag.locktag_lockmethodid, slockmode),
+ GetLockmodeName(tag.locktag_lockmethodid, lockmode),
+ RelationGetRelationName(relation));
+#endif
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
/*
* LockHasWaitersRelation
*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 01b6cc1f7d..4931fb2da7 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1233,6 +1233,7 @@ exec_simple_query(const char *query_string)
* Start the portal. No parameters here.
*/
PortalStart(portal, NULL, 0, InvalidSnapshot);
+ Assert(portal->plan_valid);
/*
* Select the appropriate output format: text unless we are doing a
@@ -1737,6 +1738,7 @@ exec_bind_message(StringInfo input_message)
"commands ignored until end of transaction block"),
errdetail_abort()));
+replan:
/*
* Create the portal. Allow silent replacement of an existing portal only
* if the unnamed portal is specified.
@@ -2028,10 +2030,19 @@ exec_bind_message(StringInfo input_message)
PopActiveSnapshot();
/*
- * And we're ready to start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if portal->plan_valid is false which tells that the cached
+ * plan was found to have been invalidated when initializing one of the
+ * plan trees contained in it.
*/
PortalStart(portal, params, 0, InvalidSnapshot);
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/*
* Apply the result format requests to the portal.
*/
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5565f200c3..09ee6069f9 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -35,12 +36,6 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc);
static void FillPortalStore(Portal portal, bool isTopLevel);
static uint64 RunFromStore(Portal portal, ScanDirection direction, uint64 count,
DestReceiver *dest);
@@ -65,6 +60,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +73,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan, if plan is from one */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -116,86 +113,6 @@ FreeQueryDesc(QueryDesc *qdesc)
}
-/*
- * ProcessQuery
- * Execute a single plannable query within a PORTAL_MULTI_QUERY,
- * PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
- *
- * plan: the plan tree for the query
- * sourceText: the source text of the query
- * params: any parameters needed
- * dest: where to send results
- * qc: where to store the command completion status data.
- *
- * qc may be NULL if caller doesn't want a status string.
- *
- * Must be called in a memory context that will be reset or deleted on
- * error; otherwise the executor's memory usage will be leaked.
- */
-static void
-ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc)
-{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
-
- /*
- * Run the plan to completion.
- */
- ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
-
- /*
- * Build command completion status data, if caller wants one.
- */
- if (qc)
- {
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
- SetQueryCompletion(qc, CMDTAG_SELECT, queryDesc->estate->es_processed);
- break;
- case CMD_INSERT:
- SetQueryCompletion(qc, CMDTAG_INSERT, queryDesc->estate->es_processed);
- break;
- case CMD_UPDATE:
- SetQueryCompletion(qc, CMDTAG_UPDATE, queryDesc->estate->es_processed);
- break;
- case CMD_DELETE:
- SetQueryCompletion(qc, CMDTAG_DELETE, queryDesc->estate->es_processed);
- break;
- case CMD_MERGE:
- SetQueryCompletion(qc, CMDTAG_MERGE, queryDesc->estate->es_processed);
- break;
- default:
- SetQueryCompletion(qc, CMDTAG_UNKNOWN, queryDesc->estate->es_processed);
- break;
- }
- }
-
- /*
- * Now, we close down all the scans and free allocated resources.
- */
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
-
- FreeQueryDesc(queryDesc);
-}
-
/*
* ChoosePortalStrategy
* Select portal execution strategy given the intended statement list.
@@ -427,7 +344,8 @@ FetchStatementTargetList(Node *stmt)
* to be used for cursors).
*
* On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * tupdesc (if any) is known, unless portal->plan_valid is set to false, in
+ * which case, the caller must retry after generating a new CachedPlan.
*/
void
PortalStart(Portal portal, ParamListInfo params,
@@ -435,10 +353,9 @@ PortalStart(Portal portal, ParamListInfo params,
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
- int myeflags;
+ int myeflags = 0;
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_DEFINED);
@@ -448,15 +365,13 @@ PortalStart(Portal portal, ParamListInfo params,
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -472,6 +387,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -493,6 +410,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -501,30 +419,52 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti(). */
+ if (portal->strategy == PORTAL_ONE_RETURNING ||
+ portal->strategy == PORTAL_ONE_MOD_WITH)
+ portal->qdescs = list_make1(queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Call ExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated as we're doing that.
*/
ExecutorStart(queryDesc, myeflags);
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ PopActiveSnapshot();
+ portal->plan_valid = false;
+ goto early_exit;
+ }
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -532,33 +472,11 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -578,11 +496,86 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool first = true;
+
+ myeflags = eflags;
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /*
+ * Push the snapshot to be used by the executor.
+ */
+ if (!is_utility)
+ {
+ /*
+ * Must copy the snapshot if we'll need to update
+ * its command ID.
+ */
+ if (!first)
+ PushCopiedSnapshot(GetTransactionSnapshot());
+ else
+ PushActiveSnapshot(GetTransactionSnapshot());
+ }
+
+ /*
+ * From the 2nd statement onwards, update the command
+ * ID and the snapshot to match.
+ */
+ if (!first)
+ {
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+ }
+
+ first = false;
+
+ /*
+ * Create the QueryDesc object. DestReceiver will
+ * be set in PortalRunMulti().
+ */
+ queryDesc = CreateQueryDesc(plan, portal->cplan,
+ portal->sourceText,
+ !is_utility ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ if (is_utility)
+ continue;
+
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated as
+ * we're doing that.
+ */
+ ExecutorStart(queryDesc, myeflags);
+ PopActiveSnapshot();
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ portal->plan_valid = false;
+ goto early_exit;
+ }
+ }
+ }
+
portal->tupDesc = NULL;
+ portal->plan_valid = true;
break;
}
}
@@ -594,19 +587,18 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+early_exit:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
-
- portal->status = PORTAL_READY;
}
/*
@@ -1193,7 +1185,7 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1206,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = (QueryDesc *) lfirst(qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1271,23 +1264,38 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Run the plan to completion.
+ */
+ qdesc->dest = dest;
+ ExecutorRun(qdesc, ForwardScanDirection, 0, true);
+
+ /*
+ * Build command completion status data if needed.
+ */
if (pstmt->canSetTag)
{
- /* statement can set tag string */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- dest, qc);
- }
- else
- {
- /* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- altdest, NULL);
+ switch (qdesc->operation)
+ {
+ case CMD_SELECT:
+ SetQueryCompletion(qc, CMDTAG_SELECT, qdesc->estate->es_processed);
+ break;
+ case CMD_INSERT:
+ SetQueryCompletion(qc, CMDTAG_INSERT, qdesc->estate->es_processed);
+ break;
+ case CMD_UPDATE:
+ SetQueryCompletion(qc, CMDTAG_UPDATE, qdesc->estate->es_processed);
+ break;
+ case CMD_DELETE:
+ SetQueryCompletion(qc, CMDTAG_DELETE, qdesc->estate->es_processed);
+ break;
+ case CMD_MERGE:
+ SetQueryCompletion(qc, CMDTAG_MERGE, qdesc->estate->es_processed);
+ break;
+ default:
+ SetQueryCompletion(qc, CMDTAG_UNKNOWN, qdesc->estate->es_processed);
+ break;
+ }
}
if (log_executor_stats)
@@ -1346,8 +1354,15 @@ PortalRunMulti(Portal portal,
* Increment command counter between queries, but not after the last
* one.
*/
- if (lnext(portal->stmts, stmtlist_item) != NULL)
+ if (lnext(portal->qdescs, qdesc_item) != NULL)
CommandCounterIncrement();
+
+ if (qdesc->estate)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ }
+ FreeQueryDesc(qdesc);
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 60978f9415..de3fc756e2 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2073,6 +2073,27 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * get_rel_relisshared
+ *
+ * Returns if the given relation is shared or not
+ */
+bool
+get_rel_relisshared(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relisshared;
+ ReleaseSysCache(tp);
+
+ return result;
+}
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 87210fcf62..16fb85312b 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -100,13 +100,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -788,8 +788,14 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * Note though that if the plan contains any child relations that would have
+ * been added by the planner, which would not have been locked yet (because
+ * AcquirePlannerLocks() only locks relations that would be present in the
+ * range table before entering the planner), the plan could go stale before
+ * it reaches execution if any of those child relations get modified
+ * concurrently. The executor must check that the plan (CachedPlan) is still
+ * valid after taking a lock on each of the child tables, and if it is not,
+ * ask the caller to recreate the plan.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -803,60 +809,56 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
-
- return false;
+ return cplan->is_valid;
}
/*
@@ -1126,8 +1128,15 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid unless it contains inheritance/partition child
+ * tables, that is, only the locks on the tables mentioned in the query have
+ * been taken. If any of those tables have inheritance/partition tables, the
+ * executor must also lock them before executing the plan and if the plan gets
+ * invalidated as a result of taking those locks, must ask the caller to get
+ * a new plan by calling here again. Locking of the child tables must be
+ * deferred to the executor like this, because not all child tables may need
+ * to be locked; some may get pruned during the executor plan initialization
+ * phase (InitPlan()).
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1360,8 +1369,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if the executor would need to take additional locks, that is, in
+ * addition to those taken by AcquirePlannerLocks() on a given query.
*/
foreach(lc, plan->stmt_list)
{
@@ -1735,58 +1744,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 06dfa85f04..0cad450dcd 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -201,6 +201,13 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /*
+ * initialize portal's query context to store QueryDescs created during
+ * PortalStart() and then used in PortalRun().
+ */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -224,6 +231,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -594,6 +602,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3d3e632a0c..392abb5150 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -88,7 +88,11 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt, struct CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv);
+extern void ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
@@ -104,6 +108,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index af2bf36dfb..c36c25b497 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -32,9 +32,12 @@
*/
typedef struct QueryDesc
{
+ NodeTag type;
+
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ struct CachedPlan *cplan; /* CachedPlan, if plannedstmt is from one */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -47,6 +50,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ bool plan_valid; /* is planstate tree fully valid? */
/* This field is set by ExecutorRun */
bool already_executed; /* true if previously executed */
@@ -57,6 +61,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ struct CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ac02247947..7cba22acc4 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -61,6 +62,10 @@
* WITH_NO_DATA indicates that we are performing REFRESH MATERIALIZED VIEW
* ... WITH NO DATA. Currently, the only effect is to suppress errors about
* scanning unpopulated materialized views.
+ *
+ * GET_LOCKS indicates that the caller of ExecutorStart() is executing a
+ * cached plan which must be validated by taking the remaining locks necessary
+ * for execution.
*/
#define EXEC_FLAG_EXPLAIN_ONLY 0x0001 /* EXPLAIN, no ANALYZE */
#define EXEC_FLAG_EXPLAIN_GENERIC 0x0002 /* EXPLAIN (GENERIC_PLAN) */
@@ -69,6 +74,8 @@
#define EXEC_FLAG_MARK 0x0010 /* need mark/restore */
#define EXEC_FLAG_SKIP_TRIGGERS 0x0020 /* skip AfterTrigger setup */
#define EXEC_FLAG_WITH_NO_DATA 0x0040 /* REFRESH ... WITH NO DATA */
+#define EXEC_FLAG_GET_LOCKS 0x0400 /* should the executor lock
+ * relations? */
/* Hook for plugins to get control in ExecutorStart() */
@@ -256,6 +263,17 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the cached plan, if any, still valid at this point? That is, not
+ * invalidated by the incoming invalidation messages that have been processed
+ * recently.
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanStillValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
@@ -590,6 +608,9 @@ exec_rt_fetch(Index rti, EState *estate)
}
extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
+extern void ExecLockElidedAppendChildRelations(EState *estate,
+ List *elidedAppendChildRels);
+extern void ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids);
extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Index rti);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cb714f4a19..f0c5177b06 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -623,6 +623,8 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct CachedPlan *es_cachedplan; /* CachedPlan if plannedstmt is from
+ * one */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -671,6 +673,10 @@ typedef struct EState
List *es_exprcontexts; /* List of ExprContexts within EState */
+ List *es_inited_plannodes; /* List of PlanState of nodes from the
+ * plan tree that were fully
+ * initialized */
+
List *es_subplanstates; /* List of PlanState for SubPlans */
List *es_auxmodifytables; /* List of secondary ModifyTableStates */
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 4ee91e3cf9..598bf2688a 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,7 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
bool orstronger);
+extern bool CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger);
extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 4f5418b972..3074e604dd 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -139,6 +139,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_relisshared(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a443181d41..8990fe72e3 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -221,6 +221,20 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Called by the executor on every relation lock taken when initializing the
+ * plan tree in the CachedPlan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aa08b1e0fc..24d420b9e9 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,9 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
+ bool plan_valid; /* are plans in qdescs ready for execution? */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +245,7 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalQueryFinish(QueryDesc *queryDesc);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..2fca84d027 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-replan
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7cd76eb34b..515b2c0c95 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2023, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,41 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static void
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ queryDesc->cplan->is_valid ? "valid" : "not valid");
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +123,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-replan.out b/src/test/modules/delay_execution/expected/cached-plan-replan.out
new file mode 100644
index 0000000000..0ac6a17c2b
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-replan.out
@@ -0,0 +1,156 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo11 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = $1)
+(6 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-----------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo11 foo_1
+ Filter: (a = $1)
+(4 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------
+Bitmap Heap Scan on foo11 foo
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = 1)
+(4 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------
+Seq Scan on foo11 foo
+ Filter: (a = 1)
+(2 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+----------------------------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Index Only Scan using foo11_a_idx on foo11 t1
+ -> Materialize
+ -> Index Scan using foo11_a_idx on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(18 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Sort
+ Sort Key: t1.a
+ -> Seq Scan on foo11 t1
+ -> Sort
+ Sort Key: t2.a
+ -> Seq Scan on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(21 rows)
+
diff --git a/src/test/modules/delay_execution/specs/cached-plan-replan.spec b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
new file mode 100644
index 0000000000..3c92cbd5c6
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
@@ -0,0 +1,61 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo1 PARTITION OF foo FOR VALUES IN (1) PARTITION BY LIST (a);
+ CREATE TABLE foo11 PARTITION OF foo1 FOR VALUES IN (1);
+ CREATE INDEX foo11_a ON foo1 (a);
+ CREATE TABLE foo2 PARTITION OF foo FOR VALUES IN (2);
+ CREATE VIEW foov AS SELECT * FROM foo;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP TABLE foo;
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# no Append case (only one partition selected by the planner)
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Append with partition-wise join aggregate and join plans as child subplans
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo11_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
--
2.35.3
On 8 Jun 2023, at 16:23, Amit Langote <amitlangote09@gmail.com> wrote:
Here is a new version.
The local planstate variable in the hunk below is shadowing the function
parameter planstate which cause a compiler warning:
@@ -1495,18 +1556,15 @@ ExecEndPlan(PlanState *planstate, EState *estate)
ListCell *l;
/*
- * shut down the node-type-specific query processing
- */
- ExecEndNode(planstate);
-
- /*
- * for subplans too
+ * Shut down the node-type-specific query processing for all nodes that
+ * were initialized during InitPlan(), both in the main plan tree and those
+ * in subplans (es_subplanstates), if any.
*/
- foreach(l, estate->es_subplanstates)
+ foreach(l, estate->es_inited_plannodes)
{
- PlanState *subplanstate = (PlanState *) lfirst(l);
+ PlanState *planstate = (PlanState *) lfirst(l);
--
Daniel Gustafsson
On Mon, Jul 3, 2023 at 10:27 PM Daniel Gustafsson <daniel@yesql.se> wrote:
On 8 Jun 2023, at 16:23, Amit Langote <amitlangote09@gmail.com> wrote:
Here is a new version.
The local planstate variable in the hunk below is shadowing the function
parameter planstate which cause a compiler warning:
Thanks Daniel for the heads up.
Attached new version fixes that and contains a few other notable
changes. Before going into the details of those changes, let me
reiterate in broad strokes what the patch is trying to do.
The idea is to move the locking of some tables referenced in a cached
(generic) plan from plancache/GetCachedPlan() to the
executor/ExecutorStart(). Specifically, the locking of inheritance
child tables. Why? Because partition pruning with "initial pruning
steps" contained in the Append/MergeAppend nodes may eliminate some
child tables that need not have been locked to begin with, though the
pruning can only occur during ExecutorStart().
After applying this patch, GetCachedPlan() only locks the tables that
are directly mentioned in the query to ensure that the
analyzed-rewritten-but-unplanned query tree backing a given CachedPlan
is still valid (cf RevalidateCachedQuery()), but not the tables in the
CachedPlan that would have been added by the planner. Tables in a
CachePlan that would not be locked currently only include the
inheritance child tables / partitions of the tables mentioned in the
query. This means that the plan trees in a given CachedPlan returned
by GetCachedPlan() are only partially valid and are subject to
invalidation because concurrent sessions can possibly modify the child
tables referenced in them before ExecutorStart() gets around to
locking them. If the concurrent modifications do happen,
ExecutorStart() is now equipped to detect them by way of noticing that
the CachedPlan is invalidated and inform the caller to discard and
recreate the CachedPlan. This entails changing all the call sites of
ExecutorStart() that pass it a plan tree from a CachedPlan to
implement the replan-and-retry-execution loop.
Given the above, ExecutorStart(), which has not needed so far to take
any locks (except on indexes mentioned in IndexScans), now needs to
lock child tables if executing a cached plan which contains them. In
the previous versions, the patch used a flag passed in
EState.es_top_eflags to signal ExecGetRangeTableRelation() to lock the
table. The flag would be set in ExecInitAppend() and
ExecInitMergeAppend() for the duration of the loop that initializes
child subplans with the assumption that that's where the child tables
would be opened. But not all child subplans of Append/MergeAppend
scan child tables (think UNION ALL queries), so this approach can
result in redundant locking. Worse, I needed to invent
PlannedStmt.elidedAppendChildRelations to separately track child
tables whose Scan nodes' parent Append/MergeAppend would be removed by
setrefs.c in some cases.
So, this new patch uses a flag in the RangeTblEntry itself to denote
if the table is a child table instead of the above roundabout way.
ExecGetRangeTableRelation() can simply look at the RTE to decide
whether to take a lock or not. I considered adding a new bool field,
but noticed we already have inFromCl to track if a given RTE is for
table/entity directly mentioned in the query or for something added
behind-the-scenes into the range table as the field's description in
parsenodes.h says. RTEs for child tables are added behind-the-scenes
by the planner and it makes perfect sense to me to mark their inFromCl
as false. I can't find anything that relies on the current behavior
of inFromCl being set to the same value as the root inheritance parent
(true). Patch 0002 makes this change for child RTEs.
A few other notes:
* A parallel worker does ExecutorStart() without access to the
CachedPlan that the leader may have gotten its plan tree from. This
means that parallel workers do not have the ability to detect plan
tree invalidations. I think that's fine, because if the leader would
have been able to launch workers at all, it would also have gotten all
the locks to protect the (portion of) the plan tree that the workers
would be executing. I had an off-list discussion about this with
Robert and he mentioned his concern that each parallel worker would
have its own view of which child subplans of a parallel Append are
"valid" that depends on the result of its own evaluation of initial
pruning. So, there may be race conditions whereby a worker may try
to execute plan nodes that are no longer valid, for example, if the
partition a worker considers valid is not viewed as such by the leader
and thus not locked. I shared my thoughts as to why that sounds
unlikely at [1]https://postgr/es/m/CA+HiwqFA=swkzgGK8AmXUNFtLeEXFJwFyY3E7cTxvL46aa1OTw@mail.gmail.com, though maybe I'm a bit too optimistic?
* For multi-query portals, you can't now do ExecutorStart()
immediately followed by ExecutorRun() for each query in the portal,
because ExecutorStart() may now fail to start a plan if it gets
invalidated. So PortalStart() now does ExecutorStart()s for all
queries and remembers the QueryDescs for PortalRun() then to do
ExecutorRun()s using. A consequence of this is that
CommandCounterIncrement() now must be done between the
ExecutorStart()s of the individual plans in PortalStart() and not
between the ExecutorRun()s in PortalRunMulti(). make check-world
passes with this new arrangement, though I'm not entirely confident
that there are no problems lurking.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
[1]: https://postgr/es/m/CA+HiwqFA=swkzgGK8AmXUNFtLeEXFJwFyY3E7cTxvL46aa1OTw@mail.gmail.com
Attachments:
v40-0004-Track-opened-range-table-relations-in-a-List-in-.patchapplication/octet-stream; name=v40-0004-Track-opened-range-table-relations-in-a-List-in-.patchDownload
From 7643c369e4877ad77d57c38e7c86c888efff3771 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:49 +0900
Subject: [PATCH v40 4/4] Track opened range table relations in a List in
EState
This makes ExecCloseRangeTableRelations faster when there are many
relations in the range table but only a few are opened during
execution, such as when run-time pruning kicks in on an Append
containing thousands of partition subplans.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 9 +++++----
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 2 ++
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index a2f6ac9d1c..053d8a2dc2 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1650,12 +1650,13 @@ ExecCloseResultRelations(EState *estate)
void
ExecCloseRangeTableRelations(EState *estate)
{
- int i;
+ ListCell *lc;
- for (i = 0; i < estate->es_range_table_size; i++)
+ foreach(lc, estate->es_opened_relations)
{
- if (estate->es_relations[i])
- table_close(estate->es_relations[i], NoLock);
+ Relation rel = lfirst(lc);
+
+ table_close(rel, NoLock);
}
}
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index af92d2b3c3..f0320cfa34 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -837,6 +837,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
}
estate->es_relations[rti - 1] = rel;
+ estate->es_opened_relations = lappend(estate->es_opened_relations,
+ rel);
}
return rel;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f0c5177b06..be06c40766 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,8 @@ typedef struct EState
Index es_range_table_size; /* size of the range table arrays */
Relation *es_relations; /* Array of per-range-table-entry Relation
* pointers, or NULL if not yet opened */
+ List *es_opened_relations; /* List of non-NULL entries in
+ * es_relations in no specific order */
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
--
2.35.3
v40-0002-Set-inFromCl-to-false-in-child-table-RTEs.patchapplication/octet-stream; name=v40-0002-Set-inFromCl-to-false-in-child-table-RTEs.patchDownload
From cede423bb7cfeb879b78f63087d18326cda13f5b Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:43 +0900
Subject: [PATCH v40 2/4] Set inFromCl to false in child table RTEs
This is to allow the executor be able to distinguish tables that are
directly mentioned in the query from those that get added to the
query during planning. A subsequent commit will teach the executor
to lock only the tables of the latter kind when executing a cached
plan.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.comk
---
src/backend/optimizer/util/inherit.c | 6 ++++++
src/backend/parser/analyze.c | 7 +++----
src/include/nodes/parsenodes.h | 9 +++++++--
3 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 94de855a22..9bac07bf40 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -492,6 +492,12 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
}
else
childrte->inh = false;
+ /*
+ * Mark child tables as not being directly mentioned in the query. This
+ * allows the executor's ExecGetRangeTableRelation() to conveniently
+ * identify it as an inheritance child table.
+ */
+ childrte->inFromCl = false;
childrte->securityQuals = NIL;
/*
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 4006632092..bcf6fcdde2 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -3267,10 +3267,9 @@ transformLockingClause(ParseState *pstate, Query *qry, LockingClause *lc,
/*
* Lock all regular tables used in query and its subqueries. We
* examine inFromCl to exclude auto-added RTEs, particularly NEW/OLD
- * in rules. This is a bit of an abuse of a mostly-obsolete flag, but
- * it's convenient. We can't rely on the namespace mechanism that has
- * largely replaced inFromCl, since for example we need to lock
- * base-relation RTEs even if they are masked by upper joins.
+ * in rules. We can't rely on the namespace mechanism since for
+ * example we need to lock base-relation RTEs even if they are masked
+ * by upper joins.
*/
i = 0;
foreach(rt, qry->rtable)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 88b03cc472..c1360f87ee 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -995,11 +995,16 @@ typedef struct PartitionCmd
*
* inFromCl marks those range variables that are listed in the FROM clause.
* It's false for RTEs that are added to a query behind the scenes, such
- * as the NEW and OLD variables for a rule, or the subqueries of a UNION.
+ * as the NEW and OLD variables for a rule, or the subqueries of a UNION,
+ * or the RTEs of inheritance child tables that are added by the planner.
* This flag is not used during parsing (except in transformLockingClause,
* q.v.); the parser now uses a separate "namespace" data structure to
* control visibility. But it is needed by ruleutils.c to determine
- * whether RTEs should be shown in decompiled queries.
+ * whether RTEs should be shown in decompiled queries. It is used by the
+ * executor to determine that a given RTE_RELATION entry belongs to a table
+ * directly mentioned in the query or to a child table added by the planner.
+ * It needs to know that for the case where the child tables in a plan need
+ * to be locked.
*
* securityQuals is a list of security barrier quals (boolean expressions),
* to be tested in the listed order before returning a row from the
--
2.35.3
v40-0003-Delay-locking-of-child-tables-in-cached-plans-un.patchapplication/octet-stream; name=v40-0003-Delay-locking-of-child-tables-in-cached-plans-un.patchDownload
From 5fcc6f7b2d55efbd61dc4cf9ac69f3ff6b4f81a4 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:45 +0900
Subject: [PATCH v40 3/4] Delay locking of child tables in cached plans until
ExecutorStart()
Currently, GetCachedPlan() takes a lock on all relations contained in
a cached plan before returning it as a valid plan to its callers for
execution. One disadvantage is that if the plan contains partitions
that are prunable with conditions involving EXTERN parameters and
other stable expressions (known as "initial pruning"), many of them
would be locked unnecessarily, because only those that survive
initial pruning need to have been locked. Locking all partitions this
way causes significant delay when there are many partitions. Note
that initial pruning occurs during executor's initialization of the
plan, that is, InitPlan().
This commit rearranges things to move the locking of child tables
referenced in a cached plan to occur during InitPlan() so that
initial pruning can eliminate any child tables that need not be
scanned and thus locked.
To determine that a given table is a child table,
ExecGetRangeTableRelation() now looks at the RTE's inFromCl field,
which is only true for tables that are directly mentioned in the
query but false for child tables. Note that any tables whose RTEs'
inFromCl is true would already have been locked by GetCachedPlan(),
so need not be locked again during execution.
If the locking of child tables causes the CachedPlan to go stale, that
is, its is_valid set to false by PlanCacheRelCallback() when an
invalidation message matching some child table contained in the plan
is processed, ExecInitNode() abandons the initialization of the
remaining nodes in the plan tree. In that case, InitPlan() returns
after setting QueryDesc.planstate to NULL to indicate to the caller
that no execution is possible with the plan tree as is. Though some
plan tree subnodes may get fully initialized by ExecInitNode() before
the CachedPlan's invalidation is detected, so to ensure that they
are released by ExecEndPlan(), ExecInitNode() now adds the PlanState
nodes of the nodes that are fully initialized to a new List in
EState called es_inited_plannodes. ExecEndPlan() releases them
individually by calling ExecEndNode() on each element of the new
List. ExecEndNode() is no longer recursive, because all nodes that
need to be closed can be found in es_inited_plannodes.
Call sites that use GetCachedPlan() to get the plan trees to pass to
the executor should now be prepared to handle the case where the old
CachedPlan gets invalidated during ExecutorStart() as described
above. So this commit refactors the relevant code sites to move the
ExecutorStart() call closer to the GetCachedPlan() to implement the
replan loop conveniently.
Given this new behavior, PortalStart() now must always perform
ExecutorStart() to be able to drop and recreate cached plans if
needed, which is currently only done so for single-query portals.
For multi-query portals, the QueryDescs that are now created during
PortalStart() are remembered in a new List field of Portal called
'qdescs' and allocated in a new memory context 'queryContext'.
PortalRunMulti() now simply performs ExecutorRun() on the
QueryDescs found in 'qdescs'.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
contrib/postgres_fdw/postgres_fdw.c | 4 +
src/backend/commands/copyto.c | 4 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 145 +++++---
src/backend/commands/extension.c | 2 +
src/backend/commands/matview.c | 3 +-
src/backend/commands/portalcmds.c | 16 +-
src/backend/commands/prepare.c | 32 +-
src/backend/executor/execMain.c | 106 +++++-
src/backend/executor/execParallel.c | 12 +-
src/backend/executor/execPartition.c | 14 +
src/backend/executor/execProcnode.c | 50 ++-
src/backend/executor/execUtils.c | 63 +++-
src/backend/executor/functions.c | 2 +
src/backend/executor/nodeAgg.c | 6 +-
src/backend/executor/nodeAppend.c | 48 ++-
src/backend/executor/nodeBitmapAnd.c | 31 +-
src/backend/executor/nodeBitmapHeapscan.c | 9 +-
src/backend/executor/nodeBitmapIndexscan.c | 9 +-
src/backend/executor/nodeBitmapOr.c | 31 +-
src/backend/executor/nodeCustom.c | 2 +
src/backend/executor/nodeForeignscan.c | 8 +-
src/backend/executor/nodeGather.c | 4 +-
src/backend/executor/nodeGatherMerge.c | 3 +-
src/backend/executor/nodeGroup.c | 7 +-
src/backend/executor/nodeHash.c | 10 +-
src/backend/executor/nodeHashjoin.c | 10 +-
src/backend/executor/nodeIncrementalSort.c | 7 +-
src/backend/executor/nodeIndexonlyscan.c | 11 +-
src/backend/executor/nodeIndexscan.c | 11 +-
src/backend/executor/nodeLimit.c | 3 +-
src/backend/executor/nodeLockRows.c | 3 +-
src/backend/executor/nodeMaterial.c | 7 +-
src/backend/executor/nodeMemoize.c | 7 +-
src/backend/executor/nodeMergeAppend.c | 47 ++-
src/backend/executor/nodeMergejoin.c | 10 +-
src/backend/executor/nodeModifyTable.c | 12 +-
src/backend/executor/nodeNestloop.c | 10 +-
src/backend/executor/nodeProjectSet.c | 7 +-
src/backend/executor/nodeRecursiveunion.c | 10 +-
src/backend/executor/nodeResult.c | 7 +-
src/backend/executor/nodeSamplescan.c | 2 +
src/backend/executor/nodeSeqscan.c | 2 +
src/backend/executor/nodeSetOp.c | 4 +-
src/backend/executor/nodeSort.c | 7 +-
src/backend/executor/nodeSubqueryscan.c | 7 +-
src/backend/executor/nodeTidrangescan.c | 2 +
src/backend/executor/nodeTidscan.c | 2 +
src/backend/executor/nodeUnique.c | 4 +-
src/backend/executor/nodeWindowAgg.c | 6 +-
src/backend/executor/spi.c | 49 ++-
src/backend/storage/lmgr/lmgr.c | 45 +++
src/backend/tcop/postgres.c | 13 +-
src/backend/tcop/pquery.c | 340 +++++++++---------
src/backend/utils/cache/lsyscache.c | 21 ++
src/backend/utils/cache/plancache.c | 149 +++-----
src/backend/utils/mmgr/portalmem.c | 9 +
src/include/commands/explain.h | 7 +-
src/include/executor/execdesc.h | 5 +
src/include/executor/executor.h | 13 +
src/include/nodes/execnodes.h | 6 +
src/include/storage/lmgr.h | 1 +
src/include/utils/lsyscache.h | 1 +
src/include/utils/plancache.h | 14 +
src/include/utils/portal.h | 4 +
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 63 +++-
.../expected/cached-plan-replan.out | 156 ++++++++
.../specs/cached-plan-replan.spec | 61 ++++
69 files changed, 1193 insertions(+), 588 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-replan.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-replan.spec
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index c5cada55fb..1edd4c3f17 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2658,7 +2658,11 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
/* Get info about foreign table. */
rtindex = node->resultRelInfo->ri_RangeTableIndex;
if (fsplan->scan.scanrelid == 0)
+ {
dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (!ExecPlanStillValid(estate))
+ return;
+ }
else
dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9e4b2437a5..8244194681 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,8 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL,
+ pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -569,6 +570,7 @@ BeginCopyTo(ParseState *pstate,
* ExecutorStart computes a result tupdesc for us
*/
ExecutorStart(cstate->queryDesc, 0);
+ Assert(cstate->queryDesc->plan_valid);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index e91920ca14..18b07c0200 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..b1ea45ef2c 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -393,6 +393,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
else
{
PlannedStmt *plan;
+ QueryDesc *queryDesc;
instr_time planstart,
planduration;
BufferUsage bufusage_start,
@@ -415,12 +416,90 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
+ queryDesc = ExplainQueryDesc(plan, NULL, queryString, into, es,
+ params, queryEnv);
+ Assert(queryDesc);
+
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(queryDesc, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
+/*
+ * ExplainQueryDesc
+ * Set up QueryDesc for EXPLAINing a given plan
+ *
+ * This returns NULL if cplan is found to be no longer valid.
+ */
+QueryDesc *
+ExplainQueryDesc(PlannedStmt *stmt, CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv)
+{
+ QueryDesc *queryDesc;
+ DestReceiver *dest;
+ int eflags;
+ int instrument_option = 0;
+
+ /*
+ * Normally we discard the query's output, but if explaining CREATE TABLE
+ * AS, we'd better use the appropriate tuple receiver.
+ */
+ if (into)
+ dest = CreateIntoRelDestReceiver(into);
+ else
+ dest = None_Receiver;
+
+ if (es->analyze && es->timing)
+ instrument_option |= INSTRUMENT_TIMER;
+ else if (es->analyze)
+ instrument_option |= INSTRUMENT_ROWS;
+
+ if (es->buffers)
+ instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
+
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+
+ /* Create a QueryDesc for the query */
+ queryDesc = CreateQueryDesc(stmt, cplan, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv, instrument_option);
+
+ /* Select execution options */
+ if (es->analyze)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_EXPLAIN_ONLY;
+ if (es->generic)
+ eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
+ if (into)
+ eflags |= GetIntoRelEFlags(into);
+
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated as we're doing that.
+ */
+ ExecutorStart(queryDesc, eflags);
+ if (!queryDesc->plan_valid)
+ {
+ /* Clean up. */
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ return NULL;
+ }
+
+ return queryDesc;
+}
+
/*
* ExplainOneUtility -
* print out the execution plan for one utility statement
@@ -524,29 +603,16 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
{
- DestReceiver *dest;
- QueryDesc *queryDesc;
instr_time starttime;
double totaltime = 0;
- int eflags;
- int instrument_option = 0;
-
- Assert(plannedstmt->commandType != CMD_UTILITY);
- if (es->analyze && es->timing)
- instrument_option |= INSTRUMENT_TIMER;
- else if (es->analyze)
- instrument_option |= INSTRUMENT_ROWS;
-
- if (es->buffers)
- instrument_option |= INSTRUMENT_BUFFERS;
- if (es->wal)
- instrument_option |= INSTRUMENT_WAL;
+ Assert(queryDesc->plannedstmt->commandType != CMD_UTILITY);
/*
* We always collect timing for the entire statement, even when node-level
@@ -555,40 +621,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
- /*
- * Normally we discard the query's output, but if explaining CREATE TABLE
- * AS, we'd better use the appropriate tuple receiver.
- */
- if (into)
- dest = CreateIntoRelDestReceiver(into);
- else
- dest = None_Receiver;
-
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
-
- /* Select execution options */
- if (es->analyze)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_EXPLAIN_ONLY;
- if (es->generic)
- eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
- if (into)
- eflags |= GetIntoRelEFlags(into);
-
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
-
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -4865,6 +4897,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 0eabe18335..5a76343123 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -797,11 +797,13 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
ExecutorStart(qdesc, 0);
+ Assert(qdesc->plan_valid);
ExecutorRun(qdesc, ForwardScanDirection, 0, true);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index f9a3bdfc3a..1c1ce1e17d 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -409,12 +409,13 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, 0);
+ Assert(queryDesc->plan_valid);
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 73ed7aa2f0..4abbec054b 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -146,6 +146,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
PortalStart(portal, params, 0, GetActiveSnapshot());
Assert(portal->strategy == PORTAL_ONE_SELECT);
+ Assert(portal->plan_valid);
/*
* We're done; the query won't actually be run until PerformPortalFetch is
@@ -249,6 +250,17 @@ PerformPortalClose(const char *name)
PortalDrop(portal, false);
}
+/*
+ * Release a portal's QueryDesc.
+ */
+void
+PortalQueryFinish(QueryDesc *queryDesc)
+{
+ ExecutorFinish(queryDesc);
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+}
+
/*
* PortalCleanup
*
@@ -295,9 +307,7 @@ PortalCleanup(Portal portal)
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
- FreeQueryDesc(queryDesc);
+ PortalQueryFinish(queryDesc);
CurrentResourceOwner = saveResourceOwner;
}
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..c9070ed97f 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -183,6 +183,7 @@ ExecuteQuery(ParseState *pstate,
paramLI = EvaluateParams(pstate, entry, stmt->params, estate);
}
+replan:
/* Create a new portal to run the query in */
portal = CreateNewPortal();
/* Don't display the portal in pg_cursors, it is for internal use only */
@@ -251,10 +252,19 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal contains a cached plan, it
+ * must be recreated if portal->plan_valid is false which tells that the
+ * cached plan was found to have been invalidated when initializing one of
+ * the plan trees contained in it.
*/
PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
(void) PortalRun(portal, count, false, true, dest, dest, qc);
PortalDrop(portal, false);
@@ -574,7 +584,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -618,6 +628,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI,
CurrentResourceOwner, queryEnv);
@@ -639,8 +650,21 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ {
+ QueryDesc *queryDesc;
+
+ queryDesc = ExplainQueryDesc(pstmt, cplan, queryString,
+ into, es, paramLI, queryEnv);
+ if (queryDesc == NULL)
+ {
+ ExplainResetOutput(es);
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ goto replan;
+ }
+ ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
+ }
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4c5a7bbf62..a2f6ac9d1c 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -620,6 +620,17 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
RTEPermissionInfo *perminfo = lfirst_node(RTEPermissionInfo, l);
Assert(OidIsValid(perminfo->relid));
+
+ /*
+ * Relations whose permissions need to be checked must already have
+ * been locked by the parser or by GetCachedPlan() if a cached plan is
+ * being executed.
+ *
+ * XXX shouldn't we skip calling ExecCheckPermissions from InitPlan
+ * in a parallel worker?
+ */
+ Assert(CheckRelLockedByMe(perminfo->relid, AccessShareLock, true) ||
+ IsParallelWorker());
result = ExecCheckOneRelPerms(perminfo);
if (!result)
{
@@ -829,6 +840,23 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ *
+ * Normally, the plan tree given in queryDesc->plannedstmt is known to be
+ * valid in a race-free manner, that is, all relations contained in
+ * plannedstmt->relationOids would have already been locked. That is not the
+ * case however if the plannedstmt comes from a CachedPlan, one given in
+ * queryDesc->cplan. That's because GetCachedPlan() only locks the tables
+ * that are mentioned in the original query but not the child tables, which
+ * would have been added to the plan by the planner. In that case, locks on
+ * child tables will be taken when initializing their Scan nodes in
+ * ExecInitNode() to be done here. If the CachedPlan gets invalidated as
+ * those locks are taken, plan tree initialization is suspended at the point
+ * where the invalidation is first detected, queryDesc->planstate will be set
+ * to NULL, and queryDesc->plan_valid to false. Callers must retry the
+ * execution after creating a new CachedPlan in that case, after properly
+ * releasing the resources of this QueryDesc, which includes calling
+ * ExecutorFinish() and ExecutorEnd() on the EState contained therein.
* ----------------------------------------------------------------
*/
static void
@@ -839,7 +867,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
- PlanState *planstate;
+ PlanState *planstate = NULL;
TupleDesc tupType;
ListCell *l;
int i;
@@ -850,10 +878,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
/*
- * initialize the node's execution state
+ * Set up range table in EState.
*/
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+ estate->es_cachedplan = queryDesc->cplan;
estate->es_plannedstmt = plannedstmt;
/*
@@ -886,6 +915,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (!ExecPlanStillValid(estate))
+ goto plan_init_suspended;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -953,6 +984,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
sp_eflags |= EXEC_FLAG_REWIND;
subplanstate = ExecInitNode(subplan, estate, sp_eflags);
+ if (!ExecPlanStillValid(estate))
+ {
+ Assert(subplanstate == NULL);
+ goto plan_init_suspended;
+ }
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
@@ -966,6 +1002,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ {
+ Assert(planstate == NULL);
+ goto plan_init_suspended;
+ }
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -1008,7 +1049,19 @@ InitPlan(QueryDesc *queryDesc, int eflags)
}
queryDesc->tupDesc = tupType;
+ Assert(planstate != NULL);
queryDesc->planstate = planstate;
+ queryDesc->plan_valid = true;
+ return;
+
+plan_init_suspended:
+ /*
+ * Plan initialization failed. Mark QueryDesc as such. ExecEndPlan()
+ * will clean up initialized plan nodes from estate->es_inited_plannodes.
+ */
+ Assert(planstate == NULL);
+ queryDesc->planstate = NULL;
+ queryDesc->plan_valid = false;
}
/*
@@ -1426,7 +1479,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked by the planner or ExecLockAppendNonLeafRelations().
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
@@ -1504,18 +1557,15 @@ ExecEndPlan(PlanState *planstate, EState *estate)
ListCell *l;
/*
- * shut down the node-type-specific query processing
+ * Shut down the node-type-specific query processing for all nodes that
+ * were initialized during InitPlan(), both in the main plan tree and those
+ * in subplans (es_subplanstates), if any.
*/
- ExecEndNode(planstate);
-
- /*
- * for subplans too
- */
- foreach(l, estate->es_subplanstates)
+ foreach(l, estate->es_inited_plannodes)
{
- PlanState *subplanstate = (PlanState *) lfirst(l);
+ PlanState *pstate = (PlanState *) lfirst(l);
- ExecEndNode(subplanstate);
+ ExecEndNode(pstate);
}
/*
@@ -2858,7 +2908,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2945,6 +2996,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
PlanState *subplanstate;
subplanstate = ExecInitNode(subplan, rcestate, 0);
+
+ /*
+ * At this point, we had better not received any new invalidation
+ * messages that would have caused the plan tree to go stale.
+ */
+ Assert(ExecPlanStillValid(rcestate) && subplanstate);
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
}
@@ -2988,6 +3045,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
*/
epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
+ /*
+ * At this point, we had better not received any new invalidation messages
+ * that would have caused the plan tree to go stale.
+ */
+ Assert(ExecPlanStillValid(rcestate) && epqstate->recheckplanstate);
+
MemoryContextSwitchTo(oldcontext);
}
@@ -3010,6 +3073,10 @@ EvalPlanQualEnd(EPQState *epqstate)
MemoryContext oldcontext;
ListCell *l;
+ /* Nothing to do if EvalPlanQualInit() wasn't done to begin with. */
+ if (epqstate->parentestate == NULL)
+ return;
+
rtsize = epqstate->parentestate->es_range_table_size;
/*
@@ -3030,13 +3097,16 @@ EvalPlanQualEnd(EPQState *epqstate)
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
- ExecEndNode(epqstate->recheckplanstate);
-
- foreach(l, estate->es_subplanstates)
+ /*
+ * Shut down the node-type-specific query processing for all nodes that
+ * were initialized during EvalPlanQualStart(), both in the main plan tree
+ * and those in subplans (es_subplanstates), if any.
+ */
+ foreach(l, estate->es_inited_plannodes)
{
- PlanState *subplanstate = (PlanState *) lfirst(l);
+ PlanState *planstate = (PlanState *) lfirst(l);
- ExecEndNode(subplanstate);
+ ExecEndNode(planstate);
}
/* throw away the per-estate tuple table, some node may have used it */
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index cc2b8ccab7..42df7b6428 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1248,8 +1248,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. Note that no CachedPlan is available
+ * here even if the leader may have gotten the plan tree from one. That's
+ * fine though, because the leader would have taken the locks necessary
+ * for the plan tree that we have here to be fully valid. That is true
+ * despite the fact that we will be taking our own copies of those locks
+ * in ExecGetRangeTableRelation(), because none of them would be the locks
+ * that are not already taken by the leader.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
@@ -1431,6 +1440,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
ExecutorStart(queryDesc, fpes->eflags);
+ Assert(queryDesc->plan_valid);
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index eb8a87fd63..cf73d28baa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -513,6 +513,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
oldcxt = MemoryContextSwitchTo(proute->memcxt);
+ /*
+ * Note that while we normally check ExecPlanStillValid(estate) after each
+ * lock taken during execution initialization, it is fine not do so for
+ * partitions opened here, for tuple routing. Locks taken here can't
+ * possibly invalidate the plan given that the plan doesn't contain any
+ * info about those partitions.
+ */
partrel = table_open(partOid, RowExclusiveLock);
leaf_part_rri = makeNode(ResultRelInfo);
@@ -1111,6 +1118,9 @@ ExecInitPartitionDispatchInfo(EState *estate,
* Only sub-partitioned tables need to be locked here. The root
* partitioned table will already have been locked as it's referenced in
* the query's rtable.
+ *
+ * See the comment in ExecInitPartitionInfo() about taking locks and
+ * not checking ExecPlanStillValid(estate) here.
*/
if (partoid != RelationGetRelid(proute->partition_root))
rel = table_open(partoid, RowExclusiveLock);
@@ -1801,6 +1811,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/* Create the working data structure for pruning */
prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Perform an initial partition prune pass, if required.
@@ -1927,6 +1939,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d4..f3bb1d4591 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -135,7 +135,17 @@ static bool ExecShutdownNode_walker(PlanState *node, void *context);
* 'estate' is the shared execution state for the plan tree
* 'eflags' is a bitwise OR of flag bits described in executor.h
*
- * Returns a PlanState node corresponding to the given Plan node.
+ * Returns a PlanState node corresponding to the given Plan node or NULL.
+ *
+ * NULL may be returned either if the input node is NULL or if the plan
+ * tree that the node is a part of is found to have been invalidated when
+ * taking a lock on the relation mentioned in the node or in a child
+ * node. The latter case arises if the plan tree contains inheritance/
+ * partition child tables and is from a CachedPlan.
+ *
+ * Also, all non-NULL PlanState nodes are added to
+ * estate->es_inited_plannodes for ExecEndPlan() to iterate over to close
+ * each one using ExecEndNode().
* ------------------------------------------------------------------------
*/
PlanState *
@@ -388,6 +398,13 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
break;
}
+ if (!ExecPlanStillValid(estate))
+ {
+ Assert(result == NULL);
+ return NULL;
+ }
+
+ Assert(result != NULL);
ExecSetExecProcNode(result, result->ExecProcNode);
/*
@@ -411,6 +428,13 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
result->instrument = InstrAlloc(1, estate->es_instrument,
result->async_capable);
+ /*
+ * Remember valid PlanState nodes in EState for the processing in
+ * ExecEndPlan().
+ */
+ estate->es_inited_plannodes = lappend(estate->es_inited_plannodes,
+ result);
+
return result;
}
@@ -545,29 +569,21 @@ MultiExecProcNode(PlanState *node)
/* ----------------------------------------------------------------
* ExecEndNode
*
- * Recursively cleans up all the nodes in the plan rooted
- * at 'node'.
+ * Cleans up node
*
- * After this operation, the query plan will not be able to be
- * processed any further. This should be called only after
+ * Child nodes, if any, would have been closed by the caller, so the
+ * ExecEnd* routine for a given node type is only responsible for
+ * cleaning up the resources local to that node.
+ *
+ * After this operation, the query plan containing this node will not be
+ * able to be processed any further. This should be called only after
* the query plan has been fully executed.
* ----------------------------------------------------------------
*/
void
ExecEndNode(PlanState *node)
{
- /*
- * do nothing when we get to the end of a leaf on tree.
- */
- if (node == NULL)
- return;
-
- /*
- * Make sure there's enough stack available. Need to check here, in
- * addition to ExecProcNode() (via ExecProcNodeFirst()), because it's not
- * guaranteed that ExecProcNode() is reached for all nodes.
- */
- check_stack_depth();
+ Assert(node != NULL);
if (node->chgParam != NULL)
{
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c06b228858..af92d2b3c3 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -804,7 +804,25 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (IsParallelWorker() ||
+ (estate->es_cachedplan != NULL && !rte->inFromCl))
+ {
+ /*
+ * Take a lock if we are a parallel worker or if this is a child
+ * table referenced in a cached plan.
+ *
+ * Parallel workers need to have their own local lock on the
+ * relation. This ensures sane behavior in case the parent process
+ * exits before we do.
+ *
+ * When executing a cached plan, child tables must be locked
+ * here, because plancache.c (GetCachedPlan()) would only have
+ * locked tables mentioned in the query, that is, tables whose
+ * RTEs' inFromCl is true.
+ */
+ rel = table_open(rte->relid, rte->rellockmode);
+ }
+ else
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -817,15 +835,6 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rellockmode == AccessShareLock ||
CheckRelationLockedByMe(rel, rte->rellockmode, false));
}
- else
- {
- /*
- * If we are a parallel worker, we need to obtain our own local
- * lock on the relation. This ensures sane behavior in case the
- * parent process exits before we do.
- */
- rel = table_open(rte->relid, rte->rellockmode);
- }
estate->es_relations[rti - 1] = rel;
}
@@ -833,6 +842,38 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
return rel;
}
+/*
+ * ExecLockAppendNonLeafRelations
+ * Lock non-leaf relations whose children are scanned by a given
+ * Append/MergeAppend node
+ */
+void
+ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids)
+{
+ ListCell *l;
+
+ /* This should get called only when executing cached plans. */
+ Assert(estate->es_cachedplan != NULL);
+ foreach(l, allpartrelids)
+ {
+ Bitmapset *partrelids = lfirst_node(Bitmapset, l);
+ int i;
+
+ /*
+ * Note that we don't lock the first member (i=0) of each bitmapset
+ * because it stands for the root parent mentioned in the query that
+ * should always have been locked before entering the executor.
+ */
+ i = 0;
+ while ((i = bms_next_member(partrelids, i)) > 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(i, estate);
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+}
+
/*
* ExecInitResultRelation
* Open relation given by the passed-in RT index and fill its
@@ -848,6 +889,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (!ExecPlanStillValid(estate))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f55424eb5a..c88f72bc4e 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -838,6 +838,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL, /* fmgr_sql() doesn't use CachedPlans */
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -863,6 +864,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
eflags = 0; /* default run-to-completion flags */
ExecutorStart(es->qd, eflags);
+ Assert(es->qd->plan_valid);
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 468db94fe5..54f742820b 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3304,6 +3304,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize source tuple type.
@@ -4304,7 +4306,6 @@ GetAggInitVal(Datum textInitVal, Oid transtype)
void
ExecEndAgg(AggState *node)
{
- PlanState *outerPlan;
int transno;
int numGroupingSets = Max(node->maxsets, 1);
int setno;
@@ -4366,9 +4367,6 @@ ExecEndAgg(AggState *node)
/* clean up tuple table */
ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
}
void
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..a6dadb7d07 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -133,6 +133,27 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_syncdone = false;
appendstate->as_begun = false;
+ /*
+ * Must take locks on child tables if running a cached plan, because
+ * GetCachedPlan() would've only locked the root parent named in the
+ * query.
+ *
+ * First lock non-leaf partitions before doing pruning if any. Even when
+ * no pruning is to be done, non-leaf partitions still must be locked
+ * explicitly like this, because they're not referenced elsewhere in
+ * the plan tree. XXX - OTOH, non-leaf partitions mentioned in
+ * part_prune_info, if any, would be opened by ExecInitPartitionPruning()
+ * using ExecGetRangeTableRelation() which locks child tables, redundantly
+ * in this case.
+ */
+ if (estate->es_cachedplan)
+ {
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
+ }
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
@@ -147,6 +168,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
list_length(node->appendplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -221,6 +244,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
appendstate->as_first_partial_plan = firstvalid;
@@ -376,30 +401,15 @@ ExecAppend(PlanState *pstate)
/* ----------------------------------------------------------------
* ExecEndAppend
- *
- * Shuts down the subscans of the append node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndAppend(AppendState *node)
{
- PlanState **appendplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- appendplans = node->appendplans;
- nplans = node->as_nplans;
-
- /*
- * shut down each of the subscans
- */
- for (i = 0; i < nplans; i++)
- ExecEndNode(appendplans[i]);
+ /*
+ * Nothing to do as subscans of the append node would be cleaned up by
+ * ExecEndPlan().
+ */
}
void
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4c5eb2b23b..187aea4bb8 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -88,8 +88,9 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
foreach(l, node->bitmapplans)
{
initNode = (Plan *) lfirst(l);
- bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
- i++;
+ bitmapplanstates[i++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
/*
@@ -168,33 +169,15 @@ MultiExecBitmapAnd(BitmapAndState *node)
/* ----------------------------------------------------------------
* ExecEndBitmapAnd
- *
- * Shuts down the subscans of the BitmapAnd node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndBitmapAnd(BitmapAndState *node)
{
- PlanState **bitmapplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- bitmapplans = node->bitmapplans;
- nplans = node->nplans;
-
- /*
- * shut down each of the subscans (that we've initialized)
- */
- for (i = 0; i < nplans; i++)
- {
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
- }
+ /*
+ * Nothing to do as any subscans that would have been initialized would
+ * be cleaned up by ExecEndPlan().
+ */
}
void
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index f35df0b8bf..ee1008519b 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -667,11 +667,6 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
ExecClearTuple(node->ss.ss_ScanTupleSlot);
- /*
- * close down subplans
- */
- ExecEndNode(outerPlanState(node));
-
/*
* release bitmaps and buffers if any
*/
@@ -763,11 +758,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 83ec9ede89..99015812a1 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -211,6 +211,7 @@ BitmapIndexScanState *
ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
{
BitmapIndexScanState *indexstate;
+ Relation indexRelation;
LOCKMODE lockmode;
/* check for unsupported flags */
@@ -262,7 +263,13 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
+ indexRelation = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ {
+ index_close(indexRelation, lockmode);
+ return NULL;
+ }
+ indexstate->biss_RelationDesc = indexRelation;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 0bf8af9652..3f51918fe1 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -89,8 +89,9 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
foreach(l, node->bitmapplans)
{
initNode = (Plan *) lfirst(l);
- bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
- i++;
+ bitmapplanstates[i++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
/*
@@ -186,33 +187,15 @@ MultiExecBitmapOr(BitmapOrState *node)
/* ----------------------------------------------------------------
* ExecEndBitmapOr
- *
- * Shuts down the subscans of the BitmapOr node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndBitmapOr(BitmapOrState *node)
{
- PlanState **bitmapplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- bitmapplans = node->bitmapplans;
- nplans = node->nplans;
-
- /*
- * shut down each of the subscans (that we've initialized)
- */
- for (i = 0; i < nplans; i++)
- {
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
- }
+ /*
+ * Nothing to do as any subscans that would have been initialized would
+ * be cleaned up by ExecEndPlan().
+ */
}
void
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index bd42c65b29..91239cc500 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -61,6 +61,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
css->ss.ss_currentRelation = scan_rel;
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index c2139acca0..207165f44f 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -173,6 +173,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -264,6 +266,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Tell the FDW to initialize the scan.
@@ -309,10 +313,6 @@ ExecEndForeignScan(ForeignScanState *node)
else
node->fdwroutine->EndForeignScan(node);
- /* Shut down any outer plan. */
- if (outerPlanState(node))
- ExecEndNode(outerPlanState(node));
-
/* Free the exprcontext */
ExecFreeExprContext(&node->ss.ps);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 307fc10eea..400c8b42ed 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -89,6 +89,9 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
@@ -248,7 +251,6 @@ ExecGather(PlanState *pstate)
void
ExecEndGather(GatherState *node)
{
- ExecEndNode(outerPlanState(node)); /* let children clean up first */
ExecShutdownGather(node);
ExecFreeExprContext(&node->ps);
if (node->ps.ps_ResultTupleSlot)
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 9d5e1a46e9..9077c4bc55 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -108,6 +108,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Leader may access ExecProcNode result directly (if
@@ -288,7 +290,6 @@ ExecGatherMerge(PlanState *pstate)
void
ExecEndGatherMerge(GatherMergeState *node)
{
- ExecEndNode(outerPlanState(node)); /* let children clean up first */
ExecShutdownGatherMerge(node);
ExecFreeExprContext(&node->ps);
if (node->ps.ps_ResultTupleSlot)
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 25a1618952..976e739ab7 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -185,6 +185,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
@@ -226,15 +228,10 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
void
ExecEndGroup(GroupState *node)
{
- PlanState *outerPlan;
-
ExecFreeExprContext(&node->ss.ps);
/* clean up tuple table */
ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
}
void
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 8b5c35b82b..fc7a6b2ccc 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -386,6 +386,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize our result slot and type. No need to build projection
@@ -413,18 +415,10 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
void
ExecEndHash(HashState *node)
{
- PlanState *outerPlan;
-
/*
* free exprcontext
*/
ExecFreeExprContext(&node->ps);
-
- /*
- * shut down the subplan
- */
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
}
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 980746128b..4c4b39ce2d 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -752,8 +752,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
@@ -878,12 +882,6 @@ ExecEndHashJoin(HashJoinState *node)
ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
ExecClearTuple(node->hj_OuterTupleSlot);
ExecClearTuple(node->hj_HashTupleSlot);
-
- /*
- * clean up subtrees
- */
- ExecEndNode(outerPlanState(node));
- ExecEndNode(innerPlanState(node));
}
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 7683e3341c..5b11afeb96 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1041,6 +1041,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
@@ -1101,11 +1103,6 @@ ExecEndIncrementalSort(IncrementalSortState *node)
node->prefixsort_state = NULL;
}
- /*
- * Shut down the subplan.
- */
- ExecEndNode(outerPlanState(node));
-
SO_printf("ExecEndIncrementalSort: sort node shutdown\n");
}
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0b43a9b969..ea8bef4b97 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -490,6 +490,7 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
{
IndexOnlyScanState *indexstate;
Relation currentRelation;
+ Relation indexRelation;
LOCKMODE lockmode;
TupleDesc tupDesc;
@@ -512,6 +513,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -564,7 +567,13 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->ioss_RelationDesc = index_open(node->indexid, lockmode);
+ indexRelation = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ {
+ index_close(indexRelation, lockmode);
+ return NULL;
+ }
+ indexstate->ioss_RelationDesc = indexRelation;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 4540c7781d..956e9e5543 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -904,6 +904,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
{
IndexScanState *indexstate;
Relation currentRelation;
+ Relation indexRelation;
LOCKMODE lockmode;
/*
@@ -925,6 +926,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -969,7 +972,13 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+ indexRelation = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ {
+ index_close(indexRelation, lockmode);
+ return NULL;
+ }
+ indexstate->iss_RelationDesc = indexRelation;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 425fbfc405..1cc884bc65 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -476,6 +476,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize child expressions
@@ -535,7 +537,6 @@ void
ExecEndLimit(LimitState *node)
{
ExecFreeExprContext(&node->ps);
- ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index e459971d32..77731c0c8c 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -322,6 +322,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
@@ -386,7 +388,6 @@ ExecEndLockRows(LockRowsState *node)
{
/* We may have shut down EPQ already, but no harm in another call */
EvalPlanQualEnd(&node->lr_epqstate);
- ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 09632678b0..a38b9805a5 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result type and slot. No need to initialize projection info
@@ -250,11 +252,6 @@ ExecEndMaterial(MaterialState *node)
if (node->tuplestorestate != NULL)
tuplestore_end(node->tuplestorestate);
node->tuplestorestate = NULL;
-
- /*
- * shut down the subplan
- */
- ExecEndNode(outerPlanState(node));
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 4f04269e26..a8997ba7da 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -938,6 +938,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize return slot and type. No need to initialize projection info
@@ -1099,11 +1101,6 @@ ExecEndMemoize(MemoizeState *node)
* free exprcontext
*/
ExecFreeExprContext(&node->ss.ps);
-
- /*
- * shut down the subplan
- */
- ExecEndNode(outerPlanState(node));
}
void
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 21b5726e6e..8718f20825 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -81,6 +81,27 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.state = estate;
mergestate->ps.ExecProcNode = ExecMergeAppend;
+ /*
+ * Must take locks on child tables if running a cached plan, because
+ * GetCachedPlan() would've only locked the root parent named in the
+ * query.
+ *
+ * First lock non-leaf partitions before doing pruning if any. Even when
+ * no pruning is to be done, non-leaf partitions still must be locked
+ * explicitly like this, because they're not referenced elsewhere in
+ * the plan tree. XXX - OTOH, non-leaf partitions mentioned in
+ * part_prune_info, if any, would be opened by ExecInitPartitionPruning()
+ * using ExecGetRangeTableRelation() which locks child tables, redundantly
+ * in this case.
+ */
+ if (estate->es_cachedplan)
+ {
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
+ }
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
@@ -95,6 +116,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
list_length(node->mergeplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -151,6 +174,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
mergestate->ps.ps_ProjInfo = NULL;
@@ -310,30 +335,14 @@ heap_compare_slots(Datum a, Datum b, void *arg)
/* ----------------------------------------------------------------
* ExecEndMergeAppend
- *
- * Shuts down the subscans of the MergeAppend node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndMergeAppend(MergeAppendState *node)
{
- PlanState **mergeplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- mergeplans = node->mergeplans;
- nplans = node->ms_nplans;
-
- /*
- * shut down each of the subscans
- */
- for (i = 0; i < nplans; i++)
- ExecEndNode(mergeplans[i]);
+ /*
+ * Nothing to do as subscans would be cleaned up by ExecEndPlan().
+ */
}
void
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 00f96d045e..c6644c6816 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1490,11 +1490,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
@@ -1654,12 +1658,6 @@ ExecEndMergeJoin(MergeJoinState *node)
ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
ExecClearTuple(node->mj_MarkedTupleSlot);
- /*
- * shut down the subplans
- */
- ExecEndNode(innerPlanState(node));
- ExecEndNode(outerPlanState(node));
-
MJ1_printf("ExecEndMergeJoin: %s\n",
"node processing ended");
}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 2a5fec8d01..0c3aeb1154 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3984,6 +3984,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
node->epqParam, node->resultRelations);
@@ -4011,6 +4014,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
if (resultRelInfo != mtstate->rootResultRelInfo)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* For child result relations, store the root result relation
@@ -4038,6 +4043,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Do additional per-result-relation initialization.
@@ -4460,11 +4467,6 @@ ExecEndModifyTable(ModifyTableState *node)
* Terminate EPQ execution if active
*/
EvalPlanQualEnd(&node->mt_epqstate);
-
- /*
- * shut down subplan
- */
- ExecEndNode(outerPlanState(node));
}
void
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b3d52e69ec..71a1f8101c 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -295,11 +295,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result slot, type and projection.
@@ -374,12 +378,6 @@ ExecEndNestLoop(NestLoopState *node)
*/
ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- /*
- * close down subplans
- */
- ExecEndNode(outerPlanState(node));
- ExecEndNode(innerPlanState(node));
-
NL1_printf("ExecEndNestLoop: %s\n",
"node processing ended");
}
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index f6ff3dc44c..abcbd7e765 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -247,6 +247,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* we don't use inner plan
@@ -329,11 +331,6 @@ ExecEndProjectSet(ProjectSetState *node)
* clean out the tuple table
*/
ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
- /*
- * shut down subplans
- */
- ExecEndNode(outerPlanState(node));
}
void
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index e781003934..84a706458a 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
@@ -280,12 +284,6 @@ ExecEndRecursiveUnion(RecursiveUnionState *node)
MemoryContextDelete(node->tempContext);
if (node->tableContext)
MemoryContextDelete(node->tableContext);
-
- /*
- * close down subplans
- */
- ExecEndNode(outerPlanState(node));
- ExecEndNode(innerPlanState(node));
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 4219712d30..330ca68d12 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -208,6 +208,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* we don't use inner plan
@@ -249,11 +251,6 @@ ExecEndResult(ResultState *node)
* clean out the tuple table
*/
ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
- /*
- * shut down subplans
- */
- ExecEndNode(outerPlanState(node));
}
void
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index d7e22b1dbb..22357e7a0e 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -125,6 +125,8 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 4da0f28f7b..b0b34cd14e 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,8 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 4bc2406b89..912cf7b37f 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
@@ -589,8 +591,6 @@ ExecEndSetOp(SetOpState *node)
if (node->tableContext)
MemoryContextDelete(node->tableContext);
ExecFreeExprContext(&node->ps);
-
- ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index c6c72c6e67..1ba53373c2 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
@@ -317,11 +319,6 @@ ExecEndSort(SortState *node)
tuplesort_end((Tuplesortstate *) node->tuplesortstate);
node->tuplesortstate = NULL;
- /*
- * shut down the subplan
- */
- ExecEndNode(outerPlanState(node));
-
SO1_printf("ExecEndSort: %s\n",
"sort node shutdown");
}
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 42471bfc04..12014250ae 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
@@ -178,11 +180,6 @@ ExecEndSubqueryScan(SubqueryScanState *node)
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
- /*
- * close down subquery
- */
- ExecEndNode(node->subplan);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 2124c55ef5..613b377c7c 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -386,6 +386,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 862bd0330b..1b0a2d8083 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -529,6 +529,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 45035d74fa..bd71033622 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -136,6 +136,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result slot and type. Unique nodes do no projections, so
@@ -172,8 +174,6 @@ ExecEndUnique(UniqueState *node)
ExecClearTuple(node->ps.ps_ResultTupleSlot);
ExecFreeExprContext(&node->ps);
-
- ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 310ac23e3a..483f23da18 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2458,6 +2458,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize source tuple type (which is also the tuple type that we'll
@@ -2681,7 +2683,6 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
void
ExecEndWindowAgg(WindowAggState *node)
{
- PlanState *outerPlan;
int i;
release_partition(node);
@@ -2713,9 +2714,6 @@ ExecEndWindowAgg(WindowAggState *node)
pfree(node->perfunc);
pfree(node->peragg);
-
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
}
/* -----------------
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 33975687b3..07b1f453e2 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -71,7 +71,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1623,6 +1623,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
_SPI_current->processed = 0;
_SPI_current->tuptable = NULL;
+replan:
/* Create the portal */
if (name == NULL || name[0] == '\0')
{
@@ -1766,7 +1767,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if portal->plan_valid is false which tells that the cached
+ * plan was found to have been invalidated when initializing one of the
+ * plan trees contained in it.
*/
PortalStart(portal, paramLI, 0, snapshot);
@@ -1775,6 +1779,12 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/* Pop the SPI stack */
_SPI_end_call(true);
@@ -2552,6 +2562,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2661,6 +2672,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2668,14 +2680,32 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+
+ ExecutorStart(qdesc, eflags);
+ if (!qdesc->plan_valid)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ Assert(cplan);
+ ReleaseCachedPlan(cplan, plan_owner);
+ goto replan;
+ }
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2850,10 +2880,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2897,14 +2926,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a672..c807e9cdcc 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -27,6 +27,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
@@ -364,6 +365,50 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
return false;
}
+/*
+ * CheckRelLockedByMe
+ *
+ * Returns true if current transaction holds a lock on the given relation of
+ * mode 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
+ */
+bool
+CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+ Oid dbId = get_rel_relisshared(relid) ? InvalidOid : MyDatabaseId;
+ LOCKTAG tag;
+
+ SET_LOCKTAG_RELATION(tag, dbId, relid);
+
+ if (LockHeldByMe(&tag, lockmode))
+ return true;
+
+ if (orstronger)
+ {
+ LOCKMODE slockmode;
+
+ for (slockmode = lockmode + 1;
+ slockmode <= MaxLockMode;
+ slockmode++)
+ {
+ if (LockHeldByMe(&tag, slockmode))
+ {
+#ifdef NOT_USED
+ /* Sometimes this might be useful for debugging purposes */
+ elog(WARNING, "lock mode %s substituted for %s on relation %s",
+ GetLockmodeName(tag.locktag_lockmethodid, slockmode),
+ GetLockmodeName(tag.locktag_lockmethodid, lockmode),
+ RelationGetRelationName(relation));
+#endif
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
/*
* LockHasWaitersRelation
*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 01b6cc1f7d..4931fb2da7 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1233,6 +1233,7 @@ exec_simple_query(const char *query_string)
* Start the portal. No parameters here.
*/
PortalStart(portal, NULL, 0, InvalidSnapshot);
+ Assert(portal->plan_valid);
/*
* Select the appropriate output format: text unless we are doing a
@@ -1737,6 +1738,7 @@ exec_bind_message(StringInfo input_message)
"commands ignored until end of transaction block"),
errdetail_abort()));
+replan:
/*
* Create the portal. Allow silent replacement of an existing portal only
* if the unnamed portal is specified.
@@ -2028,10 +2030,19 @@ exec_bind_message(StringInfo input_message)
PopActiveSnapshot();
/*
- * And we're ready to start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if portal->plan_valid is false which tells that the cached
+ * plan was found to have been invalidated when initializing one of the
+ * plan trees contained in it.
*/
PortalStart(portal, params, 0, InvalidSnapshot);
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/*
* Apply the result format requests to the portal.
*/
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5565f200c3..dab971ab0f 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -35,12 +36,6 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc);
static void FillPortalStore(Portal portal, bool isTopLevel);
static uint64 RunFromStore(Portal portal, ScanDirection direction, uint64 count,
DestReceiver *dest);
@@ -65,6 +60,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +73,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan, if plan is from one */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -116,86 +113,6 @@ FreeQueryDesc(QueryDesc *qdesc)
}
-/*
- * ProcessQuery
- * Execute a single plannable query within a PORTAL_MULTI_QUERY,
- * PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
- *
- * plan: the plan tree for the query
- * sourceText: the source text of the query
- * params: any parameters needed
- * dest: where to send results
- * qc: where to store the command completion status data.
- *
- * qc may be NULL if caller doesn't want a status string.
- *
- * Must be called in a memory context that will be reset or deleted on
- * error; otherwise the executor's memory usage will be leaked.
- */
-static void
-ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc)
-{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
-
- /*
- * Run the plan to completion.
- */
- ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
-
- /*
- * Build command completion status data, if caller wants one.
- */
- if (qc)
- {
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
- SetQueryCompletion(qc, CMDTAG_SELECT, queryDesc->estate->es_processed);
- break;
- case CMD_INSERT:
- SetQueryCompletion(qc, CMDTAG_INSERT, queryDesc->estate->es_processed);
- break;
- case CMD_UPDATE:
- SetQueryCompletion(qc, CMDTAG_UPDATE, queryDesc->estate->es_processed);
- break;
- case CMD_DELETE:
- SetQueryCompletion(qc, CMDTAG_DELETE, queryDesc->estate->es_processed);
- break;
- case CMD_MERGE:
- SetQueryCompletion(qc, CMDTAG_MERGE, queryDesc->estate->es_processed);
- break;
- default:
- SetQueryCompletion(qc, CMDTAG_UNKNOWN, queryDesc->estate->es_processed);
- break;
- }
- }
-
- /*
- * Now, we close down all the scans and free allocated resources.
- */
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
-
- FreeQueryDesc(queryDesc);
-}
-
/*
* ChoosePortalStrategy
* Select portal execution strategy given the intended statement list.
@@ -427,7 +344,8 @@ FetchStatementTargetList(Node *stmt)
* to be used for cursors).
*
* On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * tupdesc (if any) is known, unless portal->plan_valid is set to false, in
+ * which case, the caller must retry after generating a new CachedPlan.
*/
void
PortalStart(Portal portal, ParamListInfo params,
@@ -435,10 +353,9 @@ PortalStart(Portal portal, ParamListInfo params,
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
- int myeflags;
+ int myeflags = 0;
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_DEFINED);
@@ -448,15 +365,13 @@ PortalStart(Portal portal, ParamListInfo params,
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -472,6 +387,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -493,6 +410,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -501,30 +419,52 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti(). */
+ if (portal->strategy == PORTAL_ONE_RETURNING ||
+ portal->strategy == PORTAL_ONE_MOD_WITH)
+ portal->qdescs = list_make1(queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Call ExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated as we're doing that.
*/
ExecutorStart(queryDesc, myeflags);
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ PopActiveSnapshot();
+ portal->plan_valid = false;
+ goto early_exit;
+ }
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -532,33 +472,11 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -578,11 +496,87 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool first = true;
+
+ myeflags = eflags;
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /*
+ * Push the snapshot to be used by the executor.
+ */
+ if (!is_utility)
+ {
+ /*
+ * Must copy the snapshot for all statements
+ * except thec first as we'll need to update its
+ * command ID.
+ */
+ if (!first)
+ PushCopiedSnapshot(GetTransactionSnapshot());
+ else
+ PushActiveSnapshot(GetTransactionSnapshot());
+ }
+
+ /*
+ * From the 2nd statement onwards, update the command
+ * ID and the snapshot to match.
+ */
+ if (!first)
+ {
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+ }
+
+ first = false;
+
+ /*
+ * Create the QueryDesc object. DestReceiver will
+ * be set in PortalRunMulti().
+ */
+ queryDesc = CreateQueryDesc(plan, portal->cplan,
+ portal->sourceText,
+ !is_utility ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ if (is_utility)
+ continue;
+
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated as
+ * we're doing that.
+ */
+ ExecutorStart(queryDesc, myeflags);
+ PopActiveSnapshot();
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ portal->plan_valid = false;
+ goto early_exit;
+ }
+ }
+ }
+
portal->tupDesc = NULL;
+ portal->plan_valid = true;
break;
}
}
@@ -594,19 +588,18 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+early_exit:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
-
- portal->status = PORTAL_READY;
}
/*
@@ -1193,7 +1186,7 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1207,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = (QueryDesc *) lfirst(qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1233,33 +1227,26 @@ PortalRunMulti(Portal portal,
if (log_executor_stats)
ResetUsage();
- /*
- * Must always have a snapshot for plannable queries. First time
- * through, take a new snapshot; for subsequent queries in the
- * same portal, just update the snapshot's copy of the command
- * counter.
- */
+ /* Push the snapshot for plannable queries. */
if (!active_snapshot_set)
{
- Snapshot snapshot = GetTransactionSnapshot();
+ Snapshot snapshot = qdesc->snapshot;
- /* If told to, register the snapshot and save in portal */
+ /*
+ * If told to, register the snapshot and save in portal
+ *
+ * Note that the command ID of qdesc->snapshot for 2nd query
+ * onwards would have been updated in PortalStart() to account
+ * for CCI() done between queries, but it's OK that here we
+ * don't likewise update holdSnapshot's command ID.
+ */
if (setHoldSnapshot)
{
snapshot = RegisterSnapshot(snapshot);
portal->holdSnapshot = snapshot;
}
- /*
- * We can't have the holdSnapshot also be the active one,
- * because UpdateActiveSnapshotCommandId would complain. So
- * force an extra snapshot copy. Plain PushActiveSnapshot
- * would have copied the transaction snapshot anyway, so this
- * only adds a copy step when setHoldSnapshot is true. (It's
- * okay for the command ID of the active snapshot to diverge
- * from what holdSnapshot has.)
- */
- PushCopiedSnapshot(snapshot);
+ PushActiveSnapshot(snapshot);
/*
* As for PORTAL_ONE_SELECT portals, it does not seem
@@ -1268,26 +1255,39 @@ PortalRunMulti(Portal portal,
active_snapshot_set = true;
}
- else
- UpdateActiveSnapshotCommandId();
+ /*
+ * Run the plan to completion.
+ */
+ qdesc->dest = dest;
+ ExecutorRun(qdesc, ForwardScanDirection, 0, true);
+
+ /*
+ * Build command completion status data if needed.
+ */
if (pstmt->canSetTag)
{
- /* statement can set tag string */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- dest, qc);
- }
- else
- {
- /* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- altdest, NULL);
+ switch (qdesc->operation)
+ {
+ case CMD_SELECT:
+ SetQueryCompletion(qc, CMDTAG_SELECT, qdesc->estate->es_processed);
+ break;
+ case CMD_INSERT:
+ SetQueryCompletion(qc, CMDTAG_INSERT, qdesc->estate->es_processed);
+ break;
+ case CMD_UPDATE:
+ SetQueryCompletion(qc, CMDTAG_UPDATE, qdesc->estate->es_processed);
+ break;
+ case CMD_DELETE:
+ SetQueryCompletion(qc, CMDTAG_DELETE, qdesc->estate->es_processed);
+ break;
+ case CMD_MERGE:
+ SetQueryCompletion(qc, CMDTAG_MERGE, qdesc->estate->es_processed);
+ break;
+ default:
+ SetQueryCompletion(qc, CMDTAG_UNKNOWN, qdesc->estate->es_processed);
+ break;
+ }
}
if (log_executor_stats)
@@ -1342,12 +1342,12 @@ PortalRunMulti(Portal portal,
if (portal->stmts == NIL)
break;
- /*
- * Increment command counter between queries, but not after the last
- * one.
- */
- if (lnext(portal->stmts, stmtlist_item) != NULL)
- CommandCounterIncrement();
+ if (qdesc->estate)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ }
+ FreeQueryDesc(qdesc);
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 60978f9415..de3fc756e2 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2073,6 +2073,27 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * get_rel_relisshared
+ *
+ * Returns if the given relation is shared or not
+ */
+bool
+get_rel_relisshared(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relisshared;
+ ReleaseSysCache(tp);
+
+ return result;
+}
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 3d3f7a9bea..e6237d70b3 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -102,13 +102,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,8 +790,14 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * Note though that if the plan contains any child relations that would have
+ * been added by the planner, which would not have been locked yet (because
+ * AcquirePlannerLocks() only locks relations that would be present in the
+ * range table before entering the planner), the plan could go stale before
+ * it reaches execution if any of those child relations get modified
+ * concurrently. The executor must check that the plan (CachedPlan) is still
+ * valid after taking a lock on each of the child tables, and if it is not,
+ * ask the caller to recreate the plan.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -805,60 +811,56 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
-
- return false;
+ return cplan->is_valid;
}
/*
@@ -1128,8 +1130,15 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid unless it contains inheritance/partition child
+ * tables, that is, only the locks on the tables mentioned in the query have
+ * been taken. If any of those tables have inheritance/partition tables, the
+ * executor must also lock them before executing the plan and if the plan gets
+ * invalidated as a result of taking those locks, must ask the caller to get
+ * a new plan by calling here again. Locking of the child tables must be
+ * deferred to the executor like this, because not all child tables may need
+ * to be locked; some may get pruned during the executor plan initialization
+ * phase (InitPlan()).
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1362,8 +1371,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if the executor would need to take additional locks, that is, in
+ * addition to those taken by AcquirePlannerLocks() on a given query.
*/
foreach(lc, plan->stmt_list)
{
@@ -1737,58 +1746,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 06dfa85f04..0cad450dcd 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -201,6 +201,13 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /*
+ * initialize portal's query context to store QueryDescs created during
+ * PortalStart() and then used in PortalRun().
+ */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -224,6 +231,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -594,6 +602,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3d3e632a0c..392abb5150 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -88,7 +88,11 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt, struct CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv);
+extern void ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
@@ -104,6 +108,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index af2bf36dfb..c36c25b497 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -32,9 +32,12 @@
*/
typedef struct QueryDesc
{
+ NodeTag type;
+
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ struct CachedPlan *cplan; /* CachedPlan, if plannedstmt is from one */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -47,6 +50,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ bool plan_valid; /* is planstate tree fully valid? */
/* This field is set by ExecutorRun */
bool already_executed; /* true if previously executed */
@@ -57,6 +61,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ struct CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ac02247947..640b905973 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -256,6 +257,17 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the cached plan, if any, still valid at this point? That is, not
+ * invalidated by the incoming invalidation messages that have been processed
+ * recently.
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanStillValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
@@ -590,6 +602,7 @@ exec_rt_fetch(Index rti, EState *estate)
}
extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
+extern void ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids);
extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Index rti);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cb714f4a19..f0c5177b06 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -623,6 +623,8 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct CachedPlan *es_cachedplan; /* CachedPlan if plannedstmt is from
+ * one */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -671,6 +673,10 @@ typedef struct EState
List *es_exprcontexts; /* List of ExprContexts within EState */
+ List *es_inited_plannodes; /* List of PlanState of nodes from the
+ * plan tree that were fully
+ * initialized */
+
List *es_subplanstates; /* List of PlanState for SubPlans */
List *es_auxmodifytables; /* List of secondary ModifyTableStates */
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 4ee91e3cf9..598bf2688a 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,7 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
bool orstronger);
+extern bool CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger);
extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 4f5418b972..3074e604dd 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -139,6 +139,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_relisshared(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a443181d41..8990fe72e3 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -221,6 +221,20 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Called by the executor on every relation lock taken when initializing the
+ * plan tree in the CachedPlan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aa08b1e0fc..24d420b9e9 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,9 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
+ bool plan_valid; /* are plans in qdescs ready for execution? */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +245,7 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalQueryFinish(QueryDesc *queryDesc);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..2fca84d027 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-replan
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7cd76eb34b..515b2c0c95 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2023, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,41 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static void
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ queryDesc->cplan->is_valid ? "valid" : "not valid");
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +123,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-replan.out b/src/test/modules/delay_execution/expected/cached-plan-replan.out
new file mode 100644
index 0000000000..0ac6a17c2b
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-replan.out
@@ -0,0 +1,156 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo11 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = $1)
+(6 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-----------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo11 foo_1
+ Filter: (a = $1)
+(4 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------
+Bitmap Heap Scan on foo11 foo
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = 1)
+(4 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------
+Seq Scan on foo11 foo
+ Filter: (a = 1)
+(2 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+----------------------------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Index Only Scan using foo11_a_idx on foo11 t1
+ -> Materialize
+ -> Index Scan using foo11_a_idx on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(18 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Sort
+ Sort Key: t1.a
+ -> Seq Scan on foo11 t1
+ -> Sort
+ Sort Key: t2.a
+ -> Seq Scan on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(21 rows)
+
diff --git a/src/test/modules/delay_execution/specs/cached-plan-replan.spec b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
new file mode 100644
index 0000000000..3c92cbd5c6
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
@@ -0,0 +1,61 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo1 PARTITION OF foo FOR VALUES IN (1) PARTITION BY LIST (a);
+ CREATE TABLE foo11 PARTITION OF foo1 FOR VALUES IN (1);
+ CREATE INDEX foo11_a ON foo1 (a);
+ CREATE TABLE foo2 PARTITION OF foo FOR VALUES IN (2);
+ CREATE VIEW foov AS SELECT * FROM foo;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP TABLE foo;
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# no Append case (only one partition selected by the planner)
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Append with partition-wise join aggregate and join plans as child subplans
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo11_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
--
2.35.3
v40-0001-Add-field-to-store-parent-relids-to-Append-Merge.patchapplication/octet-stream; name=v40-0001-Add-field-to-store-parent-relids-to-Append-Merge.patchDownload
From 54554c16763831037b48d7c7686f883c2c882108 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:31 +0900
Subject: [PATCH v40 1/4] Add field to store parent relids to
Append/MergeAppend
There's no way currently in the executor to tell if the child
subplans of Append/MergeAppend are scanning partitions, and if
they indeed do, what the RT indexes of their parent/ancestor tables
are. Executor doesn't need to see their RT indexes except for
run-time pruning, in which case they can can be found in the
PartitionPruneInfo, but a future commit will create a need for
them to be available at all times for the purpose of locking
those parent/ancestor tables when executing a cached plan.
The code to look up partitioned parent relids for a given list of
partition scan subpaths of an Append/MergeAppend is already present
in make_partition_pruneinfo() but it's local to partprune.c. This
commit refactors that code into its own function called
add_append_subpath_partrelids() defined in appendinfo.c and
generalizes it to consider child join and aggregate paths. To
facilitate looking up of parent rels of child grouping rels in
add_append_subpath_partrelids(), parent links are now also set in
the RelOptInfos of child grouping rels too, like they are in
those of child base and join rels.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/optimizer/plan/createplan.c | 41 ++++++--
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/plan/setrefs.c | 4 +
src/backend/optimizer/util/appendinfo.c | 134 ++++++++++++++++++++++++
src/backend/partitioning/partprune.c | 124 +++-------------------
src/include/nodes/plannodes.h | 14 +++
src/include/optimizer/appendinfo.h | 3 +
src/include/partitioning/partprune.h | 3 +-
8 files changed, 203 insertions(+), 123 deletions(-)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ec73789bc2..5ee2b5b7f9 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -25,6 +25,7 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
@@ -1210,6 +1211,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
Oid *nodeCollations = NULL;
bool *nodeNullsFirst = NULL;
bool consider_async = false;
+ List *allpartrelids = NIL;
/*
* The subpaths list could be empty, if every child was proven empty by
@@ -1351,15 +1353,23 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
++nasyncplans;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ plan->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1380,7 +1390,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
partpruneinfo =
make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
plan->appendplans = subplans;
@@ -1426,6 +1437,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ List *allpartrelids = NIL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1515,15 +1527,23 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplan = (Plan *) sort;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ node->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1535,7 +1555,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
partpruneinfo = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
node->mergeplans = subplans;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0e12fdeb60..7bc6eec364 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7835,8 +7835,11 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
agg_costs, gd, &child_extra,
&child_partially_grouped_rel);
+ /* Mark as child of grouped_rel. */
+ child_grouped_rel->parent = grouped_rel;
if (child_partially_grouped_rel)
{
+ child_partially_grouped_rel->parent = grouped_rel;
partially_grouped_live_children =
lappend(partially_grouped_live_children,
child_partially_grouped_rel);
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index c63758cb2b..396c83e357 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1754,6 +1754,8 @@ set_append_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) aplan, rtoffset);
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
+ foreach(l, aplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (aplan->part_prune_info)
{
@@ -1830,6 +1832,8 @@ set_mergeappend_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) mplan, rtoffset);
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
+ foreach(l, mplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (mplan->part_prune_info)
{
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index f456b3b0a4..5bd8e82b9b 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -41,6 +41,7 @@ static void make_inh_translation_list(Relation oldrelation,
AppendRelInfo *appinfo);
static Node *adjust_appendrel_attrs_mutator(Node *node,
adjust_appendrel_attrs_context *context);
+static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
/*
@@ -1035,3 +1036,136 @@ distribute_row_identity_vars(PlannerInfo *root)
}
}
}
+
+/*
+ * add_append_subpath_partrelids
+ * Look up a child subpath's rel's partitioned parent relids up to
+ * parentrel and add the bitmapset containing those into
+ * 'allpartrelids'
+ */
+List *
+add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids)
+{
+ RelOptInfo *prel = subpath->parent;
+ Relids partrelids = NULL;
+
+ /* Nothing to do if there's no parent to begin with. */
+ if (!IS_OTHER_REL(prel))
+ return allpartrelids;
+
+ /*
+ * Traverse up to the pathrel's topmost partitioned parent, collecting
+ * parent relids as we go; but stop if we reach parentrel. (Normally, a
+ * pathrel's topmost partitioned parent is either parentrel or a UNION ALL
+ * appendrel child of parentrel. But when handling partitionwise joins of
+ * multi-level partitioning trees, we can see an append path whose
+ * parentrel is an intermediate partitioned table.)
+ */
+ do
+ {
+ Relids parent_relids = NULL;
+
+ /*
+ * For simple child rels, we can simply set the parent_relids to
+ * prel->parent->relids. But for partitionwise join and aggregate
+ * child rels, while we can use prel->parent to move up the tree,
+ * parent_relids must be found the hard way through AppendInfoInfos,
+ * because 1) a joinrel's relids may point to RTE_JOIN entries,
+ * 2) topmost parent grouping rel's relids field is NULL.
+ */
+ if (IS_SIMPLE_REL(prel))
+ {
+ prel = prel->parent;
+ /* Stop once we reach the root partitioned rel. */
+ if (!IS_PARTITIONED_REL(prel))
+ break;
+ parent_relids = bms_add_members(parent_relids, prel->relids);
+ }
+ else
+ {
+ AppendRelInfo **appinfos;
+ int nappinfos,
+ i;
+
+ appinfos = find_appinfos_by_relids(root, prel->relids,
+ &nappinfos);
+ for (i = 0; i < nappinfos; i++)
+ {
+ AppendRelInfo *appinfo = appinfos[i];
+
+ parent_relids = bms_add_member(parent_relids,
+ appinfo->parent_relid);
+ }
+ pfree(appinfos);
+ prel = prel->parent;
+ }
+ /* accept this level as an interesting parent */
+ partrelids = bms_add_members(partrelids, parent_relids);
+ if (prel == parentrel)
+ break; /* don't traverse above parentrel */
+ } while (IS_OTHER_REL(prel));
+
+ if (partrelids == NULL)
+ return allpartrelids;
+
+ return add_part_relids(allpartrelids, partrelids);
+}
+
+/*
+ * add_part_relids
+ * Add new info to a list of Bitmapsets of partitioned relids.
+ *
+ * Within 'allpartrelids', there is one Bitmapset for each topmost parent
+ * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
+ * parent as well as its relevant non-leaf child partitions. Since (by
+ * construction of the rangetable list) parent partitions must have lower
+ * RT indexes than their children, we can distinguish the topmost parent
+ * as being the lowest set bit in the Bitmapset.
+ *
+ * 'partrelids' contains the RT indexes of a parent partitioned rel, and
+ * possibly some non-leaf children, that are newly identified as parents of
+ * some subpath rel passed to make_partition_pruneinfo(). These are added
+ * to an appropriate member of 'allpartrelids'.
+ *
+ * Note that the list contains only RT indexes of partitioned tables that
+ * are parents of some scan-level relation appearing in the 'subpaths' that
+ * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
+ * not allowed to be higher than the 'parentrel' associated with the append
+ * path. In this way, we avoid expending cycles on partitioned rels that
+ * can't contribute useful pruning information for the problem at hand.
+ * (It is possible for 'parentrel' to be a child partitioned table, and it
+ * is also possible for scan-level relations to be child partitioned tables
+ * rather than leaf partitions. Hence we must construct this relation set
+ * with reference to the particular append path we're dealing with, rather
+ * than looking at the full partitioning structure represented in the
+ * RelOptInfos.)
+ */
+static List *
+add_part_relids(List *allpartrelids, Bitmapset *partrelids)
+{
+ Index targetpart;
+ ListCell *lc;
+
+ /* We can easily get the lowest set bit this way: */
+ targetpart = bms_next_member(partrelids, -1);
+ Assert(targetpart > 0);
+
+ /* Look for a matching topmost parent */
+ foreach(lc, allpartrelids)
+ {
+ Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
+ Index currtarget = bms_next_member(currpartrelids, -1);
+
+ if (targetpart == currtarget)
+ {
+ /* Found a match, so add any new RT indexes to this hierarchy */
+ currpartrelids = bms_add_members(currpartrelids, partrelids);
+ lfirst(lc) = currpartrelids;
+ return allpartrelids;
+ }
+ }
+ /* No match, so add the new partition hierarchy to the list */
+ return lappend(allpartrelids, partrelids);
+}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 7179b22a05..213512a5f4 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -138,7 +138,6 @@ typedef struct PruneStepResult
} PruneStepResult;
-static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
RelOptInfo *parentrel,
List *prunequal,
@@ -218,33 +217,32 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'allpartrelids' contains Bitmapsets of RT indexes of partitioned parents
+ * whose partitions' Paths are in 'subpaths'; there's one Bitmapset for every
+ * partition tree involved.
*/
PartitionPruneInfo *
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ List *allpartrelids)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
- List *allpartrelids;
List *prunerelinfos;
int *relid_subplan_map;
ListCell *lc;
int i;
+ Assert(list_length(allpartrelids) > 0);
+
/*
- * Scan the subpaths to see which ones are scans of partition child
- * relations, and identify their parent partitioned rels. (Note: we must
- * restrict the parent partitioned rels to be parentrel or children of
- * parentrel, otherwise we couldn't translate prunequal to match.)
- *
- * Also construct a temporary array to map from partition-child-relation
- * relid to the index in 'subpaths' of the scan plan for that partition.
+ * Construct a temporary array to map from partition-child-relation relid
+ * to the index in 'subpaths' of the scan plan for that partition.
* (Use of "subplan" rather than "subpath" is a bit of a misnomer, but
* we'll let it stand.) For convenience, we use 1-based indexes here, so
* that zero can represent an un-filled array entry.
*/
- allpartrelids = NIL;
relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
i = 1;
@@ -253,50 +251,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Path *path = (Path *) lfirst(lc);
RelOptInfo *pathrel = path->parent;
- /* We don't consider partitioned joins here */
- if (pathrel->reloptkind == RELOPT_OTHER_MEMBER_REL)
- {
- RelOptInfo *prel = pathrel;
- Bitmapset *partrelids = NULL;
-
- /*
- * Traverse up to the pathrel's topmost partitioned parent,
- * collecting parent relids as we go; but stop if we reach
- * parentrel. (Normally, a pathrel's topmost partitioned parent
- * is either parentrel or a UNION ALL appendrel child of
- * parentrel. But when handling partitionwise joins of
- * multi-level partitioning trees, we can see an append path whose
- * parentrel is an intermediate partitioned table.)
- */
- do
- {
- AppendRelInfo *appinfo;
-
- Assert(prel->relid < root->simple_rel_array_size);
- appinfo = root->append_rel_array[prel->relid];
- prel = find_base_rel(root, appinfo->parent_relid);
- if (!IS_PARTITIONED_REL(prel))
- break; /* reached a non-partitioned parent */
- /* accept this level as an interesting parent */
- partrelids = bms_add_member(partrelids, prel->relid);
- if (prel == parentrel)
- break; /* don't traverse above parentrel */
- } while (prel->reloptkind == RELOPT_OTHER_MEMBER_REL);
-
- if (partrelids)
- {
- /*
- * Found some relevant parent partitions, which may or may not
- * overlap with partition trees we already found. Add new
- * information to the allpartrelids list.
- */
- allpartrelids = add_part_relids(allpartrelids, partrelids);
- /* Also record the subplan in relid_subplan_map[] */
- /* No duplicates please */
- Assert(relid_subplan_map[pathrel->relid] == 0);
- relid_subplan_map[pathrel->relid] = i;
- }
- }
+ /* No duplicates please */
+ Assert(relid_subplan_map[pathrel->relid] == 0);
+ relid_subplan_map[pathrel->relid] = i;
i++;
}
@@ -362,63 +319,6 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return pruneinfo;
}
-/*
- * add_part_relids
- * Add new info to a list of Bitmapsets of partitioned relids.
- *
- * Within 'allpartrelids', there is one Bitmapset for each topmost parent
- * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
- * parent as well as its relevant non-leaf child partitions. Since (by
- * construction of the rangetable list) parent partitions must have lower
- * RT indexes than their children, we can distinguish the topmost parent
- * as being the lowest set bit in the Bitmapset.
- *
- * 'partrelids' contains the RT indexes of a parent partitioned rel, and
- * possibly some non-leaf children, that are newly identified as parents of
- * some subpath rel passed to make_partition_pruneinfo(). These are added
- * to an appropriate member of 'allpartrelids'.
- *
- * Note that the list contains only RT indexes of partitioned tables that
- * are parents of some scan-level relation appearing in the 'subpaths' that
- * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
- * not allowed to be higher than the 'parentrel' associated with the append
- * path. In this way, we avoid expending cycles on partitioned rels that
- * can't contribute useful pruning information for the problem at hand.
- * (It is possible for 'parentrel' to be a child partitioned table, and it
- * is also possible for scan-level relations to be child partitioned tables
- * rather than leaf partitions. Hence we must construct this relation set
- * with reference to the particular append path we're dealing with, rather
- * than looking at the full partitioning structure represented in the
- * RelOptInfos.)
- */
-static List *
-add_part_relids(List *allpartrelids, Bitmapset *partrelids)
-{
- Index targetpart;
- ListCell *lc;
-
- /* We can easily get the lowest set bit this way: */
- targetpart = bms_next_member(partrelids, -1);
- Assert(targetpart > 0);
-
- /* Look for a matching topmost parent */
- foreach(lc, allpartrelids)
- {
- Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
- Index currtarget = bms_next_member(currpartrelids, -1);
-
- if (targetpart == currtarget)
- {
- /* Found a match, so add any new RT indexes to this hierarchy */
- currpartrelids = bms_add_members(currpartrelids, partrelids);
- lfirst(lc) = currpartrelids;
- return allpartrelids;
- }
- }
- /* No match, so add the new partition hierarchy to the list */
- return lappend(allpartrelids, partrelids);
-}
-
/*
* make_partitionedrel_pruneinfo
* Build a List of PartitionedRelPruneInfos, one for each interesting
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 1b787fe031..7a5f3ba625 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -267,6 +267,13 @@ typedef struct Append
List *appendplans;
int nasyncplans; /* # of asynchronous plans */
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * appendplans. The list contains a bitmapset for every partition tree
+ * covered by this Append.
+ */
+ List *allpartrelids;
+
/*
* All 'appendplans' preceding this index are non-partial plans. All
* 'appendplans' from this index onwards are partial plans.
@@ -291,6 +298,13 @@ typedef struct MergeAppend
List *mergeplans;
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * mergeplans. The list contains a bitmapset for every partition tree
+ * covered by this MergeAppend.
+ */
+ List *allpartrelids;
+
/* these fields are just like the sort-key info in struct Sort: */
/* number of sort-key columns */
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index a05f91f77d..1621a7319a 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -46,5 +46,8 @@ extern void add_row_identity_columns(PlannerInfo *root, Index rtindex,
RangeTblEntry *target_rte,
Relation target_relation);
extern void distribute_row_identity_vars(PlannerInfo *root);
+extern List *add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids);
#endif /* APPENDINFO_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 8636e04e37..caa774a111 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -73,7 +73,8 @@ typedef struct PartitionPruneContext
extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ List *allpartrelids);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
On Thu, Jul 6, 2023 at 11:29 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Mon, Jul 3, 2023 at 10:27 PM Daniel Gustafsson <daniel@yesql.se> wrote:
On 8 Jun 2023, at 16:23, Amit Langote <amitlangote09@gmail.com> wrote:
Here is a new version.The local planstate variable in the hunk below is shadowing the function
parameter planstate which cause a compiler warning:Thanks Daniel for the heads up.
Attached new version fixes that and contains a few other notable
changes. Before going into the details of those changes, let me
reiterate in broad strokes what the patch is trying to do.The idea is to move the locking of some tables referenced in a cached
(generic) plan from plancache/GetCachedPlan() to the
executor/ExecutorStart(). Specifically, the locking of inheritance
child tables. Why? Because partition pruning with "initial pruning
steps" contained in the Append/MergeAppend nodes may eliminate some
child tables that need not have been locked to begin with, though the
pruning can only occur during ExecutorStart().After applying this patch, GetCachedPlan() only locks the tables that
are directly mentioned in the query to ensure that the
analyzed-rewritten-but-unplanned query tree backing a given CachedPlan
is still valid (cf RevalidateCachedQuery()), but not the tables in the
CachedPlan that would have been added by the planner. Tables in a
CachePlan that would not be locked currently only include the
inheritance child tables / partitions of the tables mentioned in the
query. This means that the plan trees in a given CachedPlan returned
by GetCachedPlan() are only partially valid and are subject to
invalidation because concurrent sessions can possibly modify the child
tables referenced in them before ExecutorStart() gets around to
locking them. If the concurrent modifications do happen,
ExecutorStart() is now equipped to detect them by way of noticing that
the CachedPlan is invalidated and inform the caller to discard and
recreate the CachedPlan. This entails changing all the call sites of
ExecutorStart() that pass it a plan tree from a CachedPlan to
implement the replan-and-retry-execution loop.Given the above, ExecutorStart(), which has not needed so far to take
any locks (except on indexes mentioned in IndexScans), now needs to
lock child tables if executing a cached plan which contains them. In
the previous versions, the patch used a flag passed in
EState.es_top_eflags to signal ExecGetRangeTableRelation() to lock the
table. The flag would be set in ExecInitAppend() and
ExecInitMergeAppend() for the duration of the loop that initializes
child subplans with the assumption that that's where the child tables
would be opened. But not all child subplans of Append/MergeAppend
scan child tables (think UNION ALL queries), so this approach can
result in redundant locking. Worse, I needed to invent
PlannedStmt.elidedAppendChildRelations to separately track child
tables whose Scan nodes' parent Append/MergeAppend would be removed by
setrefs.c in some cases.So, this new patch uses a flag in the RangeTblEntry itself to denote
if the table is a child table instead of the above roundabout way.
ExecGetRangeTableRelation() can simply look at the RTE to decide
whether to take a lock or not. I considered adding a new bool field,
but noticed we already have inFromCl to track if a given RTE is for
table/entity directly mentioned in the query or for something added
behind-the-scenes into the range table as the field's description in
parsenodes.h says. RTEs for child tables are added behind-the-scenes
by the planner and it makes perfect sense to me to mark their inFromCl
as false. I can't find anything that relies on the current behavior
of inFromCl being set to the same value as the root inheritance parent
(true). Patch 0002 makes this change for child RTEs.A few other notes:
* A parallel worker does ExecutorStart() without access to the
CachedPlan that the leader may have gotten its plan tree from. This
means that parallel workers do not have the ability to detect plan
tree invalidations. I think that's fine, because if the leader would
have been able to launch workers at all, it would also have gotten all
the locks to protect the (portion of) the plan tree that the workers
would be executing. I had an off-list discussion about this with
Robert and he mentioned his concern that each parallel worker would
have its own view of which child subplans of a parallel Append are
"valid" that depends on the result of its own evaluation of initial
pruning. So, there may be race conditions whereby a worker may try
to execute plan nodes that are no longer valid, for example, if the
partition a worker considers valid is not viewed as such by the leader
and thus not locked. I shared my thoughts as to why that sounds
unlikely at [1], though maybe I'm a bit too optimistic?* For multi-query portals, you can't now do ExecutorStart()
immediately followed by ExecutorRun() for each query in the portal,
because ExecutorStart() may now fail to start a plan if it gets
invalidated. So PortalStart() now does ExecutorStart()s for all
queries and remembers the QueryDescs for PortalRun() then to do
ExecutorRun()s using. A consequence of this is that
CommandCounterIncrement() now must be done between the
ExecutorStart()s of the individual plans in PortalStart() and not
between the ExecutorRun()s in PortalRunMulti(). make check-world
passes with this new arrangement, though I'm not entirely confident
that there are no problems lurking.
In an absolutely brown-paper-bag moment, I realized that I had not
updated src/backend/executor/README to reflect the changes to the
executor's control flow that this patch makes. That is, after
scrapping the old design back in January whose details *were*
reflected in the patches before that redesign.
Anyway, the attached fixes that.
Tom, do you think you have bandwidth in the near future to give this
another look? I think I've addressed the comments that you had given
back in April, though as mentioned in the previous message, there may
still be some funny-looking aspects still remaining. In any case, I
have no intention of pressing ahead with the patch without another
committer having had a chance to sign off on it.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v41-0003-Delay-locking-of-child-tables-in-cached-plans-un.patchapplication/x-patch; name=v41-0003-Delay-locking-of-child-tables-in-cached-plans-un.patchDownload
From 17f8ce26b86c3e8aebfd5279fc6bbdc7997894f8 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:45 +0900
Subject: [PATCH v41 3/4] Delay locking of child tables in cached plans until
ExecutorStart()
Currently, GetCachedPlan() takes a lock on all relations contained in
a cached plan before returning it as a valid plan to its callers for
execution. One disadvantage is that if the plan contains partitions
that are prunable with conditions involving EXTERN parameters and
other stable expressions (known as "initial pruning"), many of them
would be locked unnecessarily, because only those that survive
initial pruning need to have been locked. Locking all partitions this
way causes significant delay when there are many partitions. Note
that initial pruning occurs during executor's initialization of the
plan, that is, InitPlan().
This commit rearranges things to move the locking of child tables
referenced in a cached plan to occur during InitPlan() so that
initial pruning can eliminate any child tables that need not be
scanned and thus locked.
To determine that a given table is a child table,
ExecGetRangeTableRelation() now looks at the RTE's inFromCl field,
which is only true for tables that are directly mentioned in the
query but false for child tables. Note that any tables whose RTEs'
inFromCl is true would already have been locked by GetCachedPlan(),
so need not be locked again during execution.
If the locking of child tables causes the CachedPlan to go stale, that
is, its is_valid set to false by PlanCacheRelCallback() when an
invalidation message matching some child table contained in the plan
is processed, ExecInitNode() abandons the initialization of the
remaining nodes in the plan tree. In that case, InitPlan() returns
after setting QueryDesc.planstate to NULL to indicate to the caller
that no execution is possible with the plan tree as is. Though some
plan tree subnodes may get fully initialized by ExecInitNode() before
the CachedPlan's invalidation is detected, so to ensure that they
are released by ExecEndPlan(), ExecInitNode() now adds the PlanState
nodes of the nodes that are fully initialized to a new List in
EState called es_inited_plannodes. ExecEndPlan() releases them
individually by calling ExecEndNode() on each element of the new
List. ExecEndNode() is no longer recursive, because all nodes that
need to be closed can be found in es_inited_plannodes.
Call sites that use GetCachedPlan() to get the plan trees to pass to
the executor should now be prepared to handle the case where the old
CachedPlan gets invalidated during ExecutorStart() as described
above. So this commit refactors the relevant code sites to move the
ExecutorStart() call closer to the GetCachedPlan() to implement the
replan loop conveniently.
Given this new behavior, PortalStart() now must always perform
ExecutorStart() to be able to drop and recreate cached plans if
needed, which is currently only done so for single-query portals.
For multi-query portals, the QueryDescs that are now created during
PortalStart() are remembered in a new List field of Portal called
'qdescs' and allocated in a new memory context 'queryContext'.
PortalRunMulti() now simply performs ExecutorRun() on the
QueryDescs found in 'qdescs'.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
contrib/postgres_fdw/postgres_fdw.c | 4 +
src/backend/commands/copyto.c | 4 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 145 +++++---
src/backend/commands/extension.c | 2 +
src/backend/commands/matview.c | 3 +-
src/backend/commands/portalcmds.c | 16 +-
src/backend/commands/prepare.c | 32 +-
src/backend/executor/README | 39 +-
src/backend/executor/execMain.c | 106 +++++-
src/backend/executor/execParallel.c | 12 +-
src/backend/executor/execPartition.c | 14 +
src/backend/executor/execProcnode.c | 50 ++-
src/backend/executor/execUtils.c | 63 +++-
src/backend/executor/functions.c | 2 +
src/backend/executor/nodeAgg.c | 6 +-
src/backend/executor/nodeAppend.c | 48 ++-
src/backend/executor/nodeBitmapAnd.c | 31 +-
src/backend/executor/nodeBitmapHeapscan.c | 9 +-
src/backend/executor/nodeBitmapIndexscan.c | 9 +-
src/backend/executor/nodeBitmapOr.c | 31 +-
src/backend/executor/nodeCustom.c | 2 +
src/backend/executor/nodeForeignscan.c | 8 +-
src/backend/executor/nodeGather.c | 4 +-
src/backend/executor/nodeGatherMerge.c | 3 +-
src/backend/executor/nodeGroup.c | 7 +-
src/backend/executor/nodeHash.c | 10 +-
src/backend/executor/nodeHashjoin.c | 10 +-
src/backend/executor/nodeIncrementalSort.c | 7 +-
src/backend/executor/nodeIndexonlyscan.c | 11 +-
src/backend/executor/nodeIndexscan.c | 11 +-
src/backend/executor/nodeLimit.c | 3 +-
src/backend/executor/nodeLockRows.c | 3 +-
src/backend/executor/nodeMaterial.c | 7 +-
src/backend/executor/nodeMemoize.c | 7 +-
src/backend/executor/nodeMergeAppend.c | 47 ++-
src/backend/executor/nodeMergejoin.c | 10 +-
src/backend/executor/nodeModifyTable.c | 12 +-
src/backend/executor/nodeNestloop.c | 10 +-
src/backend/executor/nodeProjectSet.c | 7 +-
src/backend/executor/nodeRecursiveunion.c | 10 +-
src/backend/executor/nodeResult.c | 7 +-
src/backend/executor/nodeSamplescan.c | 2 +
src/backend/executor/nodeSeqscan.c | 2 +
src/backend/executor/nodeSetOp.c | 4 +-
src/backend/executor/nodeSort.c | 7 +-
src/backend/executor/nodeSubqueryscan.c | 7 +-
src/backend/executor/nodeTidrangescan.c | 2 +
src/backend/executor/nodeTidscan.c | 2 +
src/backend/executor/nodeUnique.c | 4 +-
src/backend/executor/nodeWindowAgg.c | 6 +-
src/backend/executor/spi.c | 49 ++-
src/backend/storage/lmgr/lmgr.c | 45 +++
src/backend/tcop/postgres.c | 13 +-
src/backend/tcop/pquery.c | 340 +++++++++---------
src/backend/utils/cache/lsyscache.c | 21 ++
src/backend/utils/cache/plancache.c | 149 +++-----
src/backend/utils/mmgr/portalmem.c | 9 +
src/include/commands/explain.h | 7 +-
src/include/executor/execdesc.h | 5 +
src/include/executor/executor.h | 13 +
src/include/nodes/execnodes.h | 6 +
src/include/storage/lmgr.h | 1 +
src/include/utils/lsyscache.h | 1 +
src/include/utils/plancache.h | 14 +
src/include/utils/portal.h | 4 +
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 63 +++-
.../expected/cached-plan-replan.out | 156 ++++++++
.../specs/cached-plan-replan.spec | 61 ++++
70 files changed, 1231 insertions(+), 589 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-replan.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-replan.spec
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index c5cada55fb..1edd4c3f17 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2658,7 +2658,11 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
/* Get info about foreign table. */
rtindex = node->resultRelInfo->ri_RangeTableIndex;
if (fsplan->scan.scanrelid == 0)
+ {
dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (!ExecPlanStillValid(estate))
+ return;
+ }
else
dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9e4b2437a5..8244194681 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,8 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL,
+ pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -569,6 +570,7 @@ BeginCopyTo(ParseState *pstate,
* ExecutorStart computes a result tupdesc for us
*/
ExecutorStart(cstate->queryDesc, 0);
+ Assert(cstate->queryDesc->plan_valid);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index e91920ca14..18b07c0200 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..b1ea45ef2c 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -393,6 +393,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
else
{
PlannedStmt *plan;
+ QueryDesc *queryDesc;
instr_time planstart,
planduration;
BufferUsage bufusage_start,
@@ -415,12 +416,90 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
+ queryDesc = ExplainQueryDesc(plan, NULL, queryString, into, es,
+ params, queryEnv);
+ Assert(queryDesc);
+
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(queryDesc, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
+/*
+ * ExplainQueryDesc
+ * Set up QueryDesc for EXPLAINing a given plan
+ *
+ * This returns NULL if cplan is found to be no longer valid.
+ */
+QueryDesc *
+ExplainQueryDesc(PlannedStmt *stmt, CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv)
+{
+ QueryDesc *queryDesc;
+ DestReceiver *dest;
+ int eflags;
+ int instrument_option = 0;
+
+ /*
+ * Normally we discard the query's output, but if explaining CREATE TABLE
+ * AS, we'd better use the appropriate tuple receiver.
+ */
+ if (into)
+ dest = CreateIntoRelDestReceiver(into);
+ else
+ dest = None_Receiver;
+
+ if (es->analyze && es->timing)
+ instrument_option |= INSTRUMENT_TIMER;
+ else if (es->analyze)
+ instrument_option |= INSTRUMENT_ROWS;
+
+ if (es->buffers)
+ instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
+
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+
+ /* Create a QueryDesc for the query */
+ queryDesc = CreateQueryDesc(stmt, cplan, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv, instrument_option);
+
+ /* Select execution options */
+ if (es->analyze)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_EXPLAIN_ONLY;
+ if (es->generic)
+ eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
+ if (into)
+ eflags |= GetIntoRelEFlags(into);
+
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated as we're doing that.
+ */
+ ExecutorStart(queryDesc, eflags);
+ if (!queryDesc->plan_valid)
+ {
+ /* Clean up. */
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ return NULL;
+ }
+
+ return queryDesc;
+}
+
/*
* ExplainOneUtility -
* print out the execution plan for one utility statement
@@ -524,29 +603,16 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
{
- DestReceiver *dest;
- QueryDesc *queryDesc;
instr_time starttime;
double totaltime = 0;
- int eflags;
- int instrument_option = 0;
-
- Assert(plannedstmt->commandType != CMD_UTILITY);
- if (es->analyze && es->timing)
- instrument_option |= INSTRUMENT_TIMER;
- else if (es->analyze)
- instrument_option |= INSTRUMENT_ROWS;
-
- if (es->buffers)
- instrument_option |= INSTRUMENT_BUFFERS;
- if (es->wal)
- instrument_option |= INSTRUMENT_WAL;
+ Assert(queryDesc->plannedstmt->commandType != CMD_UTILITY);
/*
* We always collect timing for the entire statement, even when node-level
@@ -555,40 +621,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
- /*
- * Normally we discard the query's output, but if explaining CREATE TABLE
- * AS, we'd better use the appropriate tuple receiver.
- */
- if (into)
- dest = CreateIntoRelDestReceiver(into);
- else
- dest = None_Receiver;
-
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
-
- /* Select execution options */
- if (es->analyze)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_EXPLAIN_ONLY;
- if (es->generic)
- eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
- if (into)
- eflags |= GetIntoRelEFlags(into);
-
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
-
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -4865,6 +4897,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 9a2ee1c600..72d60d9c4f 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -797,11 +797,13 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
ExecutorStart(qdesc, 0);
+ Assert(qdesc->plan_valid);
ExecutorRun(qdesc, ForwardScanDirection, 0, true);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ac2e74fa3f..a64880b719 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,12 +408,13 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, 0);
+ Assert(queryDesc->plan_valid);
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 73ed7aa2f0..4abbec054b 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -146,6 +146,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
PortalStart(portal, params, 0, GetActiveSnapshot());
Assert(portal->strategy == PORTAL_ONE_SELECT);
+ Assert(portal->plan_valid);
/*
* We're done; the query won't actually be run until PerformPortalFetch is
@@ -249,6 +250,17 @@ PerformPortalClose(const char *name)
PortalDrop(portal, false);
}
+/*
+ * Release a portal's QueryDesc.
+ */
+void
+PortalQueryFinish(QueryDesc *queryDesc)
+{
+ ExecutorFinish(queryDesc);
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+}
+
/*
* PortalCleanup
*
@@ -295,9 +307,7 @@ PortalCleanup(Portal portal)
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
- FreeQueryDesc(queryDesc);
+ PortalQueryFinish(queryDesc);
CurrentResourceOwner = saveResourceOwner;
}
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..c9070ed97f 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -183,6 +183,7 @@ ExecuteQuery(ParseState *pstate,
paramLI = EvaluateParams(pstate, entry, stmt->params, estate);
}
+replan:
/* Create a new portal to run the query in */
portal = CreateNewPortal();
/* Don't display the portal in pg_cursors, it is for internal use only */
@@ -251,10 +252,19 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal contains a cached plan, it
+ * must be recreated if portal->plan_valid is false which tells that the
+ * cached plan was found to have been invalidated when initializing one of
+ * the plan trees contained in it.
*/
PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
(void) PortalRun(portal, count, false, true, dest, dest, qc);
PortalDrop(portal, false);
@@ -574,7 +584,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -618,6 +628,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI,
CurrentResourceOwner, queryEnv);
@@ -639,8 +650,21 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ {
+ QueryDesc *queryDesc;
+
+ queryDesc = ExplainQueryDesc(pstmt, cplan, queryString,
+ into, es, paramLI, queryEnv);
+ if (queryDesc == NULL)
+ {
+ ExplainResetOutput(es);
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ goto replan;
+ }
+ ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
+ }
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..110062e0e8 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,38 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Normally, the executor does not lock non-index relations appearing in a given
+plan tree when initializing it for execution if the plan tree is freshly
+created, that is, not derived from a CachedPlan. The reason for that is that
+the locks must already have been taken during parsing, rewriting, and planning
+of the query in that case. If the plan tree is a cached one, there may still
+be unlocked relations present in the plan tree, because GetCachedPlan() only
+locks the relations that would be present in the query's range table before
+planning occurs, but not relations that would have been added to the range
+table during planning. This means that inheritance child tables, which are
+added to the query's range table during planning, if they are present in a
+cached plan tree would not have been locked.
+
+GetCachedPlan() punts on locking child tables because not all may actually be
+scanned during a given execution of the plan if the child tables are partitions
+which may get pruned away due to executor-initialization-time pruning. So the
+locking of child tables must wait till execution-initialization-time, which
+occurs during ExecInitNode() on the plan nodes containing the child tables.
+
+So, there's a time window during which a cached plan tree could go stale
+if it contains child tables, because they could get changed in other backends
+before ExecInitNode() gets a lock on them. This means the executor now must
+check the validity of the plan tree every time it takes a lock on a child
+table contained in the tree (after executor-initialization-pruning, if any,
+has been performed), which it does by looking at CachedPlan.is_valid of the
+CachedPlan passed to it. If the plan tree is indeed stale (is_valid=false),
+the executor must give up continuing to intiatialize it any further and
+return to the caller letting it know that the execution must be retried with
+a new plan tree.
+
Query Processing Control Flow
-----------------------------
@@ -300,6 +332,11 @@ This is a sketch of control flow for full query processing:
creates per-tuple context
ExecInitExpr
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale during one of the recursive calls of ExecInitNode() shown above after
+taking a lock on a child table, the control is returned to the caller, which
+must redo the steps from CreateQueryDesc with a new plan tree.
+
ExecutorRun
ExecProcNode --- recursively called in per-query context
ExecEvalExpr --- called in per-tuple context
@@ -310,7 +347,7 @@ This is a sketch of control flow for full query processing:
AfterTriggerEndQuery
ExecutorEnd
- ExecEndNode --- recursively releases resources
+ ExecEndNode --- releases resources
FreeExecutorState
frees per-query context and child contexts
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4c5a7bbf62..a2f6ac9d1c 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -620,6 +620,17 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
RTEPermissionInfo *perminfo = lfirst_node(RTEPermissionInfo, l);
Assert(OidIsValid(perminfo->relid));
+
+ /*
+ * Relations whose permissions need to be checked must already have
+ * been locked by the parser or by GetCachedPlan() if a cached plan is
+ * being executed.
+ *
+ * XXX shouldn't we skip calling ExecCheckPermissions from InitPlan
+ * in a parallel worker?
+ */
+ Assert(CheckRelLockedByMe(perminfo->relid, AccessShareLock, true) ||
+ IsParallelWorker());
result = ExecCheckOneRelPerms(perminfo);
if (!result)
{
@@ -829,6 +840,23 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ *
+ * Normally, the plan tree given in queryDesc->plannedstmt is known to be
+ * valid in a race-free manner, that is, all relations contained in
+ * plannedstmt->relationOids would have already been locked. That is not the
+ * case however if the plannedstmt comes from a CachedPlan, one given in
+ * queryDesc->cplan. That's because GetCachedPlan() only locks the tables
+ * that are mentioned in the original query but not the child tables, which
+ * would have been added to the plan by the planner. In that case, locks on
+ * child tables will be taken when initializing their Scan nodes in
+ * ExecInitNode() to be done here. If the CachedPlan gets invalidated as
+ * those locks are taken, plan tree initialization is suspended at the point
+ * where the invalidation is first detected, queryDesc->planstate will be set
+ * to NULL, and queryDesc->plan_valid to false. Callers must retry the
+ * execution after creating a new CachedPlan in that case, after properly
+ * releasing the resources of this QueryDesc, which includes calling
+ * ExecutorFinish() and ExecutorEnd() on the EState contained therein.
* ----------------------------------------------------------------
*/
static void
@@ -839,7 +867,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
- PlanState *planstate;
+ PlanState *planstate = NULL;
TupleDesc tupType;
ListCell *l;
int i;
@@ -850,10 +878,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
/*
- * initialize the node's execution state
+ * Set up range table in EState.
*/
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+ estate->es_cachedplan = queryDesc->cplan;
estate->es_plannedstmt = plannedstmt;
/*
@@ -886,6 +915,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (!ExecPlanStillValid(estate))
+ goto plan_init_suspended;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -953,6 +984,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
sp_eflags |= EXEC_FLAG_REWIND;
subplanstate = ExecInitNode(subplan, estate, sp_eflags);
+ if (!ExecPlanStillValid(estate))
+ {
+ Assert(subplanstate == NULL);
+ goto plan_init_suspended;
+ }
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
@@ -966,6 +1002,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ {
+ Assert(planstate == NULL);
+ goto plan_init_suspended;
+ }
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -1008,7 +1049,19 @@ InitPlan(QueryDesc *queryDesc, int eflags)
}
queryDesc->tupDesc = tupType;
+ Assert(planstate != NULL);
queryDesc->planstate = planstate;
+ queryDesc->plan_valid = true;
+ return;
+
+plan_init_suspended:
+ /*
+ * Plan initialization failed. Mark QueryDesc as such. ExecEndPlan()
+ * will clean up initialized plan nodes from estate->es_inited_plannodes.
+ */
+ Assert(planstate == NULL);
+ queryDesc->planstate = NULL;
+ queryDesc->plan_valid = false;
}
/*
@@ -1426,7 +1479,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked by the planner or ExecLockAppendNonLeafRelations().
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
@@ -1504,18 +1557,15 @@ ExecEndPlan(PlanState *planstate, EState *estate)
ListCell *l;
/*
- * shut down the node-type-specific query processing
+ * Shut down the node-type-specific query processing for all nodes that
+ * were initialized during InitPlan(), both in the main plan tree and those
+ * in subplans (es_subplanstates), if any.
*/
- ExecEndNode(planstate);
-
- /*
- * for subplans too
- */
- foreach(l, estate->es_subplanstates)
+ foreach(l, estate->es_inited_plannodes)
{
- PlanState *subplanstate = (PlanState *) lfirst(l);
+ PlanState *pstate = (PlanState *) lfirst(l);
- ExecEndNode(subplanstate);
+ ExecEndNode(pstate);
}
/*
@@ -2858,7 +2908,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2945,6 +2996,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
PlanState *subplanstate;
subplanstate = ExecInitNode(subplan, rcestate, 0);
+
+ /*
+ * At this point, we had better not received any new invalidation
+ * messages that would have caused the plan tree to go stale.
+ */
+ Assert(ExecPlanStillValid(rcestate) && subplanstate);
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
}
@@ -2988,6 +3045,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
*/
epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
+ /*
+ * At this point, we had better not received any new invalidation messages
+ * that would have caused the plan tree to go stale.
+ */
+ Assert(ExecPlanStillValid(rcestate) && epqstate->recheckplanstate);
+
MemoryContextSwitchTo(oldcontext);
}
@@ -3010,6 +3073,10 @@ EvalPlanQualEnd(EPQState *epqstate)
MemoryContext oldcontext;
ListCell *l;
+ /* Nothing to do if EvalPlanQualInit() wasn't done to begin with. */
+ if (epqstate->parentestate == NULL)
+ return;
+
rtsize = epqstate->parentestate->es_range_table_size;
/*
@@ -3030,13 +3097,16 @@ EvalPlanQualEnd(EPQState *epqstate)
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
- ExecEndNode(epqstate->recheckplanstate);
-
- foreach(l, estate->es_subplanstates)
+ /*
+ * Shut down the node-type-specific query processing for all nodes that
+ * were initialized during EvalPlanQualStart(), both in the main plan tree
+ * and those in subplans (es_subplanstates), if any.
+ */
+ foreach(l, estate->es_inited_plannodes)
{
- PlanState *subplanstate = (PlanState *) lfirst(l);
+ PlanState *planstate = (PlanState *) lfirst(l);
- ExecEndNode(subplanstate);
+ ExecEndNode(planstate);
}
/* throw away the per-estate tuple table, some node may have used it */
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index cc2b8ccab7..42df7b6428 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1248,8 +1248,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. Note that no CachedPlan is available
+ * here even if the leader may have gotten the plan tree from one. That's
+ * fine though, because the leader would have taken the locks necessary
+ * for the plan tree that we have here to be fully valid. That is true
+ * despite the fact that we will be taking our own copies of those locks
+ * in ExecGetRangeTableRelation(), because none of them would be the locks
+ * that are not already taken by the leader.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
@@ -1431,6 +1440,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
ExecutorStart(queryDesc, fpes->eflags);
+ Assert(queryDesc->plan_valid);
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index eb8a87fd63..cf73d28baa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -513,6 +513,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
oldcxt = MemoryContextSwitchTo(proute->memcxt);
+ /*
+ * Note that while we normally check ExecPlanStillValid(estate) after each
+ * lock taken during execution initialization, it is fine not do so for
+ * partitions opened here, for tuple routing. Locks taken here can't
+ * possibly invalidate the plan given that the plan doesn't contain any
+ * info about those partitions.
+ */
partrel = table_open(partOid, RowExclusiveLock);
leaf_part_rri = makeNode(ResultRelInfo);
@@ -1111,6 +1118,9 @@ ExecInitPartitionDispatchInfo(EState *estate,
* Only sub-partitioned tables need to be locked here. The root
* partitioned table will already have been locked as it's referenced in
* the query's rtable.
+ *
+ * See the comment in ExecInitPartitionInfo() about taking locks and
+ * not checking ExecPlanStillValid(estate) here.
*/
if (partoid != RelationGetRelid(proute->partition_root))
rel = table_open(partoid, RowExclusiveLock);
@@ -1801,6 +1811,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/* Create the working data structure for pruning */
prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Perform an initial partition prune pass, if required.
@@ -1927,6 +1939,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d4..f3bb1d4591 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -135,7 +135,17 @@ static bool ExecShutdownNode_walker(PlanState *node, void *context);
* 'estate' is the shared execution state for the plan tree
* 'eflags' is a bitwise OR of flag bits described in executor.h
*
- * Returns a PlanState node corresponding to the given Plan node.
+ * Returns a PlanState node corresponding to the given Plan node or NULL.
+ *
+ * NULL may be returned either if the input node is NULL or if the plan
+ * tree that the node is a part of is found to have been invalidated when
+ * taking a lock on the relation mentioned in the node or in a child
+ * node. The latter case arises if the plan tree contains inheritance/
+ * partition child tables and is from a CachedPlan.
+ *
+ * Also, all non-NULL PlanState nodes are added to
+ * estate->es_inited_plannodes for ExecEndPlan() to iterate over to close
+ * each one using ExecEndNode().
* ------------------------------------------------------------------------
*/
PlanState *
@@ -388,6 +398,13 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
break;
}
+ if (!ExecPlanStillValid(estate))
+ {
+ Assert(result == NULL);
+ return NULL;
+ }
+
+ Assert(result != NULL);
ExecSetExecProcNode(result, result->ExecProcNode);
/*
@@ -411,6 +428,13 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
result->instrument = InstrAlloc(1, estate->es_instrument,
result->async_capable);
+ /*
+ * Remember valid PlanState nodes in EState for the processing in
+ * ExecEndPlan().
+ */
+ estate->es_inited_plannodes = lappend(estate->es_inited_plannodes,
+ result);
+
return result;
}
@@ -545,29 +569,21 @@ MultiExecProcNode(PlanState *node)
/* ----------------------------------------------------------------
* ExecEndNode
*
- * Recursively cleans up all the nodes in the plan rooted
- * at 'node'.
+ * Cleans up node
*
- * After this operation, the query plan will not be able to be
- * processed any further. This should be called only after
+ * Child nodes, if any, would have been closed by the caller, so the
+ * ExecEnd* routine for a given node type is only responsible for
+ * cleaning up the resources local to that node.
+ *
+ * After this operation, the query plan containing this node will not be
+ * able to be processed any further. This should be called only after
* the query plan has been fully executed.
* ----------------------------------------------------------------
*/
void
ExecEndNode(PlanState *node)
{
- /*
- * do nothing when we get to the end of a leaf on tree.
- */
- if (node == NULL)
- return;
-
- /*
- * Make sure there's enough stack available. Need to check here, in
- * addition to ExecProcNode() (via ExecProcNodeFirst()), because it's not
- * guaranteed that ExecProcNode() is reached for all nodes.
- */
- check_stack_depth();
+ Assert(node != NULL);
if (node->chgParam != NULL)
{
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c06b228858..af92d2b3c3 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -804,7 +804,25 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (IsParallelWorker() ||
+ (estate->es_cachedplan != NULL && !rte->inFromCl))
+ {
+ /*
+ * Take a lock if we are a parallel worker or if this is a child
+ * table referenced in a cached plan.
+ *
+ * Parallel workers need to have their own local lock on the
+ * relation. This ensures sane behavior in case the parent process
+ * exits before we do.
+ *
+ * When executing a cached plan, child tables must be locked
+ * here, because plancache.c (GetCachedPlan()) would only have
+ * locked tables mentioned in the query, that is, tables whose
+ * RTEs' inFromCl is true.
+ */
+ rel = table_open(rte->relid, rte->rellockmode);
+ }
+ else
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -817,15 +835,6 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rellockmode == AccessShareLock ||
CheckRelationLockedByMe(rel, rte->rellockmode, false));
}
- else
- {
- /*
- * If we are a parallel worker, we need to obtain our own local
- * lock on the relation. This ensures sane behavior in case the
- * parent process exits before we do.
- */
- rel = table_open(rte->relid, rte->rellockmode);
- }
estate->es_relations[rti - 1] = rel;
}
@@ -833,6 +842,38 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
return rel;
}
+/*
+ * ExecLockAppendNonLeafRelations
+ * Lock non-leaf relations whose children are scanned by a given
+ * Append/MergeAppend node
+ */
+void
+ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids)
+{
+ ListCell *l;
+
+ /* This should get called only when executing cached plans. */
+ Assert(estate->es_cachedplan != NULL);
+ foreach(l, allpartrelids)
+ {
+ Bitmapset *partrelids = lfirst_node(Bitmapset, l);
+ int i;
+
+ /*
+ * Note that we don't lock the first member (i=0) of each bitmapset
+ * because it stands for the root parent mentioned in the query that
+ * should always have been locked before entering the executor.
+ */
+ i = 0;
+ while ((i = bms_next_member(partrelids, i)) > 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(i, estate);
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+}
+
/*
* ExecInitResultRelation
* Open relation given by the passed-in RT index and fill its
@@ -848,6 +889,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (!ExecPlanStillValid(estate))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f55424eb5a..c88f72bc4e 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -838,6 +838,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL, /* fmgr_sql() doesn't use CachedPlans */
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -863,6 +864,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
eflags = 0; /* default run-to-completion flags */
ExecutorStart(es->qd, eflags);
+ Assert(es->qd->plan_valid);
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 468db94fe5..54f742820b 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3304,6 +3304,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize source tuple type.
@@ -4304,7 +4306,6 @@ GetAggInitVal(Datum textInitVal, Oid transtype)
void
ExecEndAgg(AggState *node)
{
- PlanState *outerPlan;
int transno;
int numGroupingSets = Max(node->maxsets, 1);
int setno;
@@ -4366,9 +4367,6 @@ ExecEndAgg(AggState *node)
/* clean up tuple table */
ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
}
void
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..a6dadb7d07 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -133,6 +133,27 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_syncdone = false;
appendstate->as_begun = false;
+ /*
+ * Must take locks on child tables if running a cached plan, because
+ * GetCachedPlan() would've only locked the root parent named in the
+ * query.
+ *
+ * First lock non-leaf partitions before doing pruning if any. Even when
+ * no pruning is to be done, non-leaf partitions still must be locked
+ * explicitly like this, because they're not referenced elsewhere in
+ * the plan tree. XXX - OTOH, non-leaf partitions mentioned in
+ * part_prune_info, if any, would be opened by ExecInitPartitionPruning()
+ * using ExecGetRangeTableRelation() which locks child tables, redundantly
+ * in this case.
+ */
+ if (estate->es_cachedplan)
+ {
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
+ }
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
@@ -147,6 +168,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
list_length(node->appendplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -221,6 +244,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
appendstate->as_first_partial_plan = firstvalid;
@@ -376,30 +401,15 @@ ExecAppend(PlanState *pstate)
/* ----------------------------------------------------------------
* ExecEndAppend
- *
- * Shuts down the subscans of the append node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndAppend(AppendState *node)
{
- PlanState **appendplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- appendplans = node->appendplans;
- nplans = node->as_nplans;
-
- /*
- * shut down each of the subscans
- */
- for (i = 0; i < nplans; i++)
- ExecEndNode(appendplans[i]);
+ /*
+ * Nothing to do as subscans of the append node would be cleaned up by
+ * ExecEndPlan().
+ */
}
void
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4c5eb2b23b..187aea4bb8 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -88,8 +88,9 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
foreach(l, node->bitmapplans)
{
initNode = (Plan *) lfirst(l);
- bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
- i++;
+ bitmapplanstates[i++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
/*
@@ -168,33 +169,15 @@ MultiExecBitmapAnd(BitmapAndState *node)
/* ----------------------------------------------------------------
* ExecEndBitmapAnd
- *
- * Shuts down the subscans of the BitmapAnd node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndBitmapAnd(BitmapAndState *node)
{
- PlanState **bitmapplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- bitmapplans = node->bitmapplans;
- nplans = node->nplans;
-
- /*
- * shut down each of the subscans (that we've initialized)
- */
- for (i = 0; i < nplans; i++)
- {
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
- }
+ /*
+ * Nothing to do as any subscans that would have been initialized would
+ * be cleaned up by ExecEndPlan().
+ */
}
void
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index f35df0b8bf..ee1008519b 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -667,11 +667,6 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
ExecClearTuple(node->ss.ss_ScanTupleSlot);
- /*
- * close down subplans
- */
- ExecEndNode(outerPlanState(node));
-
/*
* release bitmaps and buffers if any
*/
@@ -763,11 +758,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 83ec9ede89..99015812a1 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -211,6 +211,7 @@ BitmapIndexScanState *
ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
{
BitmapIndexScanState *indexstate;
+ Relation indexRelation;
LOCKMODE lockmode;
/* check for unsupported flags */
@@ -262,7 +263,13 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
+ indexRelation = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ {
+ index_close(indexRelation, lockmode);
+ return NULL;
+ }
+ indexstate->biss_RelationDesc = indexRelation;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 0bf8af9652..3f51918fe1 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -89,8 +89,9 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
foreach(l, node->bitmapplans)
{
initNode = (Plan *) lfirst(l);
- bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
- i++;
+ bitmapplanstates[i++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
/*
@@ -186,33 +187,15 @@ MultiExecBitmapOr(BitmapOrState *node)
/* ----------------------------------------------------------------
* ExecEndBitmapOr
- *
- * Shuts down the subscans of the BitmapOr node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndBitmapOr(BitmapOrState *node)
{
- PlanState **bitmapplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- bitmapplans = node->bitmapplans;
- nplans = node->nplans;
-
- /*
- * shut down each of the subscans (that we've initialized)
- */
- for (i = 0; i < nplans; i++)
- {
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
- }
+ /*
+ * Nothing to do as any subscans that would have been initialized would
+ * be cleaned up by ExecEndPlan().
+ */
}
void
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index bd42c65b29..91239cc500 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -61,6 +61,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
css->ss.ss_currentRelation = scan_rel;
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index c2139acca0..207165f44f 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -173,6 +173,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -264,6 +266,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Tell the FDW to initialize the scan.
@@ -309,10 +313,6 @@ ExecEndForeignScan(ForeignScanState *node)
else
node->fdwroutine->EndForeignScan(node);
- /* Shut down any outer plan. */
- if (outerPlanState(node))
- ExecEndNode(outerPlanState(node));
-
/* Free the exprcontext */
ExecFreeExprContext(&node->ss.ps);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 307fc10eea..400c8b42ed 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -89,6 +89,9 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
@@ -248,7 +251,6 @@ ExecGather(PlanState *pstate)
void
ExecEndGather(GatherState *node)
{
- ExecEndNode(outerPlanState(node)); /* let children clean up first */
ExecShutdownGather(node);
ExecFreeExprContext(&node->ps);
if (node->ps.ps_ResultTupleSlot)
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 9d5e1a46e9..9077c4bc55 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -108,6 +108,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Leader may access ExecProcNode result directly (if
@@ -288,7 +290,6 @@ ExecGatherMerge(PlanState *pstate)
void
ExecEndGatherMerge(GatherMergeState *node)
{
- ExecEndNode(outerPlanState(node)); /* let children clean up first */
ExecShutdownGatherMerge(node);
ExecFreeExprContext(&node->ps);
if (node->ps.ps_ResultTupleSlot)
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 25a1618952..976e739ab7 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -185,6 +185,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
@@ -226,15 +228,10 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
void
ExecEndGroup(GroupState *node)
{
- PlanState *outerPlan;
-
ExecFreeExprContext(&node->ss.ps);
/* clean up tuple table */
ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
}
void
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 8b5c35b82b..fc7a6b2ccc 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -386,6 +386,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize our result slot and type. No need to build projection
@@ -413,18 +415,10 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
void
ExecEndHash(HashState *node)
{
- PlanState *outerPlan;
-
/*
* free exprcontext
*/
ExecFreeExprContext(&node->ps);
-
- /*
- * shut down the subplan
- */
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
}
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 980746128b..4c4b39ce2d 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -752,8 +752,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
@@ -878,12 +882,6 @@ ExecEndHashJoin(HashJoinState *node)
ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
ExecClearTuple(node->hj_OuterTupleSlot);
ExecClearTuple(node->hj_HashTupleSlot);
-
- /*
- * clean up subtrees
- */
- ExecEndNode(outerPlanState(node));
- ExecEndNode(innerPlanState(node));
}
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 7683e3341c..5b11afeb96 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1041,6 +1041,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
@@ -1101,11 +1103,6 @@ ExecEndIncrementalSort(IncrementalSortState *node)
node->prefixsort_state = NULL;
}
- /*
- * Shut down the subplan.
- */
- ExecEndNode(outerPlanState(node));
-
SO_printf("ExecEndIncrementalSort: sort node shutdown\n");
}
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0b43a9b969..ea8bef4b97 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -490,6 +490,7 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
{
IndexOnlyScanState *indexstate;
Relation currentRelation;
+ Relation indexRelation;
LOCKMODE lockmode;
TupleDesc tupDesc;
@@ -512,6 +513,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -564,7 +567,13 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->ioss_RelationDesc = index_open(node->indexid, lockmode);
+ indexRelation = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ {
+ index_close(indexRelation, lockmode);
+ return NULL;
+ }
+ indexstate->ioss_RelationDesc = indexRelation;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 4540c7781d..956e9e5543 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -904,6 +904,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
{
IndexScanState *indexstate;
Relation currentRelation;
+ Relation indexRelation;
LOCKMODE lockmode;
/*
@@ -925,6 +926,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -969,7 +972,13 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+ indexRelation = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ {
+ index_close(indexRelation, lockmode);
+ return NULL;
+ }
+ indexstate->iss_RelationDesc = indexRelation;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 425fbfc405..1cc884bc65 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -476,6 +476,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize child expressions
@@ -535,7 +537,6 @@ void
ExecEndLimit(LimitState *node)
{
ExecFreeExprContext(&node->ps);
- ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index e459971d32..77731c0c8c 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -322,6 +322,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
@@ -386,7 +388,6 @@ ExecEndLockRows(LockRowsState *node)
{
/* We may have shut down EPQ already, but no harm in another call */
EvalPlanQualEnd(&node->lr_epqstate);
- ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 09632678b0..a38b9805a5 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result type and slot. No need to initialize projection info
@@ -250,11 +252,6 @@ ExecEndMaterial(MaterialState *node)
if (node->tuplestorestate != NULL)
tuplestore_end(node->tuplestorestate);
node->tuplestorestate = NULL;
-
- /*
- * shut down the subplan
- */
- ExecEndNode(outerPlanState(node));
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 4f04269e26..a8997ba7da 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -938,6 +938,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize return slot and type. No need to initialize projection info
@@ -1099,11 +1101,6 @@ ExecEndMemoize(MemoizeState *node)
* free exprcontext
*/
ExecFreeExprContext(&node->ss.ps);
-
- /*
- * shut down the subplan
- */
- ExecEndNode(outerPlanState(node));
}
void
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 21b5726e6e..8718f20825 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -81,6 +81,27 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.state = estate;
mergestate->ps.ExecProcNode = ExecMergeAppend;
+ /*
+ * Must take locks on child tables if running a cached plan, because
+ * GetCachedPlan() would've only locked the root parent named in the
+ * query.
+ *
+ * First lock non-leaf partitions before doing pruning if any. Even when
+ * no pruning is to be done, non-leaf partitions still must be locked
+ * explicitly like this, because they're not referenced elsewhere in
+ * the plan tree. XXX - OTOH, non-leaf partitions mentioned in
+ * part_prune_info, if any, would be opened by ExecInitPartitionPruning()
+ * using ExecGetRangeTableRelation() which locks child tables, redundantly
+ * in this case.
+ */
+ if (estate->es_cachedplan)
+ {
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
+ }
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
@@ -95,6 +116,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
list_length(node->mergeplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -151,6 +174,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
mergestate->ps.ps_ProjInfo = NULL;
@@ -310,30 +335,14 @@ heap_compare_slots(Datum a, Datum b, void *arg)
/* ----------------------------------------------------------------
* ExecEndMergeAppend
- *
- * Shuts down the subscans of the MergeAppend node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndMergeAppend(MergeAppendState *node)
{
- PlanState **mergeplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- mergeplans = node->mergeplans;
- nplans = node->ms_nplans;
-
- /*
- * shut down each of the subscans
- */
- for (i = 0; i < nplans; i++)
- ExecEndNode(mergeplans[i]);
+ /*
+ * Nothing to do as subscans would be cleaned up by ExecEndPlan().
+ */
}
void
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 00f96d045e..c6644c6816 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1490,11 +1490,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
@@ -1654,12 +1658,6 @@ ExecEndMergeJoin(MergeJoinState *node)
ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
ExecClearTuple(node->mj_MarkedTupleSlot);
- /*
- * shut down the subplans
- */
- ExecEndNode(innerPlanState(node));
- ExecEndNode(outerPlanState(node));
-
MJ1_printf("ExecEndMergeJoin: %s\n",
"node processing ended");
}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 2a5fec8d01..0c3aeb1154 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3984,6 +3984,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
node->epqParam, node->resultRelations);
@@ -4011,6 +4014,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
if (resultRelInfo != mtstate->rootResultRelInfo)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* For child result relations, store the root result relation
@@ -4038,6 +4043,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Do additional per-result-relation initialization.
@@ -4460,11 +4467,6 @@ ExecEndModifyTable(ModifyTableState *node)
* Terminate EPQ execution if active
*/
EvalPlanQualEnd(&node->mt_epqstate);
-
- /*
- * shut down subplan
- */
- ExecEndNode(outerPlanState(node));
}
void
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b3d52e69ec..71a1f8101c 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -295,11 +295,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result slot, type and projection.
@@ -374,12 +378,6 @@ ExecEndNestLoop(NestLoopState *node)
*/
ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- /*
- * close down subplans
- */
- ExecEndNode(outerPlanState(node));
- ExecEndNode(innerPlanState(node));
-
NL1_printf("ExecEndNestLoop: %s\n",
"node processing ended");
}
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index f6ff3dc44c..abcbd7e765 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -247,6 +247,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* we don't use inner plan
@@ -329,11 +331,6 @@ ExecEndProjectSet(ProjectSetState *node)
* clean out the tuple table
*/
ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
- /*
- * shut down subplans
- */
- ExecEndNode(outerPlanState(node));
}
void
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index e781003934..84a706458a 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
@@ -280,12 +284,6 @@ ExecEndRecursiveUnion(RecursiveUnionState *node)
MemoryContextDelete(node->tempContext);
if (node->tableContext)
MemoryContextDelete(node->tableContext);
-
- /*
- * close down subplans
- */
- ExecEndNode(outerPlanState(node));
- ExecEndNode(innerPlanState(node));
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 4219712d30..330ca68d12 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -208,6 +208,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* we don't use inner plan
@@ -249,11 +251,6 @@ ExecEndResult(ResultState *node)
* clean out the tuple table
*/
ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
- /*
- * shut down subplans
- */
- ExecEndNode(outerPlanState(node));
}
void
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index d7e22b1dbb..22357e7a0e 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -125,6 +125,8 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 4da0f28f7b..b0b34cd14e 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,8 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 4bc2406b89..912cf7b37f 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
@@ -589,8 +591,6 @@ ExecEndSetOp(SetOpState *node)
if (node->tableContext)
MemoryContextDelete(node->tableContext);
ExecFreeExprContext(&node->ps);
-
- ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index c6c72c6e67..1ba53373c2 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
@@ -317,11 +319,6 @@ ExecEndSort(SortState *node)
tuplesort_end((Tuplesortstate *) node->tuplesortstate);
node->tuplesortstate = NULL;
- /*
- * shut down the subplan
- */
- ExecEndNode(outerPlanState(node));
-
SO1_printf("ExecEndSort: %s\n",
"sort node shutdown");
}
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 42471bfc04..12014250ae 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
@@ -178,11 +180,6 @@ ExecEndSubqueryScan(SubqueryScanState *node)
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
- /*
- * close down subquery
- */
- ExecEndNode(node->subplan);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 2124c55ef5..613b377c7c 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -386,6 +386,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 862bd0330b..1b0a2d8083 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -529,6 +529,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 45035d74fa..bd71033622 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -136,6 +136,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result slot and type. Unique nodes do no projections, so
@@ -172,8 +174,6 @@ ExecEndUnique(UniqueState *node)
ExecClearTuple(node->ps.ps_ResultTupleSlot);
ExecFreeExprContext(&node->ps);
-
- ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 310ac23e3a..483f23da18 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2458,6 +2458,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize source tuple type (which is also the tuple type that we'll
@@ -2681,7 +2683,6 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
void
ExecEndWindowAgg(WindowAggState *node)
{
- PlanState *outerPlan;
int i;
release_partition(node);
@@ -2713,9 +2714,6 @@ ExecEndWindowAgg(WindowAggState *node)
pfree(node->perfunc);
pfree(node->peragg);
-
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
}
/* -----------------
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 33975687b3..07b1f453e2 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -71,7 +71,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1623,6 +1623,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
_SPI_current->processed = 0;
_SPI_current->tuptable = NULL;
+replan:
/* Create the portal */
if (name == NULL || name[0] == '\0')
{
@@ -1766,7 +1767,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if portal->plan_valid is false which tells that the cached
+ * plan was found to have been invalidated when initializing one of the
+ * plan trees contained in it.
*/
PortalStart(portal, paramLI, 0, snapshot);
@@ -1775,6 +1779,12 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/* Pop the SPI stack */
_SPI_end_call(true);
@@ -2552,6 +2562,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2661,6 +2672,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2668,14 +2680,32 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+
+ ExecutorStart(qdesc, eflags);
+ if (!qdesc->plan_valid)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ Assert(cplan);
+ ReleaseCachedPlan(cplan, plan_owner);
+ goto replan;
+ }
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2850,10 +2880,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2897,14 +2926,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a672..c807e9cdcc 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -27,6 +27,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
@@ -364,6 +365,50 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
return false;
}
+/*
+ * CheckRelLockedByMe
+ *
+ * Returns true if current transaction holds a lock on the given relation of
+ * mode 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
+ */
+bool
+CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+ Oid dbId = get_rel_relisshared(relid) ? InvalidOid : MyDatabaseId;
+ LOCKTAG tag;
+
+ SET_LOCKTAG_RELATION(tag, dbId, relid);
+
+ if (LockHeldByMe(&tag, lockmode))
+ return true;
+
+ if (orstronger)
+ {
+ LOCKMODE slockmode;
+
+ for (slockmode = lockmode + 1;
+ slockmode <= MaxLockMode;
+ slockmode++)
+ {
+ if (LockHeldByMe(&tag, slockmode))
+ {
+#ifdef NOT_USED
+ /* Sometimes this might be useful for debugging purposes */
+ elog(WARNING, "lock mode %s substituted for %s on relation %s",
+ GetLockmodeName(tag.locktag_lockmethodid, slockmode),
+ GetLockmodeName(tag.locktag_lockmethodid, lockmode),
+ RelationGetRelationName(relation));
+#endif
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
/*
* LockHasWaitersRelation
*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 36cc99ec9c..160aef92f8 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1233,6 +1233,7 @@ exec_simple_query(const char *query_string)
* Start the portal. No parameters here.
*/
PortalStart(portal, NULL, 0, InvalidSnapshot);
+ Assert(portal->plan_valid);
/*
* Select the appropriate output format: text unless we are doing a
@@ -1737,6 +1738,7 @@ exec_bind_message(StringInfo input_message)
"commands ignored until end of transaction block"),
errdetail_abort()));
+replan:
/*
* Create the portal. Allow silent replacement of an existing portal only
* if the unnamed portal is specified.
@@ -2028,10 +2030,19 @@ exec_bind_message(StringInfo input_message)
PopActiveSnapshot();
/*
- * And we're ready to start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if portal->plan_valid is false which tells that the cached
+ * plan was found to have been invalidated when initializing one of the
+ * plan trees contained in it.
*/
PortalStart(portal, params, 0, InvalidSnapshot);
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/*
* Apply the result format requests to the portal.
*/
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5565f200c3..dab971ab0f 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -35,12 +36,6 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc);
static void FillPortalStore(Portal portal, bool isTopLevel);
static uint64 RunFromStore(Portal portal, ScanDirection direction, uint64 count,
DestReceiver *dest);
@@ -65,6 +60,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +73,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan, if plan is from one */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -116,86 +113,6 @@ FreeQueryDesc(QueryDesc *qdesc)
}
-/*
- * ProcessQuery
- * Execute a single plannable query within a PORTAL_MULTI_QUERY,
- * PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
- *
- * plan: the plan tree for the query
- * sourceText: the source text of the query
- * params: any parameters needed
- * dest: where to send results
- * qc: where to store the command completion status data.
- *
- * qc may be NULL if caller doesn't want a status string.
- *
- * Must be called in a memory context that will be reset or deleted on
- * error; otherwise the executor's memory usage will be leaked.
- */
-static void
-ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc)
-{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
-
- /*
- * Run the plan to completion.
- */
- ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
-
- /*
- * Build command completion status data, if caller wants one.
- */
- if (qc)
- {
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
- SetQueryCompletion(qc, CMDTAG_SELECT, queryDesc->estate->es_processed);
- break;
- case CMD_INSERT:
- SetQueryCompletion(qc, CMDTAG_INSERT, queryDesc->estate->es_processed);
- break;
- case CMD_UPDATE:
- SetQueryCompletion(qc, CMDTAG_UPDATE, queryDesc->estate->es_processed);
- break;
- case CMD_DELETE:
- SetQueryCompletion(qc, CMDTAG_DELETE, queryDesc->estate->es_processed);
- break;
- case CMD_MERGE:
- SetQueryCompletion(qc, CMDTAG_MERGE, queryDesc->estate->es_processed);
- break;
- default:
- SetQueryCompletion(qc, CMDTAG_UNKNOWN, queryDesc->estate->es_processed);
- break;
- }
- }
-
- /*
- * Now, we close down all the scans and free allocated resources.
- */
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
-
- FreeQueryDesc(queryDesc);
-}
-
/*
* ChoosePortalStrategy
* Select portal execution strategy given the intended statement list.
@@ -427,7 +344,8 @@ FetchStatementTargetList(Node *stmt)
* to be used for cursors).
*
* On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * tupdesc (if any) is known, unless portal->plan_valid is set to false, in
+ * which case, the caller must retry after generating a new CachedPlan.
*/
void
PortalStart(Portal portal, ParamListInfo params,
@@ -435,10 +353,9 @@ PortalStart(Portal portal, ParamListInfo params,
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
- int myeflags;
+ int myeflags = 0;
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_DEFINED);
@@ -448,15 +365,13 @@ PortalStart(Portal portal, ParamListInfo params,
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -472,6 +387,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -493,6 +410,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -501,30 +419,52 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti(). */
+ if (portal->strategy == PORTAL_ONE_RETURNING ||
+ portal->strategy == PORTAL_ONE_MOD_WITH)
+ portal->qdescs = list_make1(queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Call ExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated as we're doing that.
*/
ExecutorStart(queryDesc, myeflags);
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ PopActiveSnapshot();
+ portal->plan_valid = false;
+ goto early_exit;
+ }
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -532,33 +472,11 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -578,11 +496,87 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool first = true;
+
+ myeflags = eflags;
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /*
+ * Push the snapshot to be used by the executor.
+ */
+ if (!is_utility)
+ {
+ /*
+ * Must copy the snapshot for all statements
+ * except thec first as we'll need to update its
+ * command ID.
+ */
+ if (!first)
+ PushCopiedSnapshot(GetTransactionSnapshot());
+ else
+ PushActiveSnapshot(GetTransactionSnapshot());
+ }
+
+ /*
+ * From the 2nd statement onwards, update the command
+ * ID and the snapshot to match.
+ */
+ if (!first)
+ {
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+ }
+
+ first = false;
+
+ /*
+ * Create the QueryDesc object. DestReceiver will
+ * be set in PortalRunMulti().
+ */
+ queryDesc = CreateQueryDesc(plan, portal->cplan,
+ portal->sourceText,
+ !is_utility ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ if (is_utility)
+ continue;
+
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated as
+ * we're doing that.
+ */
+ ExecutorStart(queryDesc, myeflags);
+ PopActiveSnapshot();
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ portal->plan_valid = false;
+ goto early_exit;
+ }
+ }
+ }
+
portal->tupDesc = NULL;
+ portal->plan_valid = true;
break;
}
}
@@ -594,19 +588,18 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+early_exit:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
-
- portal->status = PORTAL_READY;
}
/*
@@ -1193,7 +1186,7 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1207,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = (QueryDesc *) lfirst(qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1233,33 +1227,26 @@ PortalRunMulti(Portal portal,
if (log_executor_stats)
ResetUsage();
- /*
- * Must always have a snapshot for plannable queries. First time
- * through, take a new snapshot; for subsequent queries in the
- * same portal, just update the snapshot's copy of the command
- * counter.
- */
+ /* Push the snapshot for plannable queries. */
if (!active_snapshot_set)
{
- Snapshot snapshot = GetTransactionSnapshot();
+ Snapshot snapshot = qdesc->snapshot;
- /* If told to, register the snapshot and save in portal */
+ /*
+ * If told to, register the snapshot and save in portal
+ *
+ * Note that the command ID of qdesc->snapshot for 2nd query
+ * onwards would have been updated in PortalStart() to account
+ * for CCI() done between queries, but it's OK that here we
+ * don't likewise update holdSnapshot's command ID.
+ */
if (setHoldSnapshot)
{
snapshot = RegisterSnapshot(snapshot);
portal->holdSnapshot = snapshot;
}
- /*
- * We can't have the holdSnapshot also be the active one,
- * because UpdateActiveSnapshotCommandId would complain. So
- * force an extra snapshot copy. Plain PushActiveSnapshot
- * would have copied the transaction snapshot anyway, so this
- * only adds a copy step when setHoldSnapshot is true. (It's
- * okay for the command ID of the active snapshot to diverge
- * from what holdSnapshot has.)
- */
- PushCopiedSnapshot(snapshot);
+ PushActiveSnapshot(snapshot);
/*
* As for PORTAL_ONE_SELECT portals, it does not seem
@@ -1268,26 +1255,39 @@ PortalRunMulti(Portal portal,
active_snapshot_set = true;
}
- else
- UpdateActiveSnapshotCommandId();
+ /*
+ * Run the plan to completion.
+ */
+ qdesc->dest = dest;
+ ExecutorRun(qdesc, ForwardScanDirection, 0, true);
+
+ /*
+ * Build command completion status data if needed.
+ */
if (pstmt->canSetTag)
{
- /* statement can set tag string */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- dest, qc);
- }
- else
- {
- /* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- altdest, NULL);
+ switch (qdesc->operation)
+ {
+ case CMD_SELECT:
+ SetQueryCompletion(qc, CMDTAG_SELECT, qdesc->estate->es_processed);
+ break;
+ case CMD_INSERT:
+ SetQueryCompletion(qc, CMDTAG_INSERT, qdesc->estate->es_processed);
+ break;
+ case CMD_UPDATE:
+ SetQueryCompletion(qc, CMDTAG_UPDATE, qdesc->estate->es_processed);
+ break;
+ case CMD_DELETE:
+ SetQueryCompletion(qc, CMDTAG_DELETE, qdesc->estate->es_processed);
+ break;
+ case CMD_MERGE:
+ SetQueryCompletion(qc, CMDTAG_MERGE, qdesc->estate->es_processed);
+ break;
+ default:
+ SetQueryCompletion(qc, CMDTAG_UNKNOWN, qdesc->estate->es_processed);
+ break;
+ }
}
if (log_executor_stats)
@@ -1342,12 +1342,12 @@ PortalRunMulti(Portal portal,
if (portal->stmts == NIL)
break;
- /*
- * Increment command counter between queries, but not after the last
- * one.
- */
- if (lnext(portal->stmts, stmtlist_item) != NULL)
- CommandCounterIncrement();
+ if (qdesc->estate)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ }
+ FreeQueryDesc(qdesc);
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 60978f9415..de3fc756e2 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2073,6 +2073,27 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * get_rel_relisshared
+ *
+ * Returns if the given relation is shared or not
+ */
+bool
+get_rel_relisshared(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relisshared;
+ ReleaseSysCache(tp);
+
+ return result;
+}
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 3d3f7a9bea..e6237d70b3 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -102,13 +102,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,8 +790,14 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * Note though that if the plan contains any child relations that would have
+ * been added by the planner, which would not have been locked yet (because
+ * AcquirePlannerLocks() only locks relations that would be present in the
+ * range table before entering the planner), the plan could go stale before
+ * it reaches execution if any of those child relations get modified
+ * concurrently. The executor must check that the plan (CachedPlan) is still
+ * valid after taking a lock on each of the child tables, and if it is not,
+ * ask the caller to recreate the plan.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -805,60 +811,56 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
-
- return false;
+ return cplan->is_valid;
}
/*
@@ -1128,8 +1130,15 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid unless it contains inheritance/partition child
+ * tables, that is, only the locks on the tables mentioned in the query have
+ * been taken. If any of those tables have inheritance/partition tables, the
+ * executor must also lock them before executing the plan and if the plan gets
+ * invalidated as a result of taking those locks, must ask the caller to get
+ * a new plan by calling here again. Locking of the child tables must be
+ * deferred to the executor like this, because not all child tables may need
+ * to be locked; some may get pruned during the executor plan initialization
+ * phase (InitPlan()).
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1362,8 +1371,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if the executor would need to take additional locks, that is, in
+ * addition to those taken by AcquirePlannerLocks() on a given query.
*/
foreach(lc, plan->stmt_list)
{
@@ -1737,58 +1746,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 06dfa85f04..0cad450dcd 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -201,6 +201,13 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /*
+ * initialize portal's query context to store QueryDescs created during
+ * PortalStart() and then used in PortalRun().
+ */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -224,6 +231,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -594,6 +602,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3d3e632a0c..392abb5150 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -88,7 +88,11 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt, struct CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv);
+extern void ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
@@ -104,6 +108,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index af2bf36dfb..c36c25b497 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -32,9 +32,12 @@
*/
typedef struct QueryDesc
{
+ NodeTag type;
+
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ struct CachedPlan *cplan; /* CachedPlan, if plannedstmt is from one */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -47,6 +50,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ bool plan_valid; /* is planstate tree fully valid? */
/* This field is set by ExecutorRun */
bool already_executed; /* true if previously executed */
@@ -57,6 +61,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ struct CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ac02247947..640b905973 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -256,6 +257,17 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the cached plan, if any, still valid at this point? That is, not
+ * invalidated by the incoming invalidation messages that have been processed
+ * recently.
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanStillValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
@@ -590,6 +602,7 @@ exec_rt_fetch(Index rti, EState *estate)
}
extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
+extern void ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids);
extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Index rti);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cb714f4a19..f0c5177b06 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -623,6 +623,8 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct CachedPlan *es_cachedplan; /* CachedPlan if plannedstmt is from
+ * one */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -671,6 +673,10 @@ typedef struct EState
List *es_exprcontexts; /* List of ExprContexts within EState */
+ List *es_inited_plannodes; /* List of PlanState of nodes from the
+ * plan tree that were fully
+ * initialized */
+
List *es_subplanstates; /* List of PlanState for SubPlans */
List *es_auxmodifytables; /* List of secondary ModifyTableStates */
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 4ee91e3cf9..598bf2688a 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,7 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
bool orstronger);
+extern bool CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger);
extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 4f5418b972..3074e604dd 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -139,6 +139,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_relisshared(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a443181d41..8990fe72e3 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -221,6 +221,20 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Called by the executor on every relation lock taken when initializing the
+ * plan tree in the CachedPlan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aa08b1e0fc..24d420b9e9 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,9 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
+ bool plan_valid; /* are plans in qdescs ready for execution? */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +245,7 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalQueryFinish(QueryDesc *queryDesc);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..2fca84d027 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-replan
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7cd76eb34b..515b2c0c95 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2023, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,41 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static void
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ queryDesc->cplan->is_valid ? "valid" : "not valid");
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +123,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-replan.out b/src/test/modules/delay_execution/expected/cached-plan-replan.out
new file mode 100644
index 0000000000..0ac6a17c2b
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-replan.out
@@ -0,0 +1,156 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo11 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = $1)
+(6 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-----------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo11 foo_1
+ Filter: (a = $1)
+(4 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------
+Bitmap Heap Scan on foo11 foo
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = 1)
+(4 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------
+Seq Scan on foo11 foo
+ Filter: (a = 1)
+(2 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+----------------------------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Index Only Scan using foo11_a_idx on foo11 t1
+ -> Materialize
+ -> Index Scan using foo11_a_idx on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(18 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Sort
+ Sort Key: t1.a
+ -> Seq Scan on foo11 t1
+ -> Sort
+ Sort Key: t2.a
+ -> Seq Scan on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(21 rows)
+
diff --git a/src/test/modules/delay_execution/specs/cached-plan-replan.spec b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
new file mode 100644
index 0000000000..3c92cbd5c6
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
@@ -0,0 +1,61 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo1 PARTITION OF foo FOR VALUES IN (1) PARTITION BY LIST (a);
+ CREATE TABLE foo11 PARTITION OF foo1 FOR VALUES IN (1);
+ CREATE INDEX foo11_a ON foo1 (a);
+ CREATE TABLE foo2 PARTITION OF foo FOR VALUES IN (2);
+ CREATE VIEW foov AS SELECT * FROM foo;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP TABLE foo;
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# no Append case (only one partition selected by the planner)
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Append with partition-wise join aggregate and join plans as child subplans
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo11_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
--
2.35.3
v41-0001-Add-field-to-store-parent-relids-to-Append-Merge.patchapplication/x-patch; name=v41-0001-Add-field-to-store-parent-relids-to-Append-Merge.patchDownload
From 91dca92a9918335952178af1f3827087d3b33485 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:31 +0900
Subject: [PATCH v41 1/4] Add field to store parent relids to
Append/MergeAppend
There's no way currently in the executor to tell if the child
subplans of Append/MergeAppend are scanning partitions, and if
they indeed do, what the RT indexes of their parent/ancestor tables
are. Executor doesn't need to see their RT indexes except for
run-time pruning, in which case they can can be found in the
PartitionPruneInfo, but a future commit will create a need for
them to be available at all times for the purpose of locking
those parent/ancestor tables when executing a cached plan.
The code to look up partitioned parent relids for a given list of
partition scan subpaths of an Append/MergeAppend is already present
in make_partition_pruneinfo() but it's local to partprune.c. This
commit refactors that code into its own function called
add_append_subpath_partrelids() defined in appendinfo.c and
generalizes it to consider child join and aggregate paths. To
facilitate looking up of parent rels of child grouping rels in
add_append_subpath_partrelids(), parent links are now also set in
the RelOptInfos of child grouping rels too, like they are in
those of child base and join rels.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/optimizer/plan/createplan.c | 41 ++++++--
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/plan/setrefs.c | 4 +
src/backend/optimizer/util/appendinfo.c | 134 ++++++++++++++++++++++++
src/backend/partitioning/partprune.c | 124 +++-------------------
src/include/nodes/plannodes.h | 14 +++
src/include/optimizer/appendinfo.h | 3 +
src/include/partitioning/partprune.h | 3 +-
8 files changed, 203 insertions(+), 123 deletions(-)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ec73789bc2..5ee2b5b7f9 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -25,6 +25,7 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
@@ -1210,6 +1211,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
Oid *nodeCollations = NULL;
bool *nodeNullsFirst = NULL;
bool consider_async = false;
+ List *allpartrelids = NIL;
/*
* The subpaths list could be empty, if every child was proven empty by
@@ -1351,15 +1353,23 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
++nasyncplans;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ plan->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1380,7 +1390,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
partpruneinfo =
make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
plan->appendplans = subplans;
@@ -1426,6 +1437,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ List *allpartrelids = NIL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1515,15 +1527,23 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplan = (Plan *) sort;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ node->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1535,7 +1555,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
partpruneinfo = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
node->mergeplans = subplans;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0e12fdeb60..7bc6eec364 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7835,8 +7835,11 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
agg_costs, gd, &child_extra,
&child_partially_grouped_rel);
+ /* Mark as child of grouped_rel. */
+ child_grouped_rel->parent = grouped_rel;
if (child_partially_grouped_rel)
{
+ child_partially_grouped_rel->parent = grouped_rel;
partially_grouped_live_children =
lappend(partially_grouped_live_children,
child_partially_grouped_rel);
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index c63758cb2b..396c83e357 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1754,6 +1754,8 @@ set_append_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) aplan, rtoffset);
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
+ foreach(l, aplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (aplan->part_prune_info)
{
@@ -1830,6 +1832,8 @@ set_mergeappend_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) mplan, rtoffset);
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
+ foreach(l, mplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (mplan->part_prune_info)
{
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index f456b3b0a4..5bd8e82b9b 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -41,6 +41,7 @@ static void make_inh_translation_list(Relation oldrelation,
AppendRelInfo *appinfo);
static Node *adjust_appendrel_attrs_mutator(Node *node,
adjust_appendrel_attrs_context *context);
+static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
/*
@@ -1035,3 +1036,136 @@ distribute_row_identity_vars(PlannerInfo *root)
}
}
}
+
+/*
+ * add_append_subpath_partrelids
+ * Look up a child subpath's rel's partitioned parent relids up to
+ * parentrel and add the bitmapset containing those into
+ * 'allpartrelids'
+ */
+List *
+add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids)
+{
+ RelOptInfo *prel = subpath->parent;
+ Relids partrelids = NULL;
+
+ /* Nothing to do if there's no parent to begin with. */
+ if (!IS_OTHER_REL(prel))
+ return allpartrelids;
+
+ /*
+ * Traverse up to the pathrel's topmost partitioned parent, collecting
+ * parent relids as we go; but stop if we reach parentrel. (Normally, a
+ * pathrel's topmost partitioned parent is either parentrel or a UNION ALL
+ * appendrel child of parentrel. But when handling partitionwise joins of
+ * multi-level partitioning trees, we can see an append path whose
+ * parentrel is an intermediate partitioned table.)
+ */
+ do
+ {
+ Relids parent_relids = NULL;
+
+ /*
+ * For simple child rels, we can simply set the parent_relids to
+ * prel->parent->relids. But for partitionwise join and aggregate
+ * child rels, while we can use prel->parent to move up the tree,
+ * parent_relids must be found the hard way through AppendInfoInfos,
+ * because 1) a joinrel's relids may point to RTE_JOIN entries,
+ * 2) topmost parent grouping rel's relids field is NULL.
+ */
+ if (IS_SIMPLE_REL(prel))
+ {
+ prel = prel->parent;
+ /* Stop once we reach the root partitioned rel. */
+ if (!IS_PARTITIONED_REL(prel))
+ break;
+ parent_relids = bms_add_members(parent_relids, prel->relids);
+ }
+ else
+ {
+ AppendRelInfo **appinfos;
+ int nappinfos,
+ i;
+
+ appinfos = find_appinfos_by_relids(root, prel->relids,
+ &nappinfos);
+ for (i = 0; i < nappinfos; i++)
+ {
+ AppendRelInfo *appinfo = appinfos[i];
+
+ parent_relids = bms_add_member(parent_relids,
+ appinfo->parent_relid);
+ }
+ pfree(appinfos);
+ prel = prel->parent;
+ }
+ /* accept this level as an interesting parent */
+ partrelids = bms_add_members(partrelids, parent_relids);
+ if (prel == parentrel)
+ break; /* don't traverse above parentrel */
+ } while (IS_OTHER_REL(prel));
+
+ if (partrelids == NULL)
+ return allpartrelids;
+
+ return add_part_relids(allpartrelids, partrelids);
+}
+
+/*
+ * add_part_relids
+ * Add new info to a list of Bitmapsets of partitioned relids.
+ *
+ * Within 'allpartrelids', there is one Bitmapset for each topmost parent
+ * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
+ * parent as well as its relevant non-leaf child partitions. Since (by
+ * construction of the rangetable list) parent partitions must have lower
+ * RT indexes than their children, we can distinguish the topmost parent
+ * as being the lowest set bit in the Bitmapset.
+ *
+ * 'partrelids' contains the RT indexes of a parent partitioned rel, and
+ * possibly some non-leaf children, that are newly identified as parents of
+ * some subpath rel passed to make_partition_pruneinfo(). These are added
+ * to an appropriate member of 'allpartrelids'.
+ *
+ * Note that the list contains only RT indexes of partitioned tables that
+ * are parents of some scan-level relation appearing in the 'subpaths' that
+ * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
+ * not allowed to be higher than the 'parentrel' associated with the append
+ * path. In this way, we avoid expending cycles on partitioned rels that
+ * can't contribute useful pruning information for the problem at hand.
+ * (It is possible for 'parentrel' to be a child partitioned table, and it
+ * is also possible for scan-level relations to be child partitioned tables
+ * rather than leaf partitions. Hence we must construct this relation set
+ * with reference to the particular append path we're dealing with, rather
+ * than looking at the full partitioning structure represented in the
+ * RelOptInfos.)
+ */
+static List *
+add_part_relids(List *allpartrelids, Bitmapset *partrelids)
+{
+ Index targetpart;
+ ListCell *lc;
+
+ /* We can easily get the lowest set bit this way: */
+ targetpart = bms_next_member(partrelids, -1);
+ Assert(targetpart > 0);
+
+ /* Look for a matching topmost parent */
+ foreach(lc, allpartrelids)
+ {
+ Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
+ Index currtarget = bms_next_member(currpartrelids, -1);
+
+ if (targetpart == currtarget)
+ {
+ /* Found a match, so add any new RT indexes to this hierarchy */
+ currpartrelids = bms_add_members(currpartrelids, partrelids);
+ lfirst(lc) = currpartrelids;
+ return allpartrelids;
+ }
+ }
+ /* No match, so add the new partition hierarchy to the list */
+ return lappend(allpartrelids, partrelids);
+}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 7179b22a05..213512a5f4 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -138,7 +138,6 @@ typedef struct PruneStepResult
} PruneStepResult;
-static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
RelOptInfo *parentrel,
List *prunequal,
@@ -218,33 +217,32 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'allpartrelids' contains Bitmapsets of RT indexes of partitioned parents
+ * whose partitions' Paths are in 'subpaths'; there's one Bitmapset for every
+ * partition tree involved.
*/
PartitionPruneInfo *
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ List *allpartrelids)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
- List *allpartrelids;
List *prunerelinfos;
int *relid_subplan_map;
ListCell *lc;
int i;
+ Assert(list_length(allpartrelids) > 0);
+
/*
- * Scan the subpaths to see which ones are scans of partition child
- * relations, and identify their parent partitioned rels. (Note: we must
- * restrict the parent partitioned rels to be parentrel or children of
- * parentrel, otherwise we couldn't translate prunequal to match.)
- *
- * Also construct a temporary array to map from partition-child-relation
- * relid to the index in 'subpaths' of the scan plan for that partition.
+ * Construct a temporary array to map from partition-child-relation relid
+ * to the index in 'subpaths' of the scan plan for that partition.
* (Use of "subplan" rather than "subpath" is a bit of a misnomer, but
* we'll let it stand.) For convenience, we use 1-based indexes here, so
* that zero can represent an un-filled array entry.
*/
- allpartrelids = NIL;
relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
i = 1;
@@ -253,50 +251,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Path *path = (Path *) lfirst(lc);
RelOptInfo *pathrel = path->parent;
- /* We don't consider partitioned joins here */
- if (pathrel->reloptkind == RELOPT_OTHER_MEMBER_REL)
- {
- RelOptInfo *prel = pathrel;
- Bitmapset *partrelids = NULL;
-
- /*
- * Traverse up to the pathrel's topmost partitioned parent,
- * collecting parent relids as we go; but stop if we reach
- * parentrel. (Normally, a pathrel's topmost partitioned parent
- * is either parentrel or a UNION ALL appendrel child of
- * parentrel. But when handling partitionwise joins of
- * multi-level partitioning trees, we can see an append path whose
- * parentrel is an intermediate partitioned table.)
- */
- do
- {
- AppendRelInfo *appinfo;
-
- Assert(prel->relid < root->simple_rel_array_size);
- appinfo = root->append_rel_array[prel->relid];
- prel = find_base_rel(root, appinfo->parent_relid);
- if (!IS_PARTITIONED_REL(prel))
- break; /* reached a non-partitioned parent */
- /* accept this level as an interesting parent */
- partrelids = bms_add_member(partrelids, prel->relid);
- if (prel == parentrel)
- break; /* don't traverse above parentrel */
- } while (prel->reloptkind == RELOPT_OTHER_MEMBER_REL);
-
- if (partrelids)
- {
- /*
- * Found some relevant parent partitions, which may or may not
- * overlap with partition trees we already found. Add new
- * information to the allpartrelids list.
- */
- allpartrelids = add_part_relids(allpartrelids, partrelids);
- /* Also record the subplan in relid_subplan_map[] */
- /* No duplicates please */
- Assert(relid_subplan_map[pathrel->relid] == 0);
- relid_subplan_map[pathrel->relid] = i;
- }
- }
+ /* No duplicates please */
+ Assert(relid_subplan_map[pathrel->relid] == 0);
+ relid_subplan_map[pathrel->relid] = i;
i++;
}
@@ -362,63 +319,6 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return pruneinfo;
}
-/*
- * add_part_relids
- * Add new info to a list of Bitmapsets of partitioned relids.
- *
- * Within 'allpartrelids', there is one Bitmapset for each topmost parent
- * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
- * parent as well as its relevant non-leaf child partitions. Since (by
- * construction of the rangetable list) parent partitions must have lower
- * RT indexes than their children, we can distinguish the topmost parent
- * as being the lowest set bit in the Bitmapset.
- *
- * 'partrelids' contains the RT indexes of a parent partitioned rel, and
- * possibly some non-leaf children, that are newly identified as parents of
- * some subpath rel passed to make_partition_pruneinfo(). These are added
- * to an appropriate member of 'allpartrelids'.
- *
- * Note that the list contains only RT indexes of partitioned tables that
- * are parents of some scan-level relation appearing in the 'subpaths' that
- * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
- * not allowed to be higher than the 'parentrel' associated with the append
- * path. In this way, we avoid expending cycles on partitioned rels that
- * can't contribute useful pruning information for the problem at hand.
- * (It is possible for 'parentrel' to be a child partitioned table, and it
- * is also possible for scan-level relations to be child partitioned tables
- * rather than leaf partitions. Hence we must construct this relation set
- * with reference to the particular append path we're dealing with, rather
- * than looking at the full partitioning structure represented in the
- * RelOptInfos.)
- */
-static List *
-add_part_relids(List *allpartrelids, Bitmapset *partrelids)
-{
- Index targetpart;
- ListCell *lc;
-
- /* We can easily get the lowest set bit this way: */
- targetpart = bms_next_member(partrelids, -1);
- Assert(targetpart > 0);
-
- /* Look for a matching topmost parent */
- foreach(lc, allpartrelids)
- {
- Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
- Index currtarget = bms_next_member(currpartrelids, -1);
-
- if (targetpart == currtarget)
- {
- /* Found a match, so add any new RT indexes to this hierarchy */
- currpartrelids = bms_add_members(currpartrelids, partrelids);
- lfirst(lc) = currpartrelids;
- return allpartrelids;
- }
- }
- /* No match, so add the new partition hierarchy to the list */
- return lappend(allpartrelids, partrelids);
-}
-
/*
* make_partitionedrel_pruneinfo
* Build a List of PartitionedRelPruneInfos, one for each interesting
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 1b787fe031..7a5f3ba625 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -267,6 +267,13 @@ typedef struct Append
List *appendplans;
int nasyncplans; /* # of asynchronous plans */
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * appendplans. The list contains a bitmapset for every partition tree
+ * covered by this Append.
+ */
+ List *allpartrelids;
+
/*
* All 'appendplans' preceding this index are non-partial plans. All
* 'appendplans' from this index onwards are partial plans.
@@ -291,6 +298,13 @@ typedef struct MergeAppend
List *mergeplans;
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * mergeplans. The list contains a bitmapset for every partition tree
+ * covered by this MergeAppend.
+ */
+ List *allpartrelids;
+
/* these fields are just like the sort-key info in struct Sort: */
/* number of sort-key columns */
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index a05f91f77d..1621a7319a 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -46,5 +46,8 @@ extern void add_row_identity_columns(PlannerInfo *root, Index rtindex,
RangeTblEntry *target_rte,
Relation target_relation);
extern void distribute_row_identity_vars(PlannerInfo *root);
+extern List *add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids);
#endif /* APPENDINFO_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 8636e04e37..caa774a111 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -73,7 +73,8 @@ typedef struct PartitionPruneContext
extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ List *allpartrelids);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v41-0004-Track-opened-range-table-relations-in-a-List-in-.patchapplication/x-patch; name=v41-0004-Track-opened-range-table-relations-in-a-List-in-.patchDownload
From b8fd387987d28b13a2229515d0fa451e332b9d8e Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:49 +0900
Subject: [PATCH v41 4/4] Track opened range table relations in a List in
EState
This makes ExecCloseRangeTableRelations faster when there are many
relations in the range table but only a few are opened during
execution, such as when run-time pruning kicks in on an Append
containing thousands of partition subplans.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 9 +++++----
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 2 ++
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index a2f6ac9d1c..053d8a2dc2 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1650,12 +1650,13 @@ ExecCloseResultRelations(EState *estate)
void
ExecCloseRangeTableRelations(EState *estate)
{
- int i;
+ ListCell *lc;
- for (i = 0; i < estate->es_range_table_size; i++)
+ foreach(lc, estate->es_opened_relations)
{
- if (estate->es_relations[i])
- table_close(estate->es_relations[i], NoLock);
+ Relation rel = lfirst(lc);
+
+ table_close(rel, NoLock);
}
}
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index af92d2b3c3..f0320cfa34 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -837,6 +837,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
}
estate->es_relations[rti - 1] = rel;
+ estate->es_opened_relations = lappend(estate->es_opened_relations,
+ rel);
}
return rel;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f0c5177b06..be06c40766 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,8 @@ typedef struct EState
Index es_range_table_size; /* size of the range table arrays */
Relation *es_relations; /* Array of per-range-table-entry Relation
* pointers, or NULL if not yet opened */
+ List *es_opened_relations; /* List of non-NULL entries in
+ * es_relations in no specific order */
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
--
2.35.3
v41-0002-Set-inFromCl-to-false-in-child-table-RTEs.patchapplication/x-patch; name=v41-0002-Set-inFromCl-to-false-in-child-table-RTEs.patchDownload
From 02f7fa054649fda593cd17fc2c0cb40543cf9bae Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:43 +0900
Subject: [PATCH v41 2/4] Set inFromCl to false in child table RTEs
This is to allow the executor be able to distinguish tables that are
directly mentioned in the query from those that get added to the
query during planning. A subsequent commit will teach the executor
to lock only the tables of the latter kind when executing a cached
plan.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.comk
---
src/backend/optimizer/util/inherit.c | 6 ++++++
src/backend/parser/analyze.c | 7 +++----
src/include/nodes/parsenodes.h | 9 +++++++--
3 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 94de855a22..9bac07bf40 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -492,6 +492,12 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
}
else
childrte->inh = false;
+ /*
+ * Mark child tables as not being directly mentioned in the query. This
+ * allows the executor's ExecGetRangeTableRelation() to conveniently
+ * identify it as an inheritance child table.
+ */
+ childrte->inFromCl = false;
childrte->securityQuals = NIL;
/*
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 4006632092..bcf6fcdde2 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -3267,10 +3267,9 @@ transformLockingClause(ParseState *pstate, Query *qry, LockingClause *lc,
/*
* Lock all regular tables used in query and its subqueries. We
* examine inFromCl to exclude auto-added RTEs, particularly NEW/OLD
- * in rules. This is a bit of an abuse of a mostly-obsolete flag, but
- * it's convenient. We can't rely on the namespace mechanism that has
- * largely replaced inFromCl, since for example we need to lock
- * base-relation RTEs even if they are masked by upper joins.
+ * in rules. We can't rely on the namespace mechanism since for
+ * example we need to lock base-relation RTEs even if they are masked
+ * by upper joins.
*/
i = 0;
foreach(rt, qry->rtable)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index efb5c3e098..a891272e6e 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -994,11 +994,16 @@ typedef struct PartitionCmd
*
* inFromCl marks those range variables that are listed in the FROM clause.
* It's false for RTEs that are added to a query behind the scenes, such
- * as the NEW and OLD variables for a rule, or the subqueries of a UNION.
+ * as the NEW and OLD variables for a rule, or the subqueries of a UNION,
+ * or the RTEs of inheritance child tables that are added by the planner.
* This flag is not used during parsing (except in transformLockingClause,
* q.v.); the parser now uses a separate "namespace" data structure to
* control visibility. But it is needed by ruleutils.c to determine
- * whether RTEs should be shown in decompiled queries.
+ * whether RTEs should be shown in decompiled queries. It is used by the
+ * executor to determine that a given RTE_RELATION entry belongs to a table
+ * directly mentioned in the query or to a child table added by the planner.
+ * It needs to know that for the case where the child tables in a plan need
+ * to be locked.
*
* securityQuals is a list of security barrier quals (boolean expressions),
* to be tested in the listed order before returning a row from the
--
2.35.3
On Thu, 13 Jul 2023 at 13:59, Amit Langote <amitlangote09@gmail.com> wrote:
In an absolutely brown-paper-bag moment, I realized that I had not
updated src/backend/executor/README to reflect the changes to the
executor's control flow that this patch makes. That is, after
scrapping the old design back in January whose details *were*
reflected in the patches before that redesign.Anyway, the attached fixes that.
Tom, do you think you have bandwidth in the near future to give this
another look? I think I've addressed the comments that you had given
back in April, though as mentioned in the previous message, there may
still be some funny-looking aspects still remaining. In any case, I
have no intention of pressing ahead with the patch without another
committer having had a chance to sign off on it.
I've only just started taking a look at this, and my first test drive
yields very impressive results:
8192 partitions (3 runs, 10000 rows)
Head 391.294989 382.622481 379.252236
Patched 13088.145995 13406.135531 13431.828051
Looking at your changes to README, I would like to suggest rewording
the following:
+table during planning. This means that inheritance child tables, which are
+added to the query's range table during planning, if they are present in a
+cached plan tree would not have been locked.
To:
This means that inheritance child tables present in a cached plan
tree, which are added to the query's range table during planning,
would not have been locked.
Also, further down:
s/intiatialize/initialize/
I'll carry on taking a closer look and see if I can break it.
Thom
Hi Thom,
On Tue, Jul 18, 2023 at 1:33 AM Thom Brown <thom@linux.com> wrote:
On Thu, 13 Jul 2023 at 13:59, Amit Langote <amitlangote09@gmail.com> wrote:
In an absolutely brown-paper-bag moment, I realized that I had not
updated src/backend/executor/README to reflect the changes to the
executor's control flow that this patch makes. That is, after
scrapping the old design back in January whose details *were*
reflected in the patches before that redesign.Anyway, the attached fixes that.
Tom, do you think you have bandwidth in the near future to give this
another look? I think I've addressed the comments that you had given
back in April, though as mentioned in the previous message, there may
still be some funny-looking aspects still remaining. In any case, I
have no intention of pressing ahead with the patch without another
committer having had a chance to sign off on it.I've only just started taking a look at this, and my first test drive
yields very impressive results:8192 partitions (3 runs, 10000 rows)
Head 391.294989 382.622481 379.252236
Patched 13088.145995 13406.135531 13431.828051
Just to be sure, did you use pgbench --Mprepared with plan_cache_mode
= force_generic_plan in postgresql.conf?
Looking at your changes to README, I would like to suggest rewording
the following:+table during planning. This means that inheritance child tables, which are +added to the query's range table during planning, if they are present in a +cached plan tree would not have been locked.To:
This means that inheritance child tables present in a cached plan
tree, which are added to the query's range table during planning,
would not have been locked.Also, further down:
s/intiatialize/initialize/
I'll carry on taking a closer look and see if I can break it.
Thanks for looking. I've fixed these issues in the attached updated
patch. I've also changed the position of a newly added paragraph in
src/backend/executor/README so that it doesn't break the flow of the
existing text.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v42-0001-Add-field-to-store-parent-relids-to-Append-Merge.patchapplication/octet-stream; name=v42-0001-Add-field-to-store-parent-relids-to-Append-Merge.patchDownload
From b6413e324c5be9273a3c33aa026c06cdfb710da7 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:31 +0900
Subject: [PATCH v42 1/4] Add field to store parent relids to
Append/MergeAppend
There's no way currently in the executor to tell if the child
subplans of Append/MergeAppend are scanning partitions, and if
they indeed do, what the RT indexes of their parent/ancestor tables
are. Executor doesn't need to see their RT indexes except for
run-time pruning, in which case they can can be found in the
PartitionPruneInfo, but a future commit will create a need for
them to be available at all times for the purpose of locking
those parent/ancestor tables when executing a cached plan.
The code to look up partitioned parent relids for a given list of
partition scan subpaths of an Append/MergeAppend is already present
in make_partition_pruneinfo() but it's local to partprune.c. This
commit refactors that code into its own function called
add_append_subpath_partrelids() defined in appendinfo.c and
generalizes it to consider child join and aggregate paths. To
facilitate looking up of parent rels of child grouping rels in
add_append_subpath_partrelids(), parent links are now also set in
the RelOptInfos of child grouping rels too, like they are in
those of child base and join rels.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/optimizer/plan/createplan.c | 41 ++++++--
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/plan/setrefs.c | 4 +
src/backend/optimizer/util/appendinfo.c | 134 ++++++++++++++++++++++++
src/backend/partitioning/partprune.c | 124 +++-------------------
src/include/nodes/plannodes.h | 14 +++
src/include/optimizer/appendinfo.h | 3 +
src/include/partitioning/partprune.h | 3 +-
8 files changed, 203 insertions(+), 123 deletions(-)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index af48109058..8ac1d3909b 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -25,6 +25,7 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
@@ -1210,6 +1211,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
Oid *nodeCollations = NULL;
bool *nodeNullsFirst = NULL;
bool consider_async = false;
+ List *allpartrelids = NIL;
/*
* The subpaths list could be empty, if every child was proven empty by
@@ -1351,15 +1353,23 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
++nasyncplans;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ plan->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1380,7 +1390,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
partpruneinfo =
make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
plan->appendplans = subplans;
@@ -1426,6 +1437,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ List *allpartrelids = NIL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1515,15 +1527,23 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplan = (Plan *) sort;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ node->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1535,7 +1555,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
partpruneinfo = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
node->mergeplans = subplans;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 44efb1f4eb..f97bc09113 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7855,8 +7855,11 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
agg_costs, gd, &child_extra,
&child_partially_grouped_rel);
+ /* Mark as child of grouped_rel. */
+ child_grouped_rel->parent = grouped_rel;
if (child_partially_grouped_rel)
{
+ child_partially_grouped_rel->parent = grouped_rel;
partially_grouped_live_children =
lappend(partially_grouped_live_children,
child_partially_grouped_rel);
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 97fa561e4e..854dd7c8af 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1766,6 +1766,8 @@ set_append_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) aplan, rtoffset);
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
+ foreach(l, aplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (aplan->part_prune_info)
{
@@ -1842,6 +1844,8 @@ set_mergeappend_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) mplan, rtoffset);
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
+ foreach(l, mplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (mplan->part_prune_info)
{
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index f456b3b0a4..5bd8e82b9b 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -41,6 +41,7 @@ static void make_inh_translation_list(Relation oldrelation,
AppendRelInfo *appinfo);
static Node *adjust_appendrel_attrs_mutator(Node *node,
adjust_appendrel_attrs_context *context);
+static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
/*
@@ -1035,3 +1036,136 @@ distribute_row_identity_vars(PlannerInfo *root)
}
}
}
+
+/*
+ * add_append_subpath_partrelids
+ * Look up a child subpath's rel's partitioned parent relids up to
+ * parentrel and add the bitmapset containing those into
+ * 'allpartrelids'
+ */
+List *
+add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids)
+{
+ RelOptInfo *prel = subpath->parent;
+ Relids partrelids = NULL;
+
+ /* Nothing to do if there's no parent to begin with. */
+ if (!IS_OTHER_REL(prel))
+ return allpartrelids;
+
+ /*
+ * Traverse up to the pathrel's topmost partitioned parent, collecting
+ * parent relids as we go; but stop if we reach parentrel. (Normally, a
+ * pathrel's topmost partitioned parent is either parentrel or a UNION ALL
+ * appendrel child of parentrel. But when handling partitionwise joins of
+ * multi-level partitioning trees, we can see an append path whose
+ * parentrel is an intermediate partitioned table.)
+ */
+ do
+ {
+ Relids parent_relids = NULL;
+
+ /*
+ * For simple child rels, we can simply set the parent_relids to
+ * prel->parent->relids. But for partitionwise join and aggregate
+ * child rels, while we can use prel->parent to move up the tree,
+ * parent_relids must be found the hard way through AppendInfoInfos,
+ * because 1) a joinrel's relids may point to RTE_JOIN entries,
+ * 2) topmost parent grouping rel's relids field is NULL.
+ */
+ if (IS_SIMPLE_REL(prel))
+ {
+ prel = prel->parent;
+ /* Stop once we reach the root partitioned rel. */
+ if (!IS_PARTITIONED_REL(prel))
+ break;
+ parent_relids = bms_add_members(parent_relids, prel->relids);
+ }
+ else
+ {
+ AppendRelInfo **appinfos;
+ int nappinfos,
+ i;
+
+ appinfos = find_appinfos_by_relids(root, prel->relids,
+ &nappinfos);
+ for (i = 0; i < nappinfos; i++)
+ {
+ AppendRelInfo *appinfo = appinfos[i];
+
+ parent_relids = bms_add_member(parent_relids,
+ appinfo->parent_relid);
+ }
+ pfree(appinfos);
+ prel = prel->parent;
+ }
+ /* accept this level as an interesting parent */
+ partrelids = bms_add_members(partrelids, parent_relids);
+ if (prel == parentrel)
+ break; /* don't traverse above parentrel */
+ } while (IS_OTHER_REL(prel));
+
+ if (partrelids == NULL)
+ return allpartrelids;
+
+ return add_part_relids(allpartrelids, partrelids);
+}
+
+/*
+ * add_part_relids
+ * Add new info to a list of Bitmapsets of partitioned relids.
+ *
+ * Within 'allpartrelids', there is one Bitmapset for each topmost parent
+ * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
+ * parent as well as its relevant non-leaf child partitions. Since (by
+ * construction of the rangetable list) parent partitions must have lower
+ * RT indexes than their children, we can distinguish the topmost parent
+ * as being the lowest set bit in the Bitmapset.
+ *
+ * 'partrelids' contains the RT indexes of a parent partitioned rel, and
+ * possibly some non-leaf children, that are newly identified as parents of
+ * some subpath rel passed to make_partition_pruneinfo(). These are added
+ * to an appropriate member of 'allpartrelids'.
+ *
+ * Note that the list contains only RT indexes of partitioned tables that
+ * are parents of some scan-level relation appearing in the 'subpaths' that
+ * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
+ * not allowed to be higher than the 'parentrel' associated with the append
+ * path. In this way, we avoid expending cycles on partitioned rels that
+ * can't contribute useful pruning information for the problem at hand.
+ * (It is possible for 'parentrel' to be a child partitioned table, and it
+ * is also possible for scan-level relations to be child partitioned tables
+ * rather than leaf partitions. Hence we must construct this relation set
+ * with reference to the particular append path we're dealing with, rather
+ * than looking at the full partitioning structure represented in the
+ * RelOptInfos.)
+ */
+static List *
+add_part_relids(List *allpartrelids, Bitmapset *partrelids)
+{
+ Index targetpart;
+ ListCell *lc;
+
+ /* We can easily get the lowest set bit this way: */
+ targetpart = bms_next_member(partrelids, -1);
+ Assert(targetpart > 0);
+
+ /* Look for a matching topmost parent */
+ foreach(lc, allpartrelids)
+ {
+ Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
+ Index currtarget = bms_next_member(currpartrelids, -1);
+
+ if (targetpart == currtarget)
+ {
+ /* Found a match, so add any new RT indexes to this hierarchy */
+ currpartrelids = bms_add_members(currpartrelids, partrelids);
+ lfirst(lc) = currpartrelids;
+ return allpartrelids;
+ }
+ }
+ /* No match, so add the new partition hierarchy to the list */
+ return lappend(allpartrelids, partrelids);
+}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 7179b22a05..213512a5f4 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -138,7 +138,6 @@ typedef struct PruneStepResult
} PruneStepResult;
-static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
RelOptInfo *parentrel,
List *prunequal,
@@ -218,33 +217,32 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'allpartrelids' contains Bitmapsets of RT indexes of partitioned parents
+ * whose partitions' Paths are in 'subpaths'; there's one Bitmapset for every
+ * partition tree involved.
*/
PartitionPruneInfo *
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ List *allpartrelids)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
- List *allpartrelids;
List *prunerelinfos;
int *relid_subplan_map;
ListCell *lc;
int i;
+ Assert(list_length(allpartrelids) > 0);
+
/*
- * Scan the subpaths to see which ones are scans of partition child
- * relations, and identify their parent partitioned rels. (Note: we must
- * restrict the parent partitioned rels to be parentrel or children of
- * parentrel, otherwise we couldn't translate prunequal to match.)
- *
- * Also construct a temporary array to map from partition-child-relation
- * relid to the index in 'subpaths' of the scan plan for that partition.
+ * Construct a temporary array to map from partition-child-relation relid
+ * to the index in 'subpaths' of the scan plan for that partition.
* (Use of "subplan" rather than "subpath" is a bit of a misnomer, but
* we'll let it stand.) For convenience, we use 1-based indexes here, so
* that zero can represent an un-filled array entry.
*/
- allpartrelids = NIL;
relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
i = 1;
@@ -253,50 +251,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Path *path = (Path *) lfirst(lc);
RelOptInfo *pathrel = path->parent;
- /* We don't consider partitioned joins here */
- if (pathrel->reloptkind == RELOPT_OTHER_MEMBER_REL)
- {
- RelOptInfo *prel = pathrel;
- Bitmapset *partrelids = NULL;
-
- /*
- * Traverse up to the pathrel's topmost partitioned parent,
- * collecting parent relids as we go; but stop if we reach
- * parentrel. (Normally, a pathrel's topmost partitioned parent
- * is either parentrel or a UNION ALL appendrel child of
- * parentrel. But when handling partitionwise joins of
- * multi-level partitioning trees, we can see an append path whose
- * parentrel is an intermediate partitioned table.)
- */
- do
- {
- AppendRelInfo *appinfo;
-
- Assert(prel->relid < root->simple_rel_array_size);
- appinfo = root->append_rel_array[prel->relid];
- prel = find_base_rel(root, appinfo->parent_relid);
- if (!IS_PARTITIONED_REL(prel))
- break; /* reached a non-partitioned parent */
- /* accept this level as an interesting parent */
- partrelids = bms_add_member(partrelids, prel->relid);
- if (prel == parentrel)
- break; /* don't traverse above parentrel */
- } while (prel->reloptkind == RELOPT_OTHER_MEMBER_REL);
-
- if (partrelids)
- {
- /*
- * Found some relevant parent partitions, which may or may not
- * overlap with partition trees we already found. Add new
- * information to the allpartrelids list.
- */
- allpartrelids = add_part_relids(allpartrelids, partrelids);
- /* Also record the subplan in relid_subplan_map[] */
- /* No duplicates please */
- Assert(relid_subplan_map[pathrel->relid] == 0);
- relid_subplan_map[pathrel->relid] = i;
- }
- }
+ /* No duplicates please */
+ Assert(relid_subplan_map[pathrel->relid] == 0);
+ relid_subplan_map[pathrel->relid] = i;
i++;
}
@@ -362,63 +319,6 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return pruneinfo;
}
-/*
- * add_part_relids
- * Add new info to a list of Bitmapsets of partitioned relids.
- *
- * Within 'allpartrelids', there is one Bitmapset for each topmost parent
- * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
- * parent as well as its relevant non-leaf child partitions. Since (by
- * construction of the rangetable list) parent partitions must have lower
- * RT indexes than their children, we can distinguish the topmost parent
- * as being the lowest set bit in the Bitmapset.
- *
- * 'partrelids' contains the RT indexes of a parent partitioned rel, and
- * possibly some non-leaf children, that are newly identified as parents of
- * some subpath rel passed to make_partition_pruneinfo(). These are added
- * to an appropriate member of 'allpartrelids'.
- *
- * Note that the list contains only RT indexes of partitioned tables that
- * are parents of some scan-level relation appearing in the 'subpaths' that
- * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
- * not allowed to be higher than the 'parentrel' associated with the append
- * path. In this way, we avoid expending cycles on partitioned rels that
- * can't contribute useful pruning information for the problem at hand.
- * (It is possible for 'parentrel' to be a child partitioned table, and it
- * is also possible for scan-level relations to be child partitioned tables
- * rather than leaf partitions. Hence we must construct this relation set
- * with reference to the particular append path we're dealing with, rather
- * than looking at the full partitioning structure represented in the
- * RelOptInfos.)
- */
-static List *
-add_part_relids(List *allpartrelids, Bitmapset *partrelids)
-{
- Index targetpart;
- ListCell *lc;
-
- /* We can easily get the lowest set bit this way: */
- targetpart = bms_next_member(partrelids, -1);
- Assert(targetpart > 0);
-
- /* Look for a matching topmost parent */
- foreach(lc, allpartrelids)
- {
- Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
- Index currtarget = bms_next_member(currpartrelids, -1);
-
- if (targetpart == currtarget)
- {
- /* Found a match, so add any new RT indexes to this hierarchy */
- currpartrelids = bms_add_members(currpartrelids, partrelids);
- lfirst(lc) = currpartrelids;
- return allpartrelids;
- }
- }
- /* No match, so add the new partition hierarchy to the list */
- return lappend(allpartrelids, partrelids);
-}
-
/*
* make_partitionedrel_pruneinfo
* Build a List of PartitionedRelPruneInfos, one for each interesting
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 1b787fe031..7a5f3ba625 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -267,6 +267,13 @@ typedef struct Append
List *appendplans;
int nasyncplans; /* # of asynchronous plans */
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * appendplans. The list contains a bitmapset for every partition tree
+ * covered by this Append.
+ */
+ List *allpartrelids;
+
/*
* All 'appendplans' preceding this index are non-partial plans. All
* 'appendplans' from this index onwards are partial plans.
@@ -291,6 +298,13 @@ typedef struct MergeAppend
List *mergeplans;
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * mergeplans. The list contains a bitmapset for every partition tree
+ * covered by this MergeAppend.
+ */
+ List *allpartrelids;
+
/* these fields are just like the sort-key info in struct Sort: */
/* number of sort-key columns */
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index a05f91f77d..1621a7319a 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -46,5 +46,8 @@ extern void add_row_identity_columns(PlannerInfo *root, Index rtindex,
RangeTblEntry *target_rte,
Relation target_relation);
extern void distribute_row_identity_vars(PlannerInfo *root);
+extern List *add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids);
#endif /* APPENDINFO_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 8636e04e37..caa774a111 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -73,7 +73,8 @@ typedef struct PartitionPruneContext
extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ List *allpartrelids);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v42-0004-Track-opened-range-table-relations-in-a-List-in-.patchapplication/octet-stream; name=v42-0004-Track-opened-range-table-relations-in-a-List-in-.patchDownload
From b9b750fd6323342672613c7efcdb47287264baa4 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:49 +0900
Subject: [PATCH v42 4/4] Track opened range table relations in a List in
EState
This makes ExecCloseRangeTableRelations faster when there are many
relations in the range table but only a few are opened during
execution, such as when run-time pruning kicks in on an Append
containing thousands of partition subplans.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 9 +++++----
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 2 ++
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index a2f6ac9d1c..053d8a2dc2 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1650,12 +1650,13 @@ ExecCloseResultRelations(EState *estate)
void
ExecCloseRangeTableRelations(EState *estate)
{
- int i;
+ ListCell *lc;
- for (i = 0; i < estate->es_range_table_size; i++)
+ foreach(lc, estate->es_opened_relations)
{
- if (estate->es_relations[i])
- table_close(estate->es_relations[i], NoLock);
+ Relation rel = lfirst(lc);
+
+ table_close(rel, NoLock);
}
}
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index af92d2b3c3..f0320cfa34 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -837,6 +837,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
}
estate->es_relations[rti - 1] = rel;
+ estate->es_opened_relations = lappend(estate->es_opened_relations,
+ rel);
}
return rel;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f0c5177b06..be06c40766 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,8 @@ typedef struct EState
Index es_range_table_size; /* size of the range table arrays */
Relation *es_relations; /* Array of per-range-table-entry Relation
* pointers, or NULL if not yet opened */
+ List *es_opened_relations; /* List of non-NULL entries in
+ * es_relations in no specific order */
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
--
2.35.3
v42-0002-Set-inFromCl-to-false-in-child-table-RTEs.patchapplication/octet-stream; name=v42-0002-Set-inFromCl-to-false-in-child-table-RTEs.patchDownload
From a16ff85f49bd4d67413b7397fe2ad2fd642f0284 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:43 +0900
Subject: [PATCH v42 2/4] Set inFromCl to false in child table RTEs
This is to allow the executor be able to distinguish tables that are
directly mentioned in the query from those that get added to the
query during planning. A subsequent commit will teach the executor
to lock only the tables of the latter kind when executing a cached
plan.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.comk
---
src/backend/optimizer/util/inherit.c | 6 ++++++
src/backend/parser/analyze.c | 7 +++----
src/include/nodes/parsenodes.h | 9 +++++++--
3 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 94de855a22..9bac07bf40 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -492,6 +492,12 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
}
else
childrte->inh = false;
+ /*
+ * Mark child tables as not being directly mentioned in the query. This
+ * allows the executor's ExecGetRangeTableRelation() to conveniently
+ * identify it as an inheritance child table.
+ */
+ childrte->inFromCl = false;
childrte->securityQuals = NIL;
/*
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 4006632092..bcf6fcdde2 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -3267,10 +3267,9 @@ transformLockingClause(ParseState *pstate, Query *qry, LockingClause *lc,
/*
* Lock all regular tables used in query and its subqueries. We
* examine inFromCl to exclude auto-added RTEs, particularly NEW/OLD
- * in rules. This is a bit of an abuse of a mostly-obsolete flag, but
- * it's convenient. We can't rely on the namespace mechanism that has
- * largely replaced inFromCl, since for example we need to lock
- * base-relation RTEs even if they are masked by upper joins.
+ * in rules. We can't rely on the namespace mechanism since for
+ * example we need to lock base-relation RTEs even if they are masked
+ * by upper joins.
*/
i = 0;
foreach(rt, qry->rtable)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index efb5c3e098..a891272e6e 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -994,11 +994,16 @@ typedef struct PartitionCmd
*
* inFromCl marks those range variables that are listed in the FROM clause.
* It's false for RTEs that are added to a query behind the scenes, such
- * as the NEW and OLD variables for a rule, or the subqueries of a UNION.
+ * as the NEW and OLD variables for a rule, or the subqueries of a UNION,
+ * or the RTEs of inheritance child tables that are added by the planner.
* This flag is not used during parsing (except in transformLockingClause,
* q.v.); the parser now uses a separate "namespace" data structure to
* control visibility. But it is needed by ruleutils.c to determine
- * whether RTEs should be shown in decompiled queries.
+ * whether RTEs should be shown in decompiled queries. It is used by the
+ * executor to determine that a given RTE_RELATION entry belongs to a table
+ * directly mentioned in the query or to a child table added by the planner.
+ * It needs to know that for the case where the child tables in a plan need
+ * to be locked.
*
* securityQuals is a list of security barrier quals (boolean expressions),
* to be tested in the listed order before returning a row from the
--
2.35.3
v42-0003-Delay-locking-of-child-tables-in-cached-plans-un.patchapplication/octet-stream; name=v42-0003-Delay-locking-of-child-tables-in-cached-plans-un.patchDownload
From 731074be2036f02d081fe7c731e21330b6b47b4a Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:45 +0900
Subject: [PATCH v42 3/4] Delay locking of child tables in cached plans until
ExecutorStart()
Currently, GetCachedPlan() takes a lock on all relations contained in
a cached plan before returning it as a valid plan to its callers for
execution. One disadvantage is that if the plan contains partitions
that are prunable with conditions involving EXTERN parameters and
other stable expressions (known as "initial pruning"), many of them
would be locked unnecessarily, because only those that survive
initial pruning need to have been locked. Locking all partitions this
way causes significant delay when there are many partitions. Note
that initial pruning occurs during executor's initialization of the
plan, that is, InitPlan().
This commit rearranges things to move the locking of child tables
referenced in a cached plan to occur during InitPlan() so that
initial pruning can eliminate any child tables that need not be
scanned and thus locked.
To determine that a given table is a child table,
ExecGetRangeTableRelation() now looks at the RTE's inFromCl field,
which is only true for tables that are directly mentioned in the
query but false for child tables. Note that any tables whose RTEs'
inFromCl is true would already have been locked by GetCachedPlan(),
so need not be locked again during execution.
If the locking of child tables causes the CachedPlan to go stale, that
is, its is_valid set to false by PlanCacheRelCallback() when an
invalidation message matching some child table contained in the plan
is processed, ExecInitNode() abandons the initialization of the
remaining nodes in the plan tree. In that case, InitPlan() returns
after setting QueryDesc.planstate to NULL to indicate to the caller
that no execution is possible with the plan tree as is. Though some
plan tree subnodes may get fully initialized by ExecInitNode() before
the CachedPlan's invalidation is detected, so to ensure that they
are released by ExecEndPlan(), ExecInitNode() now adds the PlanState
nodes of the nodes that are fully initialized to a new List in
EState called es_inited_plannodes. ExecEndPlan() releases them
individually by calling ExecEndNode() on each element of the new
List. ExecEndNode() is no longer recursive, because all nodes that
need to be closed can be found in es_inited_plannodes.
Call sites that use GetCachedPlan() to get the plan trees to pass to
the executor should now be prepared to handle the case where the old
CachedPlan gets invalidated during ExecutorStart() as described
above. So this commit refactors the relevant code sites to move the
ExecutorStart() call closer to the GetCachedPlan() to implement the
replan loop conveniently.
Given this new behavior, PortalStart() now must always perform
ExecutorStart() to be able to drop and recreate cached plans if
needed, which is currently only done so for single-query portals.
For multi-query portals, the QueryDescs that are now created during
PortalStart() are remembered in a new List field of Portal called
'qdescs' and allocated in a new memory context 'queryContext'.
PortalRunMulti() now simply performs ExecutorRun() on the
QueryDescs found in 'qdescs'.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
contrib/postgres_fdw/postgres_fdw.c | 4 +
src/backend/commands/copyto.c | 4 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 145 +++++---
src/backend/commands/extension.c | 2 +
src/backend/commands/matview.c | 3 +-
src/backend/commands/portalcmds.c | 16 +-
src/backend/commands/prepare.c | 32 +-
src/backend/executor/README | 41 ++-
src/backend/executor/execMain.c | 106 +++++-
src/backend/executor/execParallel.c | 12 +-
src/backend/executor/execPartition.c | 14 +
src/backend/executor/execProcnode.c | 50 ++-
src/backend/executor/execUtils.c | 63 +++-
src/backend/executor/functions.c | 2 +
src/backend/executor/nodeAgg.c | 6 +-
src/backend/executor/nodeAppend.c | 48 ++-
src/backend/executor/nodeBitmapAnd.c | 31 +-
src/backend/executor/nodeBitmapHeapscan.c | 9 +-
src/backend/executor/nodeBitmapIndexscan.c | 9 +-
src/backend/executor/nodeBitmapOr.c | 31 +-
src/backend/executor/nodeCustom.c | 2 +
src/backend/executor/nodeForeignscan.c | 8 +-
src/backend/executor/nodeGather.c | 4 +-
src/backend/executor/nodeGatherMerge.c | 3 +-
src/backend/executor/nodeGroup.c | 7 +-
src/backend/executor/nodeHash.c | 10 +-
src/backend/executor/nodeHashjoin.c | 10 +-
src/backend/executor/nodeIncrementalSort.c | 7 +-
src/backend/executor/nodeIndexonlyscan.c | 11 +-
src/backend/executor/nodeIndexscan.c | 11 +-
src/backend/executor/nodeLimit.c | 3 +-
src/backend/executor/nodeLockRows.c | 3 +-
src/backend/executor/nodeMaterial.c | 7 +-
src/backend/executor/nodeMemoize.c | 7 +-
src/backend/executor/nodeMergeAppend.c | 47 ++-
src/backend/executor/nodeMergejoin.c | 10 +-
src/backend/executor/nodeModifyTable.c | 12 +-
src/backend/executor/nodeNestloop.c | 10 +-
src/backend/executor/nodeProjectSet.c | 7 +-
src/backend/executor/nodeRecursiveunion.c | 10 +-
src/backend/executor/nodeResult.c | 7 +-
src/backend/executor/nodeSamplescan.c | 2 +
src/backend/executor/nodeSeqscan.c | 2 +
src/backend/executor/nodeSetOp.c | 4 +-
src/backend/executor/nodeSort.c | 7 +-
src/backend/executor/nodeSubqueryscan.c | 7 +-
src/backend/executor/nodeTidrangescan.c | 2 +
src/backend/executor/nodeTidscan.c | 2 +
src/backend/executor/nodeUnique.c | 4 +-
src/backend/executor/nodeWindowAgg.c | 6 +-
src/backend/executor/spi.c | 49 ++-
src/backend/storage/lmgr/lmgr.c | 45 +++
src/backend/tcop/postgres.c | 13 +-
src/backend/tcop/pquery.c | 340 +++++++++---------
src/backend/utils/cache/lsyscache.c | 21 ++
src/backend/utils/cache/plancache.c | 149 +++-----
src/backend/utils/mmgr/portalmem.c | 9 +
src/include/commands/explain.h | 7 +-
src/include/executor/execdesc.h | 5 +
src/include/executor/executor.h | 13 +
src/include/nodes/execnodes.h | 6 +
src/include/storage/lmgr.h | 1 +
src/include/utils/lsyscache.h | 1 +
src/include/utils/plancache.h | 14 +
src/include/utils/portal.h | 4 +
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 63 +++-
.../expected/cached-plan-replan.out | 156 ++++++++
.../specs/cached-plan-replan.spec | 61 ++++
70 files changed, 1233 insertions(+), 589 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-replan.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-replan.spec
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index c5cada55fb..1edd4c3f17 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2658,7 +2658,11 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
/* Get info about foreign table. */
rtindex = node->resultRelInfo->ri_RangeTableIndex;
if (fsplan->scan.scanrelid == 0)
+ {
dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (!ExecPlanStillValid(estate))
+ return;
+ }
else
dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9e4b2437a5..8244194681 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,8 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL,
+ pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -569,6 +570,7 @@ BeginCopyTo(ParseState *pstate,
* ExecutorStart computes a result tupdesc for us
*/
ExecutorStart(cstate->queryDesc, 0);
+ Assert(cstate->queryDesc->plan_valid);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index e91920ca14..18b07c0200 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..b1ea45ef2c 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -393,6 +393,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
else
{
PlannedStmt *plan;
+ QueryDesc *queryDesc;
instr_time planstart,
planduration;
BufferUsage bufusage_start,
@@ -415,12 +416,90 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
+ queryDesc = ExplainQueryDesc(plan, NULL, queryString, into, es,
+ params, queryEnv);
+ Assert(queryDesc);
+
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(queryDesc, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
+/*
+ * ExplainQueryDesc
+ * Set up QueryDesc for EXPLAINing a given plan
+ *
+ * This returns NULL if cplan is found to be no longer valid.
+ */
+QueryDesc *
+ExplainQueryDesc(PlannedStmt *stmt, CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv)
+{
+ QueryDesc *queryDesc;
+ DestReceiver *dest;
+ int eflags;
+ int instrument_option = 0;
+
+ /*
+ * Normally we discard the query's output, but if explaining CREATE TABLE
+ * AS, we'd better use the appropriate tuple receiver.
+ */
+ if (into)
+ dest = CreateIntoRelDestReceiver(into);
+ else
+ dest = None_Receiver;
+
+ if (es->analyze && es->timing)
+ instrument_option |= INSTRUMENT_TIMER;
+ else if (es->analyze)
+ instrument_option |= INSTRUMENT_ROWS;
+
+ if (es->buffers)
+ instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
+
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+
+ /* Create a QueryDesc for the query */
+ queryDesc = CreateQueryDesc(stmt, cplan, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv, instrument_option);
+
+ /* Select execution options */
+ if (es->analyze)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_EXPLAIN_ONLY;
+ if (es->generic)
+ eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
+ if (into)
+ eflags |= GetIntoRelEFlags(into);
+
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated as we're doing that.
+ */
+ ExecutorStart(queryDesc, eflags);
+ if (!queryDesc->plan_valid)
+ {
+ /* Clean up. */
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ return NULL;
+ }
+
+ return queryDesc;
+}
+
/*
* ExplainOneUtility -
* print out the execution plan for one utility statement
@@ -524,29 +603,16 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
{
- DestReceiver *dest;
- QueryDesc *queryDesc;
instr_time starttime;
double totaltime = 0;
- int eflags;
- int instrument_option = 0;
-
- Assert(plannedstmt->commandType != CMD_UTILITY);
- if (es->analyze && es->timing)
- instrument_option |= INSTRUMENT_TIMER;
- else if (es->analyze)
- instrument_option |= INSTRUMENT_ROWS;
-
- if (es->buffers)
- instrument_option |= INSTRUMENT_BUFFERS;
- if (es->wal)
- instrument_option |= INSTRUMENT_WAL;
+ Assert(queryDesc->plannedstmt->commandType != CMD_UTILITY);
/*
* We always collect timing for the entire statement, even when node-level
@@ -555,40 +621,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
- /*
- * Normally we discard the query's output, but if explaining CREATE TABLE
- * AS, we'd better use the appropriate tuple receiver.
- */
- if (into)
- dest = CreateIntoRelDestReceiver(into);
- else
- dest = None_Receiver;
-
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
-
- /* Select execution options */
- if (es->analyze)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_EXPLAIN_ONLY;
- if (es->generic)
- eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
- if (into)
- eflags |= GetIntoRelEFlags(into);
-
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
-
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -4865,6 +4897,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 9a2ee1c600..72d60d9c4f 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -797,11 +797,13 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
ExecutorStart(qdesc, 0);
+ Assert(qdesc->plan_valid);
ExecutorRun(qdesc, ForwardScanDirection, 0, true);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ac2e74fa3f..a64880b719 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,12 +408,13 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, 0);
+ Assert(queryDesc->plan_valid);
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 73ed7aa2f0..4abbec054b 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -146,6 +146,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
PortalStart(portal, params, 0, GetActiveSnapshot());
Assert(portal->strategy == PORTAL_ONE_SELECT);
+ Assert(portal->plan_valid);
/*
* We're done; the query won't actually be run until PerformPortalFetch is
@@ -249,6 +250,17 @@ PerformPortalClose(const char *name)
PortalDrop(portal, false);
}
+/*
+ * Release a portal's QueryDesc.
+ */
+void
+PortalQueryFinish(QueryDesc *queryDesc)
+{
+ ExecutorFinish(queryDesc);
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+}
+
/*
* PortalCleanup
*
@@ -295,9 +307,7 @@ PortalCleanup(Portal portal)
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
- FreeQueryDesc(queryDesc);
+ PortalQueryFinish(queryDesc);
CurrentResourceOwner = saveResourceOwner;
}
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..c9070ed97f 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -183,6 +183,7 @@ ExecuteQuery(ParseState *pstate,
paramLI = EvaluateParams(pstate, entry, stmt->params, estate);
}
+replan:
/* Create a new portal to run the query in */
portal = CreateNewPortal();
/* Don't display the portal in pg_cursors, it is for internal use only */
@@ -251,10 +252,19 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal contains a cached plan, it
+ * must be recreated if portal->plan_valid is false which tells that the
+ * cached plan was found to have been invalidated when initializing one of
+ * the plan trees contained in it.
*/
PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
(void) PortalRun(portal, count, false, true, dest, dest, qc);
PortalDrop(portal, false);
@@ -574,7 +584,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -618,6 +628,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI,
CurrentResourceOwner, queryEnv);
@@ -639,8 +650,21 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ {
+ QueryDesc *queryDesc;
+
+ queryDesc = ExplainQueryDesc(pstmt, cplan, queryString,
+ into, es, paramLI, queryEnv);
+ if (queryDesc == NULL)
+ {
+ ExplainResetOutput(es);
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ goto replan;
+ }
+ ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
+ }
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..fcbe266f8a 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,39 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Normally, the executor does not lock non-index relations appearing in a given
+plan tree when initializing it for execution if the plan tree is freshly
+created, that is, not derived from a CachedPlan. The reason for that is that
+the locks must already have been taken during parsing, rewriting, and planning
+of the query in that case. If the plan tree is a cached one, there may still
+be unlocked relations present in the plan tree, because GetCachedPlan() only
+locks the relations that would be present in the query's range table before
+planning occurs, but not relations that would have been added to the range
+table during planning. This means that inheritance child tables present in
+a cached plan, which are added to the query's range table during planning,
+would not have been locked when the plan enters the executor.
+
+GetCachedPlan() punts on locking child tables because not all may actually be
+scanned during a given execution of the plan if the child tables are partitions
+which may get pruned away due to executor-initialization-time pruning. So the
+locking of child tables is made to wait till execution-initialization-time,
+which occurs during ExecInitNode() on the plan nodes containing the child
+tables.
+
+So, there's a time window during which a cached plan tree could go stale
+if it contains child tables, because they could get changed in other backends
+before ExecInitNode() gets a lock on them. This means the executor now must
+check the validity of the plan tree every time it takes a lock on a child
+table contained in the tree (after executor-initialization-pruning, if any,
+has been performed), which it does by looking at CachedPlan.is_valid of the
+CachedPlan passed to it. If the plan tree is indeed stale (is_valid=false),
+the executor must give up continuing to initialize it any further and return
+to the caller letting it know that the execution must be retried with a new
+plan tree.
+
Query Processing Control Flow
-----------------------------
@@ -310,12 +343,18 @@ This is a sketch of control flow for full query processing:
AfterTriggerEndQuery
ExecutorEnd
- ExecEndNode --- recursively releases resources
+ ExecEndNode --- releases resources
FreeExecutorState
frees per-query context and child contexts
FreeQueryDesc
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale during one of the recursive calls of ExecInitNode() after taking a
+lock on a child table, the control is immmediately returned to the caller of
+ExecutorStart(), which must redo the steps from CreateQueryDesc with a new
+plan tree.
+
Per above comments, it's not really critical for ExecEndNode to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4c5a7bbf62..a2f6ac9d1c 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -620,6 +620,17 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
RTEPermissionInfo *perminfo = lfirst_node(RTEPermissionInfo, l);
Assert(OidIsValid(perminfo->relid));
+
+ /*
+ * Relations whose permissions need to be checked must already have
+ * been locked by the parser or by GetCachedPlan() if a cached plan is
+ * being executed.
+ *
+ * XXX shouldn't we skip calling ExecCheckPermissions from InitPlan
+ * in a parallel worker?
+ */
+ Assert(CheckRelLockedByMe(perminfo->relid, AccessShareLock, true) ||
+ IsParallelWorker());
result = ExecCheckOneRelPerms(perminfo);
if (!result)
{
@@ -829,6 +840,23 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ *
+ * Normally, the plan tree given in queryDesc->plannedstmt is known to be
+ * valid in a race-free manner, that is, all relations contained in
+ * plannedstmt->relationOids would have already been locked. That is not the
+ * case however if the plannedstmt comes from a CachedPlan, one given in
+ * queryDesc->cplan. That's because GetCachedPlan() only locks the tables
+ * that are mentioned in the original query but not the child tables, which
+ * would have been added to the plan by the planner. In that case, locks on
+ * child tables will be taken when initializing their Scan nodes in
+ * ExecInitNode() to be done here. If the CachedPlan gets invalidated as
+ * those locks are taken, plan tree initialization is suspended at the point
+ * where the invalidation is first detected, queryDesc->planstate will be set
+ * to NULL, and queryDesc->plan_valid to false. Callers must retry the
+ * execution after creating a new CachedPlan in that case, after properly
+ * releasing the resources of this QueryDesc, which includes calling
+ * ExecutorFinish() and ExecutorEnd() on the EState contained therein.
* ----------------------------------------------------------------
*/
static void
@@ -839,7 +867,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
- PlanState *planstate;
+ PlanState *planstate = NULL;
TupleDesc tupType;
ListCell *l;
int i;
@@ -850,10 +878,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
/*
- * initialize the node's execution state
+ * Set up range table in EState.
*/
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+ estate->es_cachedplan = queryDesc->cplan;
estate->es_plannedstmt = plannedstmt;
/*
@@ -886,6 +915,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (!ExecPlanStillValid(estate))
+ goto plan_init_suspended;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -953,6 +984,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
sp_eflags |= EXEC_FLAG_REWIND;
subplanstate = ExecInitNode(subplan, estate, sp_eflags);
+ if (!ExecPlanStillValid(estate))
+ {
+ Assert(subplanstate == NULL);
+ goto plan_init_suspended;
+ }
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
@@ -966,6 +1002,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ {
+ Assert(planstate == NULL);
+ goto plan_init_suspended;
+ }
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -1008,7 +1049,19 @@ InitPlan(QueryDesc *queryDesc, int eflags)
}
queryDesc->tupDesc = tupType;
+ Assert(planstate != NULL);
queryDesc->planstate = planstate;
+ queryDesc->plan_valid = true;
+ return;
+
+plan_init_suspended:
+ /*
+ * Plan initialization failed. Mark QueryDesc as such. ExecEndPlan()
+ * will clean up initialized plan nodes from estate->es_inited_plannodes.
+ */
+ Assert(planstate == NULL);
+ queryDesc->planstate = NULL;
+ queryDesc->plan_valid = false;
}
/*
@@ -1426,7 +1479,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked by the planner or ExecLockAppendNonLeafRelations().
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
@@ -1504,18 +1557,15 @@ ExecEndPlan(PlanState *planstate, EState *estate)
ListCell *l;
/*
- * shut down the node-type-specific query processing
+ * Shut down the node-type-specific query processing for all nodes that
+ * were initialized during InitPlan(), both in the main plan tree and those
+ * in subplans (es_subplanstates), if any.
*/
- ExecEndNode(planstate);
-
- /*
- * for subplans too
- */
- foreach(l, estate->es_subplanstates)
+ foreach(l, estate->es_inited_plannodes)
{
- PlanState *subplanstate = (PlanState *) lfirst(l);
+ PlanState *pstate = (PlanState *) lfirst(l);
- ExecEndNode(subplanstate);
+ ExecEndNode(pstate);
}
/*
@@ -2858,7 +2908,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2945,6 +2996,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
PlanState *subplanstate;
subplanstate = ExecInitNode(subplan, rcestate, 0);
+
+ /*
+ * At this point, we had better not received any new invalidation
+ * messages that would have caused the plan tree to go stale.
+ */
+ Assert(ExecPlanStillValid(rcestate) && subplanstate);
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
}
@@ -2988,6 +3045,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
*/
epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
+ /*
+ * At this point, we had better not received any new invalidation messages
+ * that would have caused the plan tree to go stale.
+ */
+ Assert(ExecPlanStillValid(rcestate) && epqstate->recheckplanstate);
+
MemoryContextSwitchTo(oldcontext);
}
@@ -3010,6 +3073,10 @@ EvalPlanQualEnd(EPQState *epqstate)
MemoryContext oldcontext;
ListCell *l;
+ /* Nothing to do if EvalPlanQualInit() wasn't done to begin with. */
+ if (epqstate->parentestate == NULL)
+ return;
+
rtsize = epqstate->parentestate->es_range_table_size;
/*
@@ -3030,13 +3097,16 @@ EvalPlanQualEnd(EPQState *epqstate)
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
- ExecEndNode(epqstate->recheckplanstate);
-
- foreach(l, estate->es_subplanstates)
+ /*
+ * Shut down the node-type-specific query processing for all nodes that
+ * were initialized during EvalPlanQualStart(), both in the main plan tree
+ * and those in subplans (es_subplanstates), if any.
+ */
+ foreach(l, estate->es_inited_plannodes)
{
- PlanState *subplanstate = (PlanState *) lfirst(l);
+ PlanState *planstate = (PlanState *) lfirst(l);
- ExecEndNode(subplanstate);
+ ExecEndNode(planstate);
}
/* throw away the per-estate tuple table, some node may have used it */
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index cc2b8ccab7..42df7b6428 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1248,8 +1248,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. Note that no CachedPlan is available
+ * here even if the leader may have gotten the plan tree from one. That's
+ * fine though, because the leader would have taken the locks necessary
+ * for the plan tree that we have here to be fully valid. That is true
+ * despite the fact that we will be taking our own copies of those locks
+ * in ExecGetRangeTableRelation(), because none of them would be the locks
+ * that are not already taken by the leader.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
@@ -1431,6 +1440,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
ExecutorStart(queryDesc, fpes->eflags);
+ Assert(queryDesc->plan_valid);
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index eb8a87fd63..cf73d28baa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -513,6 +513,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
oldcxt = MemoryContextSwitchTo(proute->memcxt);
+ /*
+ * Note that while we normally check ExecPlanStillValid(estate) after each
+ * lock taken during execution initialization, it is fine not do so for
+ * partitions opened here, for tuple routing. Locks taken here can't
+ * possibly invalidate the plan given that the plan doesn't contain any
+ * info about those partitions.
+ */
partrel = table_open(partOid, RowExclusiveLock);
leaf_part_rri = makeNode(ResultRelInfo);
@@ -1111,6 +1118,9 @@ ExecInitPartitionDispatchInfo(EState *estate,
* Only sub-partitioned tables need to be locked here. The root
* partitioned table will already have been locked as it's referenced in
* the query's rtable.
+ *
+ * See the comment in ExecInitPartitionInfo() about taking locks and
+ * not checking ExecPlanStillValid(estate) here.
*/
if (partoid != RelationGetRelid(proute->partition_root))
rel = table_open(partoid, RowExclusiveLock);
@@ -1801,6 +1811,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/* Create the working data structure for pruning */
prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Perform an initial partition prune pass, if required.
@@ -1927,6 +1939,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d4..f3bb1d4591 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -135,7 +135,17 @@ static bool ExecShutdownNode_walker(PlanState *node, void *context);
* 'estate' is the shared execution state for the plan tree
* 'eflags' is a bitwise OR of flag bits described in executor.h
*
- * Returns a PlanState node corresponding to the given Plan node.
+ * Returns a PlanState node corresponding to the given Plan node or NULL.
+ *
+ * NULL may be returned either if the input node is NULL or if the plan
+ * tree that the node is a part of is found to have been invalidated when
+ * taking a lock on the relation mentioned in the node or in a child
+ * node. The latter case arises if the plan tree contains inheritance/
+ * partition child tables and is from a CachedPlan.
+ *
+ * Also, all non-NULL PlanState nodes are added to
+ * estate->es_inited_plannodes for ExecEndPlan() to iterate over to close
+ * each one using ExecEndNode().
* ------------------------------------------------------------------------
*/
PlanState *
@@ -388,6 +398,13 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
break;
}
+ if (!ExecPlanStillValid(estate))
+ {
+ Assert(result == NULL);
+ return NULL;
+ }
+
+ Assert(result != NULL);
ExecSetExecProcNode(result, result->ExecProcNode);
/*
@@ -411,6 +428,13 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
result->instrument = InstrAlloc(1, estate->es_instrument,
result->async_capable);
+ /*
+ * Remember valid PlanState nodes in EState for the processing in
+ * ExecEndPlan().
+ */
+ estate->es_inited_plannodes = lappend(estate->es_inited_plannodes,
+ result);
+
return result;
}
@@ -545,29 +569,21 @@ MultiExecProcNode(PlanState *node)
/* ----------------------------------------------------------------
* ExecEndNode
*
- * Recursively cleans up all the nodes in the plan rooted
- * at 'node'.
+ * Cleans up node
*
- * After this operation, the query plan will not be able to be
- * processed any further. This should be called only after
+ * Child nodes, if any, would have been closed by the caller, so the
+ * ExecEnd* routine for a given node type is only responsible for
+ * cleaning up the resources local to that node.
+ *
+ * After this operation, the query plan containing this node will not be
+ * able to be processed any further. This should be called only after
* the query plan has been fully executed.
* ----------------------------------------------------------------
*/
void
ExecEndNode(PlanState *node)
{
- /*
- * do nothing when we get to the end of a leaf on tree.
- */
- if (node == NULL)
- return;
-
- /*
- * Make sure there's enough stack available. Need to check here, in
- * addition to ExecProcNode() (via ExecProcNodeFirst()), because it's not
- * guaranteed that ExecProcNode() is reached for all nodes.
- */
- check_stack_depth();
+ Assert(node != NULL);
if (node->chgParam != NULL)
{
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c06b228858..af92d2b3c3 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -804,7 +804,25 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (IsParallelWorker() ||
+ (estate->es_cachedplan != NULL && !rte->inFromCl))
+ {
+ /*
+ * Take a lock if we are a parallel worker or if this is a child
+ * table referenced in a cached plan.
+ *
+ * Parallel workers need to have their own local lock on the
+ * relation. This ensures sane behavior in case the parent process
+ * exits before we do.
+ *
+ * When executing a cached plan, child tables must be locked
+ * here, because plancache.c (GetCachedPlan()) would only have
+ * locked tables mentioned in the query, that is, tables whose
+ * RTEs' inFromCl is true.
+ */
+ rel = table_open(rte->relid, rte->rellockmode);
+ }
+ else
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -817,15 +835,6 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rellockmode == AccessShareLock ||
CheckRelationLockedByMe(rel, rte->rellockmode, false));
}
- else
- {
- /*
- * If we are a parallel worker, we need to obtain our own local
- * lock on the relation. This ensures sane behavior in case the
- * parent process exits before we do.
- */
- rel = table_open(rte->relid, rte->rellockmode);
- }
estate->es_relations[rti - 1] = rel;
}
@@ -833,6 +842,38 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
return rel;
}
+/*
+ * ExecLockAppendNonLeafRelations
+ * Lock non-leaf relations whose children are scanned by a given
+ * Append/MergeAppend node
+ */
+void
+ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids)
+{
+ ListCell *l;
+
+ /* This should get called only when executing cached plans. */
+ Assert(estate->es_cachedplan != NULL);
+ foreach(l, allpartrelids)
+ {
+ Bitmapset *partrelids = lfirst_node(Bitmapset, l);
+ int i;
+
+ /*
+ * Note that we don't lock the first member (i=0) of each bitmapset
+ * because it stands for the root parent mentioned in the query that
+ * should always have been locked before entering the executor.
+ */
+ i = 0;
+ while ((i = bms_next_member(partrelids, i)) > 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(i, estate);
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+}
+
/*
* ExecInitResultRelation
* Open relation given by the passed-in RT index and fill its
@@ -848,6 +889,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (!ExecPlanStillValid(estate))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f55424eb5a..c88f72bc4e 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -838,6 +838,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL, /* fmgr_sql() doesn't use CachedPlans */
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -863,6 +864,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
eflags = 0; /* default run-to-completion flags */
ExecutorStart(es->qd, eflags);
+ Assert(es->qd->plan_valid);
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 468db94fe5..54f742820b 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3304,6 +3304,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize source tuple type.
@@ -4304,7 +4306,6 @@ GetAggInitVal(Datum textInitVal, Oid transtype)
void
ExecEndAgg(AggState *node)
{
- PlanState *outerPlan;
int transno;
int numGroupingSets = Max(node->maxsets, 1);
int setno;
@@ -4366,9 +4367,6 @@ ExecEndAgg(AggState *node)
/* clean up tuple table */
ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
}
void
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..a6dadb7d07 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -133,6 +133,27 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_syncdone = false;
appendstate->as_begun = false;
+ /*
+ * Must take locks on child tables if running a cached plan, because
+ * GetCachedPlan() would've only locked the root parent named in the
+ * query.
+ *
+ * First lock non-leaf partitions before doing pruning if any. Even when
+ * no pruning is to be done, non-leaf partitions still must be locked
+ * explicitly like this, because they're not referenced elsewhere in
+ * the plan tree. XXX - OTOH, non-leaf partitions mentioned in
+ * part_prune_info, if any, would be opened by ExecInitPartitionPruning()
+ * using ExecGetRangeTableRelation() which locks child tables, redundantly
+ * in this case.
+ */
+ if (estate->es_cachedplan)
+ {
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
+ }
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
@@ -147,6 +168,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
list_length(node->appendplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -221,6 +244,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
appendstate->as_first_partial_plan = firstvalid;
@@ -376,30 +401,15 @@ ExecAppend(PlanState *pstate)
/* ----------------------------------------------------------------
* ExecEndAppend
- *
- * Shuts down the subscans of the append node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndAppend(AppendState *node)
{
- PlanState **appendplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- appendplans = node->appendplans;
- nplans = node->as_nplans;
-
- /*
- * shut down each of the subscans
- */
- for (i = 0; i < nplans; i++)
- ExecEndNode(appendplans[i]);
+ /*
+ * Nothing to do as subscans of the append node would be cleaned up by
+ * ExecEndPlan().
+ */
}
void
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4c5eb2b23b..187aea4bb8 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -88,8 +88,9 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
foreach(l, node->bitmapplans)
{
initNode = (Plan *) lfirst(l);
- bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
- i++;
+ bitmapplanstates[i++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
/*
@@ -168,33 +169,15 @@ MultiExecBitmapAnd(BitmapAndState *node)
/* ----------------------------------------------------------------
* ExecEndBitmapAnd
- *
- * Shuts down the subscans of the BitmapAnd node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndBitmapAnd(BitmapAndState *node)
{
- PlanState **bitmapplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- bitmapplans = node->bitmapplans;
- nplans = node->nplans;
-
- /*
- * shut down each of the subscans (that we've initialized)
- */
- for (i = 0; i < nplans; i++)
- {
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
- }
+ /*
+ * Nothing to do as any subscans that would have been initialized would
+ * be cleaned up by ExecEndPlan().
+ */
}
void
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index f35df0b8bf..ee1008519b 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -667,11 +667,6 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
ExecClearTuple(node->ss.ss_ScanTupleSlot);
- /*
- * close down subplans
- */
- ExecEndNode(outerPlanState(node));
-
/*
* release bitmaps and buffers if any
*/
@@ -763,11 +758,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 83ec9ede89..99015812a1 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -211,6 +211,7 @@ BitmapIndexScanState *
ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
{
BitmapIndexScanState *indexstate;
+ Relation indexRelation;
LOCKMODE lockmode;
/* check for unsupported flags */
@@ -262,7 +263,13 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
+ indexRelation = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ {
+ index_close(indexRelation, lockmode);
+ return NULL;
+ }
+ indexstate->biss_RelationDesc = indexRelation;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 0bf8af9652..3f51918fe1 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -89,8 +89,9 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
foreach(l, node->bitmapplans)
{
initNode = (Plan *) lfirst(l);
- bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
- i++;
+ bitmapplanstates[i++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
/*
@@ -186,33 +187,15 @@ MultiExecBitmapOr(BitmapOrState *node)
/* ----------------------------------------------------------------
* ExecEndBitmapOr
- *
- * Shuts down the subscans of the BitmapOr node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndBitmapOr(BitmapOrState *node)
{
- PlanState **bitmapplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- bitmapplans = node->bitmapplans;
- nplans = node->nplans;
-
- /*
- * shut down each of the subscans (that we've initialized)
- */
- for (i = 0; i < nplans; i++)
- {
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
- }
+ /*
+ * Nothing to do as any subscans that would have been initialized would
+ * be cleaned up by ExecEndPlan().
+ */
}
void
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index bd42c65b29..91239cc500 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -61,6 +61,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
css->ss.ss_currentRelation = scan_rel;
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index c2139acca0..207165f44f 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -173,6 +173,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -264,6 +266,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Tell the FDW to initialize the scan.
@@ -309,10 +313,6 @@ ExecEndForeignScan(ForeignScanState *node)
else
node->fdwroutine->EndForeignScan(node);
- /* Shut down any outer plan. */
- if (outerPlanState(node))
- ExecEndNode(outerPlanState(node));
-
/* Free the exprcontext */
ExecFreeExprContext(&node->ss.ps);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 307fc10eea..400c8b42ed 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -89,6 +89,9 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
@@ -248,7 +251,6 @@ ExecGather(PlanState *pstate)
void
ExecEndGather(GatherState *node)
{
- ExecEndNode(outerPlanState(node)); /* let children clean up first */
ExecShutdownGather(node);
ExecFreeExprContext(&node->ps);
if (node->ps.ps_ResultTupleSlot)
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 9d5e1a46e9..9077c4bc55 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -108,6 +108,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Leader may access ExecProcNode result directly (if
@@ -288,7 +290,6 @@ ExecGatherMerge(PlanState *pstate)
void
ExecEndGatherMerge(GatherMergeState *node)
{
- ExecEndNode(outerPlanState(node)); /* let children clean up first */
ExecShutdownGatherMerge(node);
ExecFreeExprContext(&node->ps);
if (node->ps.ps_ResultTupleSlot)
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 25a1618952..976e739ab7 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -185,6 +185,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
@@ -226,15 +228,10 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
void
ExecEndGroup(GroupState *node)
{
- PlanState *outerPlan;
-
ExecFreeExprContext(&node->ss.ps);
/* clean up tuple table */
ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
}
void
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 8b5c35b82b..fc7a6b2ccc 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -386,6 +386,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize our result slot and type. No need to build projection
@@ -413,18 +415,10 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
void
ExecEndHash(HashState *node)
{
- PlanState *outerPlan;
-
/*
* free exprcontext
*/
ExecFreeExprContext(&node->ps);
-
- /*
- * shut down the subplan
- */
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
}
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 980746128b..4c4b39ce2d 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -752,8 +752,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
@@ -878,12 +882,6 @@ ExecEndHashJoin(HashJoinState *node)
ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
ExecClearTuple(node->hj_OuterTupleSlot);
ExecClearTuple(node->hj_HashTupleSlot);
-
- /*
- * clean up subtrees
- */
- ExecEndNode(outerPlanState(node));
- ExecEndNode(innerPlanState(node));
}
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 7683e3341c..5b11afeb96 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1041,6 +1041,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
@@ -1101,11 +1103,6 @@ ExecEndIncrementalSort(IncrementalSortState *node)
node->prefixsort_state = NULL;
}
- /*
- * Shut down the subplan.
- */
- ExecEndNode(outerPlanState(node));
-
SO_printf("ExecEndIncrementalSort: sort node shutdown\n");
}
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0b43a9b969..ea8bef4b97 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -490,6 +490,7 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
{
IndexOnlyScanState *indexstate;
Relation currentRelation;
+ Relation indexRelation;
LOCKMODE lockmode;
TupleDesc tupDesc;
@@ -512,6 +513,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -564,7 +567,13 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->ioss_RelationDesc = index_open(node->indexid, lockmode);
+ indexRelation = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ {
+ index_close(indexRelation, lockmode);
+ return NULL;
+ }
+ indexstate->ioss_RelationDesc = indexRelation;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 4540c7781d..956e9e5543 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -904,6 +904,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
{
IndexScanState *indexstate;
Relation currentRelation;
+ Relation indexRelation;
LOCKMODE lockmode;
/*
@@ -925,6 +926,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -969,7 +972,13 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+ indexRelation = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ {
+ index_close(indexRelation, lockmode);
+ return NULL;
+ }
+ indexstate->iss_RelationDesc = indexRelation;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 425fbfc405..1cc884bc65 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -476,6 +476,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize child expressions
@@ -535,7 +537,6 @@ void
ExecEndLimit(LimitState *node)
{
ExecFreeExprContext(&node->ps);
- ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index e459971d32..77731c0c8c 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -322,6 +322,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
@@ -386,7 +388,6 @@ ExecEndLockRows(LockRowsState *node)
{
/* We may have shut down EPQ already, but no harm in another call */
EvalPlanQualEnd(&node->lr_epqstate);
- ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 09632678b0..a38b9805a5 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result type and slot. No need to initialize projection info
@@ -250,11 +252,6 @@ ExecEndMaterial(MaterialState *node)
if (node->tuplestorestate != NULL)
tuplestore_end(node->tuplestorestate);
node->tuplestorestate = NULL;
-
- /*
- * shut down the subplan
- */
- ExecEndNode(outerPlanState(node));
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 4f04269e26..a8997ba7da 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -938,6 +938,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize return slot and type. No need to initialize projection info
@@ -1099,11 +1101,6 @@ ExecEndMemoize(MemoizeState *node)
* free exprcontext
*/
ExecFreeExprContext(&node->ss.ps);
-
- /*
- * shut down the subplan
- */
- ExecEndNode(outerPlanState(node));
}
void
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 21b5726e6e..8718f20825 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -81,6 +81,27 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.state = estate;
mergestate->ps.ExecProcNode = ExecMergeAppend;
+ /*
+ * Must take locks on child tables if running a cached plan, because
+ * GetCachedPlan() would've only locked the root parent named in the
+ * query.
+ *
+ * First lock non-leaf partitions before doing pruning if any. Even when
+ * no pruning is to be done, non-leaf partitions still must be locked
+ * explicitly like this, because they're not referenced elsewhere in
+ * the plan tree. XXX - OTOH, non-leaf partitions mentioned in
+ * part_prune_info, if any, would be opened by ExecInitPartitionPruning()
+ * using ExecGetRangeTableRelation() which locks child tables, redundantly
+ * in this case.
+ */
+ if (estate->es_cachedplan)
+ {
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
+ }
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
@@ -95,6 +116,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
list_length(node->mergeplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -151,6 +174,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
mergestate->ps.ps_ProjInfo = NULL;
@@ -310,30 +335,14 @@ heap_compare_slots(Datum a, Datum b, void *arg)
/* ----------------------------------------------------------------
* ExecEndMergeAppend
- *
- * Shuts down the subscans of the MergeAppend node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndMergeAppend(MergeAppendState *node)
{
- PlanState **mergeplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- mergeplans = node->mergeplans;
- nplans = node->ms_nplans;
-
- /*
- * shut down each of the subscans
- */
- for (i = 0; i < nplans; i++)
- ExecEndNode(mergeplans[i]);
+ /*
+ * Nothing to do as subscans would be cleaned up by ExecEndPlan().
+ */
}
void
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 00f96d045e..c6644c6816 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1490,11 +1490,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
@@ -1654,12 +1658,6 @@ ExecEndMergeJoin(MergeJoinState *node)
ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
ExecClearTuple(node->mj_MarkedTupleSlot);
- /*
- * shut down the subplans
- */
- ExecEndNode(innerPlanState(node));
- ExecEndNode(outerPlanState(node));
-
MJ1_printf("ExecEndMergeJoin: %s\n",
"node processing ended");
}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 2a5fec8d01..0c3aeb1154 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3984,6 +3984,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
node->epqParam, node->resultRelations);
@@ -4011,6 +4014,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
if (resultRelInfo != mtstate->rootResultRelInfo)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* For child result relations, store the root result relation
@@ -4038,6 +4043,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Do additional per-result-relation initialization.
@@ -4460,11 +4467,6 @@ ExecEndModifyTable(ModifyTableState *node)
* Terminate EPQ execution if active
*/
EvalPlanQualEnd(&node->mt_epqstate);
-
- /*
- * shut down subplan
- */
- ExecEndNode(outerPlanState(node));
}
void
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b3d52e69ec..71a1f8101c 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -295,11 +295,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result slot, type and projection.
@@ -374,12 +378,6 @@ ExecEndNestLoop(NestLoopState *node)
*/
ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- /*
- * close down subplans
- */
- ExecEndNode(outerPlanState(node));
- ExecEndNode(innerPlanState(node));
-
NL1_printf("ExecEndNestLoop: %s\n",
"node processing ended");
}
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index f6ff3dc44c..abcbd7e765 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -247,6 +247,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* we don't use inner plan
@@ -329,11 +331,6 @@ ExecEndProjectSet(ProjectSetState *node)
* clean out the tuple table
*/
ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
- /*
- * shut down subplans
- */
- ExecEndNode(outerPlanState(node));
}
void
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index e781003934..84a706458a 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
@@ -280,12 +284,6 @@ ExecEndRecursiveUnion(RecursiveUnionState *node)
MemoryContextDelete(node->tempContext);
if (node->tableContext)
MemoryContextDelete(node->tableContext);
-
- /*
- * close down subplans
- */
- ExecEndNode(outerPlanState(node));
- ExecEndNode(innerPlanState(node));
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 4219712d30..330ca68d12 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -208,6 +208,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* we don't use inner plan
@@ -249,11 +251,6 @@ ExecEndResult(ResultState *node)
* clean out the tuple table
*/
ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
- /*
- * shut down subplans
- */
- ExecEndNode(outerPlanState(node));
}
void
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index d7e22b1dbb..22357e7a0e 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -125,6 +125,8 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 4da0f28f7b..b0b34cd14e 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,8 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 4bc2406b89..912cf7b37f 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
@@ -589,8 +591,6 @@ ExecEndSetOp(SetOpState *node)
if (node->tableContext)
MemoryContextDelete(node->tableContext);
ExecFreeExprContext(&node->ps);
-
- ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index c6c72c6e67..1ba53373c2 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
@@ -317,11 +319,6 @@ ExecEndSort(SortState *node)
tuplesort_end((Tuplesortstate *) node->tuplesortstate);
node->tuplesortstate = NULL;
- /*
- * shut down the subplan
- */
- ExecEndNode(outerPlanState(node));
-
SO1_printf("ExecEndSort: %s\n",
"sort node shutdown");
}
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 42471bfc04..12014250ae 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
@@ -178,11 +180,6 @@ ExecEndSubqueryScan(SubqueryScanState *node)
if (node->ss.ps.ps_ResultTupleSlot)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
- /*
- * close down subquery
- */
- ExecEndNode(node->subplan);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 2124c55ef5..613b377c7c 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -386,6 +386,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 862bd0330b..1b0a2d8083 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -529,6 +529,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 45035d74fa..bd71033622 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -136,6 +136,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result slot and type. Unique nodes do no projections, so
@@ -172,8 +174,6 @@ ExecEndUnique(UniqueState *node)
ExecClearTuple(node->ps.ps_ResultTupleSlot);
ExecFreeExprContext(&node->ps);
-
- ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 310ac23e3a..483f23da18 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2458,6 +2458,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize source tuple type (which is also the tuple type that we'll
@@ -2681,7 +2683,6 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
void
ExecEndWindowAgg(WindowAggState *node)
{
- PlanState *outerPlan;
int i;
release_partition(node);
@@ -2713,9 +2714,6 @@ ExecEndWindowAgg(WindowAggState *node)
pfree(node->perfunc);
pfree(node->peragg);
-
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
}
/* -----------------
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 33975687b3..07b1f453e2 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -71,7 +71,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1623,6 +1623,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
_SPI_current->processed = 0;
_SPI_current->tuptable = NULL;
+replan:
/* Create the portal */
if (name == NULL || name[0] == '\0')
{
@@ -1766,7 +1767,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if portal->plan_valid is false which tells that the cached
+ * plan was found to have been invalidated when initializing one of the
+ * plan trees contained in it.
*/
PortalStart(portal, paramLI, 0, snapshot);
@@ -1775,6 +1779,12 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/* Pop the SPI stack */
_SPI_end_call(true);
@@ -2552,6 +2562,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2661,6 +2672,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2668,14 +2680,32 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+
+ ExecutorStart(qdesc, eflags);
+ if (!qdesc->plan_valid)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ Assert(cplan);
+ ReleaseCachedPlan(cplan, plan_owner);
+ goto replan;
+ }
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2850,10 +2880,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2897,14 +2926,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a672..c807e9cdcc 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -27,6 +27,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
@@ -364,6 +365,50 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
return false;
}
+/*
+ * CheckRelLockedByMe
+ *
+ * Returns true if current transaction holds a lock on the given relation of
+ * mode 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
+ */
+bool
+CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+ Oid dbId = get_rel_relisshared(relid) ? InvalidOid : MyDatabaseId;
+ LOCKTAG tag;
+
+ SET_LOCKTAG_RELATION(tag, dbId, relid);
+
+ if (LockHeldByMe(&tag, lockmode))
+ return true;
+
+ if (orstronger)
+ {
+ LOCKMODE slockmode;
+
+ for (slockmode = lockmode + 1;
+ slockmode <= MaxLockMode;
+ slockmode++)
+ {
+ if (LockHeldByMe(&tag, slockmode))
+ {
+#ifdef NOT_USED
+ /* Sometimes this might be useful for debugging purposes */
+ elog(WARNING, "lock mode %s substituted for %s on relation %s",
+ GetLockmodeName(tag.locktag_lockmethodid, slockmode),
+ GetLockmodeName(tag.locktag_lockmethodid, lockmode),
+ RelationGetRelationName(relation));
+#endif
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
/*
* LockHasWaitersRelation
*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 36cc99ec9c..160aef92f8 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1233,6 +1233,7 @@ exec_simple_query(const char *query_string)
* Start the portal. No parameters here.
*/
PortalStart(portal, NULL, 0, InvalidSnapshot);
+ Assert(portal->plan_valid);
/*
* Select the appropriate output format: text unless we are doing a
@@ -1737,6 +1738,7 @@ exec_bind_message(StringInfo input_message)
"commands ignored until end of transaction block"),
errdetail_abort()));
+replan:
/*
* Create the portal. Allow silent replacement of an existing portal only
* if the unnamed portal is specified.
@@ -2028,10 +2030,19 @@ exec_bind_message(StringInfo input_message)
PopActiveSnapshot();
/*
- * And we're ready to start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if portal->plan_valid is false which tells that the cached
+ * plan was found to have been invalidated when initializing one of the
+ * plan trees contained in it.
*/
PortalStart(portal, params, 0, InvalidSnapshot);
+ if (!portal->plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/*
* Apply the result format requests to the portal.
*/
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5565f200c3..dab971ab0f 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -35,12 +36,6 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc);
static void FillPortalStore(Portal portal, bool isTopLevel);
static uint64 RunFromStore(Portal portal, ScanDirection direction, uint64 count,
DestReceiver *dest);
@@ -65,6 +60,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +73,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan, if plan is from one */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -116,86 +113,6 @@ FreeQueryDesc(QueryDesc *qdesc)
}
-/*
- * ProcessQuery
- * Execute a single plannable query within a PORTAL_MULTI_QUERY,
- * PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
- *
- * plan: the plan tree for the query
- * sourceText: the source text of the query
- * params: any parameters needed
- * dest: where to send results
- * qc: where to store the command completion status data.
- *
- * qc may be NULL if caller doesn't want a status string.
- *
- * Must be called in a memory context that will be reset or deleted on
- * error; otherwise the executor's memory usage will be leaked.
- */
-static void
-ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc)
-{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
-
- /*
- * Run the plan to completion.
- */
- ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
-
- /*
- * Build command completion status data, if caller wants one.
- */
- if (qc)
- {
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
- SetQueryCompletion(qc, CMDTAG_SELECT, queryDesc->estate->es_processed);
- break;
- case CMD_INSERT:
- SetQueryCompletion(qc, CMDTAG_INSERT, queryDesc->estate->es_processed);
- break;
- case CMD_UPDATE:
- SetQueryCompletion(qc, CMDTAG_UPDATE, queryDesc->estate->es_processed);
- break;
- case CMD_DELETE:
- SetQueryCompletion(qc, CMDTAG_DELETE, queryDesc->estate->es_processed);
- break;
- case CMD_MERGE:
- SetQueryCompletion(qc, CMDTAG_MERGE, queryDesc->estate->es_processed);
- break;
- default:
- SetQueryCompletion(qc, CMDTAG_UNKNOWN, queryDesc->estate->es_processed);
- break;
- }
- }
-
- /*
- * Now, we close down all the scans and free allocated resources.
- */
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
-
- FreeQueryDesc(queryDesc);
-}
-
/*
* ChoosePortalStrategy
* Select portal execution strategy given the intended statement list.
@@ -427,7 +344,8 @@ FetchStatementTargetList(Node *stmt)
* to be used for cursors).
*
* On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * tupdesc (if any) is known, unless portal->plan_valid is set to false, in
+ * which case, the caller must retry after generating a new CachedPlan.
*/
void
PortalStart(Portal portal, ParamListInfo params,
@@ -435,10 +353,9 @@ PortalStart(Portal portal, ParamListInfo params,
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
- int myeflags;
+ int myeflags = 0;
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_DEFINED);
@@ -448,15 +365,13 @@ PortalStart(Portal portal, ParamListInfo params,
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -472,6 +387,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -493,6 +410,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -501,30 +419,52 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti(). */
+ if (portal->strategy == PORTAL_ONE_RETURNING ||
+ portal->strategy == PORTAL_ONE_MOD_WITH)
+ portal->qdescs = list_make1(queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Call ExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated as we're doing that.
*/
ExecutorStart(queryDesc, myeflags);
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ PopActiveSnapshot();
+ portal->plan_valid = false;
+ goto early_exit;
+ }
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -532,33 +472,11 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -578,11 +496,87 @@ PortalStart(Portal portal, ParamListInfo params,
portal->atStart = true;
portal->atEnd = false; /* allow fetches */
portal->portalPos = 0;
+ portal->plan_valid = true;
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool first = true;
+
+ myeflags = eflags;
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /*
+ * Push the snapshot to be used by the executor.
+ */
+ if (!is_utility)
+ {
+ /*
+ * Must copy the snapshot for all statements
+ * except thec first as we'll need to update its
+ * command ID.
+ */
+ if (!first)
+ PushCopiedSnapshot(GetTransactionSnapshot());
+ else
+ PushActiveSnapshot(GetTransactionSnapshot());
+ }
+
+ /*
+ * From the 2nd statement onwards, update the command
+ * ID and the snapshot to match.
+ */
+ if (!first)
+ {
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+ }
+
+ first = false;
+
+ /*
+ * Create the QueryDesc object. DestReceiver will
+ * be set in PortalRunMulti().
+ */
+ queryDesc = CreateQueryDesc(plan, portal->cplan,
+ portal->sourceText,
+ !is_utility ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ if (is_utility)
+ continue;
+
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated as
+ * we're doing that.
+ */
+ ExecutorStart(queryDesc, myeflags);
+ PopActiveSnapshot();
+ if (!queryDesc->plan_valid)
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ portal->plan_valid = false;
+ goto early_exit;
+ }
+ }
+ }
+
portal->tupDesc = NULL;
+ portal->plan_valid = true;
break;
}
}
@@ -594,19 +588,18 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+early_exit:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
-
- portal->status = PORTAL_READY;
}
/*
@@ -1193,7 +1186,7 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1207,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = (QueryDesc *) lfirst(qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1233,33 +1227,26 @@ PortalRunMulti(Portal portal,
if (log_executor_stats)
ResetUsage();
- /*
- * Must always have a snapshot for plannable queries. First time
- * through, take a new snapshot; for subsequent queries in the
- * same portal, just update the snapshot's copy of the command
- * counter.
- */
+ /* Push the snapshot for plannable queries. */
if (!active_snapshot_set)
{
- Snapshot snapshot = GetTransactionSnapshot();
+ Snapshot snapshot = qdesc->snapshot;
- /* If told to, register the snapshot and save in portal */
+ /*
+ * If told to, register the snapshot and save in portal
+ *
+ * Note that the command ID of qdesc->snapshot for 2nd query
+ * onwards would have been updated in PortalStart() to account
+ * for CCI() done between queries, but it's OK that here we
+ * don't likewise update holdSnapshot's command ID.
+ */
if (setHoldSnapshot)
{
snapshot = RegisterSnapshot(snapshot);
portal->holdSnapshot = snapshot;
}
- /*
- * We can't have the holdSnapshot also be the active one,
- * because UpdateActiveSnapshotCommandId would complain. So
- * force an extra snapshot copy. Plain PushActiveSnapshot
- * would have copied the transaction snapshot anyway, so this
- * only adds a copy step when setHoldSnapshot is true. (It's
- * okay for the command ID of the active snapshot to diverge
- * from what holdSnapshot has.)
- */
- PushCopiedSnapshot(snapshot);
+ PushActiveSnapshot(snapshot);
/*
* As for PORTAL_ONE_SELECT portals, it does not seem
@@ -1268,26 +1255,39 @@ PortalRunMulti(Portal portal,
active_snapshot_set = true;
}
- else
- UpdateActiveSnapshotCommandId();
+ /*
+ * Run the plan to completion.
+ */
+ qdesc->dest = dest;
+ ExecutorRun(qdesc, ForwardScanDirection, 0, true);
+
+ /*
+ * Build command completion status data if needed.
+ */
if (pstmt->canSetTag)
{
- /* statement can set tag string */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- dest, qc);
- }
- else
- {
- /* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- altdest, NULL);
+ switch (qdesc->operation)
+ {
+ case CMD_SELECT:
+ SetQueryCompletion(qc, CMDTAG_SELECT, qdesc->estate->es_processed);
+ break;
+ case CMD_INSERT:
+ SetQueryCompletion(qc, CMDTAG_INSERT, qdesc->estate->es_processed);
+ break;
+ case CMD_UPDATE:
+ SetQueryCompletion(qc, CMDTAG_UPDATE, qdesc->estate->es_processed);
+ break;
+ case CMD_DELETE:
+ SetQueryCompletion(qc, CMDTAG_DELETE, qdesc->estate->es_processed);
+ break;
+ case CMD_MERGE:
+ SetQueryCompletion(qc, CMDTAG_MERGE, qdesc->estate->es_processed);
+ break;
+ default:
+ SetQueryCompletion(qc, CMDTAG_UNKNOWN, qdesc->estate->es_processed);
+ break;
+ }
}
if (log_executor_stats)
@@ -1342,12 +1342,12 @@ PortalRunMulti(Portal portal,
if (portal->stmts == NIL)
break;
- /*
- * Increment command counter between queries, but not after the last
- * one.
- */
- if (lnext(portal->stmts, stmtlist_item) != NULL)
- CommandCounterIncrement();
+ if (qdesc->estate)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ }
+ FreeQueryDesc(qdesc);
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 2c01a86c29..2adb1588a9 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2095,6 +2095,27 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * get_rel_relisshared
+ *
+ * Returns if the given relation is shared or not
+ */
+bool
+get_rel_relisshared(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relisshared;
+ ReleaseSysCache(tp);
+
+ return result;
+}
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 3d3f7a9bea..e6237d70b3 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -102,13 +102,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,8 +790,14 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * Note though that if the plan contains any child relations that would have
+ * been added by the planner, which would not have been locked yet (because
+ * AcquirePlannerLocks() only locks relations that would be present in the
+ * range table before entering the planner), the plan could go stale before
+ * it reaches execution if any of those child relations get modified
+ * concurrently. The executor must check that the plan (CachedPlan) is still
+ * valid after taking a lock on each of the child tables, and if it is not,
+ * ask the caller to recreate the plan.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -805,60 +811,56 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
-
- return false;
+ return cplan->is_valid;
}
/*
@@ -1128,8 +1130,15 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid unless it contains inheritance/partition child
+ * tables, that is, only the locks on the tables mentioned in the query have
+ * been taken. If any of those tables have inheritance/partition tables, the
+ * executor must also lock them before executing the plan and if the plan gets
+ * invalidated as a result of taking those locks, must ask the caller to get
+ * a new plan by calling here again. Locking of the child tables must be
+ * deferred to the executor like this, because not all child tables may need
+ * to be locked; some may get pruned during the executor plan initialization
+ * phase (InitPlan()).
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1362,8 +1371,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if the executor would need to take additional locks, that is, in
+ * addition to those taken by AcquirePlannerLocks() on a given query.
*/
foreach(lc, plan->stmt_list)
{
@@ -1737,58 +1746,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 06dfa85f04..0cad450dcd 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -201,6 +201,13 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /*
+ * initialize portal's query context to store QueryDescs created during
+ * PortalStart() and then used in PortalRun().
+ */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -224,6 +231,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -594,6 +602,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3d3e632a0c..392abb5150 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -88,7 +88,11 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt, struct CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv);
+extern void ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
@@ -104,6 +108,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index af2bf36dfb..c36c25b497 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -32,9 +32,12 @@
*/
typedef struct QueryDesc
{
+ NodeTag type;
+
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ struct CachedPlan *cplan; /* CachedPlan, if plannedstmt is from one */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -47,6 +50,7 @@ typedef struct QueryDesc
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
+ bool plan_valid; /* is planstate tree fully valid? */
/* This field is set by ExecutorRun */
bool already_executed; /* true if previously executed */
@@ -57,6 +61,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ struct CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index c677e490d7..e05e23bb4a 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -256,6 +257,17 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the cached plan, if any, still valid at this point? That is, not
+ * invalidated by the incoming invalidation messages that have been processed
+ * recently.
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanStillValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
@@ -590,6 +602,7 @@ exec_rt_fetch(Index rti, EState *estate)
}
extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
+extern void ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids);
extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Index rti);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cb714f4a19..f0c5177b06 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -623,6 +623,8 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct CachedPlan *es_cachedplan; /* CachedPlan if plannedstmt is from
+ * one */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -671,6 +673,10 @@ typedef struct EState
List *es_exprcontexts; /* List of ExprContexts within EState */
+ List *es_inited_plannodes; /* List of PlanState of nodes from the
+ * plan tree that were fully
+ * initialized */
+
List *es_subplanstates; /* List of PlanState for SubPlans */
List *es_auxmodifytables; /* List of secondary ModifyTableStates */
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 4ee91e3cf9..598bf2688a 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,7 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
bool orstronger);
+extern bool CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger);
extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index f5fdbfe116..a024e5dcd0 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -140,6 +140,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_relisshared(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a443181d41..8990fe72e3 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -221,6 +221,20 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Called by the executor on every relation lock taken when initializing the
+ * plan tree in the CachedPlan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aa08b1e0fc..24d420b9e9 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,9 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
+ bool plan_valid; /* are plans in qdescs ready for execution? */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +245,7 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalQueryFinish(QueryDesc *queryDesc);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..2fca84d027 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-replan
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7cd76eb34b..515b2c0c95 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2023, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,41 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static void
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ queryDesc->cplan->is_valid ? "valid" : "not valid");
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +123,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-replan.out b/src/test/modules/delay_execution/expected/cached-plan-replan.out
new file mode 100644
index 0000000000..0ac6a17c2b
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-replan.out
@@ -0,0 +1,156 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo11 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = $1)
+(6 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-----------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo11 foo_1
+ Filter: (a = $1)
+(4 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------
+Bitmap Heap Scan on foo11 foo
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = 1)
+(4 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------
+Seq Scan on foo11 foo
+ Filter: (a = 1)
+(2 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+----------------------------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Index Only Scan using foo11_a_idx on foo11 t1
+ -> Materialize
+ -> Index Scan using foo11_a_idx on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(18 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Sort
+ Sort Key: t1.a
+ -> Seq Scan on foo11 t1
+ -> Sort
+ Sort Key: t2.a
+ -> Seq Scan on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(21 rows)
+
diff --git a/src/test/modules/delay_execution/specs/cached-plan-replan.spec b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
new file mode 100644
index 0000000000..3c92cbd5c6
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
@@ -0,0 +1,61 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo1 PARTITION OF foo FOR VALUES IN (1) PARTITION BY LIST (a);
+ CREATE TABLE foo11 PARTITION OF foo1 FOR VALUES IN (1);
+ CREATE INDEX foo11_a ON foo1 (a);
+ CREATE TABLE foo2 PARTITION OF foo FOR VALUES IN (2);
+ CREATE VIEW foov AS SELECT * FROM foo;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP TABLE foo;
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# no Append case (only one partition selected by the planner)
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Append with partition-wise join aggregate and join plans as child subplans
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo11_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
--
2.35.3
On Tue, 18 Jul 2023, 08:26 Amit Langote, <amitlangote09@gmail.com> wrote:
Hi Thom,
On Tue, Jul 18, 2023 at 1:33 AM Thom Brown <thom@linux.com> wrote:
On Thu, 13 Jul 2023 at 13:59, Amit Langote <amitlangote09@gmail.com>
wrote:
In an absolutely brown-paper-bag moment, I realized that I had not
updated src/backend/executor/README to reflect the changes to the
executor's control flow that this patch makes. That is, after
scrapping the old design back in January whose details *were*
reflected in the patches before that redesign.Anyway, the attached fixes that.
Tom, do you think you have bandwidth in the near future to give this
another look? I think I've addressed the comments that you had given
back in April, though as mentioned in the previous message, there may
still be some funny-looking aspects still remaining. In any case, I
have no intention of pressing ahead with the patch without another
committer having had a chance to sign off on it.I've only just started taking a look at this, and my first test drive
yields very impressive results:8192 partitions (3 runs, 10000 rows)
Head 391.294989 382.622481 379.252236
Patched 13088.145995 13406.135531 13431.828051Just to be sure, did you use pgbench --Mprepared with plan_cache_mode
= force_generic_plan in postgresql.conf?
I did.
For full disclosure, I also had max_locks_per_transaction set to 10000.
Looking at your changes to README, I would like to suggest rewording
the following:+table during planning. This means that inheritance child tables, which
are
+added to the query's range table during planning, if they are present
in a
+cached plan tree would not have been locked.
To:
This means that inheritance child tables present in a cached plan
tree, which are added to the query's range table during planning,
would not have been locked.Also, further down:
s/intiatialize/initialize/
I'll carry on taking a closer look and see if I can break it.
Thanks for looking. I've fixed these issues in the attached updated
patch. I've also changed the position of a newly added paragraph in
src/backend/executor/README so that it doesn't break the flow of the
existing text.
Thanks.
Thom
Show quoted text
While chatting with Robert about this patch set, he suggested that it
would be better to break out some executor refactoring changes from
the main patch (0003) into a separate patch. To wit, the changes to
make the PlanState tree cleanup in ExecEndPlan() non-recursive by
walking a flat list of PlanState nodes instead of the recursive tree
walk that ExecEndNode() currently does. That allows us to cleanly
handle the cases where the PlanState tree is only partially
constructed when ExecInitNode() detects in the middle of its
construction that the plan tree is no longer valid after receiving and
processing an invalidation message on locking child tables. Or at
least more cleanly than the previously proposed approach of adjusting
ExecEndNode() subroutines for the individual node types to gracefully
handle such partially initialized PlanState trees.
With the new approach, node type specific subroutines of ExecEndNode()
need not close its child nodes, because ExecEndPlan() would close each
node that would have been initialized directly. I couldn't find any
instance of breakage by this decoupling of child node cleanup from
their parent node's cleanup. Comments in ExecEndGather() and
ExecEndGatherMerge() appear to suggest that outerPlan must be closed
before the local cleanup:
void
ExecEndGather(GatherState *node)
{
- ExecEndNode(outerPlanState(node)); /* let children clean up first */
+ /* outerPlan is closed separately. */
ExecShutdownGather(node);
ExecFreeExprContext(&node->ps);
But I don't think there's a problem, because what ExecShutdownGather()
does seems entirely independent of cleanup of outerPlan.
As for the performance impact of initializing the list of initialized
nodes to use during the cleanup phase, I couldn't find a regression,
nor any improvement by replacing the tree walk by linear scan of a
list. Actually, ExecEndNode() is pretty far down in the perf profile
anyway, so the performance difference caused by the patch hardly
matters. See the following contrived example:
create table f();
analyze f;
explain (costs off) select count(*) from f f1, f f2, f f3, f f4, f f5,
f f6, f f7, f f8, f f9, f f10;
QUERY PLAN
------------------------------------------------------------------------------
Aggregate
-> Nested Loop
-> Nested Loop
-> Nested Loop
-> Nested Loop
-> Nested Loop
-> Nested Loop
-> Nested Loop
-> Nested Loop
-> Nested Loop
-> Seq Scan on f f1
-> Seq Scan on f f2
-> Seq Scan on f f3
-> Seq Scan on f f4
-> Seq Scan on f f5
-> Seq Scan on f f6
-> Seq Scan on f f7
-> Seq Scan on f f8
-> Seq Scan on f f9
-> Seq Scan on f f10
(20 rows)
do $$
begin
for i in 1..100000 loop
perform count(*) from f f1, f f2, f f3, f f4, f f5, f f6, f f7, f f8,
f f9, f f10;
end loop;
end; $$;
Times for the DO:
Unpatched:
Time: 756.353 ms
Time: 745.752 ms
Time: 749.184 ms
Patched:
Time: 737.717 ms
Time: 747.815 ms
Time: 753.456 ms
I've attached the new refactoring patch as 0001.
Another change I've made in the main patch is to change the API of
ExecutorStart() (and ExecutorStart_hook) more explicitly to return a
boolean indicating whether or not the plan initialization was
successful. That way seems better than making the callers figure that
out by seeing that QueryDesc.planstate is NULL and/or checking
QueryDesc.plan_valid. Correspondingly, PortalStart() now also returns
true or false matching what ExecutorStart() returned. I suppose this
better alerts any extensions that use the ExecutorStart_hook to fix
their code to do the right thing.
Having extracted the ExecEndNode() change, I'm also starting to feel
inclined to extract a couple of other bits from the main patch as
separate patches, such as moving the ExecutorStart() call from
PortalRun() to PortalStart() for the multi-query portals. I'll do
that in the next version.
Attachments:
v43-0002-Add-field-to-store-parent-relids-to-Append-Merge.patchapplication/octet-stream; name=v43-0002-Add-field-to-store-parent-relids-to-Append-Merge.patchDownload
From 94117d1144355b2af718570244438853146edb4c Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:31 +0900
Subject: [PATCH v43 2/5] Add field to store parent relids to
Append/MergeAppend
There's no way currently in the executor to tell if the child
subplans of Append/MergeAppend are scanning partitions, and if
they indeed do, what the RT indexes of their parent/ancestor tables
are. Executor doesn't need to see their RT indexes except for
run-time pruning, in which case they can can be found in the
PartitionPruneInfo, but a future commit will create a need for
them to be available at all times for the purpose of locking
those parent/ancestor tables when executing a cached plan.
The code to look up partitioned parent relids for a given list of
partition scan subpaths of an Append/MergeAppend is already present
in make_partition_pruneinfo() but it's local to partprune.c. This
commit refactors that code into its own function called
add_append_subpath_partrelids() defined in appendinfo.c and
generalizes it to consider child join and aggregate paths. To
facilitate looking up of parent rels of child grouping rels in
add_append_subpath_partrelids(), parent links are now also set in
the RelOptInfos of child grouping rels too, like they are in
those of child base and join rels.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/optimizer/plan/createplan.c | 41 ++++++--
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/plan/setrefs.c | 4 +
src/backend/optimizer/util/appendinfo.c | 134 ++++++++++++++++++++++++
src/backend/partitioning/partprune.c | 124 +++-------------------
src/include/nodes/plannodes.h | 14 +++
src/include/optimizer/appendinfo.h | 3 +
src/include/partitioning/partprune.h | 3 +-
8 files changed, 203 insertions(+), 123 deletions(-)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index af48109058..8ac1d3909b 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -25,6 +25,7 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
@@ -1210,6 +1211,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
Oid *nodeCollations = NULL;
bool *nodeNullsFirst = NULL;
bool consider_async = false;
+ List *allpartrelids = NIL;
/*
* The subpaths list could be empty, if every child was proven empty by
@@ -1351,15 +1353,23 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
++nasyncplans;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ plan->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1380,7 +1390,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
partpruneinfo =
make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
plan->appendplans = subplans;
@@ -1426,6 +1437,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ List *allpartrelids = NIL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1515,15 +1527,23 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplan = (Plan *) sort;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ node->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1535,7 +1555,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
partpruneinfo = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
node->mergeplans = subplans;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 44efb1f4eb..f97bc09113 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7855,8 +7855,11 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
agg_costs, gd, &child_extra,
&child_partially_grouped_rel);
+ /* Mark as child of grouped_rel. */
+ child_grouped_rel->parent = grouped_rel;
if (child_partially_grouped_rel)
{
+ child_partially_grouped_rel->parent = grouped_rel;
partially_grouped_live_children =
lappend(partially_grouped_live_children,
child_partially_grouped_rel);
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 97fa561e4e..854dd7c8af 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1766,6 +1766,8 @@ set_append_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) aplan, rtoffset);
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
+ foreach(l, aplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (aplan->part_prune_info)
{
@@ -1842,6 +1844,8 @@ set_mergeappend_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) mplan, rtoffset);
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
+ foreach(l, mplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (mplan->part_prune_info)
{
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index f456b3b0a4..5bd8e82b9b 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -41,6 +41,7 @@ static void make_inh_translation_list(Relation oldrelation,
AppendRelInfo *appinfo);
static Node *adjust_appendrel_attrs_mutator(Node *node,
adjust_appendrel_attrs_context *context);
+static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
/*
@@ -1035,3 +1036,136 @@ distribute_row_identity_vars(PlannerInfo *root)
}
}
}
+
+/*
+ * add_append_subpath_partrelids
+ * Look up a child subpath's rel's partitioned parent relids up to
+ * parentrel and add the bitmapset containing those into
+ * 'allpartrelids'
+ */
+List *
+add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids)
+{
+ RelOptInfo *prel = subpath->parent;
+ Relids partrelids = NULL;
+
+ /* Nothing to do if there's no parent to begin with. */
+ if (!IS_OTHER_REL(prel))
+ return allpartrelids;
+
+ /*
+ * Traverse up to the pathrel's topmost partitioned parent, collecting
+ * parent relids as we go; but stop if we reach parentrel. (Normally, a
+ * pathrel's topmost partitioned parent is either parentrel or a UNION ALL
+ * appendrel child of parentrel. But when handling partitionwise joins of
+ * multi-level partitioning trees, we can see an append path whose
+ * parentrel is an intermediate partitioned table.)
+ */
+ do
+ {
+ Relids parent_relids = NULL;
+
+ /*
+ * For simple child rels, we can simply set the parent_relids to
+ * prel->parent->relids. But for partitionwise join and aggregate
+ * child rels, while we can use prel->parent to move up the tree,
+ * parent_relids must be found the hard way through AppendInfoInfos,
+ * because 1) a joinrel's relids may point to RTE_JOIN entries,
+ * 2) topmost parent grouping rel's relids field is NULL.
+ */
+ if (IS_SIMPLE_REL(prel))
+ {
+ prel = prel->parent;
+ /* Stop once we reach the root partitioned rel. */
+ if (!IS_PARTITIONED_REL(prel))
+ break;
+ parent_relids = bms_add_members(parent_relids, prel->relids);
+ }
+ else
+ {
+ AppendRelInfo **appinfos;
+ int nappinfos,
+ i;
+
+ appinfos = find_appinfos_by_relids(root, prel->relids,
+ &nappinfos);
+ for (i = 0; i < nappinfos; i++)
+ {
+ AppendRelInfo *appinfo = appinfos[i];
+
+ parent_relids = bms_add_member(parent_relids,
+ appinfo->parent_relid);
+ }
+ pfree(appinfos);
+ prel = prel->parent;
+ }
+ /* accept this level as an interesting parent */
+ partrelids = bms_add_members(partrelids, parent_relids);
+ if (prel == parentrel)
+ break; /* don't traverse above parentrel */
+ } while (IS_OTHER_REL(prel));
+
+ if (partrelids == NULL)
+ return allpartrelids;
+
+ return add_part_relids(allpartrelids, partrelids);
+}
+
+/*
+ * add_part_relids
+ * Add new info to a list of Bitmapsets of partitioned relids.
+ *
+ * Within 'allpartrelids', there is one Bitmapset for each topmost parent
+ * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
+ * parent as well as its relevant non-leaf child partitions. Since (by
+ * construction of the rangetable list) parent partitions must have lower
+ * RT indexes than their children, we can distinguish the topmost parent
+ * as being the lowest set bit in the Bitmapset.
+ *
+ * 'partrelids' contains the RT indexes of a parent partitioned rel, and
+ * possibly some non-leaf children, that are newly identified as parents of
+ * some subpath rel passed to make_partition_pruneinfo(). These are added
+ * to an appropriate member of 'allpartrelids'.
+ *
+ * Note that the list contains only RT indexes of partitioned tables that
+ * are parents of some scan-level relation appearing in the 'subpaths' that
+ * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
+ * not allowed to be higher than the 'parentrel' associated with the append
+ * path. In this way, we avoid expending cycles on partitioned rels that
+ * can't contribute useful pruning information for the problem at hand.
+ * (It is possible for 'parentrel' to be a child partitioned table, and it
+ * is also possible for scan-level relations to be child partitioned tables
+ * rather than leaf partitions. Hence we must construct this relation set
+ * with reference to the particular append path we're dealing with, rather
+ * than looking at the full partitioning structure represented in the
+ * RelOptInfos.)
+ */
+static List *
+add_part_relids(List *allpartrelids, Bitmapset *partrelids)
+{
+ Index targetpart;
+ ListCell *lc;
+
+ /* We can easily get the lowest set bit this way: */
+ targetpart = bms_next_member(partrelids, -1);
+ Assert(targetpart > 0);
+
+ /* Look for a matching topmost parent */
+ foreach(lc, allpartrelids)
+ {
+ Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
+ Index currtarget = bms_next_member(currpartrelids, -1);
+
+ if (targetpart == currtarget)
+ {
+ /* Found a match, so add any new RT indexes to this hierarchy */
+ currpartrelids = bms_add_members(currpartrelids, partrelids);
+ lfirst(lc) = currpartrelids;
+ return allpartrelids;
+ }
+ }
+ /* No match, so add the new partition hierarchy to the list */
+ return lappend(allpartrelids, partrelids);
+}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 7179b22a05..213512a5f4 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -138,7 +138,6 @@ typedef struct PruneStepResult
} PruneStepResult;
-static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
RelOptInfo *parentrel,
List *prunequal,
@@ -218,33 +217,32 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'allpartrelids' contains Bitmapsets of RT indexes of partitioned parents
+ * whose partitions' Paths are in 'subpaths'; there's one Bitmapset for every
+ * partition tree involved.
*/
PartitionPruneInfo *
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ List *allpartrelids)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
- List *allpartrelids;
List *prunerelinfos;
int *relid_subplan_map;
ListCell *lc;
int i;
+ Assert(list_length(allpartrelids) > 0);
+
/*
- * Scan the subpaths to see which ones are scans of partition child
- * relations, and identify their parent partitioned rels. (Note: we must
- * restrict the parent partitioned rels to be parentrel or children of
- * parentrel, otherwise we couldn't translate prunequal to match.)
- *
- * Also construct a temporary array to map from partition-child-relation
- * relid to the index in 'subpaths' of the scan plan for that partition.
+ * Construct a temporary array to map from partition-child-relation relid
+ * to the index in 'subpaths' of the scan plan for that partition.
* (Use of "subplan" rather than "subpath" is a bit of a misnomer, but
* we'll let it stand.) For convenience, we use 1-based indexes here, so
* that zero can represent an un-filled array entry.
*/
- allpartrelids = NIL;
relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
i = 1;
@@ -253,50 +251,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Path *path = (Path *) lfirst(lc);
RelOptInfo *pathrel = path->parent;
- /* We don't consider partitioned joins here */
- if (pathrel->reloptkind == RELOPT_OTHER_MEMBER_REL)
- {
- RelOptInfo *prel = pathrel;
- Bitmapset *partrelids = NULL;
-
- /*
- * Traverse up to the pathrel's topmost partitioned parent,
- * collecting parent relids as we go; but stop if we reach
- * parentrel. (Normally, a pathrel's topmost partitioned parent
- * is either parentrel or a UNION ALL appendrel child of
- * parentrel. But when handling partitionwise joins of
- * multi-level partitioning trees, we can see an append path whose
- * parentrel is an intermediate partitioned table.)
- */
- do
- {
- AppendRelInfo *appinfo;
-
- Assert(prel->relid < root->simple_rel_array_size);
- appinfo = root->append_rel_array[prel->relid];
- prel = find_base_rel(root, appinfo->parent_relid);
- if (!IS_PARTITIONED_REL(prel))
- break; /* reached a non-partitioned parent */
- /* accept this level as an interesting parent */
- partrelids = bms_add_member(partrelids, prel->relid);
- if (prel == parentrel)
- break; /* don't traverse above parentrel */
- } while (prel->reloptkind == RELOPT_OTHER_MEMBER_REL);
-
- if (partrelids)
- {
- /*
- * Found some relevant parent partitions, which may or may not
- * overlap with partition trees we already found. Add new
- * information to the allpartrelids list.
- */
- allpartrelids = add_part_relids(allpartrelids, partrelids);
- /* Also record the subplan in relid_subplan_map[] */
- /* No duplicates please */
- Assert(relid_subplan_map[pathrel->relid] == 0);
- relid_subplan_map[pathrel->relid] = i;
- }
- }
+ /* No duplicates please */
+ Assert(relid_subplan_map[pathrel->relid] == 0);
+ relid_subplan_map[pathrel->relid] = i;
i++;
}
@@ -362,63 +319,6 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return pruneinfo;
}
-/*
- * add_part_relids
- * Add new info to a list of Bitmapsets of partitioned relids.
- *
- * Within 'allpartrelids', there is one Bitmapset for each topmost parent
- * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
- * parent as well as its relevant non-leaf child partitions. Since (by
- * construction of the rangetable list) parent partitions must have lower
- * RT indexes than their children, we can distinguish the topmost parent
- * as being the lowest set bit in the Bitmapset.
- *
- * 'partrelids' contains the RT indexes of a parent partitioned rel, and
- * possibly some non-leaf children, that are newly identified as parents of
- * some subpath rel passed to make_partition_pruneinfo(). These are added
- * to an appropriate member of 'allpartrelids'.
- *
- * Note that the list contains only RT indexes of partitioned tables that
- * are parents of some scan-level relation appearing in the 'subpaths' that
- * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
- * not allowed to be higher than the 'parentrel' associated with the append
- * path. In this way, we avoid expending cycles on partitioned rels that
- * can't contribute useful pruning information for the problem at hand.
- * (It is possible for 'parentrel' to be a child partitioned table, and it
- * is also possible for scan-level relations to be child partitioned tables
- * rather than leaf partitions. Hence we must construct this relation set
- * with reference to the particular append path we're dealing with, rather
- * than looking at the full partitioning structure represented in the
- * RelOptInfos.)
- */
-static List *
-add_part_relids(List *allpartrelids, Bitmapset *partrelids)
-{
- Index targetpart;
- ListCell *lc;
-
- /* We can easily get the lowest set bit this way: */
- targetpart = bms_next_member(partrelids, -1);
- Assert(targetpart > 0);
-
- /* Look for a matching topmost parent */
- foreach(lc, allpartrelids)
- {
- Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
- Index currtarget = bms_next_member(currpartrelids, -1);
-
- if (targetpart == currtarget)
- {
- /* Found a match, so add any new RT indexes to this hierarchy */
- currpartrelids = bms_add_members(currpartrelids, partrelids);
- lfirst(lc) = currpartrelids;
- return allpartrelids;
- }
- }
- /* No match, so add the new partition hierarchy to the list */
- return lappend(allpartrelids, partrelids);
-}
-
/*
* make_partitionedrel_pruneinfo
* Build a List of PartitionedRelPruneInfos, one for each interesting
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 1b787fe031..7a5f3ba625 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -267,6 +267,13 @@ typedef struct Append
List *appendplans;
int nasyncplans; /* # of asynchronous plans */
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * appendplans. The list contains a bitmapset for every partition tree
+ * covered by this Append.
+ */
+ List *allpartrelids;
+
/*
* All 'appendplans' preceding this index are non-partial plans. All
* 'appendplans' from this index onwards are partial plans.
@@ -291,6 +298,13 @@ typedef struct MergeAppend
List *mergeplans;
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * mergeplans. The list contains a bitmapset for every partition tree
+ * covered by this MergeAppend.
+ */
+ List *allpartrelids;
+
/* these fields are just like the sort-key info in struct Sort: */
/* number of sort-key columns */
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index a05f91f77d..1621a7319a 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -46,5 +46,8 @@ extern void add_row_identity_columns(PlannerInfo *root, Index rtindex,
RangeTblEntry *target_rte,
Relation target_relation);
extern void distribute_row_identity_vars(PlannerInfo *root);
+extern List *add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids);
#endif /* APPENDINFO_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 8636e04e37..caa774a111 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -73,7 +73,8 @@ typedef struct PartitionPruneContext
extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ List *allpartrelids);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v43-0001-Make-PlanState-tree-cleanup-non-recursive.patchapplication/octet-stream; name=v43-0001-Make-PlanState-tree-cleanup-non-recursive.patchDownload
From a55bd363690bc4c28047e4b874ce80384e37c49d Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 1 Aug 2023 11:36:24 +0900
Subject: [PATCH v43 1/5] Make PlanState tree cleanup non-recursive
With this change, node type specific subroutines of ExecEndNode()
are no longer required to also clean up the child nodes of a given
node, only its own stuff. Instead, ExecEndPlan() calls
ExecInitNode() directly for each node in the PlanState tree by
iterating over a list (EState.es_planstate_nodes) of all those nodes
built during the ExecInitNode() traversal of the tree.
This changes the order in which the nodes get cleaned up, because
they are now cleaned up in the order in which they are added into
the list which is from leaf-level up to the root, whereas with the
current recursive approach cleanup occurs from the root to the
leaves. The change seems harmless though, because there isn't
necessarily any coupling between of the cleanup actions of parent
and child nodes.
The main motivation behind this change is to allow the cases in
the future where ExecInitNode() traversal of the plan tree may
be aborted in the middle resulting in a partially initialized
PlanState tree. Dealing with that case by making the cleanup
phase walk over a list of successfully initialized nodes seems
better / more robust than making the individual ExecEndNode()
subroutines deal with partially valid PlanState nodes.
---
src/backend/executor/README | 4 +-
src/backend/executor/execMain.c | 36 +++++++-------
src/backend/executor/execProcnode.c | 56 ++++++++++------------
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeAgg.c | 4 +-
src/backend/executor/nodeAppend.c | 20 +-------
src/backend/executor/nodeBitmapAnd.c | 23 +--------
src/backend/executor/nodeBitmapHeapscan.c | 5 +-
src/backend/executor/nodeBitmapOr.c | 23 +--------
src/backend/executor/nodeForeignscan.c | 4 +-
src/backend/executor/nodeGather.c | 2 +-
src/backend/executor/nodeGatherMerge.c | 2 +-
src/backend/executor/nodeGroup.c | 5 +-
src/backend/executor/nodeHash.c | 8 +---
src/backend/executor/nodeHashjoin.c | 6 +--
src/backend/executor/nodeIncrementalSort.c | 5 +-
src/backend/executor/nodeLimit.c | 2 +-
src/backend/executor/nodeLockRows.c | 2 +-
src/backend/executor/nodeMaterial.c | 5 +-
src/backend/executor/nodeMemoize.c | 5 +-
src/backend/executor/nodeMergeAppend.c | 20 +-------
src/backend/executor/nodeMergejoin.c | 6 +--
src/backend/executor/nodeModifyTable.c | 7 +--
src/backend/executor/nodeNestloop.c | 6 +--
src/backend/executor/nodeProjectSet.c | 5 +-
src/backend/executor/nodeRecursiveunion.c | 6 +--
src/backend/executor/nodeResult.c | 5 +-
src/backend/executor/nodeSetOp.c | 2 +-
src/backend/executor/nodeSort.c | 5 +-
src/backend/executor/nodeSubqueryscan.c | 5 +-
src/backend/executor/nodeUnique.c | 2 +-
src/backend/executor/nodeWindowAgg.c | 4 +-
src/include/nodes/execnodes.h | 2 +
33 files changed, 80 insertions(+), 214 deletions(-)
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..67a5c1769b 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -310,13 +310,13 @@ This is a sketch of control flow for full query processing:
AfterTriggerEndQuery
ExecutorEnd
- ExecEndNode --- recursively releases resources
+ ExecEndPlan --- releases plan resources
FreeExecutorState
frees per-query context and child contexts
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4c5a7bbf62..235bb52ccc 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -82,7 +82,7 @@ ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
static void InitPlan(QueryDesc *queryDesc, int eflags);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
-static void ExecEndPlan(PlanState *planstate, EState *estate);
+static void ExecEndPlan(EState *estate);
static void ExecutePlan(EState *estate, PlanState *planstate,
bool use_parallel_mode,
CmdType operation,
@@ -500,7 +500,7 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
*/
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
- ExecEndPlan(queryDesc->planstate, estate);
+ ExecEndPlan(estate);
/* do away with our snapshots */
UnregisterSnapshot(estate->es_snapshot);
@@ -1499,23 +1499,21 @@ ExecPostprocessPlan(EState *estate)
* ----------------------------------------------------------------
*/
static void
-ExecEndPlan(PlanState *planstate, EState *estate)
+ExecEndPlan(EState *estate)
{
ListCell *l;
/*
- * shut down the node-type-specific query processing
+ * Shut down the node-type-specific query processing for all nodes that
+ * were initialized in InitPlan(). That includes the nodes in both the
+ * main plan tree (es_plannedstmt->planTree) and those in subplans
+ * (es_plannedstmt->subplans).
*/
- ExecEndNode(planstate);
-
- /*
- * for subplans too
- */
- foreach(l, estate->es_subplanstates)
+ foreach(l, estate->es_planstate_nodes)
{
- PlanState *subplanstate = (PlanState *) lfirst(l);
+ PlanState *pstate = (PlanState *) lfirst(l);
- ExecEndNode(subplanstate);
+ ExecEndNode(pstate);
}
/*
@@ -3030,13 +3028,17 @@ EvalPlanQualEnd(EPQState *epqstate)
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
- ExecEndNode(epqstate->recheckplanstate);
-
- foreach(l, estate->es_subplanstates)
+ /*
+ * Shut down the node-type-specific query processing for all nodes that
+ * were initialized in InitPlan(). That includes the nodes in both the
+ * main plan tree (epqstate->plan) and those in subplans
+ * (es_plannedstmt->subplans).
+ */
+ foreach(l, estate->es_planstate_nodes)
{
- PlanState *subplanstate = (PlanState *) lfirst(l);
+ PlanState *planstate = (PlanState *) lfirst(l);
- ExecEndNode(subplanstate);
+ ExecEndNode(planstate);
}
/* throw away the per-estate tuple table, some node may have used it */
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d4..653f74cf58 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -1,11 +1,13 @@
/*-------------------------------------------------------------------------
*
* execProcnode.c
- * contains dispatch functions which call the appropriate "initialize",
- * "get a tuple", and "cleanup" routines for the given node type.
- * If the node has children, then it will presumably call ExecInitNode,
- * ExecProcNode, or ExecEndNode on its subnodes and do the appropriate
- * processing.
+ * Contains dispatch functions ExecInitNode(), ExecProcNode(), and
+ * ExecEndNode(), which call the appropriate "initialize", "get a tuple",
+ * and "cleanup" routines, respectively, for the given node type.
+ *
+ * While the first two process the node's children recursively, ExecEndNode()
+ * is only concerned with the cleaning of the node itself while the children
+ * are processed by the caller.
*
* Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
@@ -49,7 +51,9 @@
* Eventually this calls ExecInitNode() on the right and left subplans
* and so forth until the entire plan is initialized. The result
* of ExecInitNode() is a plan state tree built with the same structure
- * as the underlying plan tree.
+ * as the underlying plan tree. (The plan state nodes are also added to
+ * a list in the same order in which they are created for the final
+ * cleanup processing.)
*
* * Then when ExecutorRun() is called, it calls ExecutePlan() which calls
* ExecProcNode() repeatedly on the top node of the plan state tree.
@@ -61,14 +65,10 @@
* form the tuples it returns.
*
* * Eventually ExecSeqScan() stops returning tuples and the nest
- * loop join ends. Lastly, ExecutorEnd() calls ExecEndNode() which
- * calls ExecEndNestLoop() which in turn calls ExecEndNode() on
- * its subplans which result in ExecEndSeqScan().
+ * loop join ends. Lastly, ExecutorEnd() calls ExecEndPlan(), which
+ * in turn calls ExecEndNode() on all the nodes that were initialized:
+ * the two Seq Scans and the Nest Loop in this case.
*
- * This should show how the executor works by having
- * ExecInitNode(), ExecProcNode() and ExecEndNode() dispatch
- * their work to the appropriate node support routines which may
- * in turn call these routines themselves on their subplans.
*/
#include "postgres.h"
@@ -136,6 +136,9 @@ static bool ExecShutdownNode_walker(PlanState *node, void *context);
* 'eflags' is a bitwise OR of flag bits described in executor.h
*
* Returns a PlanState node corresponding to the given Plan node.
+ *
+ * As a side-effect, all PlanState nodes that are created are appended to
+ * estate->es_planstate_nodes for the cleanup processing in ExecEndPlan().
* ------------------------------------------------------------------------
*/
PlanState *
@@ -411,6 +414,10 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
result->instrument = InstrAlloc(1, estate->es_instrument,
result->async_capable);
+ /* And remember for the cleanup processing in ExecEndPlan(). */
+ estate->es_planstate_nodes = lappend(estate->es_planstate_nodes,
+ result);
+
return result;
}
@@ -545,29 +552,18 @@ MultiExecProcNode(PlanState *node)
/* ----------------------------------------------------------------
* ExecEndNode
*
- * Recursively cleans up all the nodes in the plan rooted
- * at 'node'.
+ * Cleans up node
*
- * After this operation, the query plan will not be able to be
- * processed any further. This should be called only after
- * the query plan has been fully executed.
+ * Unlike ExecInitNode(), this does not recurse into child nodes, because
+ * they are processed separately. So the ExecEnd* routine for any given
+ * node type is only responsible for cleaning up its own resources.
* ----------------------------------------------------------------
*/
void
ExecEndNode(PlanState *node)
{
- /*
- * do nothing when we get to the end of a leaf on tree.
- */
- if (node == NULL)
- return;
-
- /*
- * Make sure there's enough stack available. Need to check here, in
- * addition to ExecProcNode() (via ExecProcNodeFirst()), because it's not
- * guaranteed that ExecProcNode() is reached for all nodes.
- */
- check_stack_depth();
+ /* We only ever get called on nodes that were actually initialized. */
+ Assert(node != NULL);
if (node->chgParam != NULL)
{
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c06b228858..b567165003 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -154,6 +154,8 @@ CreateExecutorState(void)
estate->es_exprcontexts = NIL;
+ estate->es_planstate_nodes = NIL;
+
estate->es_subplanstates = NIL;
estate->es_auxmodifytables = NIL;
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 468db94fe5..e9d9ab6bdd 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -4304,7 +4304,6 @@ GetAggInitVal(Datum textInitVal, Oid transtype)
void
ExecEndAgg(AggState *node)
{
- PlanState *outerPlan;
int transno;
int numGroupingSets = Max(node->maxsets, 1);
int setno;
@@ -4367,8 +4366,7 @@ ExecEndAgg(AggState *node)
/* clean up tuple table */
ExecClearTuple(node->ss.ss_ScanTupleSlot);
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ /* outerPlan is closely separately. */
}
void
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..9148d7d3b1 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -376,30 +376,12 @@ ExecAppend(PlanState *pstate)
/* ----------------------------------------------------------------
* ExecEndAppend
- *
- * Shuts down the subscans of the append node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndAppend(AppendState *node)
{
- PlanState **appendplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- appendplans = node->appendplans;
- nplans = node->as_nplans;
-
- /*
- * shut down each of the subscans
- */
- for (i = 0; i < nplans; i++)
- ExecEndNode(appendplans[i]);
+ /* Nothing to do as the nodes in appendplans are closed separately. */
}
void
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4c5eb2b23b..147592f7e2 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -168,33 +168,12 @@ MultiExecBitmapAnd(BitmapAndState *node)
/* ----------------------------------------------------------------
* ExecEndBitmapAnd
- *
- * Shuts down the subscans of the BitmapAnd node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndBitmapAnd(BitmapAndState *node)
{
- PlanState **bitmapplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- bitmapplans = node->bitmapplans;
- nplans = node->nplans;
-
- /*
- * shut down each of the subscans (that we've initialized)
- */
- for (i = 0; i < nplans; i++)
- {
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
- }
+ /* Nothing to do as the nodes in bitmapplans are closed separately. */
}
void
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index f35df0b8bf..d58ee4f4e1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -667,10 +667,7 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
ExecClearTuple(node->ss.ss_ScanTupleSlot);
- /*
- * close down subplans
- */
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
/*
* release bitmaps and buffers if any
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 0bf8af9652..736852a0ae 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -186,33 +186,12 @@ MultiExecBitmapOr(BitmapOrState *node)
/* ----------------------------------------------------------------
* ExecEndBitmapOr
- *
- * Shuts down the subscans of the BitmapOr node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndBitmapOr(BitmapOrState *node)
{
- PlanState **bitmapplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- bitmapplans = node->bitmapplans;
- nplans = node->nplans;
-
- /*
- * shut down each of the subscans (that we've initialized)
- */
- for (i = 0; i < nplans; i++)
- {
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
- }
+ /* Nothing to do as the nodes in bitmapplans are closed separately. */
}
void
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index c2139acca0..e6616dd718 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -309,9 +309,7 @@ ExecEndForeignScan(ForeignScanState *node)
else
node->fdwroutine->EndForeignScan(node);
- /* Shut down any outer plan. */
- if (outerPlanState(node))
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
/* Free the exprcontext */
ExecFreeExprContext(&node->ss.ps);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 307fc10eea..f7a69f185b 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -248,7 +248,7 @@ ExecGather(PlanState *pstate)
void
ExecEndGather(GatherState *node)
{
- ExecEndNode(outerPlanState(node)); /* let children clean up first */
+ /* outerPlan is closed separately. */
ExecShutdownGather(node);
ExecFreeExprContext(&node->ps);
if (node->ps.ps_ResultTupleSlot)
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 9d5e1a46e9..d357ff0c47 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -288,7 +288,7 @@ ExecGatherMerge(PlanState *pstate)
void
ExecEndGatherMerge(GatherMergeState *node)
{
- ExecEndNode(outerPlanState(node)); /* let children clean up first */
+ /* outerPlan is closed separately. */
ExecShutdownGatherMerge(node);
ExecFreeExprContext(&node->ps);
if (node->ps.ps_ResultTupleSlot)
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 25a1618952..2badcc7e60 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -226,15 +226,12 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
void
ExecEndGroup(GroupState *node)
{
- PlanState *outerPlan;
-
ExecFreeExprContext(&node->ss.ps);
/* clean up tuple table */
ExecClearTuple(node->ss.ss_ScanTupleSlot);
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ /* outerPlan is closed separately. */
}
void
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 8b5c35b82b..edd2324384 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -413,18 +413,12 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
void
ExecEndHash(HashState *node)
{
- PlanState *outerPlan;
-
/*
* free exprcontext
*/
ExecFreeExprContext(&node->ps);
- /*
- * shut down the subplan
- */
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ /* outerPlan is closed separately. */
}
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 980746128b..8078d7f229 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -879,11 +879,7 @@ ExecEndHashJoin(HashJoinState *node)
ExecClearTuple(node->hj_OuterTupleSlot);
ExecClearTuple(node->hj_HashTupleSlot);
- /*
- * clean up subtrees
- */
- ExecEndNode(outerPlanState(node));
- ExecEndNode(innerPlanState(node));
+ /* outerPlan and innerPlan are closed separately. */
}
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 7683e3341c..52b146cfb8 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1101,10 +1101,7 @@ ExecEndIncrementalSort(IncrementalSortState *node)
node->prefixsort_state = NULL;
}
- /*
- * Shut down the subplan.
- */
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
SO_printf("ExecEndIncrementalSort: sort node shutdown\n");
}
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 425fbfc405..a75099dd73 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -535,7 +535,7 @@ void
ExecEndLimit(LimitState *node)
{
ExecFreeExprContext(&node->ps);
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
}
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index e459971d32..55de8d3d65 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -386,7 +386,7 @@ ExecEndLockRows(LockRowsState *node)
{
/* We may have shut down EPQ already, but no harm in another call */
EvalPlanQualEnd(&node->lr_epqstate);
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
}
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 09632678b0..ef04e9a8e7 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -251,10 +251,7 @@ ExecEndMaterial(MaterialState *node)
tuplestore_end(node->tuplestorestate);
node->tuplestorestate = NULL;
- /*
- * shut down the subplan
- */
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 4f04269e26..61578d4b5c 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -1100,10 +1100,7 @@ ExecEndMemoize(MemoizeState *node)
*/
ExecFreeExprContext(&node->ss.ps);
- /*
- * shut down the subplan
- */
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
}
void
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 21b5726e6e..8aa64944c9 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -310,30 +310,12 @@ heap_compare_slots(Datum a, Datum b, void *arg)
/* ----------------------------------------------------------------
* ExecEndMergeAppend
- *
- * Shuts down the subscans of the MergeAppend node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndMergeAppend(MergeAppendState *node)
{
- PlanState **mergeplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- mergeplans = node->mergeplans;
- nplans = node->ms_nplans;
-
- /*
- * shut down each of the subscans
- */
- for (i = 0; i < nplans; i++)
- ExecEndNode(mergeplans[i]);
+ /* Nothing to do as the nodes in mergeplans are closed separately. */
}
void
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 00f96d045e..7b530d9088 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1654,11 +1654,7 @@ ExecEndMergeJoin(MergeJoinState *node)
ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
ExecClearTuple(node->mj_MarkedTupleSlot);
- /*
- * shut down the subplans
- */
- ExecEndNode(innerPlanState(node));
- ExecEndNode(outerPlanState(node));
+ /* outerPlan and innerPlan are closed separately. */
MJ1_printf("ExecEndMergeJoin: %s\n",
"node processing ended");
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 2a5fec8d01..bdbaa4753b 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4397,7 +4397,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* ----------------------------------------------------------------
* ExecEndModifyTable
*
- * Shuts down the plan.
+ * Releases ModifyTable resources.
*
* Returns nothing of interest.
* ----------------------------------------------------------------
@@ -4461,10 +4461,7 @@ ExecEndModifyTable(ModifyTableState *node)
*/
EvalPlanQualEnd(&node->mt_epqstate);
- /*
- * shut down subplan
- */
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
}
void
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b3d52e69ec..5cfb50a366 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -374,11 +374,7 @@ ExecEndNestLoop(NestLoopState *node)
*/
ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- /*
- * close down subplans
- */
- ExecEndNode(outerPlanState(node));
- ExecEndNode(innerPlanState(node));
+ /* outerPlan and innerPlan are closed separately. */
NL1_printf("ExecEndNestLoop: %s\n",
"node processing ended");
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index f6ff3dc44c..4a388220ee 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -330,10 +330,7 @@ ExecEndProjectSet(ProjectSetState *node)
*/
ExecClearTuple(node->ps.ps_ResultTupleSlot);
- /*
- * shut down subplans
- */
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
}
void
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index e781003934..aee31c7139 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -281,11 +281,7 @@ ExecEndRecursiveUnion(RecursiveUnionState *node)
if (node->tableContext)
MemoryContextDelete(node->tableContext);
- /*
- * close down subplans
- */
- ExecEndNode(outerPlanState(node));
- ExecEndNode(innerPlanState(node));
+ /* outerPlan and innerPlan are closed separately. */
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 4219712d30..a100b144be 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -250,10 +250,7 @@ ExecEndResult(ResultState *node)
*/
ExecClearTuple(node->ps.ps_ResultTupleSlot);
- /*
- * shut down subplans
- */
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
}
void
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 4bc2406b89..f7db9a3415 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -590,7 +590,7 @@ ExecEndSetOp(SetOpState *node)
MemoryContextDelete(node->tableContext);
ExecFreeExprContext(&node->ps);
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
}
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index c6c72c6e67..078d041c40 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -317,10 +317,7 @@ ExecEndSort(SortState *node)
tuplesort_end((Tuplesortstate *) node->tuplesortstate);
node->tuplesortstate = NULL;
- /*
- * shut down the subplan
- */
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
SO1_printf("ExecEndSort: %s\n",
"sort node shutdown");
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 42471bfc04..bc55a82fc3 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -179,10 +179,7 @@ ExecEndSubqueryScan(SubqueryScanState *node)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
ExecClearTuple(node->ss.ss_ScanTupleSlot);
- /*
- * close down subquery
- */
- ExecEndNode(node->subplan);
+ /* subplan is closed separately. */
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 45035d74fa..50babacdc8 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -173,7 +173,7 @@ ExecEndUnique(UniqueState *node)
ExecFreeExprContext(&node->ps);
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
}
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 310ac23e3a..648cdadc32 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2681,7 +2681,6 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
void
ExecEndWindowAgg(WindowAggState *node)
{
- PlanState *outerPlan;
int i;
release_partition(node);
@@ -2714,8 +2713,7 @@ ExecEndWindowAgg(WindowAggState *node)
pfree(node->perfunc);
pfree(node->peragg);
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ /* outerPlan is closed separately. */
}
/* -----------------
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cb714f4a19..233fb6b4f9 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -671,6 +671,8 @@ typedef struct EState
List *es_exprcontexts; /* List of ExprContexts within EState */
+ List *es_planstate_nodes; /* "flat" list of PlanState nodes */
+
List *es_subplanstates; /* List of PlanState for SubPlans */
List *es_auxmodifytables; /* List of secondary ModifyTableStates */
--
2.35.3
v43-0005-Track-opened-range-table-relations-in-a-List-in-.patchapplication/octet-stream; name=v43-0005-Track-opened-range-table-relations-in-a-List-in-.patchDownload
From b04831f37758fd86bf1e0fb41d8bf1001de4e8a0 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:49 +0900
Subject: [PATCH v43 5/5] Track opened range table relations in a List in
EState
This makes ExecCloseRangeTableRelations faster when there are many
relations in the range table but only a few are opened during
execution, such as when run-time pruning kicks in on an Append
containing thousands of partition subplans.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 9 +++++----
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 2 ++
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4a4b4b7690..ae9f45355f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1653,12 +1653,13 @@ ExecCloseResultRelations(EState *estate)
void
ExecCloseRangeTableRelations(EState *estate)
{
- int i;
+ ListCell *lc;
- for (i = 0; i < estate->es_range_table_size; i++)
+ foreach(lc, estate->es_opened_relations)
{
- if (estate->es_relations[i])
- table_close(estate->es_relations[i], NoLock);
+ Relation rel = lfirst(lc);
+
+ table_close(rel, NoLock);
}
}
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index ee12235b2f..5ae993e29c 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -839,6 +839,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
}
estate->es_relations[rti - 1] = rel;
+ estate->es_opened_relations = lappend(estate->es_opened_relations,
+ rel);
}
return rel;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 20c1bacae1..c519a6d5dc 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,8 @@ typedef struct EState
Index es_range_table_size; /* size of the range table arrays */
Relation *es_relations; /* Array of per-range-table-entry Relation
* pointers, or NULL if not yet opened */
+ List *es_opened_relations; /* List of non-NULL entries in
+ * es_relations in no specific order */
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
--
2.35.3
v43-0003-Set-inFromCl-to-false-in-child-table-RTEs.patchapplication/octet-stream; name=v43-0003-Set-inFromCl-to-false-in-child-table-RTEs.patchDownload
From b6e4c896027f832ff0e5762795355a89fd93afb2 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:43 +0900
Subject: [PATCH v43 3/5] Set inFromCl to false in child table RTEs
This is to allow the executor be able to distinguish tables that are
directly mentioned in the query from those that get added to the
query during planning. A subsequent commit will teach the executor
to lock only the tables of the latter kind when executing a cached
plan.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.comk
---
src/backend/optimizer/util/inherit.c | 6 ++++++
src/backend/parser/analyze.c | 7 +++----
src/include/nodes/parsenodes.h | 9 +++++++--
3 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 94de855a22..9bac07bf40 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -492,6 +492,12 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
}
else
childrte->inh = false;
+ /*
+ * Mark child tables as not being directly mentioned in the query. This
+ * allows the executor's ExecGetRangeTableRelation() to conveniently
+ * identify it as an inheritance child table.
+ */
+ childrte->inFromCl = false;
childrte->securityQuals = NIL;
/*
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 4006632092..bcf6fcdde2 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -3267,10 +3267,9 @@ transformLockingClause(ParseState *pstate, Query *qry, LockingClause *lc,
/*
* Lock all regular tables used in query and its subqueries. We
* examine inFromCl to exclude auto-added RTEs, particularly NEW/OLD
- * in rules. This is a bit of an abuse of a mostly-obsolete flag, but
- * it's convenient. We can't rely on the namespace mechanism that has
- * largely replaced inFromCl, since for example we need to lock
- * base-relation RTEs even if they are masked by upper joins.
+ * in rules. We can't rely on the namespace mechanism since for
+ * example we need to lock base-relation RTEs even if they are masked
+ * by upper joins.
*/
i = 0;
foreach(rt, qry->rtable)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index fe003ded50..72f2b0c04f 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -994,11 +994,16 @@ typedef struct PartitionCmd
*
* inFromCl marks those range variables that are listed in the FROM clause.
* It's false for RTEs that are added to a query behind the scenes, such
- * as the NEW and OLD variables for a rule, or the subqueries of a UNION.
+ * as the NEW and OLD variables for a rule, or the subqueries of a UNION,
+ * or the RTEs of inheritance child tables that are added by the planner.
* This flag is not used during parsing (except in transformLockingClause,
* q.v.); the parser now uses a separate "namespace" data structure to
* control visibility. But it is needed by ruleutils.c to determine
- * whether RTEs should be shown in decompiled queries.
+ * whether RTEs should be shown in decompiled queries. It is used by the
+ * executor to determine that a given RTE_RELATION entry belongs to a table
+ * directly mentioned in the query or to a child table added by the planner.
+ * It needs to know that for the case where the child tables in a plan need
+ * to be locked.
*
* securityQuals is a list of security barrier quals (boolean expressions),
* to be tested in the listed order before returning a row from the
--
2.35.3
v43-0004-Delay-locking-of-child-tables-in-cached-plans-un.patchapplication/octet-stream; name=v43-0004-Delay-locking-of-child-tables-in-cached-plans-un.patchDownload
From 90e9a2935a97ff085f521a34dfd86b9800542ab1 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:45 +0900
Subject: [PATCH v43 4/5] Delay locking of child tables in cached plans until
ExecutorStart()
Currently, GetCachedPlan() takes a lock on all relations contained in
a cached plan before returning it as a valid plan to its callers for
execution. One disadvantage is that if the plan contains partitions
that are prunable with conditions involving EXTERN parameters and
other stable expressions (known as "initial pruning"), many of them
would be locked unnecessarily, because only those that survive
initial pruning need to have been locked. Locking all partitions this
way causes significant delay when there are many partitions. Note
that initial pruning occurs during executor's initialization of the
plan, that is, InitPlan().
This commit rearranges things to move the locking of child tables
referenced in a cached plan to occur during ExecInitNode() so that
initial pruning in the ExecInitNode() subroutines of the plan nodes
that support pruning can eliminate any child tables that need not be
scanned and thus locked.
To determine that a given table is a child table,
ExecGetRangeTableRelation() now looks at the RTE's inFromCl field,
which is only true for tables that are directly mentioned in the
query but false for child tables. Note that any tables whose RTEs'
inFromCl is true would already have been locked by GetCachedPlan(),
so need not be locked again during execution.
If the locking of child tables causes the CachedPlan to go stale, that
is, its is_valid set to false by PlanCacheRelCallback() when an
invalidation message matching some child table contained in the plan
is processed, ExecInitNode() abandons the initialization of the
remaining nodes in the plan tree. In that case, InitPlan() returns
after setting QueryDesc.planstate to NULL to indicate to the caller
that no execution is possible with the plan tree as is. Also,
ExecutorStart() now returns true or false to indicate whether or not
QueryDesc.planstate points to a successfully initialized PlanState
tree. Call sites that use GetCachedPlan() to get the plan trees to
pass to the executor should now be prepared to retry in the cases
where ExecutorStart() returns false.
Given this new behavior, PortalStart() now must always perform
ExecutorStart() to be able to drop and recreate cached plans if
needed, which is currently only done so for single-query portals.
For multi-query portals, the QueryDescs that are now created during
PortalStart() are remembered in a new List field of Portal called
'qdescs' and allocated in a new memory context 'queryContext'.
PortalRunMulti() now simply performs ExecutorRun() on the
QueryDescs found in 'qdescs'.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
contrib/auto_explain/auto_explain.c | 12 +-
.../pg_stat_statements/pg_stat_statements.c | 12 +-
contrib/postgres_fdw/postgres_fdw.c | 4 +
src/backend/commands/copyto.c | 10 +-
src/backend/commands/createas.c | 9 +-
src/backend/commands/explain.c | 144 +++++---
src/backend/commands/extension.c | 5 +-
src/backend/commands/matview.c | 9 +-
src/backend/commands/portalcmds.c | 24 +-
src/backend/commands/prepare.c | 32 +-
src/backend/executor/README | 39 ++
src/backend/executor/execMain.c | 96 ++++-
src/backend/executor/execParallel.c | 18 +-
src/backend/executor/execPartition.c | 14 +
src/backend/executor/execProcnode.c | 20 +-
src/backend/executor/execUtils.c | 63 +++-
src/backend/executor/functions.c | 5 +-
src/backend/executor/nodeAgg.c | 2 +
src/backend/executor/nodeAppend.c | 25 ++
src/backend/executor/nodeBitmapAnd.c | 5 +-
src/backend/executor/nodeBitmapHeapscan.c | 4 +
src/backend/executor/nodeBitmapIndexscan.c | 9 +-
src/backend/executor/nodeBitmapOr.c | 5 +-
src/backend/executor/nodeCustom.c | 2 +
src/backend/executor/nodeForeignscan.c | 4 +
src/backend/executor/nodeGather.c | 3 +
src/backend/executor/nodeGatherMerge.c | 2 +
src/backend/executor/nodeGroup.c | 2 +
src/backend/executor/nodeHash.c | 2 +
src/backend/executor/nodeHashjoin.c | 4 +
src/backend/executor/nodeIncrementalSort.c | 2 +
src/backend/executor/nodeIndexonlyscan.c | 11 +-
src/backend/executor/nodeIndexscan.c | 11 +-
src/backend/executor/nodeLimit.c | 2 +
src/backend/executor/nodeLockRows.c | 2 +
src/backend/executor/nodeMaterial.c | 2 +
src/backend/executor/nodeMemoize.c | 2 +
src/backend/executor/nodeMergeAppend.c | 25 ++
src/backend/executor/nodeMergejoin.c | 4 +
src/backend/executor/nodeModifyTable.c | 7 +
src/backend/executor/nodeNestloop.c | 4 +
src/backend/executor/nodeProjectSet.c | 2 +
src/backend/executor/nodeRecursiveunion.c | 4 +
src/backend/executor/nodeResult.c | 2 +
src/backend/executor/nodeSamplescan.c | 2 +
src/backend/executor/nodeSeqscan.c | 2 +
src/backend/executor/nodeSetOp.c | 2 +
src/backend/executor/nodeSort.c | 2 +
src/backend/executor/nodeSubqueryscan.c | 2 +
src/backend/executor/nodeTidrangescan.c | 2 +
src/backend/executor/nodeTidscan.c | 2 +
src/backend/executor/nodeUnique.c | 2 +
src/backend/executor/nodeWindowAgg.c | 2 +
src/backend/executor/spi.c | 51 ++-
src/backend/storage/lmgr/lmgr.c | 45 +++
src/backend/tcop/postgres.c | 19 +-
src/backend/tcop/pquery.c | 344 +++++++++---------
src/backend/utils/cache/lsyscache.c | 21 ++
src/backend/utils/cache/plancache.c | 149 +++-----
src/backend/utils/mmgr/portalmem.c | 9 +
src/include/commands/explain.h | 7 +-
src/include/executor/execdesc.h | 4 +
src/include/executor/executor.h | 19 +-
src/include/nodes/execnodes.h | 2 +
src/include/storage/lmgr.h | 1 +
src/include/tcop/pquery.h | 2 +-
src/include/utils/lsyscache.h | 1 +
src/include/utils/plancache.h | 14 +
src/include/utils/portal.h | 3 +
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 67 +++-
.../expected/cached-plan-replan.out | 156 ++++++++
.../specs/cached-plan-replan.spec | 61 ++++
73 files changed, 1251 insertions(+), 410 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-replan.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-replan.spec
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index c3ac27ae99..a0630d7944 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -78,7 +78,7 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
-static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void explain_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count, bool execute_once);
@@ -258,9 +258,11 @@ _PG_init(void)
/*
* ExecutorStart hook: start up logging if needed
*/
-static void
+static bool
explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
/*
* At the beginning of each top-level statement, decide whether we'll
* sample this statement. If nested-statement explaining is enabled,
@@ -296,9 +298,9 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
}
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
if (auto_explain_enabled())
{
@@ -316,6 +318,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return plan_valid;
}
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 55b957d251..1160a7326a 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -325,7 +325,7 @@ static PlannedStmt *pgss_planner(Query *parse,
const char *query_string,
int cursorOptions,
ParamListInfo boundParams);
-static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void pgss_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count, bool execute_once);
@@ -963,13 +963,15 @@ pgss_planner(Query *parse,
/*
* ExecutorStart hook: start up tracking if needed
*/
-static void
+static bool
pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
/*
* If query has queryId zero, don't track it. This prevents double
@@ -992,6 +994,8 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return plan_valid;
}
/*
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index c5cada55fb..1edd4c3f17 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2658,7 +2658,11 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
/* Get info about foreign table. */
rtindex = node->resultRelInfo->ri_RangeTableIndex;
if (fsplan->scan.scanrelid == 0)
+ {
dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (!ExecPlanStillValid(estate))
+ return;
+ }
else
dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9e4b2437a5..2b0d0a8ebd 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,8 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL,
+ pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -568,7 +569,12 @@ BeginCopyTo(ParseState *pstate,
*
* ExecutorStart computes a result tupdesc for us
*/
- ExecutorStart(cstate->queryDesc, 0);
+ {
+ bool plan_valid PG_USED_FOR_ASSERTS_ONLY;
+
+ plan_valid = ExecutorStart(cstate->queryDesc, 0);
+ Assert(plan_valid);
+ }
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index e91920ca14..bb359bb190 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,12 +325,17 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
/* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ {
+ bool plan_valid PG_USED_FOR_ASSERTS_ONLY;
+
+ plan_valid = ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ Assert(plan_valid);
+ }
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..954d83fb0a 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -393,6 +393,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
else
{
PlannedStmt *plan;
+ QueryDesc *queryDesc;
instr_time planstart,
planduration;
BufferUsage bufusage_start,
@@ -415,12 +416,89 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
+ queryDesc = ExplainQueryDesc(plan, NULL, queryString, into, es,
+ params, queryEnv);
+ Assert(queryDesc);
+
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(queryDesc, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
+/*
+ * ExplainQueryDesc
+ * Set up QueryDesc for EXPLAINing a given plan
+ *
+ * This returns NULL if cplan is found to be no longer valid.
+ */
+QueryDesc *
+ExplainQueryDesc(PlannedStmt *stmt, CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv)
+{
+ QueryDesc *queryDesc;
+ DestReceiver *dest;
+ int eflags;
+ int instrument_option = 0;
+
+ /*
+ * Normally we discard the query's output, but if explaining CREATE TABLE
+ * AS, we'd better use the appropriate tuple receiver.
+ */
+ if (into)
+ dest = CreateIntoRelDestReceiver(into);
+ else
+ dest = None_Receiver;
+
+ if (es->analyze && es->timing)
+ instrument_option |= INSTRUMENT_TIMER;
+ else if (es->analyze)
+ instrument_option |= INSTRUMENT_ROWS;
+
+ if (es->buffers)
+ instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
+
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+
+ /* Create a QueryDesc for the query */
+ queryDesc = CreateQueryDesc(stmt, cplan, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv, instrument_option);
+
+ /* Select execution options */
+ if (es->analyze)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_EXPLAIN_ONLY;
+ if (es->generic)
+ eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
+ if (into)
+ eflags |= GetIntoRelEFlags(into);
+
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated as we're doing that.
+ */
+ if (!ExecutorStart(queryDesc, eflags))
+ {
+ /* Clean up. */
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ return NULL;
+ }
+
+ return queryDesc;
+}
+
/*
* ExplainOneUtility -
* print out the execution plan for one utility statement
@@ -524,29 +602,16 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
{
- DestReceiver *dest;
- QueryDesc *queryDesc;
instr_time starttime;
double totaltime = 0;
- int eflags;
- int instrument_option = 0;
-
- Assert(plannedstmt->commandType != CMD_UTILITY);
- if (es->analyze && es->timing)
- instrument_option |= INSTRUMENT_TIMER;
- else if (es->analyze)
- instrument_option |= INSTRUMENT_ROWS;
-
- if (es->buffers)
- instrument_option |= INSTRUMENT_BUFFERS;
- if (es->wal)
- instrument_option |= INSTRUMENT_WAL;
+ Assert(queryDesc->plannedstmt->commandType != CMD_UTILITY);
/*
* We always collect timing for the entire statement, even when node-level
@@ -555,40 +620,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
- /*
- * Normally we discard the query's output, but if explaining CREATE TABLE
- * AS, we'd better use the appropriate tuple receiver.
- */
- if (into)
- dest = CreateIntoRelDestReceiver(into);
- else
- dest = None_Receiver;
-
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
-
- /* Select execution options */
- if (es->analyze)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_EXPLAIN_ONLY;
- if (es->generic)
- eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
- if (into)
- eflags |= GetIntoRelEFlags(into);
-
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
-
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -4865,6 +4896,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 4cc994ca31..477c299112 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -795,13 +795,16 @@ execute_sql_string(const char *sql)
if (stmt->utilityStmt == NULL)
{
QueryDesc *qdesc;
+ bool plan_valid PG_USED_FOR_ASSERTS_ONLY;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
- ExecutorStart(qdesc, 0);
+ plan_valid = ExecutorStart(qdesc, 0);
+ Assert(plan_valid);
ExecutorRun(qdesc, ForwardScanDirection, 0, true);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ac2e74fa3f..fb0b29384c 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,12 +408,17 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
/* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, 0);
+ {
+ bool plan_valid PG_USED_FOR_ASSERTS_ONLY;
+
+ plan_valid = ExecutorStart(queryDesc, 0);
+ Assert(plan_valid);
+ }
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 73ed7aa2f0..fdce72c9a5 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -143,9 +143,14 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
/*
* Start execution, inserting parameters if any.
*/
- PortalStart(portal, params, 0, GetActiveSnapshot());
+ {
+ bool plan_valid PG_USED_FOR_ASSERTS_ONLY;
+
+ plan_valid = PortalStart(portal, params, 0, GetActiveSnapshot());
- Assert(portal->strategy == PORTAL_ONE_SELECT);
+ Assert(portal->strategy == PORTAL_ONE_SELECT);
+ Assert(plan_valid);
+ }
/*
* We're done; the query won't actually be run until PerformPortalFetch is
@@ -249,6 +254,17 @@ PerformPortalClose(const char *name)
PortalDrop(portal, false);
}
+/*
+ * Release a portal's QueryDesc.
+ */
+void
+PortalQueryFinish(QueryDesc *queryDesc)
+{
+ ExecutorFinish(queryDesc);
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+}
+
/*
* PortalCleanup
*
@@ -295,9 +311,7 @@ PortalCleanup(Portal portal)
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
- FreeQueryDesc(queryDesc);
+ PortalQueryFinish(queryDesc);
CurrentResourceOwner = saveResourceOwner;
}
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..07f0421182 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -183,6 +183,7 @@ ExecuteQuery(ParseState *pstate,
paramLI = EvaluateParams(pstate, entry, stmt->params, estate);
}
+replan:
/* Create a new portal to run the query in */
portal = CreateNewPortal();
/* Don't display the portal in pg_cursors, it is for internal use only */
@@ -251,9 +252,16 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal contains a cached plan, it
+ * must be recreated if portal->plan_valid is false which tells that the
+ * cached plan was found to have been invalidated when initializing one of
+ * the plan trees contained in it.
*/
- PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ if (!PortalStart(portal, paramLI, eflags, GetActiveSnapshot()))
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
(void) PortalRun(portal, count, false, true, dest, dest, qc);
@@ -574,7 +582,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -618,6 +626,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI,
CurrentResourceOwner, queryEnv);
@@ -639,8 +648,21 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ {
+ QueryDesc *queryDesc;
+
+ queryDesc = ExplainQueryDesc(pstmt, cplan, queryString,
+ into, es, paramLI, queryEnv);
+ if (queryDesc == NULL)
+ {
+ ExplainResetOutput(es);
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ goto replan;
+ }
+ ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
+ }
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 67a5c1769b..f0312376c5 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,39 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Normally, the executor does not lock non-index relations appearing in a given
+plan tree when initializing it for execution if the plan tree is freshly
+created, that is, not derived from a CachedPlan. The reason for that is that
+the locks must already have been taken during parsing, rewriting, and planning
+of the query in that case. If the plan tree is a cached one, there may still
+be unlocked relations present in the plan tree, because GetCachedPlan() only
+locks the relations that would be present in the query's range table before
+planning occurs, but not relations that would have been added to the range
+table during planning. This means that inheritance child tables present in
+a cached plan, which are added to the query's range table during planning,
+would not have been locked when the plan enters the executor.
+
+GetCachedPlan() punts on locking child tables because not all may actually be
+scanned during a given execution of the plan if the child tables are partitions
+which may get pruned away due to executor-initialization-time pruning. So the
+locking of child tables is made to wait till execution-initialization-time,
+which occurs during ExecInitNode() on the plan nodes containing the child
+tables.
+
+So, there's a time window during which a cached plan tree could go stale
+if it contains child tables, because they could get changed in other backends
+before ExecInitNode() gets a lock on them. This means the executor now must
+check the validity of the plan tree every time it takes a lock on a child
+table contained in the tree (after executor-initialization-pruning, if any,
+has been performed), which it does by looking at CachedPlan.is_valid of the
+CachedPlan passed to it. If the plan tree is indeed stale (is_valid=false),
+the executor must give up continuing to initialize it any further and return
+to the caller letting it know that the execution must be retried with a new
+plan tree.
+
Query Processing Control Flow
-----------------------------
@@ -316,6 +349,12 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale during one of the recursive calls of ExecInitNode() after taking a
+lock on a child table, the control is immmediately returned to the caller of
+ExecutorStart(), which must redo the steps from CreateQueryDesc with a new
+plan tree.
+
Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 235bb52ccc..4a4b4b7690 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -79,7 +79,7 @@ ExecutorEnd_hook_type ExecutorEnd_hook = NULL;
ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
/* decls for local routines only used within this module */
-static void InitPlan(QueryDesc *queryDesc, int eflags);
+static bool InitPlan(QueryDesc *queryDesc, int eflags);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
static void ExecEndPlan(EState *estate);
@@ -128,7 +128,7 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
*
* ----------------------------------------------------------------
*/
-void
+bool
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
/*
@@ -140,14 +140,15 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
if (ExecutorStart_hook)
- (*ExecutorStart_hook) (queryDesc, eflags);
- else
- standard_ExecutorStart(queryDesc, eflags);
+ return (*ExecutorStart_hook) (queryDesc, eflags);
+
+ return standard_ExecutorStart(queryDesc, eflags);
}
-void
+bool
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
EState *estate;
MemoryContext oldcontext;
@@ -263,9 +264,11 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Initialize the plan state tree
*/
- InitPlan(queryDesc, eflags);
+ plan_valid = InitPlan(queryDesc, eflags);
MemoryContextSwitchTo(oldcontext);
+
+ return plan_valid;
}
/* ----------------------------------------------------------------
@@ -620,6 +623,17 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
RTEPermissionInfo *perminfo = lfirst_node(RTEPermissionInfo, l);
Assert(OidIsValid(perminfo->relid));
+
+ /*
+ * Relations whose permissions need to be checked must already have
+ * been locked by the parser or by GetCachedPlan() if a cached plan is
+ * being executed.
+ *
+ * XXX shouldn't we skip calling ExecCheckPermissions from InitPlan
+ * in a parallel worker?
+ */
+ Assert(CheckRelLockedByMe(perminfo->relid, AccessShareLock, true) ||
+ IsParallelWorker());
result = ExecCheckOneRelPerms(perminfo);
if (!result)
{
@@ -829,9 +843,26 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ *
+ * Normally, the plan tree given in queryDesc->plannedstmt is known to be
+ * valid in a race-free manner, that is, all relations contained in
+ * plannedstmt->relationOids would have already been locked. That is not the
+ * case however if the plannedstmt comes from a CachedPlan, one given in
+ * queryDesc->cplan. That's because GetCachedPlan() only locks the tables
+ * that are mentioned in the original query but not the child tables, which
+ * would have been added to the plan by the planner. In that case, locks on
+ * child tables will be taken when initializing their Scan nodes in
+ * ExecInitNode() to be done here. If the CachedPlan gets invalidated as
+ * those locks are taken, plan tree initialization is suspended at the point
+ * where the invalidation is first detected, queryDesc->planstate will be set
+ * to NULL, and queryDesc->plan_valid to false. Callers must retry the
+ * execution after creating a new CachedPlan in that case, after properly
+ * releasing the resources of this QueryDesc, which includes calling
+ * ExecutorFinish() and ExecutorEnd() on the EState contained therein.
* ----------------------------------------------------------------
*/
-static void
+static bool
InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
@@ -839,7 +870,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
- PlanState *planstate;
+ PlanState *planstate = NULL;
TupleDesc tupType;
ListCell *l;
int i;
@@ -850,10 +881,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
/*
- * initialize the node's execution state
+ * Set up range table in EState.
*/
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+ estate->es_cachedplan = queryDesc->cplan;
estate->es_plannedstmt = plannedstmt;
/*
@@ -886,6 +918,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (!ExecPlanStillValid(estate))
+ goto plan_init_suspended;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -953,6 +987,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
sp_eflags |= EXEC_FLAG_REWIND;
subplanstate = ExecInitNode(subplan, estate, sp_eflags);
+ if (!ExecPlanStillValid(estate))
+ {
+ Assert(subplanstate == NULL);
+ goto plan_init_suspended;
+ }
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
@@ -966,6 +1005,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ {
+ Assert(planstate == NULL);
+ goto plan_init_suspended;
+ }
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -1008,7 +1052,18 @@ InitPlan(QueryDesc *queryDesc, int eflags)
}
queryDesc->tupDesc = tupType;
+ Assert(planstate != NULL);
queryDesc->planstate = planstate;
+ return true;
+
+plan_init_suspended:
+ /*
+ * Plan initialization failed. Mark QueryDesc as such. ExecEndPlan()
+ * will clean up initialized plan nodes from estate->es_inited_plannodes.
+ */
+ Assert(planstate == NULL);
+ queryDesc->planstate = NULL;
+ return false;
}
/*
@@ -1426,7 +1481,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked by the planner or ExecLockAppendNonLeafRelations().
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
@@ -2856,7 +2911,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2943,6 +2999,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
PlanState *subplanstate;
subplanstate = ExecInitNode(subplan, rcestate, 0);
+
+ /*
+ * At this point, we had better not received any new invalidation
+ * messages that would have caused the plan tree to go stale.
+ */
+ Assert(ExecPlanStillValid(rcestate) && subplanstate);
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
}
@@ -2986,6 +3048,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
*/
epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
+ /*
+ * At this point, we had better not received any new invalidation messages
+ * that would have caused the plan tree to go stale.
+ */
+ Assert(ExecPlanStillValid(rcestate) && epqstate->recheckplanstate);
+
MemoryContextSwitchTo(oldcontext);
}
@@ -3008,6 +3076,10 @@ EvalPlanQualEnd(EPQState *epqstate)
MemoryContext oldcontext;
ListCell *l;
+ /* Nothing to do if EvalPlanQualInit() wasn't done to begin with. */
+ if (epqstate->parentestate == NULL)
+ return;
+
rtsize = epqstate->parentestate->es_range_table_size;
/*
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index cc2b8ccab7..d32bc74609 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1248,8 +1248,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. Note that no CachedPlan is available
+ * here even if the leader may have gotten the plan tree from one. That's
+ * fine though, because the leader would have taken the locks necessary
+ * for the plan tree that we have here to be fully valid. That is true
+ * despite the fact that we will be taking our own copies of those locks
+ * in ExecGetRangeTableRelation(), because none of them would be the locks
+ * that are not already taken by the leader.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
@@ -1430,7 +1439,12 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
- ExecutorStart(queryDesc, fpes->eflags);
+ {
+ bool plan_valid PG_USED_FOR_ASSERTS_ONLY;
+
+ plan_valid = ExecutorStart(queryDesc, fpes->eflags);
+ Assert(plan_valid);
+ }
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index eb8a87fd63..cf73d28baa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -513,6 +513,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
oldcxt = MemoryContextSwitchTo(proute->memcxt);
+ /*
+ * Note that while we normally check ExecPlanStillValid(estate) after each
+ * lock taken during execution initialization, it is fine not do so for
+ * partitions opened here, for tuple routing. Locks taken here can't
+ * possibly invalidate the plan given that the plan doesn't contain any
+ * info about those partitions.
+ */
partrel = table_open(partOid, RowExclusiveLock);
leaf_part_rri = makeNode(ResultRelInfo);
@@ -1111,6 +1118,9 @@ ExecInitPartitionDispatchInfo(EState *estate,
* Only sub-partitioned tables need to be locked here. The root
* partitioned table will already have been locked as it's referenced in
* the query's rtable.
+ *
+ * See the comment in ExecInitPartitionInfo() about taking locks and
+ * not checking ExecPlanStillValid(estate) here.
*/
if (partoid != RelationGetRelid(proute->partition_root))
rel = table_open(partoid, RowExclusiveLock);
@@ -1801,6 +1811,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/* Create the working data structure for pruning */
prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Perform an initial partition prune pass, if required.
@@ -1927,6 +1939,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 653f74cf58..2dcacafd03 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -135,10 +135,17 @@ static bool ExecShutdownNode_walker(PlanState *node, void *context);
* 'estate' is the shared execution state for the plan tree
* 'eflags' is a bitwise OR of flag bits described in executor.h
*
- * Returns a PlanState node corresponding to the given Plan node.
+ * Returns a PlanState node corresponding to the given Plan node or NULL.
*
- * As a side-effect, all PlanState nodes that are created are appended to
- * estate->es_planstate_nodes for the cleanup processing in ExecEndPlan().
+ * NULL may be returned either if the input node is NULL or if the plan
+ * tree that the node is a part of is found to have been invalidated when
+ * taking a lock on the relation mentioned in the node or in a child
+ * node. The latter case arises if the plan tree contains inheritance/
+ * partition child tables and is from a CachedPlan.
+ *
+ * As a side-effect, all PlanState nodes that are successfully created are
+ * appended to estate->es_planstate_nodes for the cleanup processing in
+ * ExecEndPlan().
* ------------------------------------------------------------------------
*/
PlanState *
@@ -391,6 +398,13 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
break;
}
+ if (!ExecPlanStillValid(estate))
+ {
+ Assert(result == NULL);
+ return NULL;
+ }
+
+ Assert(result != NULL);
ExecSetExecProcNode(result, result->ExecProcNode);
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index b567165003..ee12235b2f 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -806,7 +806,25 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (IsParallelWorker() ||
+ (estate->es_cachedplan != NULL && !rte->inFromCl))
+ {
+ /*
+ * Take a lock if we are a parallel worker or if this is a child
+ * table referenced in a cached plan.
+ *
+ * Parallel workers need to have their own local lock on the
+ * relation. This ensures sane behavior in case the parent process
+ * exits before we do.
+ *
+ * When executing a cached plan, child tables must be locked
+ * here, because plancache.c (GetCachedPlan()) would only have
+ * locked tables mentioned in the query, that is, tables whose
+ * RTEs' inFromCl is true.
+ */
+ rel = table_open(rte->relid, rte->rellockmode);
+ }
+ else
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -819,15 +837,6 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rellockmode == AccessShareLock ||
CheckRelationLockedByMe(rel, rte->rellockmode, false));
}
- else
- {
- /*
- * If we are a parallel worker, we need to obtain our own local
- * lock on the relation. This ensures sane behavior in case the
- * parent process exits before we do.
- */
- rel = table_open(rte->relid, rte->rellockmode);
- }
estate->es_relations[rti - 1] = rel;
}
@@ -835,6 +844,38 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
return rel;
}
+/*
+ * ExecLockAppendNonLeafRelations
+ * Lock non-leaf relations whose children are scanned by a given
+ * Append/MergeAppend node
+ */
+void
+ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids)
+{
+ ListCell *l;
+
+ /* This should get called only when executing cached plans. */
+ Assert(estate->es_cachedplan != NULL);
+ foreach(l, allpartrelids)
+ {
+ Bitmapset *partrelids = lfirst_node(Bitmapset, l);
+ int i;
+
+ /*
+ * Note that we don't lock the first member (i=0) of each bitmapset
+ * because it stands for the root parent mentioned in the query that
+ * should always have been locked before entering the executor.
+ */
+ i = 0;
+ while ((i = bms_next_member(partrelids, i)) > 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(i, estate);
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+}
+
/*
* ExecInitResultRelation
* Open relation given by the passed-in RT index and fill its
@@ -850,6 +891,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (!ExecPlanStillValid(estate))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f55424eb5a..bc09ef992c 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -838,6 +838,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL, /* fmgr_sql() doesn't use CachedPlans */
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -857,12 +858,14 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
* lazyEval mode for any statement that could possibly queue triggers.
*/
int eflags;
+ bool plan_valid PG_USED_FOR_ASSERTS_ONLY;
if (es->lazyEval)
eflags = EXEC_FLAG_SKIP_TRIGGERS;
else
eflags = 0; /* default run-to-completion flags */
- ExecutorStart(es->qd, eflags);
+ plan_valid = ExecutorStart(es->qd, eflags);
+ Assert(plan_valid);
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index e9d9ab6bdd..9553a85115 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3304,6 +3304,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize source tuple type.
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 9148d7d3b1..b0cae25b33 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -133,6 +133,27 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_syncdone = false;
appendstate->as_begun = false;
+ /*
+ * Must take locks on child tables if running a cached plan, because
+ * GetCachedPlan() would've only locked the root parent named in the
+ * query.
+ *
+ * First lock non-leaf partitions before doing pruning if any. Even when
+ * no pruning is to be done, non-leaf partitions still must be locked
+ * explicitly like this, because they're not referenced elsewhere in
+ * the plan tree. XXX - OTOH, non-leaf partitions mentioned in
+ * part_prune_info, if any, would be opened by ExecInitPartitionPruning()
+ * using ExecGetRangeTableRelation() which locks child tables, redundantly
+ * in this case.
+ */
+ if (estate->es_cachedplan)
+ {
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
+ }
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
@@ -147,6 +168,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
list_length(node->appendplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -221,6 +244,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
appendstate->as_first_partial_plan = firstvalid;
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 147592f7e2..53afcef21c 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -88,8 +88,9 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
foreach(l, node->bitmapplans)
{
initNode = (Plan *) lfirst(l);
- bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
- i++;
+ bitmapplanstates[i++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
/*
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index d58ee4f4e1..388a02ec99 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -760,11 +760,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 83ec9ede89..99015812a1 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -211,6 +211,7 @@ BitmapIndexScanState *
ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
{
BitmapIndexScanState *indexstate;
+ Relation indexRelation;
LOCKMODE lockmode;
/* check for unsupported flags */
@@ -262,7 +263,13 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
+ indexRelation = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ {
+ index_close(indexRelation, lockmode);
+ return NULL;
+ }
+ indexstate->biss_RelationDesc = indexRelation;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 736852a0ae..425f22ee48 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -89,8 +89,9 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
foreach(l, node->bitmapplans)
{
initNode = (Plan *) lfirst(l);
- bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
- i++;
+ bitmapplanstates[i++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
/*
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index bd42c65b29..91239cc500 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -61,6 +61,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
css->ss.ss_currentRelation = scan_rel;
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index e6616dd718..71495313db 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -173,6 +173,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -264,6 +266,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Tell the FDW to initialize the scan.
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index f7a69f185b..c5652aeb2d 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -89,6 +89,9 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index d357ff0c47..1191b9e420 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -108,6 +108,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Leader may access ExecProcNode result directly (if
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 2badcc7e60..b4c3044c1f 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -185,6 +185,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index edd2324384..b2119febb6 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -386,6 +386,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize our result slot and type. No need to build projection
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 8078d7f229..d5ff80660e 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -752,8 +752,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 52b146cfb8..785896e5ea 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1041,6 +1041,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0b43a9b969..ea8bef4b97 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -490,6 +490,7 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
{
IndexOnlyScanState *indexstate;
Relation currentRelation;
+ Relation indexRelation;
LOCKMODE lockmode;
TupleDesc tupDesc;
@@ -512,6 +513,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -564,7 +567,13 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->ioss_RelationDesc = index_open(node->indexid, lockmode);
+ indexRelation = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ {
+ index_close(indexRelation, lockmode);
+ return NULL;
+ }
+ indexstate->ioss_RelationDesc = indexRelation;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 4540c7781d..956e9e5543 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -904,6 +904,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
{
IndexScanState *indexstate;
Relation currentRelation;
+ Relation indexRelation;
LOCKMODE lockmode;
/*
@@ -925,6 +926,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -969,7 +972,13 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+ indexRelation = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ {
+ index_close(indexRelation, lockmode);
+ return NULL;
+ }
+ indexstate->iss_RelationDesc = indexRelation;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index a75099dd73..a1fc36a3f0 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -476,6 +476,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize child expressions
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 55de8d3d65..ff86a82b92 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -322,6 +322,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index ef04e9a8e7..8d02ac0ccb 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result type and slot. No need to initialize projection info
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 61578d4b5c..a994d48fb2 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -938,6 +938,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize return slot and type. No need to initialize projection info
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8aa64944c9..18808c19ae 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -81,6 +81,27 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.state = estate;
mergestate->ps.ExecProcNode = ExecMergeAppend;
+ /*
+ * Must take locks on child tables if running a cached plan, because
+ * GetCachedPlan() would've only locked the root parent named in the
+ * query.
+ *
+ * First lock non-leaf partitions before doing pruning if any. Even when
+ * no pruning is to be done, non-leaf partitions still must be locked
+ * explicitly like this, because they're not referenced elsewhere in
+ * the plan tree. XXX - OTOH, non-leaf partitions mentioned in
+ * part_prune_info, if any, would be opened by ExecInitPartitionPruning()
+ * using ExecGetRangeTableRelation() which locks child tables, redundantly
+ * in this case.
+ */
+ if (estate->es_cachedplan)
+ {
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
+ }
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
@@ -95,6 +116,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
list_length(node->mergeplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -151,6 +174,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
mergestate->ps.ps_ProjInfo = NULL;
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 7b530d9088..0d92ec278a 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1490,11 +1490,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index bdbaa4753b..2245a67397 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3984,6 +3984,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
node->epqParam, node->resultRelations);
@@ -4011,6 +4014,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
if (resultRelInfo != mtstate->rootResultRelInfo)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* For child result relations, store the root result relation
@@ -4038,6 +4043,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Do additional per-result-relation initialization.
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index 5cfb50a366..e24554f4f8 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -295,11 +295,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result slot, type and projection.
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index 4a388220ee..863bf2cc65 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -247,6 +247,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index aee31c7139..3f3de771d0 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index a100b144be..f2206af451 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -208,6 +208,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index d7e22b1dbb..22357e7a0e 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -125,6 +125,8 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 4da0f28f7b..b0b34cd14e 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,8 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index f7db9a3415..3535aa298c 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index 078d041c40..547203ebfd 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index bc55a82fc3..8b6629c939 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 2124c55ef5..613b377c7c 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -386,6 +386,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 862bd0330b..1b0a2d8083 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -529,6 +529,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 50babacdc8..ae9af8f21e 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -136,6 +136,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result slot and type. Unique nodes do no projections, so
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 648cdadc32..77de2d0c22 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2458,6 +2458,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize source tuple type (which is also the tuple type that we'll
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 33975687b3..cfee208719 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -71,7 +71,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1582,6 +1582,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
Snapshot snapshot;
MemoryContext oldcontext;
Portal portal;
+ bool plan_valid;
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -1623,6 +1624,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
_SPI_current->processed = 0;
_SPI_current->tuptable = NULL;
+replan:
/* Create the portal */
if (name == NULL || name[0] == '\0')
{
@@ -1766,15 +1768,24 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if portal->plan_valid is false which tells that the cached
+ * plan was found to have been invalidated when initializing one of the
+ * plan trees contained in it.
*/
- PortalStart(portal, paramLI, 0, snapshot);
+ plan_valid = PortalStart(portal, paramLI, 0, snapshot);
Assert(portal->strategy != PORTAL_MULTI_QUERY);
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
+ if (!plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/* Pop the SPI stack */
_SPI_end_call(true);
@@ -2552,6 +2563,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2661,6 +2673,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2668,14 +2681,31 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+
+ if (!ExecutorStart(qdesc, eflags))
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ Assert(cplan);
+ ReleaseCachedPlan(cplan, plan_owner);
+ goto replan;
+ }
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2850,10 +2880,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2897,14 +2926,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a672..c807e9cdcc 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -27,6 +27,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
@@ -364,6 +365,50 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
return false;
}
+/*
+ * CheckRelLockedByMe
+ *
+ * Returns true if current transaction holds a lock on the given relation of
+ * mode 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
+ */
+bool
+CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+ Oid dbId = get_rel_relisshared(relid) ? InvalidOid : MyDatabaseId;
+ LOCKTAG tag;
+
+ SET_LOCKTAG_RELATION(tag, dbId, relid);
+
+ if (LockHeldByMe(&tag, lockmode))
+ return true;
+
+ if (orstronger)
+ {
+ LOCKMODE slockmode;
+
+ for (slockmode = lockmode + 1;
+ slockmode <= MaxLockMode;
+ slockmode++)
+ {
+ if (LockHeldByMe(&tag, slockmode))
+ {
+#ifdef NOT_USED
+ /* Sometimes this might be useful for debugging purposes */
+ elog(WARNING, "lock mode %s substituted for %s on relation %s",
+ GetLockmodeName(tag.locktag_lockmethodid, slockmode),
+ GetLockmodeName(tag.locktag_lockmethodid, lockmode),
+ RelationGetRelationName(relation));
+#endif
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
/*
* LockHasWaitersRelation
*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 36cc99ec9c..cf27fa3968 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1232,7 +1232,12 @@ exec_simple_query(const char *query_string)
/*
* Start the portal. No parameters here.
*/
- PortalStart(portal, NULL, 0, InvalidSnapshot);
+ {
+ bool plan_valid PG_USED_FOR_ASSERTS_ONLY;
+
+ plan_valid = PortalStart(portal, NULL, 0, InvalidSnapshot);
+ Assert(plan_valid);
+ }
/*
* Select the appropriate output format: text unless we are doing a
@@ -1737,6 +1742,7 @@ exec_bind_message(StringInfo input_message)
"commands ignored until end of transaction block"),
errdetail_abort()));
+replan:
/*
* Create the portal. Allow silent replacement of an existing portal only
* if the unnamed portal is specified.
@@ -2028,9 +2034,16 @@ exec_bind_message(StringInfo input_message)
PopActiveSnapshot();
/*
- * And we're ready to start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if portal->plan_valid is false which tells that the cached
+ * plan was found to have been invalidated when initializing one of the
+ * plan trees contained in it.
*/
- PortalStart(portal, params, 0, InvalidSnapshot);
+ if (!PortalStart(portal, params, 0, InvalidSnapshot))
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
/*
* Apply the result format requests to the portal.
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5565f200c3..528f795d4f 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -35,12 +36,6 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc);
static void FillPortalStore(Portal portal, bool isTopLevel);
static uint64 RunFromStore(Portal portal, ScanDirection direction, uint64 count,
DestReceiver *dest);
@@ -65,6 +60,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +73,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan, if plan is from one */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -116,86 +113,6 @@ FreeQueryDesc(QueryDesc *qdesc)
}
-/*
- * ProcessQuery
- * Execute a single plannable query within a PORTAL_MULTI_QUERY,
- * PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
- *
- * plan: the plan tree for the query
- * sourceText: the source text of the query
- * params: any parameters needed
- * dest: where to send results
- * qc: where to store the command completion status data.
- *
- * qc may be NULL if caller doesn't want a status string.
- *
- * Must be called in a memory context that will be reset or deleted on
- * error; otherwise the executor's memory usage will be leaked.
- */
-static void
-ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc)
-{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
-
- /*
- * Run the plan to completion.
- */
- ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
-
- /*
- * Build command completion status data, if caller wants one.
- */
- if (qc)
- {
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
- SetQueryCompletion(qc, CMDTAG_SELECT, queryDesc->estate->es_processed);
- break;
- case CMD_INSERT:
- SetQueryCompletion(qc, CMDTAG_INSERT, queryDesc->estate->es_processed);
- break;
- case CMD_UPDATE:
- SetQueryCompletion(qc, CMDTAG_UPDATE, queryDesc->estate->es_processed);
- break;
- case CMD_DELETE:
- SetQueryCompletion(qc, CMDTAG_DELETE, queryDesc->estate->es_processed);
- break;
- case CMD_MERGE:
- SetQueryCompletion(qc, CMDTAG_MERGE, queryDesc->estate->es_processed);
- break;
- default:
- SetQueryCompletion(qc, CMDTAG_UNKNOWN, queryDesc->estate->es_processed);
- break;
- }
- }
-
- /*
- * Now, we close down all the scans and free allocated resources.
- */
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
-
- FreeQueryDesc(queryDesc);
-}
-
/*
* ChoosePortalStrategy
* Select portal execution strategy given the intended statement list.
@@ -426,19 +343,21 @@ FetchStatementTargetList(Node *stmt)
* presently ignored for non-PORTAL_ONE_SELECT portals (it's only intended
* to be used for cursors).
*
- * On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * True is returned if portal is ready to accept PortalRun() calls, and the
+ * result tupdesc (if any) is known. False if the plan tree is no longer
+ * valid, in which case, the caller must retry after generating a new
+ * CachedPlan.
*/
-void
+bool
PortalStart(Portal portal, ParamListInfo params,
int eflags, Snapshot snapshot)
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
- int myeflags;
+ int myeflags = 0;
+ bool plan_valid = true;
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_DEFINED);
@@ -448,15 +367,13 @@ PortalStart(Portal portal, ParamListInfo params,
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -472,6 +389,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -493,6 +412,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -501,30 +421,51 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti(). */
+ if (portal->strategy == PORTAL_ONE_RETURNING ||
+ portal->strategy == PORTAL_ONE_MOD_WITH)
+ portal->qdescs = list_make1(queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Call ExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated as we're doing that.
*/
- ExecutorStart(queryDesc, myeflags);
+ if (!ExecutorStart(queryDesc, myeflags))
+ {
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ PopActiveSnapshot();
+ plan_valid = false;
+ goto plan_init_failed;
+ }
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -536,29 +477,6 @@ PortalStart(Portal portal, ParamListInfo params,
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -581,7 +499,81 @@ PortalStart(Portal portal, ParamListInfo params,
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool first = true;
+
+ myeflags = eflags;
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /*
+ * Push the snapshot to be used by the executor.
+ */
+ if (!is_utility)
+ {
+ /*
+ * Must copy the snapshot for all statements
+ * except thec first as we'll need to update its
+ * command ID.
+ */
+ if (!first)
+ PushCopiedSnapshot(GetTransactionSnapshot());
+ else
+ PushActiveSnapshot(GetTransactionSnapshot());
+ }
+
+ /*
+ * From the 2nd statement onwards, update the command
+ * ID and the snapshot to match.
+ */
+ if (!first)
+ {
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+ }
+
+ first = false;
+
+ /*
+ * Create the QueryDesc object. DestReceiver will
+ * be set in PortalRunMulti().
+ */
+ queryDesc = CreateQueryDesc(plan, portal->cplan,
+ portal->sourceText,
+ !is_utility ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ if (is_utility)
+ continue;
+
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated as
+ * we're doing that.
+ */
+ if (!ExecutorStart(queryDesc, myeflags))
+ {
+ PopActiveSnapshot();
+ Assert(queryDesc->cplan);
+ PortalQueryFinish(queryDesc);
+ plan_valid = false;
+ goto plan_init_failed;
+ }
+ PopActiveSnapshot();
+ }
+ }
+
portal->tupDesc = NULL;
break;
}
@@ -594,19 +586,20 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+plan_init_failed:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
- portal->status = PORTAL_READY;
+ return plan_valid;
}
/*
@@ -1193,7 +1186,7 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1207,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = (QueryDesc *) lfirst(qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1233,33 +1227,26 @@ PortalRunMulti(Portal portal,
if (log_executor_stats)
ResetUsage();
- /*
- * Must always have a snapshot for plannable queries. First time
- * through, take a new snapshot; for subsequent queries in the
- * same portal, just update the snapshot's copy of the command
- * counter.
- */
+ /* Push the snapshot for plannable queries. */
if (!active_snapshot_set)
{
- Snapshot snapshot = GetTransactionSnapshot();
+ Snapshot snapshot = qdesc->snapshot;
- /* If told to, register the snapshot and save in portal */
+ /*
+ * If told to, register the snapshot and save in portal
+ *
+ * Note that the command ID of qdesc->snapshot for 2nd query
+ * onwards would have been updated in PortalStart() to account
+ * for CCI() done between queries, but it's OK that here we
+ * don't likewise update holdSnapshot's command ID.
+ */
if (setHoldSnapshot)
{
snapshot = RegisterSnapshot(snapshot);
portal->holdSnapshot = snapshot;
}
- /*
- * We can't have the holdSnapshot also be the active one,
- * because UpdateActiveSnapshotCommandId would complain. So
- * force an extra snapshot copy. Plain PushActiveSnapshot
- * would have copied the transaction snapshot anyway, so this
- * only adds a copy step when setHoldSnapshot is true. (It's
- * okay for the command ID of the active snapshot to diverge
- * from what holdSnapshot has.)
- */
- PushCopiedSnapshot(snapshot);
+ PushActiveSnapshot(snapshot);
/*
* As for PORTAL_ONE_SELECT portals, it does not seem
@@ -1268,26 +1255,39 @@ PortalRunMulti(Portal portal,
active_snapshot_set = true;
}
- else
- UpdateActiveSnapshotCommandId();
+ /*
+ * Run the plan to completion.
+ */
+ qdesc->dest = dest;
+ ExecutorRun(qdesc, ForwardScanDirection, 0, true);
+
+ /*
+ * Build command completion status data if needed.
+ */
if (pstmt->canSetTag)
{
- /* statement can set tag string */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- dest, qc);
- }
- else
- {
- /* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- altdest, NULL);
+ switch (qdesc->operation)
+ {
+ case CMD_SELECT:
+ SetQueryCompletion(qc, CMDTAG_SELECT, qdesc->estate->es_processed);
+ break;
+ case CMD_INSERT:
+ SetQueryCompletion(qc, CMDTAG_INSERT, qdesc->estate->es_processed);
+ break;
+ case CMD_UPDATE:
+ SetQueryCompletion(qc, CMDTAG_UPDATE, qdesc->estate->es_processed);
+ break;
+ case CMD_DELETE:
+ SetQueryCompletion(qc, CMDTAG_DELETE, qdesc->estate->es_processed);
+ break;
+ case CMD_MERGE:
+ SetQueryCompletion(qc, CMDTAG_MERGE, qdesc->estate->es_processed);
+ break;
+ default:
+ SetQueryCompletion(qc, CMDTAG_UNKNOWN, qdesc->estate->es_processed);
+ break;
+ }
}
if (log_executor_stats)
@@ -1342,12 +1342,12 @@ PortalRunMulti(Portal portal,
if (portal->stmts == NIL)
break;
- /*
- * Increment command counter between queries, but not after the last
- * one.
- */
- if (lnext(portal->stmts, stmtlist_item) != NULL)
- CommandCounterIncrement();
+ if (qdesc->estate)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ }
+ FreeQueryDesc(qdesc);
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index fc6d267e44..2725d02312 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2095,6 +2095,27 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * get_rel_relisshared
+ *
+ * Returns if the given relation is shared or not
+ */
+bool
+get_rel_relisshared(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relisshared;
+ ReleaseSysCache(tp);
+
+ return result;
+}
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index d67cd9a405..84a354a701 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -102,13 +102,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,8 +790,14 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * Note though that if the plan contains any child relations that would have
+ * been added by the planner, which would not have been locked yet (because
+ * AcquirePlannerLocks() only locks relations that would be present in the
+ * range table before entering the planner), the plan could go stale before
+ * it reaches execution if any of those child relations get modified
+ * concurrently. The executor must check that the plan (CachedPlan) is still
+ * valid after taking a lock on each of the child tables, and if it is not,
+ * ask the caller to recreate the plan.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -805,60 +811,56 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
-
- return false;
+ return cplan->is_valid;
}
/*
@@ -1128,8 +1130,15 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid unless it contains inheritance/partition child
+ * tables, that is, only the locks on the tables mentioned in the query have
+ * been taken. If any of those tables have inheritance/partition tables, the
+ * executor must also lock them before executing the plan and if the plan gets
+ * invalidated as a result of taking those locks, must ask the caller to get
+ * a new plan by calling here again. Locking of the child tables must be
+ * deferred to the executor like this, because not all child tables may need
+ * to be locked; some may get pruned during the executor plan initialization
+ * phase (InitPlan()).
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1362,8 +1371,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if the executor would need to take additional locks, that is, in
+ * addition to those taken by AcquirePlannerLocks() on a given query.
*/
foreach(lc, plan->stmt_list)
{
@@ -1739,58 +1748,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 06dfa85f04..0cad450dcd 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -201,6 +201,13 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /*
+ * initialize portal's query context to store QueryDescs created during
+ * PortalStart() and then used in PortalRun().
+ */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -224,6 +231,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -594,6 +602,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3d3e632a0c..392abb5150 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -88,7 +88,11 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt, struct CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv);
+extern void ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
@@ -104,6 +108,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index af2bf36dfb..4b7368a0dc 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -32,9 +32,12 @@
*/
typedef struct QueryDesc
{
+ NodeTag type;
+
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ struct CachedPlan *cplan; /* CachedPlan, if plannedstmt is from one */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +60,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ struct CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index c677e490d7..edf2f13d04 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -72,7 +73,7 @@
/* Hook for plugins to get control in ExecutorStart() */
-typedef void (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
+typedef bool (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
/* Hook for plugins to get control in ExecutorRun() */
@@ -197,8 +198,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
-extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
-extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count, bool execute_once);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
@@ -256,6 +257,17 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the cached plan, if any, still valid at this point? That is, not
+ * invalidated by the incoming invalidation messages that have been processed
+ * recently.
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanStillValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
@@ -590,6 +602,7 @@ exec_rt_fetch(Index rti, EState *estate)
}
extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
+extern void ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids);
extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Index rti);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 233fb6b4f9..20c1bacae1 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -623,6 +623,8 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct CachedPlan *es_cachedplan; /* CachedPlan if plannedstmt is from
+ * one */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 4ee91e3cf9..598bf2688a 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,7 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
bool orstronger);
+extern bool CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger);
extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/tcop/pquery.h b/src/include/tcop/pquery.h
index a5e65b98aa..577b81a9ee 100644
--- a/src/include/tcop/pquery.h
+++ b/src/include/tcop/pquery.h
@@ -29,7 +29,7 @@ extern List *FetchPortalTargetList(Portal portal);
extern List *FetchStatementTargetList(Node *stmt);
-extern void PortalStart(Portal portal, ParamListInfo params,
+extern bool PortalStart(Portal portal, ParamListInfo params,
int eflags, Snapshot snapshot);
extern void PortalSetResultFormat(Portal portal, int nFormats,
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index f5fdbfe116..a024e5dcd0 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -140,6 +140,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_relisshared(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 916e59d9fe..c83a67fea3 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -221,6 +221,20 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Called by the executor on every relation lock taken when initializing the
+ * plan tree in the CachedPlan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aa08b1e0fc..e7e2fb0c3f 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,8 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +244,7 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalQueryFinish(QueryDesc *queryDesc);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..2fca84d027 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-replan
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7cd76eb34b..ce189156ad 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2023, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,45 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static bool
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ bool plan_valid;
+
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ plan_valid = prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ plan_valid ? "valid" : "not valid");
+
+ return plan_valid;
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +127,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-replan.out b/src/test/modules/delay_execution/expected/cached-plan-replan.out
new file mode 100644
index 0000000000..0ac6a17c2b
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-replan.out
@@ -0,0 +1,156 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo11 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = $1)
+(6 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-----------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo11 foo_1
+ Filter: (a = $1)
+(4 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------
+Bitmap Heap Scan on foo11 foo
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = 1)
+(4 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------
+Seq Scan on foo11 foo
+ Filter: (a = 1)
+(2 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+----------------------------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Index Only Scan using foo11_a_idx on foo11 t1
+ -> Materialize
+ -> Index Scan using foo11_a_idx on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(18 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Sort
+ Sort Key: t1.a
+ -> Seq Scan on foo11 t1
+ -> Sort
+ Sort Key: t2.a
+ -> Seq Scan on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(21 rows)
+
diff --git a/src/test/modules/delay_execution/specs/cached-plan-replan.spec b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
new file mode 100644
index 0000000000..3c92cbd5c6
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
@@ -0,0 +1,61 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo1 PARTITION OF foo FOR VALUES IN (1) PARTITION BY LIST (a);
+ CREATE TABLE foo11 PARTITION OF foo1 FOR VALUES IN (1);
+ CREATE INDEX foo11_a ON foo1 (a);
+ CREATE TABLE foo2 PARTITION OF foo FOR VALUES IN (2);
+ CREATE VIEW foov AS SELECT * FROM foo;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP TABLE foo;
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# no Append case (only one partition selected by the planner)
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Append with partition-wise join aggregate and join plans as child subplans
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo11_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
--
2.35.3
On Wed, Aug 2, 2023 at 10:39 PM Amit Langote <amitlangote09@gmail.com> wrote:
Having extracted the ExecEndNode() change, I'm also starting to feel
inclined to extract a couple of other bits from the main patch as
separate patches, such as moving the ExecutorStart() call from
PortalRun() to PortalStart() for the multi-query portals. I'll do
that in the next version.
Here's a patch set where the refactoring to move the ExecutorStart()
calls to be closer to GetCachedPlan() (for the call sites that use a
CachedPlan) is extracted into a separate patch, 0002. Its commit
message notes an aspect of this refactoring that I feel a bit nervous
about -- needing to also move the CommandCounterIncrement() call from
the loop in PortalRunMulti() to PortalStart() which now does
ExecutorStart() for the PORTAL_MULTI_QUERY case.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v44-0004-Set-inFromCl-to-false-in-child-table-RTEs.patchapplication/octet-stream; name=v44-0004-Set-inFromCl-to-false-in-child-table-RTEs.patchDownload
From 26df10ea36b2089d59129b066d3dfaedb3aa5e0c Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:43 +0900
Subject: [PATCH v44 4/6] Set inFromCl to false in child table RTEs
This is to allow the executor be able to distinguish tables that are
directly mentioned in the query from those that get added to the
query during planning. A subsequent commit will teach the executor
to lock only the tables of the latter kind when executing a cached
plan.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.comk
---
src/backend/optimizer/util/inherit.c | 6 ++++++
src/backend/parser/analyze.c | 7 +++----
src/include/nodes/parsenodes.h | 9 +++++++--
3 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 94de855a22..9bac07bf40 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -492,6 +492,12 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
}
else
childrte->inh = false;
+ /*
+ * Mark child tables as not being directly mentioned in the query. This
+ * allows the executor's ExecGetRangeTableRelation() to conveniently
+ * identify it as an inheritance child table.
+ */
+ childrte->inFromCl = false;
childrte->securityQuals = NIL;
/*
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 4006632092..bcf6fcdde2 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -3267,10 +3267,9 @@ transformLockingClause(ParseState *pstate, Query *qry, LockingClause *lc,
/*
* Lock all regular tables used in query and its subqueries. We
* examine inFromCl to exclude auto-added RTEs, particularly NEW/OLD
- * in rules. This is a bit of an abuse of a mostly-obsolete flag, but
- * it's convenient. We can't rely on the namespace mechanism that has
- * largely replaced inFromCl, since for example we need to lock
- * base-relation RTEs even if they are masked by upper joins.
+ * in rules. We can't rely on the namespace mechanism since for
+ * example we need to lock base-relation RTEs even if they are masked
+ * by upper joins.
*/
i = 0;
foreach(rt, qry->rtable)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index fe003ded50..72f2b0c04f 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -994,11 +994,16 @@ typedef struct PartitionCmd
*
* inFromCl marks those range variables that are listed in the FROM clause.
* It's false for RTEs that are added to a query behind the scenes, such
- * as the NEW and OLD variables for a rule, or the subqueries of a UNION.
+ * as the NEW and OLD variables for a rule, or the subqueries of a UNION,
+ * or the RTEs of inheritance child tables that are added by the planner.
* This flag is not used during parsing (except in transformLockingClause,
* q.v.); the parser now uses a separate "namespace" data structure to
* control visibility. But it is needed by ruleutils.c to determine
- * whether RTEs should be shown in decompiled queries.
+ * whether RTEs should be shown in decompiled queries. It is used by the
+ * executor to determine that a given RTE_RELATION entry belongs to a table
+ * directly mentioned in the query or to a child table added by the planner.
+ * It needs to know that for the case where the child tables in a plan need
+ * to be locked.
*
* securityQuals is a list of security barrier quals (boolean expressions),
* to be tested in the listed order before returning a row from the
--
2.35.3
v44-0006-Track-opened-range-table-relations-in-a-List-in-.patchapplication/octet-stream; name=v44-0006-Track-opened-range-table-relations-in-a-List-in-.patchDownload
From 4f73533573cb5959b15268455605816c0316d0e6 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:49 +0900
Subject: [PATCH v44 6/6] Track opened range table relations in a List in
EState
This makes ExecCloseRangeTableRelations faster when there are many
relations in the range table but only a few are opened during
execution, such as when run-time pruning kicks in on an Append
containing thousands of partition subplans.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 9 +++++----
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 2 ++
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index dcfbf58495..c574cd3cdc 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1646,12 +1646,13 @@ ExecCloseResultRelations(EState *estate)
void
ExecCloseRangeTableRelations(EState *estate)
{
- int i;
+ ListCell *lc;
- for (i = 0; i < estate->es_range_table_size; i++)
+ foreach(lc, estate->es_opened_relations)
{
- if (estate->es_relations[i])
- table_close(estate->es_relations[i], NoLock);
+ Relation rel = lfirst(lc);
+
+ table_close(rel, NoLock);
}
}
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index ee12235b2f..5ae993e29c 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -839,6 +839,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
}
estate->es_relations[rti - 1] = rel;
+ estate->es_opened_relations = lappend(estate->es_opened_relations,
+ rel);
}
return rel;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 20c1bacae1..c519a6d5dc 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,8 @@ typedef struct EState
Index es_range_table_size; /* size of the range table arrays */
Relation *es_relations; /* Array of per-range-table-entry Relation
* pointers, or NULL if not yet opened */
+ List *es_opened_relations; /* List of non-NULL entries in
+ * es_relations in no specific order */
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
--
2.35.3
v44-0002-Refactoring-to-move-ExecutorStart-calls-to-be-ne.patchapplication/octet-stream; name=v44-0002-Refactoring-to-move-ExecutorStart-calls-to-be-ne.patchDownload
From 0d1067505ed1c49d4a75ad5d7f4eec4a19d7b5d6 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 3 Aug 2023 12:34:31 +0900
Subject: [PATCH v44 2/6] Refactoring to move ExecutorStart() calls to be near
GetCachedPlan()
An upcoming patch will make ExecutorStart() detect the invalidation
of a CachedPlan when initializing the plan tree contained in it. A
caller must retry with a new CachedPlan when ExecutorStart() detects
an invalidation. Having the ExecutorStart() in the same or nearby
as GetCachedPlan() makes it more convenient to implement the replan
loop.
The following sites have thus been modified:
* The ExecutorStart() call in ExplainOnePlan() is moved, along with
CreateQueryDesc(), into a new function ExplainQueryDesc(), which its
callers now call before calling it.
* The ExecutorStart() call in _SPI_pquery() is moved to its caller
_SPI_execute_plan().
* The ExecutorStart() call in PortalRunMulti() is moved to
PortalStart(). This requires a new List field in PortalData to
store the QueryDescs created in PortalStart() and the associated
memory context field. One unintended consequence is that the
CommandCounterIncrement() between queries in PORTAL_MULTI_QUERY
cases is now done in the loop in PortalStart() and not in
PortalRunMulti(). That still seems to work because the Snapshot
registered in QueryDesc/EState is updated to account for the
CCI().
---
src/backend/commands/explain.c | 121 ++++++-----
src/backend/commands/prepare.c | 12 +-
src/backend/executor/spi.c | 27 +--
src/backend/tcop/pquery.c | 311 +++++++++++++----------------
src/backend/utils/mmgr/portalmem.c | 9 +
src/include/commands/explain.h | 6 +-
src/include/utils/portal.h | 2 +
7 files changed, 250 insertions(+), 238 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..59d57f9c10 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -393,6 +393,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
else
{
PlannedStmt *plan;
+ QueryDesc *queryDesc;
instr_time planstart,
planduration;
BufferUsage bufusage_start,
@@ -415,12 +416,77 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
+ queryDesc = ExplainQueryDesc(plan, queryString, into, es,
+ params, queryEnv);
+ Assert(queryDesc);
+
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(queryDesc, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
+/*
+ * ExplainQueryDesc
+ * Set up QueryDesc for EXPLAINing a given plan
+ */
+QueryDesc *
+ExplainQueryDesc(PlannedStmt *stmt,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv)
+{
+ QueryDesc *queryDesc;
+ DestReceiver *dest;
+ int eflags;
+ int instrument_option = 0;
+
+ /*
+ * Normally we discard the query's output, but if explaining CREATE TABLE
+ * AS, we'd better use the appropriate tuple receiver.
+ */
+ if (into)
+ dest = CreateIntoRelDestReceiver(into);
+ else
+ dest = None_Receiver;
+
+ if (es->analyze && es->timing)
+ instrument_option |= INSTRUMENT_TIMER;
+ else if (es->analyze)
+ instrument_option |= INSTRUMENT_ROWS;
+
+ if (es->buffers)
+ instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
+
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+
+ /* Create a QueryDesc for the query */
+ queryDesc = CreateQueryDesc(stmt, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv, instrument_option);
+
+ /* Select execution options */
+ if (es->analyze)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_EXPLAIN_ONLY;
+ if (es->generic)
+ eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
+ if (into)
+ eflags |= GetIntoRelEFlags(into);
+
+ /* Call ExecutorStart to prepare the plan for execution. */
+ ExecutorStart(queryDesc, eflags);
+
+ return queryDesc;
+}
+
/*
* ExplainOneUtility -
* print out the execution plan for one utility statement
@@ -524,29 +590,16 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
{
- DestReceiver *dest;
- QueryDesc *queryDesc;
instr_time starttime;
double totaltime = 0;
- int eflags;
- int instrument_option = 0;
- Assert(plannedstmt->commandType != CMD_UTILITY);
-
- if (es->analyze && es->timing)
- instrument_option |= INSTRUMENT_TIMER;
- else if (es->analyze)
- instrument_option |= INSTRUMENT_ROWS;
-
- if (es->buffers)
- instrument_option |= INSTRUMENT_BUFFERS;
- if (es->wal)
- instrument_option |= INSTRUMENT_WAL;
+ Assert(queryDesc->plannedstmt->commandType != CMD_UTILITY);
/*
* We always collect timing for the entire statement, even when node-level
@@ -555,40 +608,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
- /*
- * Normally we discard the query's output, but if explaining CREATE TABLE
- * AS, we'd better use the appropriate tuple receiver.
- */
- if (into)
- dest = CreateIntoRelDestReceiver(into);
- else
- dest = None_Receiver;
-
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
-
- /* Select execution options */
- if (es->analyze)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_EXPLAIN_ONLY;
- if (es->generic)
- eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
- if (into)
- eflags |= GetIntoRelEFlags(into);
-
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
-
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..1e9a98ad6e 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -639,8 +639,16 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ {
+ QueryDesc *queryDesc;
+
+ queryDesc = ExplainQueryDesc(pstmt, queryString,
+ into, es, paramLI, queryEnv);
+ Assert(queryDesc != NULL);
+ ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
+ }
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 33975687b3..d36ca35d3a 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -71,7 +71,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -2661,6 +2661,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2674,8 +2675,17 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+
+ ExecutorStart(qdesc, eflags);
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
+
FreeQueryDesc(qdesc);
}
else
@@ -2850,10 +2860,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2897,14 +2906,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5565f200c3..701808f303 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -35,12 +36,6 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc);
static void FillPortalStore(Portal portal, bool isTopLevel);
static uint64 RunFromStore(Portal portal, ScanDirection direction, uint64 count,
DestReceiver *dest);
@@ -116,86 +111,6 @@ FreeQueryDesc(QueryDesc *qdesc)
}
-/*
- * ProcessQuery
- * Execute a single plannable query within a PORTAL_MULTI_QUERY,
- * PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
- *
- * plan: the plan tree for the query
- * sourceText: the source text of the query
- * params: any parameters needed
- * dest: where to send results
- * qc: where to store the command completion status data.
- *
- * qc may be NULL if caller doesn't want a status string.
- *
- * Must be called in a memory context that will be reset or deleted on
- * error; otherwise the executor's memory usage will be leaked.
- */
-static void
-ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc)
-{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
-
- /*
- * Run the plan to completion.
- */
- ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
-
- /*
- * Build command completion status data, if caller wants one.
- */
- if (qc)
- {
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
- SetQueryCompletion(qc, CMDTAG_SELECT, queryDesc->estate->es_processed);
- break;
- case CMD_INSERT:
- SetQueryCompletion(qc, CMDTAG_INSERT, queryDesc->estate->es_processed);
- break;
- case CMD_UPDATE:
- SetQueryCompletion(qc, CMDTAG_UPDATE, queryDesc->estate->es_processed);
- break;
- case CMD_DELETE:
- SetQueryCompletion(qc, CMDTAG_DELETE, queryDesc->estate->es_processed);
- break;
- case CMD_MERGE:
- SetQueryCompletion(qc, CMDTAG_MERGE, queryDesc->estate->es_processed);
- break;
- default:
- SetQueryCompletion(qc, CMDTAG_UNKNOWN, queryDesc->estate->es_processed);
- break;
- }
- }
-
- /*
- * Now, we close down all the scans and free allocated resources.
- */
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
-
- FreeQueryDesc(queryDesc);
-}
-
/*
* ChoosePortalStrategy
* Select portal execution strategy given the intended statement list.
@@ -435,10 +350,9 @@ PortalStart(Portal portal, ParamListInfo params,
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
- int myeflags;
+ int myeflags = 0;
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_DEFINED);
@@ -448,15 +362,13 @@ PortalStart(Portal portal, ParamListInfo params,
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -472,6 +384,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -489,8 +403,8 @@ PortalStart(Portal portal, ParamListInfo params,
*/
/*
- * Create QueryDesc in portal's context; for the moment, set
- * the destination to DestNone.
+ * Create QueryDesc in portal->queryContext; for the moment,
+ * set the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
portal->sourceText,
@@ -501,30 +415,41 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti(). */
+ if (portal->strategy == PORTAL_ONE_RETURNING ||
+ portal->strategy == PORTAL_ONE_MOD_WITH)
+ portal->qdescs = list_make1(queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
+ /* Call ExecutorStart to prepare the plan for execution. */
ExecutorStart(queryDesc, myeflags);
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -536,29 +461,6 @@ PortalStart(Portal portal, ParamListInfo params,
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -581,7 +483,69 @@ PortalStart(Portal portal, ParamListInfo params,
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool first = true;
+
+ myeflags = eflags;
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /*
+ * Push the snapshot to be used by the executor.
+ */
+ if (!is_utility)
+ {
+ /*
+ * Must copy the snapshot for all statements
+ * except thec first as we'll need to update its
+ * command ID.
+ */
+ if (!first)
+ PushCopiedSnapshot(GetTransactionSnapshot());
+ else
+ PushActiveSnapshot(GetTransactionSnapshot());
+ }
+
+ /*
+ * From the 2nd statement onwards, update the command
+ * ID and the snapshot to match.
+ */
+ if (!first)
+ {
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+ }
+
+ first = false;
+
+ /*
+ * Create the QueryDesc. DestReceiver will be set in
+ * PortalRunMulti() before calling ExecutorRun().
+ */
+ queryDesc = CreateQueryDesc(plan,
+ portal->sourceText,
+ !is_utility ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ if (is_utility)
+ continue;
+
+ ExecutorStart(queryDesc, myeflags);
+ PopActiveSnapshot();
+ }
+ }
+
portal->tupDesc = NULL;
break;
}
@@ -594,7 +558,6 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
@@ -604,7 +567,6 @@ PortalStart(Portal portal, ParamListInfo params,
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
portal->status = PORTAL_READY;
}
@@ -1193,7 +1155,7 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1176,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = (QueryDesc *) lfirst(qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1233,33 +1196,26 @@ PortalRunMulti(Portal portal,
if (log_executor_stats)
ResetUsage();
- /*
- * Must always have a snapshot for plannable queries. First time
- * through, take a new snapshot; for subsequent queries in the
- * same portal, just update the snapshot's copy of the command
- * counter.
- */
+ /* Push the snapshot for plannable queries. */
if (!active_snapshot_set)
{
- Snapshot snapshot = GetTransactionSnapshot();
+ Snapshot snapshot = qdesc->snapshot;
- /* If told to, register the snapshot and save in portal */
+ /*
+ * If told to, register the snapshot and save in portal
+ *
+ * Note that the command ID of qdesc->snapshot for 2nd query
+ * onwards would have been updated in PortalStart() to account
+ * for CCI() done between queries, but it's OK that here we
+ * don't likewise update holdSnapshot's command ID.
+ */
if (setHoldSnapshot)
{
snapshot = RegisterSnapshot(snapshot);
portal->holdSnapshot = snapshot;
}
- /*
- * We can't have the holdSnapshot also be the active one,
- * because UpdateActiveSnapshotCommandId would complain. So
- * force an extra snapshot copy. Plain PushActiveSnapshot
- * would have copied the transaction snapshot anyway, so this
- * only adds a copy step when setHoldSnapshot is true. (It's
- * okay for the command ID of the active snapshot to diverge
- * from what holdSnapshot has.)
- */
- PushCopiedSnapshot(snapshot);
+ PushActiveSnapshot(snapshot);
/*
* As for PORTAL_ONE_SELECT portals, it does not seem
@@ -1268,26 +1224,39 @@ PortalRunMulti(Portal portal,
active_snapshot_set = true;
}
- else
- UpdateActiveSnapshotCommandId();
+ /*
+ * Run the plan to completion.
+ */
+ qdesc->dest = dest;
+ ExecutorRun(qdesc, ForwardScanDirection, 0, true);
+
+ /*
+ * Build command completion status data if needed.
+ */
if (pstmt->canSetTag)
{
- /* statement can set tag string */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- dest, qc);
- }
- else
- {
- /* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- altdest, NULL);
+ switch (qdesc->operation)
+ {
+ case CMD_SELECT:
+ SetQueryCompletion(qc, CMDTAG_SELECT, qdesc->estate->es_processed);
+ break;
+ case CMD_INSERT:
+ SetQueryCompletion(qc, CMDTAG_INSERT, qdesc->estate->es_processed);
+ break;
+ case CMD_UPDATE:
+ SetQueryCompletion(qc, CMDTAG_UPDATE, qdesc->estate->es_processed);
+ break;
+ case CMD_DELETE:
+ SetQueryCompletion(qc, CMDTAG_DELETE, qdesc->estate->es_processed);
+ break;
+ case CMD_MERGE:
+ SetQueryCompletion(qc, CMDTAG_MERGE, qdesc->estate->es_processed);
+ break;
+ default:
+ SetQueryCompletion(qc, CMDTAG_UNKNOWN, qdesc->estate->es_processed);
+ break;
+ }
}
if (log_executor_stats)
@@ -1342,12 +1311,12 @@ PortalRunMulti(Portal portal,
if (portal->stmts == NIL)
break;
- /*
- * Increment command counter between queries, but not after the last
- * one.
- */
- if (lnext(portal->stmts, stmtlist_item) != NULL)
- CommandCounterIncrement();
+ if (qdesc->estate)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ }
+ FreeQueryDesc(qdesc);
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 06dfa85f04..0cad450dcd 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -201,6 +201,13 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /*
+ * initialize portal's query context to store QueryDescs created during
+ * PortalStart() and then used in PortalRun().
+ */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -224,6 +231,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -594,6 +602,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3d3e632a0c..08ea852b65 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -88,7 +88,11 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv);
+extern void ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aa08b1e0fc..af059e30f8 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,8 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
--
2.35.3
v44-0003-Add-field-to-store-parent-relids-to-Append-Merge.patchapplication/octet-stream; name=v44-0003-Add-field-to-store-parent-relids-to-Append-Merge.patchDownload
From 10c7bbe9f1a489d0fcfeaf027a7df919fed490c8 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:31 +0900
Subject: [PATCH v44 3/6] Add field to store parent relids to
Append/MergeAppend
There's no way currently in the executor to tell if the child
subplans of Append/MergeAppend are scanning partitions, and if
they indeed do, what the RT indexes of their parent/ancestor tables
are. Executor doesn't need to see their RT indexes except for
run-time pruning, in which case they can can be found in the
PartitionPruneInfo, but a future commit will create a need for
them to be available at all times for the purpose of locking
those parent/ancestor tables when executing a cached plan.
The code to look up partitioned parent relids for a given list of
partition scan subpaths of an Append/MergeAppend is already present
in make_partition_pruneinfo() but it's local to partprune.c. This
commit refactors that code into its own function called
add_append_subpath_partrelids() defined in appendinfo.c and
generalizes it to consider child join and aggregate paths. To
facilitate looking up of parent rels of child grouping rels in
add_append_subpath_partrelids(), parent links are now also set in
the RelOptInfos of child grouping rels too, like they are in
those of child base and join rels.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/optimizer/plan/createplan.c | 41 ++++++--
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/plan/setrefs.c | 4 +
src/backend/optimizer/util/appendinfo.c | 134 ++++++++++++++++++++++++
src/backend/partitioning/partprune.c | 124 +++-------------------
src/include/nodes/plannodes.h | 14 +++
src/include/optimizer/appendinfo.h | 3 +
src/include/partitioning/partprune.h | 3 +-
8 files changed, 203 insertions(+), 123 deletions(-)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index af48109058..8ac1d3909b 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -25,6 +25,7 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
@@ -1210,6 +1211,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
Oid *nodeCollations = NULL;
bool *nodeNullsFirst = NULL;
bool consider_async = false;
+ List *allpartrelids = NIL;
/*
* The subpaths list could be empty, if every child was proven empty by
@@ -1351,15 +1353,23 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
++nasyncplans;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ plan->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1380,7 +1390,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
partpruneinfo =
make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
plan->appendplans = subplans;
@@ -1426,6 +1437,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ List *allpartrelids = NIL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1515,15 +1527,23 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplan = (Plan *) sort;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ node->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1535,7 +1555,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
partpruneinfo = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
node->mergeplans = subplans;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 44efb1f4eb..f97bc09113 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7855,8 +7855,11 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
agg_costs, gd, &child_extra,
&child_partially_grouped_rel);
+ /* Mark as child of grouped_rel. */
+ child_grouped_rel->parent = grouped_rel;
if (child_partially_grouped_rel)
{
+ child_partially_grouped_rel->parent = grouped_rel;
partially_grouped_live_children =
lappend(partially_grouped_live_children,
child_partially_grouped_rel);
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 97fa561e4e..854dd7c8af 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1766,6 +1766,8 @@ set_append_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) aplan, rtoffset);
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
+ foreach(l, aplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (aplan->part_prune_info)
{
@@ -1842,6 +1844,8 @@ set_mergeappend_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) mplan, rtoffset);
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
+ foreach(l, mplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (mplan->part_prune_info)
{
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index f456b3b0a4..5bd8e82b9b 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -41,6 +41,7 @@ static void make_inh_translation_list(Relation oldrelation,
AppendRelInfo *appinfo);
static Node *adjust_appendrel_attrs_mutator(Node *node,
adjust_appendrel_attrs_context *context);
+static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
/*
@@ -1035,3 +1036,136 @@ distribute_row_identity_vars(PlannerInfo *root)
}
}
}
+
+/*
+ * add_append_subpath_partrelids
+ * Look up a child subpath's rel's partitioned parent relids up to
+ * parentrel and add the bitmapset containing those into
+ * 'allpartrelids'
+ */
+List *
+add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids)
+{
+ RelOptInfo *prel = subpath->parent;
+ Relids partrelids = NULL;
+
+ /* Nothing to do if there's no parent to begin with. */
+ if (!IS_OTHER_REL(prel))
+ return allpartrelids;
+
+ /*
+ * Traverse up to the pathrel's topmost partitioned parent, collecting
+ * parent relids as we go; but stop if we reach parentrel. (Normally, a
+ * pathrel's topmost partitioned parent is either parentrel or a UNION ALL
+ * appendrel child of parentrel. But when handling partitionwise joins of
+ * multi-level partitioning trees, we can see an append path whose
+ * parentrel is an intermediate partitioned table.)
+ */
+ do
+ {
+ Relids parent_relids = NULL;
+
+ /*
+ * For simple child rels, we can simply set the parent_relids to
+ * prel->parent->relids. But for partitionwise join and aggregate
+ * child rels, while we can use prel->parent to move up the tree,
+ * parent_relids must be found the hard way through AppendInfoInfos,
+ * because 1) a joinrel's relids may point to RTE_JOIN entries,
+ * 2) topmost parent grouping rel's relids field is NULL.
+ */
+ if (IS_SIMPLE_REL(prel))
+ {
+ prel = prel->parent;
+ /* Stop once we reach the root partitioned rel. */
+ if (!IS_PARTITIONED_REL(prel))
+ break;
+ parent_relids = bms_add_members(parent_relids, prel->relids);
+ }
+ else
+ {
+ AppendRelInfo **appinfos;
+ int nappinfos,
+ i;
+
+ appinfos = find_appinfos_by_relids(root, prel->relids,
+ &nappinfos);
+ for (i = 0; i < nappinfos; i++)
+ {
+ AppendRelInfo *appinfo = appinfos[i];
+
+ parent_relids = bms_add_member(parent_relids,
+ appinfo->parent_relid);
+ }
+ pfree(appinfos);
+ prel = prel->parent;
+ }
+ /* accept this level as an interesting parent */
+ partrelids = bms_add_members(partrelids, parent_relids);
+ if (prel == parentrel)
+ break; /* don't traverse above parentrel */
+ } while (IS_OTHER_REL(prel));
+
+ if (partrelids == NULL)
+ return allpartrelids;
+
+ return add_part_relids(allpartrelids, partrelids);
+}
+
+/*
+ * add_part_relids
+ * Add new info to a list of Bitmapsets of partitioned relids.
+ *
+ * Within 'allpartrelids', there is one Bitmapset for each topmost parent
+ * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
+ * parent as well as its relevant non-leaf child partitions. Since (by
+ * construction of the rangetable list) parent partitions must have lower
+ * RT indexes than their children, we can distinguish the topmost parent
+ * as being the lowest set bit in the Bitmapset.
+ *
+ * 'partrelids' contains the RT indexes of a parent partitioned rel, and
+ * possibly some non-leaf children, that are newly identified as parents of
+ * some subpath rel passed to make_partition_pruneinfo(). These are added
+ * to an appropriate member of 'allpartrelids'.
+ *
+ * Note that the list contains only RT indexes of partitioned tables that
+ * are parents of some scan-level relation appearing in the 'subpaths' that
+ * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
+ * not allowed to be higher than the 'parentrel' associated with the append
+ * path. In this way, we avoid expending cycles on partitioned rels that
+ * can't contribute useful pruning information for the problem at hand.
+ * (It is possible for 'parentrel' to be a child partitioned table, and it
+ * is also possible for scan-level relations to be child partitioned tables
+ * rather than leaf partitions. Hence we must construct this relation set
+ * with reference to the particular append path we're dealing with, rather
+ * than looking at the full partitioning structure represented in the
+ * RelOptInfos.)
+ */
+static List *
+add_part_relids(List *allpartrelids, Bitmapset *partrelids)
+{
+ Index targetpart;
+ ListCell *lc;
+
+ /* We can easily get the lowest set bit this way: */
+ targetpart = bms_next_member(partrelids, -1);
+ Assert(targetpart > 0);
+
+ /* Look for a matching topmost parent */
+ foreach(lc, allpartrelids)
+ {
+ Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
+ Index currtarget = bms_next_member(currpartrelids, -1);
+
+ if (targetpart == currtarget)
+ {
+ /* Found a match, so add any new RT indexes to this hierarchy */
+ currpartrelids = bms_add_members(currpartrelids, partrelids);
+ lfirst(lc) = currpartrelids;
+ return allpartrelids;
+ }
+ }
+ /* No match, so add the new partition hierarchy to the list */
+ return lappend(allpartrelids, partrelids);
+}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 7179b22a05..213512a5f4 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -138,7 +138,6 @@ typedef struct PruneStepResult
} PruneStepResult;
-static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
RelOptInfo *parentrel,
List *prunequal,
@@ -218,33 +217,32 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'allpartrelids' contains Bitmapsets of RT indexes of partitioned parents
+ * whose partitions' Paths are in 'subpaths'; there's one Bitmapset for every
+ * partition tree involved.
*/
PartitionPruneInfo *
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ List *allpartrelids)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
- List *allpartrelids;
List *prunerelinfos;
int *relid_subplan_map;
ListCell *lc;
int i;
+ Assert(list_length(allpartrelids) > 0);
+
/*
- * Scan the subpaths to see which ones are scans of partition child
- * relations, and identify their parent partitioned rels. (Note: we must
- * restrict the parent partitioned rels to be parentrel or children of
- * parentrel, otherwise we couldn't translate prunequal to match.)
- *
- * Also construct a temporary array to map from partition-child-relation
- * relid to the index in 'subpaths' of the scan plan for that partition.
+ * Construct a temporary array to map from partition-child-relation relid
+ * to the index in 'subpaths' of the scan plan for that partition.
* (Use of "subplan" rather than "subpath" is a bit of a misnomer, but
* we'll let it stand.) For convenience, we use 1-based indexes here, so
* that zero can represent an un-filled array entry.
*/
- allpartrelids = NIL;
relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
i = 1;
@@ -253,50 +251,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Path *path = (Path *) lfirst(lc);
RelOptInfo *pathrel = path->parent;
- /* We don't consider partitioned joins here */
- if (pathrel->reloptkind == RELOPT_OTHER_MEMBER_REL)
- {
- RelOptInfo *prel = pathrel;
- Bitmapset *partrelids = NULL;
-
- /*
- * Traverse up to the pathrel's topmost partitioned parent,
- * collecting parent relids as we go; but stop if we reach
- * parentrel. (Normally, a pathrel's topmost partitioned parent
- * is either parentrel or a UNION ALL appendrel child of
- * parentrel. But when handling partitionwise joins of
- * multi-level partitioning trees, we can see an append path whose
- * parentrel is an intermediate partitioned table.)
- */
- do
- {
- AppendRelInfo *appinfo;
-
- Assert(prel->relid < root->simple_rel_array_size);
- appinfo = root->append_rel_array[prel->relid];
- prel = find_base_rel(root, appinfo->parent_relid);
- if (!IS_PARTITIONED_REL(prel))
- break; /* reached a non-partitioned parent */
- /* accept this level as an interesting parent */
- partrelids = bms_add_member(partrelids, prel->relid);
- if (prel == parentrel)
- break; /* don't traverse above parentrel */
- } while (prel->reloptkind == RELOPT_OTHER_MEMBER_REL);
-
- if (partrelids)
- {
- /*
- * Found some relevant parent partitions, which may or may not
- * overlap with partition trees we already found. Add new
- * information to the allpartrelids list.
- */
- allpartrelids = add_part_relids(allpartrelids, partrelids);
- /* Also record the subplan in relid_subplan_map[] */
- /* No duplicates please */
- Assert(relid_subplan_map[pathrel->relid] == 0);
- relid_subplan_map[pathrel->relid] = i;
- }
- }
+ /* No duplicates please */
+ Assert(relid_subplan_map[pathrel->relid] == 0);
+ relid_subplan_map[pathrel->relid] = i;
i++;
}
@@ -362,63 +319,6 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return pruneinfo;
}
-/*
- * add_part_relids
- * Add new info to a list of Bitmapsets of partitioned relids.
- *
- * Within 'allpartrelids', there is one Bitmapset for each topmost parent
- * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
- * parent as well as its relevant non-leaf child partitions. Since (by
- * construction of the rangetable list) parent partitions must have lower
- * RT indexes than their children, we can distinguish the topmost parent
- * as being the lowest set bit in the Bitmapset.
- *
- * 'partrelids' contains the RT indexes of a parent partitioned rel, and
- * possibly some non-leaf children, that are newly identified as parents of
- * some subpath rel passed to make_partition_pruneinfo(). These are added
- * to an appropriate member of 'allpartrelids'.
- *
- * Note that the list contains only RT indexes of partitioned tables that
- * are parents of some scan-level relation appearing in the 'subpaths' that
- * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
- * not allowed to be higher than the 'parentrel' associated with the append
- * path. In this way, we avoid expending cycles on partitioned rels that
- * can't contribute useful pruning information for the problem at hand.
- * (It is possible for 'parentrel' to be a child partitioned table, and it
- * is also possible for scan-level relations to be child partitioned tables
- * rather than leaf partitions. Hence we must construct this relation set
- * with reference to the particular append path we're dealing with, rather
- * than looking at the full partitioning structure represented in the
- * RelOptInfos.)
- */
-static List *
-add_part_relids(List *allpartrelids, Bitmapset *partrelids)
-{
- Index targetpart;
- ListCell *lc;
-
- /* We can easily get the lowest set bit this way: */
- targetpart = bms_next_member(partrelids, -1);
- Assert(targetpart > 0);
-
- /* Look for a matching topmost parent */
- foreach(lc, allpartrelids)
- {
- Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
- Index currtarget = bms_next_member(currpartrelids, -1);
-
- if (targetpart == currtarget)
- {
- /* Found a match, so add any new RT indexes to this hierarchy */
- currpartrelids = bms_add_members(currpartrelids, partrelids);
- lfirst(lc) = currpartrelids;
- return allpartrelids;
- }
- }
- /* No match, so add the new partition hierarchy to the list */
- return lappend(allpartrelids, partrelids);
-}
-
/*
* make_partitionedrel_pruneinfo
* Build a List of PartitionedRelPruneInfos, one for each interesting
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 1b787fe031..7a5f3ba625 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -267,6 +267,13 @@ typedef struct Append
List *appendplans;
int nasyncplans; /* # of asynchronous plans */
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * appendplans. The list contains a bitmapset for every partition tree
+ * covered by this Append.
+ */
+ List *allpartrelids;
+
/*
* All 'appendplans' preceding this index are non-partial plans. All
* 'appendplans' from this index onwards are partial plans.
@@ -291,6 +298,13 @@ typedef struct MergeAppend
List *mergeplans;
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * mergeplans. The list contains a bitmapset for every partition tree
+ * covered by this MergeAppend.
+ */
+ List *allpartrelids;
+
/* these fields are just like the sort-key info in struct Sort: */
/* number of sort-key columns */
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index a05f91f77d..1621a7319a 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -46,5 +46,8 @@ extern void add_row_identity_columns(PlannerInfo *root, Index rtindex,
RangeTblEntry *target_rte,
Relation target_relation);
extern void distribute_row_identity_vars(PlannerInfo *root);
+extern List *add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids);
#endif /* APPENDINFO_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 8636e04e37..caa774a111 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -73,7 +73,8 @@ typedef struct PartitionPruneContext
extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ List *allpartrelids);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v44-0005-Delay-locking-of-child-tables-in-cached-plans-un.patchapplication/octet-stream; name=v44-0005-Delay-locking-of-child-tables-in-cached-plans-un.patchDownload
From a51b753af1c38e1a7750e9f09738d304ebd7de07 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:45 +0900
Subject: [PATCH v44 5/6] Delay locking of child tables in cached plans until
ExecutorStart()
Currently, GetCachedPlan() takes a lock on all relations contained in
a cached plan before returning it as a valid plan to its callers for
execution. One disadvantage is that if the plan contains partitions
that are prunable with conditions involving EXTERN parameters and
other stable expressions (known as "initial pruning"), many of them
would be locked unnecessarily, because only those that survive
initial pruning need to have been locked. Locking all partitions this
way causes significant delay when there are many partitions. Note
that initial pruning occurs during executor's initialization of the
plan, that is, ExecInitNode().
This commit rearranges things to move the locking of child tables
referenced in a cached plan to occur during ExecInitNode() so that
initial pruning in the ExecInitNode() subroutines of the plan nodes
that support pruning can eliminate any child tables that need not be
scanned and thus locked.
To determine that a given table is a child table,
ExecGetRangeTableRelation() now looks at the RTE's inFromCl field,
which is only true for tables that are directly mentioned in the
query but false for child tables. Note that any tables whose RTEs'
inFromCl is true would already have been locked by GetCachedPlan(),
so need not be locked again during execution.
If the locking of child tables causes the CachedPlan to go stale, that
is, its is_valid set to false by PlanCacheRelCallback() when an
invalidation message matching some child table contained in the plan
is processed, ExecInitNode() abandons the initialization of the
remaining nodes in the plan tree. In that case, InitPlan() returns
after setting QueryDesc.planstate to NULL to indicate to the caller
that no execution is possible with the plan tree as is. Also,
ExecutorStart() now returns true or false to indicate whether or not
QueryDesc.planstate points to a successfully initialized PlanState
tree. Call sites that use GetCachedPlan() to get the plan trees to
pass to the executor should now be prepared to retry in the cases
where ExecutorStart() returns false.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
contrib/auto_explain/auto_explain.c | 12 +-
.../pg_stat_statements/pg_stat_statements.c | 12 +-
contrib/postgres_fdw/postgres_fdw.c | 4 +
src/backend/commands/copyto.c | 7 +-
src/backend/commands/createas.c | 10 +-
src/backend/commands/explain.c | 33 +++-
src/backend/commands/extension.c | 4 +-
src/backend/commands/matview.c | 10 +-
src/backend/commands/portalcmds.c | 5 +-
src/backend/commands/prepare.c | 23 ++-
src/backend/executor/README | 37 +++++
src/backend/executor/execMain.c | 89 ++++++++--
src/backend/executor/execParallel.c | 14 +-
src/backend/executor/execPartition.c | 14 ++
src/backend/executor/execProcnode.c | 20 ++-
src/backend/executor/execUtils.c | 63 +++++--
src/backend/executor/functions.c | 5 +-
src/backend/executor/nodeAgg.c | 2 +
src/backend/executor/nodeAppend.c | 23 +++
src/backend/executor/nodeBitmapAnd.c | 5 +-
src/backend/executor/nodeBitmapHeapscan.c | 4 +
src/backend/executor/nodeBitmapIndexscan.c | 9 +-
src/backend/executor/nodeBitmapOr.c | 5 +-
src/backend/executor/nodeCustom.c | 2 +
src/backend/executor/nodeForeignscan.c | 4 +
src/backend/executor/nodeGather.c | 3 +
src/backend/executor/nodeGatherMerge.c | 2 +
src/backend/executor/nodeGroup.c | 2 +
src/backend/executor/nodeHash.c | 2 +
src/backend/executor/nodeHashjoin.c | 4 +
src/backend/executor/nodeIncrementalSort.c | 2 +
src/backend/executor/nodeIndexonlyscan.c | 11 +-
src/backend/executor/nodeIndexscan.c | 11 +-
src/backend/executor/nodeLimit.c | 2 +
src/backend/executor/nodeLockRows.c | 2 +
src/backend/executor/nodeMaterial.c | 2 +
src/backend/executor/nodeMemoize.c | 2 +
src/backend/executor/nodeMergeAppend.c | 23 +++
src/backend/executor/nodeMergejoin.c | 4 +
src/backend/executor/nodeModifyTable.c | 7 +
src/backend/executor/nodeNestloop.c | 4 +
src/backend/executor/nodeProjectSet.c | 2 +
src/backend/executor/nodeRecursiveunion.c | 4 +
src/backend/executor/nodeResult.c | 2 +
src/backend/executor/nodeSamplescan.c | 2 +
src/backend/executor/nodeSeqscan.c | 2 +
src/backend/executor/nodeSetOp.c | 2 +
src/backend/executor/nodeSort.c | 2 +
src/backend/executor/nodeSubqueryscan.c | 2 +
src/backend/executor/nodeTidrangescan.c | 2 +
src/backend/executor/nodeTidscan.c | 2 +
src/backend/executor/nodeUnique.c | 2 +
src/backend/executor/nodeWindowAgg.c | 2 +
src/backend/executor/spi.c | 26 ++-
src/backend/storage/lmgr/lmgr.c | 45 +++++
src/backend/tcop/postgres.c | 18 +-
src/backend/tcop/pquery.c | 49 +++++-
src/backend/utils/cache/lsyscache.c | 21 +++
src/backend/utils/cache/plancache.c | 156 +++++++-----------
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 4 +
src/include/executor/executor.h | 19 ++-
src/include/nodes/execnodes.h | 2 +
src/include/storage/lmgr.h | 1 +
src/include/tcop/pquery.h | 2 +-
src/include/utils/lsyscache.h | 1 +
src/include/utils/plancache.h | 14 ++
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 67 +++++++-
.../expected/cached-plan-replan.out | 156 ++++++++++++++++++
.../specs/cached-plan-replan.spec | 61 +++++++
71 files changed, 984 insertions(+), 189 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-replan.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-replan.spec
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index c3ac27ae99..a0630d7944 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -78,7 +78,7 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
-static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void explain_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count, bool execute_once);
@@ -258,9 +258,11 @@ _PG_init(void)
/*
* ExecutorStart hook: start up logging if needed
*/
-static void
+static bool
explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
/*
* At the beginning of each top-level statement, decide whether we'll
* sample this statement. If nested-statement explaining is enabled,
@@ -296,9 +298,9 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
}
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
if (auto_explain_enabled())
{
@@ -316,6 +318,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return plan_valid;
}
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 55b957d251..1160a7326a 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -325,7 +325,7 @@ static PlannedStmt *pgss_planner(Query *parse,
const char *query_string,
int cursorOptions,
ParamListInfo boundParams);
-static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void pgss_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count, bool execute_once);
@@ -963,13 +963,15 @@ pgss_planner(Query *parse,
/*
* ExecutorStart hook: start up tracking if needed
*/
-static void
+static bool
pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
/*
* If query has queryId zero, don't track it. This prevents double
@@ -992,6 +994,8 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return plan_valid;
}
/*
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index c5cada55fb..1edd4c3f17 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2658,7 +2658,11 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
/* Get info about foreign table. */
rtindex = node->resultRelInfo->ri_RangeTableIndex;
if (fsplan->scan.scanrelid == 0)
+ {
dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (!ExecPlanStillValid(estate))
+ return;
+ }
else
dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9e4b2437a5..916d6dced3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,8 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL,
+ pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -567,8 +568,10 @@ BeginCopyTo(ParseState *pstate,
* Call ExecutorStart to prepare the plan for execution.
*
* ExecutorStart computes a result tupdesc for us
+ *
+ * OK to ignore the return value; plan can't become invalid.
*/
- ExecutorStart(cstate->queryDesc, 0);
+ (void) ExecutorStart(cstate->queryDesc, 0);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index e91920ca14..e5cce4c07c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,12 +325,16 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * call ExecutorStart to prepare the plan for execution
+ *
+ * OK to ignore the return value; plan can't become invalid.
+ */
+ (void) ExecutorStart(queryDesc, GetIntoRelEFlags(into));
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 59d57f9c10..6171a20fe2 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -416,7 +416,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
- queryDesc = ExplainQueryDesc(plan, queryString, into, es,
+ queryDesc = ExplainQueryDesc(plan, NULL, queryString, into, es,
params, queryEnv);
Assert(queryDesc);
@@ -429,9 +429,11 @@ ExplainOneQuery(Query *query, int cursorOptions,
/*
* ExplainQueryDesc
* Set up QueryDesc for EXPLAINing a given plan
+ *
+ * This returns NULL if cplan is found to be no longer valid.
*/
QueryDesc *
-ExplainQueryDesc(PlannedStmt *stmt,
+ExplainQueryDesc(PlannedStmt *stmt, CachedPlan *cplan,
const char *queryString, IntoClause *into, ExplainState *es,
ParamListInfo params, QueryEnvironment *queryEnv)
{
@@ -467,7 +469,7 @@ ExplainQueryDesc(PlannedStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(stmt, queryString,
+ queryDesc = CreateQueryDesc(stmt, cplan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
@@ -481,8 +483,18 @@ ExplainQueryDesc(PlannedStmt *stmt,
if (into)
eflags |= GetIntoRelEFlags(into);
- /* Call ExecutorStart to prepare the plan for execution. */
- ExecutorStart(queryDesc, eflags);
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated during plan intialization.
+ */
+ if (!ExecutorStart(queryDesc, eflags))
+ {
+ /* Clean up. */
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ return NULL;
+ }
return queryDesc;
}
@@ -4884,6 +4896,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 4cc994ca31..8a0859a355 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -797,11 +797,13 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
- ExecutorStart(qdesc, 0);
+ /* OK to ignore the return value; plan can't become invalid. */
+ (void) ExecutorStart(qdesc, 0);
ExecutorRun(qdesc, ForwardScanDirection, 0, true);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ac2e74fa3f..38795ce7ca 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,12 +408,16 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, 0);
+ /*
+ * call ExecutorStart to prepare the plan for execution
+ *
+ * OK to ignore the return value; plan can't become invalid.
+ */
+ (void) ExecutorStart(queryDesc, 0);
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 73ed7aa2f0..5120f93414 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -142,9 +142,10 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
/*
* Start execution, inserting parameters if any.
+ *
+ * OK to ignore the return value; plan can't become invalid here.
*/
- PortalStart(portal, params, 0, GetActiveSnapshot());
-
+ (void) PortalStart(portal, params, 0, GetActiveSnapshot());
Assert(portal->strategy == PORTAL_ONE_SELECT);
/*
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 1e9a98ad6e..156c3c5fee 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -183,6 +183,7 @@ ExecuteQuery(ParseState *pstate,
paramLI = EvaluateParams(pstate, entry, stmt->params, estate);
}
+replan:
/* Create a new portal to run the query in */
portal = CreateNewPortal();
/* Don't display the portal in pg_cursors, it is for internal use only */
@@ -251,9 +252,15 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal contains a cached plan, it
+ * must be recreated if the cached plan was found to have been invalidated
+ * when initializing one of the plan trees contained in it.
*/
- PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ if (!PortalStart(portal, paramLI, eflags, GetActiveSnapshot()))
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
(void) PortalRun(portal, count, false, true, dest, dest, qc);
@@ -574,7 +581,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -618,6 +625,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI,
CurrentResourceOwner, queryEnv);
@@ -642,9 +650,14 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
QueryDesc *queryDesc;
- queryDesc = ExplainQueryDesc(pstmt, queryString,
+ queryDesc = ExplainQueryDesc(pstmt, cplan, queryString,
into, es, paramLI, queryEnv);
- Assert(queryDesc != NULL);
+ if (queryDesc == NULL)
+ {
+ ExplainResetOutput(es);
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ goto replan;
+ }
ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
queryEnv, &planduration,
(es->buffers ? &bufusage : NULL));
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 67a5c1769b..5113523bb9 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,37 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Normally, the executor does not lock non-index relations appearing in a given
+plan tree when initializing it for execution if the plan tree is freshly
+created, that is, not derived from a CachedPlan. The reason for that is that
+the locks must already have been taken during parsing, rewriting, and planning
+of the query in that case. If the plan tree is a cached one, there may still
+be unlocked relations present in the plan tree, because GetCachedPlan() only
+locks the relations that would be present in the query's range table before
+planning occurs, but not relations that would have been added to the range
+table during planning. This means that inheritance child tables present in
+a cached plan, which are added to the query's range table during planning,
+would not have been locked when the plan enters the executor.
+
+GetCachedPlan() punts on locking child tables because not all may actually be
+scanned during a given execution of the plan if the child tables are partitions
+which may get pruned away due to execution-initialization-time pruning. So the
+locking of child tables is made to wait till execution-initialization-time,
+which occurs during ExecInitNode() on the plan nodes containing the child
+tables.
+
+So, there's a time window during which a cached plan tree could go stale
+if it contains child tables, because they could get changed in other backends
+before ExecInitNode() gets a lock on them. This means the executor now must
+check the validity of the plan tree every time it takes a lock on a child
+table contained in the tree after execution-initialization-pruning has been
+performed. It does that by looking at CachedPlan.is_valid of the CachedPlan
+passed to it. If the plan tree is indeed stale (is_valid=false), the executor
+must give up continuing to initialize it any further and return to the caller
+letting it know that the execution must be retried with a new plan tree.
Query Processing Control Flow
-----------------------------
@@ -316,6 +347,12 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale during one of the recursive calls of ExecInitNode() after taking a
+lock on a child table, the control is immmediately returned to the caller of
+ExecutorStart(), which must redo the steps from CreateQueryDesc with a new
+plan tree.
+
Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 235bb52ccc..dcfbf58495 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -79,7 +79,7 @@ ExecutorEnd_hook_type ExecutorEnd_hook = NULL;
ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
/* decls for local routines only used within this module */
-static void InitPlan(QueryDesc *queryDesc, int eflags);
+static bool InitPlan(QueryDesc *queryDesc, int eflags);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
static void ExecEndPlan(EState *estate);
@@ -119,6 +119,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
*
* eflags contains flag bits as described in executor.h.
*
+ * Plan initialization may fail if the input plan tree is found to have been
+ * invalidated, which can happen if it comes from a CachedPlan.
+ *
+ * Returns true if plan was successfully initialized and false otherwise. If
+ * the latter, the caller must call ExecutorEnd() on 'queryDesc' to clean up
+ * after failed plan initialization.
+ *
* NB: the CurrentMemoryContext when this is called will become the parent
* of the per-query context used for this Executor invocation.
*
@@ -128,7 +135,7 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
*
* ----------------------------------------------------------------
*/
-void
+bool
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
/*
@@ -140,14 +147,15 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
if (ExecutorStart_hook)
- (*ExecutorStart_hook) (queryDesc, eflags);
- else
- standard_ExecutorStart(queryDesc, eflags);
+ return (*ExecutorStart_hook) (queryDesc, eflags);
+
+ return standard_ExecutorStart(queryDesc, eflags);
}
-void
+bool
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
EState *estate;
MemoryContext oldcontext;
@@ -263,9 +271,11 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Initialize the plan state tree
*/
- InitPlan(queryDesc, eflags);
+ plan_valid = InitPlan(queryDesc, eflags);
MemoryContextSwitchTo(oldcontext);
+
+ return plan_valid;
}
/* ----------------------------------------------------------------
@@ -620,6 +630,17 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
RTEPermissionInfo *perminfo = lfirst_node(RTEPermissionInfo, l);
Assert(OidIsValid(perminfo->relid));
+
+ /*
+ * Relations whose permissions need to be checked must already have
+ * been locked by the parser or by GetCachedPlan() if a cached plan is
+ * being executed.
+ *
+ * XXX shouldn't we skip calling ExecCheckPermissions from InitPlan
+ * in a parallel worker?
+ */
+ Assert(CheckRelLockedByMe(perminfo->relid, AccessShareLock, true) ||
+ IsParallelWorker());
result = ExecCheckOneRelPerms(perminfo);
if (!result)
{
@@ -829,9 +850,12 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ * Returns true if the plan tree is successfully initialized for execution,
+ * false otherwise.
* ----------------------------------------------------------------
*/
-static void
+static bool
InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
@@ -839,7 +863,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
- PlanState *planstate;
+ PlanState *planstate = NULL;
TupleDesc tupType;
ListCell *l;
int i;
@@ -850,10 +874,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
/*
- * initialize the node's execution state
+ * Set up range table in EState.
*/
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+ estate->es_cachedplan = queryDesc->cplan;
estate->es_plannedstmt = plannedstmt;
/*
@@ -886,6 +911,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (!ExecPlanStillValid(estate))
+ goto plan_init_suspended;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -953,6 +980,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
sp_eflags |= EXEC_FLAG_REWIND;
subplanstate = ExecInitNode(subplan, estate, sp_eflags);
+ if (!ExecPlanStillValid(estate))
+ {
+ Assert(subplanstate == NULL);
+ goto plan_init_suspended;
+ }
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
@@ -966,6 +998,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ {
+ Assert(planstate == NULL);
+ goto plan_init_suspended;
+ }
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -1008,7 +1045,18 @@ InitPlan(QueryDesc *queryDesc, int eflags)
}
queryDesc->tupDesc = tupType;
+ Assert(planstate != NULL);
queryDesc->planstate = planstate;
+ return true;
+
+plan_init_suspended:
+ /*
+ * Plan initialization failed. Mark QueryDesc as such. ExecEndPlan()
+ * will clean up initialized plan nodes from estate->es_planstate_nodes.
+ */
+ Assert(planstate == NULL);
+ queryDesc->planstate = NULL;
+ return false;
}
/*
@@ -1426,7 +1474,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked by the planner or ExecLockAppendNonLeafRelations().
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
@@ -2856,7 +2904,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2943,6 +2992,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
PlanState *subplanstate;
subplanstate = ExecInitNode(subplan, rcestate, 0);
+
+ /*
+ * At this point, we had better not received any new invalidation
+ * messages that would have caused the plan tree to go stale.
+ */
+ Assert(ExecPlanStillValid(rcestate) && subplanstate);
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
}
@@ -2986,6 +3041,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
*/
epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
+ /*
+ * At this point, we had better not received any new invalidation messages
+ * that would have caused the plan tree to go stale.
+ */
+ Assert(ExecPlanStillValid(rcestate) && epqstate->recheckplanstate);
+
MemoryContextSwitchTo(oldcontext);
}
@@ -3008,6 +3069,10 @@ EvalPlanQualEnd(EPQState *epqstate)
MemoryContext oldcontext;
ListCell *l;
+ /* Nothing to do if EvalPlanQualInit() wasn't done to begin with. */
+ if (epqstate->parentestate == NULL)
+ return;
+
rtsize = epqstate->parentestate->es_range_table_size;
/*
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index cc2b8ccab7..bfa2a8ec18 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1248,8 +1248,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. Note that no CachedPlan is available
+ * here even if the leader may have gotten the plan tree from one. That's
+ * fine though, because the leader would have taken the locks necessary
+ * for the plan tree that we have here to be fully valid. That is true
+ * despite the fact that we will be taking our own copies of those locks
+ * in ExecGetRangeTableRelation(), because none of them would be the locks
+ * that are not already taken by the leader.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
@@ -1430,7 +1439,8 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
- ExecutorStart(queryDesc, fpes->eflags);
+ /* OK to ignore the return value; plan can't become invalid. */
+ (void) ExecutorStart(queryDesc, fpes->eflags);
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index eb8a87fd63..cf73d28baa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -513,6 +513,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
oldcxt = MemoryContextSwitchTo(proute->memcxt);
+ /*
+ * Note that while we normally check ExecPlanStillValid(estate) after each
+ * lock taken during execution initialization, it is fine not do so for
+ * partitions opened here, for tuple routing. Locks taken here can't
+ * possibly invalidate the plan given that the plan doesn't contain any
+ * info about those partitions.
+ */
partrel = table_open(partOid, RowExclusiveLock);
leaf_part_rri = makeNode(ResultRelInfo);
@@ -1111,6 +1118,9 @@ ExecInitPartitionDispatchInfo(EState *estate,
* Only sub-partitioned tables need to be locked here. The root
* partitioned table will already have been locked as it's referenced in
* the query's rtable.
+ *
+ * See the comment in ExecInitPartitionInfo() about taking locks and
+ * not checking ExecPlanStillValid(estate) here.
*/
if (partoid != RelationGetRelid(proute->partition_root))
rel = table_open(partoid, RowExclusiveLock);
@@ -1801,6 +1811,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/* Create the working data structure for pruning */
prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Perform an initial partition prune pass, if required.
@@ -1927,6 +1939,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 653f74cf58..2dcacafd03 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -135,10 +135,17 @@ static bool ExecShutdownNode_walker(PlanState *node, void *context);
* 'estate' is the shared execution state for the plan tree
* 'eflags' is a bitwise OR of flag bits described in executor.h
*
- * Returns a PlanState node corresponding to the given Plan node.
+ * Returns a PlanState node corresponding to the given Plan node or NULL.
*
- * As a side-effect, all PlanState nodes that are created are appended to
- * estate->es_planstate_nodes for the cleanup processing in ExecEndPlan().
+ * NULL may be returned either if the input node is NULL or if the plan
+ * tree that the node is a part of is found to have been invalidated when
+ * taking a lock on the relation mentioned in the node or in a child
+ * node. The latter case arises if the plan tree contains inheritance/
+ * partition child tables and is from a CachedPlan.
+ *
+ * As a side-effect, all PlanState nodes that are successfully created are
+ * appended to estate->es_planstate_nodes for the cleanup processing in
+ * ExecEndPlan().
* ------------------------------------------------------------------------
*/
PlanState *
@@ -391,6 +398,13 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
break;
}
+ if (!ExecPlanStillValid(estate))
+ {
+ Assert(result == NULL);
+ return NULL;
+ }
+
+ Assert(result != NULL);
ExecSetExecProcNode(result, result->ExecProcNode);
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index b567165003..ee12235b2f 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -806,7 +806,25 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (IsParallelWorker() ||
+ (estate->es_cachedplan != NULL && !rte->inFromCl))
+ {
+ /*
+ * Take a lock if we are a parallel worker or if this is a child
+ * table referenced in a cached plan.
+ *
+ * Parallel workers need to have their own local lock on the
+ * relation. This ensures sane behavior in case the parent process
+ * exits before we do.
+ *
+ * When executing a cached plan, child tables must be locked
+ * here, because plancache.c (GetCachedPlan()) would only have
+ * locked tables mentioned in the query, that is, tables whose
+ * RTEs' inFromCl is true.
+ */
+ rel = table_open(rte->relid, rte->rellockmode);
+ }
+ else
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -819,15 +837,6 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rellockmode == AccessShareLock ||
CheckRelationLockedByMe(rel, rte->rellockmode, false));
}
- else
- {
- /*
- * If we are a parallel worker, we need to obtain our own local
- * lock on the relation. This ensures sane behavior in case the
- * parent process exits before we do.
- */
- rel = table_open(rte->relid, rte->rellockmode);
- }
estate->es_relations[rti - 1] = rel;
}
@@ -835,6 +844,38 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
return rel;
}
+/*
+ * ExecLockAppendNonLeafRelations
+ * Lock non-leaf relations whose children are scanned by a given
+ * Append/MergeAppend node
+ */
+void
+ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids)
+{
+ ListCell *l;
+
+ /* This should get called only when executing cached plans. */
+ Assert(estate->es_cachedplan != NULL);
+ foreach(l, allpartrelids)
+ {
+ Bitmapset *partrelids = lfirst_node(Bitmapset, l);
+ int i;
+
+ /*
+ * Note that we don't lock the first member (i=0) of each bitmapset
+ * because it stands for the root parent mentioned in the query that
+ * should always have been locked before entering the executor.
+ */
+ i = 0;
+ while ((i = bms_next_member(partrelids, i)) > 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(i, estate);
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+}
+
/*
* ExecInitResultRelation
* Open relation given by the passed-in RT index and fill its
@@ -850,6 +891,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (!ExecPlanStillValid(estate))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f55424eb5a..4ddf4fd7a9 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -838,6 +838,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL, /* fmgr_sql() doesn't use CachedPlans */
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -862,7 +863,9 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
eflags = EXEC_FLAG_SKIP_TRIGGERS;
else
eflags = 0; /* default run-to-completion flags */
- ExecutorStart(es->qd, eflags);
+
+ /* OK to ignore the return value; plan can't become invalid. */
+ (void) ExecutorStart(es->qd, eflags);
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index e9d9ab6bdd..9553a85115 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3304,6 +3304,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize source tuple type.
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 9148d7d3b1..222434a84d 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -133,6 +133,25 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_syncdone = false;
appendstate->as_begun = false;
+ /*
+ * Lock non-leaf partitions whose leaf children are present in
+ * node->appendplans. Only need to do so if executing a cached
+ * plan, because child tables present in cached plans are not
+ * locked before execution.
+ *
+ * XXX - some of the non-leaf partitions may also be mentioned in
+ * part_prune_info, which if they are would get locked again in
+ * ExecInitPartitionPruning() because it calls
+ * ExecGetRangeTableRelation() which locks child tables.
+ */
+ if (estate->es_cachedplan)
+ {
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
+ }
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
@@ -147,6 +166,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
list_length(node->appendplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -221,6 +242,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
appendstate->as_first_partial_plan = firstvalid;
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 147592f7e2..53afcef21c 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -88,8 +88,9 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
foreach(l, node->bitmapplans)
{
initNode = (Plan *) lfirst(l);
- bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
- i++;
+ bitmapplanstates[i++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
/*
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index d58ee4f4e1..388a02ec99 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -760,11 +760,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 83ec9ede89..99015812a1 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -211,6 +211,7 @@ BitmapIndexScanState *
ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
{
BitmapIndexScanState *indexstate;
+ Relation indexRelation;
LOCKMODE lockmode;
/* check for unsupported flags */
@@ -262,7 +263,13 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
+ indexRelation = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ {
+ index_close(indexRelation, lockmode);
+ return NULL;
+ }
+ indexstate->biss_RelationDesc = indexRelation;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 736852a0ae..425f22ee48 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -89,8 +89,9 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
foreach(l, node->bitmapplans)
{
initNode = (Plan *) lfirst(l);
- bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
- i++;
+ bitmapplanstates[i++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
/*
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index bd42c65b29..91239cc500 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -61,6 +61,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
css->ss.ss_currentRelation = scan_rel;
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index e6616dd718..71495313db 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -173,6 +173,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -264,6 +266,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Tell the FDW to initialize the scan.
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index f7a69f185b..c5652aeb2d 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -89,6 +89,9 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index d357ff0c47..1191b9e420 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -108,6 +108,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Leader may access ExecProcNode result directly (if
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 2badcc7e60..b4c3044c1f 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -185,6 +185,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index edd2324384..b2119febb6 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -386,6 +386,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize our result slot and type. No need to build projection
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 8078d7f229..d5ff80660e 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -752,8 +752,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 52b146cfb8..785896e5ea 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1041,6 +1041,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0b43a9b969..ea8bef4b97 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -490,6 +490,7 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
{
IndexOnlyScanState *indexstate;
Relation currentRelation;
+ Relation indexRelation;
LOCKMODE lockmode;
TupleDesc tupDesc;
@@ -512,6 +513,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -564,7 +567,13 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->ioss_RelationDesc = index_open(node->indexid, lockmode);
+ indexRelation = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ {
+ index_close(indexRelation, lockmode);
+ return NULL;
+ }
+ indexstate->ioss_RelationDesc = indexRelation;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 4540c7781d..956e9e5543 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -904,6 +904,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
{
IndexScanState *indexstate;
Relation currentRelation;
+ Relation indexRelation;
LOCKMODE lockmode;
/*
@@ -925,6 +926,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -969,7 +972,13 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+ indexRelation = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ {
+ index_close(indexRelation, lockmode);
+ return NULL;
+ }
+ indexstate->iss_RelationDesc = indexRelation;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index a75099dd73..a1fc36a3f0 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -476,6 +476,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize child expressions
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 55de8d3d65..ff86a82b92 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -322,6 +322,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index ef04e9a8e7..8d02ac0ccb 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result type and slot. No need to initialize projection info
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 61578d4b5c..a994d48fb2 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -938,6 +938,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize return slot and type. No need to initialize projection info
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8aa64944c9..14d07c30e8 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -81,6 +81,25 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.state = estate;
mergestate->ps.ExecProcNode = ExecMergeAppend;
+ /*
+ * Lock non-leaf partitions whose leaf children are present in
+ * node->mergeplans. Only need to do so if executing a cached
+ * plan, because child tables present in cached plans are not
+ * locked before execution.
+ *
+ * XXX - some of the non-leaf partitions may also be mentioned in
+ * part_prune_info, which if they are would get locked again in
+ * ExecInitPartitionPruning() because it calls
+ * ExecGetRangeTableRelation() which locks child tables.
+ */
+ if (estate->es_cachedplan)
+ {
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
+ }
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
@@ -95,6 +114,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
list_length(node->mergeplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -151,6 +172,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
}
mergestate->ps.ps_ProjInfo = NULL;
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 7b530d9088..0d92ec278a 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1490,11 +1490,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index bdbaa4753b..2245a67397 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3984,6 +3984,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
node->epqParam, node->resultRelations);
@@ -4011,6 +4014,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
if (resultRelInfo != mtstate->rootResultRelInfo)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* For child result relations, store the root result relation
@@ -4038,6 +4043,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Do additional per-result-relation initialization.
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index 5cfb50a366..e24554f4f8 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -295,11 +295,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result slot, type and projection.
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index 4a388220ee..863bf2cc65 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -247,6 +247,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index aee31c7139..3f3de771d0 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index a100b144be..f2206af451 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -208,6 +208,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index d7e22b1dbb..22357e7a0e 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -125,6 +125,8 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 4da0f28f7b..b0b34cd14e 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,8 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index f7db9a3415..3535aa298c 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index 078d041c40..547203ebfd 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index bc55a82fc3..8b6629c939 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 2124c55ef5..613b377c7c 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -386,6 +386,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 862bd0330b..1b0a2d8083 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -529,6 +529,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 50babacdc8..ae9af8f21e 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -136,6 +136,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize result slot and type. Unique nodes do no projections, so
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 648cdadc32..77de2d0c22 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2458,6 +2458,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* initialize source tuple type (which is also the tuple type that we'll
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index d36ca35d3a..9c4ed74240 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1582,6 +1582,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
Snapshot snapshot;
MemoryContext oldcontext;
Portal portal;
+ bool plan_valid;
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -1623,6 +1624,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
_SPI_current->processed = 0;
_SPI_current->tuptable = NULL;
+replan:
/* Create the portal */
if (name == NULL || name[0] == '\0')
{
@@ -1766,15 +1768,23 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if the cached plan was found to have been invalidated when
+ * initializing one of the plan trees contained in it.
*/
- PortalStart(portal, paramLI, 0, snapshot);
+ plan_valid = PortalStart(portal, paramLI, 0, snapshot);
Assert(portal->strategy != PORTAL_MULTI_QUERY);
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
+ if (!plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/* Pop the SPI stack */
_SPI_end_call(true);
@@ -2552,6 +2562,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2669,6 +2680,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
@@ -2682,10 +2694,16 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
eflags = EXEC_FLAG_SKIP_TRIGGERS;
- ExecutorStart(qdesc, eflags);
+ if (!ExecutorStart(qdesc, eflags))
+ {
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ Assert(cplan);
+ ReleaseCachedPlan(cplan, plan_owner);
+ goto replan;
+ }
res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
-
FreeQueryDesc(qdesc);
}
else
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a672..c807e9cdcc 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -27,6 +27,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
@@ -364,6 +365,50 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
return false;
}
+/*
+ * CheckRelLockedByMe
+ *
+ * Returns true if current transaction holds a lock on the given relation of
+ * mode 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
+ */
+bool
+CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+ Oid dbId = get_rel_relisshared(relid) ? InvalidOid : MyDatabaseId;
+ LOCKTAG tag;
+
+ SET_LOCKTAG_RELATION(tag, dbId, relid);
+
+ if (LockHeldByMe(&tag, lockmode))
+ return true;
+
+ if (orstronger)
+ {
+ LOCKMODE slockmode;
+
+ for (slockmode = lockmode + 1;
+ slockmode <= MaxLockMode;
+ slockmode++)
+ {
+ if (LockHeldByMe(&tag, slockmode))
+ {
+#ifdef NOT_USED
+ /* Sometimes this might be useful for debugging purposes */
+ elog(WARNING, "lock mode %s substituted for %s on relation %s",
+ GetLockmodeName(tag.locktag_lockmethodid, slockmode),
+ GetLockmodeName(tag.locktag_lockmethodid, lockmode),
+ RelationGetRelationName(relation));
+#endif
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
/*
* LockHasWaitersRelation
*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 36cc99ec9c..88724a8d67 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1232,7 +1232,12 @@ exec_simple_query(const char *query_string)
/*
* Start the portal. No parameters here.
*/
- PortalStart(portal, NULL, 0, InvalidSnapshot);
+ {
+ bool plan_valid PG_USED_FOR_ASSERTS_ONLY;
+
+ plan_valid = PortalStart(portal, NULL, 0, InvalidSnapshot);
+ Assert(plan_valid);
+ }
/*
* Select the appropriate output format: text unless we are doing a
@@ -1737,6 +1742,7 @@ exec_bind_message(StringInfo input_message)
"commands ignored until end of transaction block"),
errdetail_abort()));
+replan:
/*
* Create the portal. Allow silent replacement of an existing portal only
* if the unnamed portal is specified.
@@ -2028,9 +2034,15 @@ exec_bind_message(StringInfo input_message)
PopActiveSnapshot();
/*
- * And we're ready to start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if the cached plan was found to have been invalidated when
+ * initializing one of the plan trees contained in it.
*/
- PortalStart(portal, params, 0, InvalidSnapshot);
+ if (!PortalStart(portal, params, 0, InvalidSnapshot))
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
/*
* Apply the result format requests to the portal.
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 701808f303..48cd6f4304 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -60,6 +60,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -72,6 +73,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan, if plan is from one */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -341,10 +343,12 @@ FetchStatementTargetList(Node *stmt)
* presently ignored for non-PORTAL_ONE_SELECT portals (it's only intended
* to be used for cursors).
*
- * On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * True is returned if portal is ready to accept PortalRun() calls, and the
+ * result tupdesc (if any) is known. False if the plan tree is no longer
+ * valid, in which case, the caller must retry after generating a new
+ * CachedPlan.
*/
-void
+bool
PortalStart(Portal portal, ParamListInfo params,
int eflags, Snapshot snapshot)
{
@@ -353,6 +357,7 @@ PortalStart(Portal portal, ParamListInfo params,
MemoryContext oldContext;
QueryDesc *queryDesc;
int myeflags = 0;
+ bool plan_valid = true;
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_DEFINED);
@@ -407,6 +412,7 @@ PortalStart(Portal portal, ParamListInfo params,
* set the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -431,8 +437,19 @@ PortalStart(Portal portal, ParamListInfo params,
else
myeflags = eflags;
- /* Call ExecutorStart to prepare the plan for execution. */
- ExecutorStart(queryDesc, myeflags);
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated during plan intialization.
+ */
+ if (!ExecutorStart(queryDesc, myeflags))
+ {
+ Assert(queryDesc->cplan);
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ plan_valid = false;
+ goto plan_init_failed;
+ }
/*
* This tells PortalCleanup to shut down the executor, though
@@ -525,7 +542,7 @@ PortalStart(Portal portal, ParamListInfo params,
* Create the QueryDesc. DestReceiver will be set in
* PortalRunMulti() before calling ExecutorRun().
*/
- queryDesc = CreateQueryDesc(plan,
+ queryDesc = CreateQueryDesc(plan, portal->cplan,
portal->sourceText,
!is_utility ?
GetActiveSnapshot() :
@@ -541,7 +558,20 @@ PortalStart(Portal portal, ParamListInfo params,
if (is_utility)
continue;
- ExecutorStart(queryDesc, myeflags);
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated
+ * during plan intialization.
+ */
+ if (!ExecutorStart(queryDesc, myeflags))
+ {
+ PopActiveSnapshot();
+ Assert(queryDesc->cplan);
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ plan_valid = false;
+ goto plan_init_failed;
+ }
PopActiveSnapshot();
}
}
@@ -563,12 +593,15 @@ PortalStart(Portal portal, ParamListInfo params,
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+plan_init_failed:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- portal->status = PORTAL_READY;
+ return plan_valid;
}
/*
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index fc6d267e44..2725d02312 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2095,6 +2095,27 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * get_rel_relisshared
+ *
+ * Returns if the given relation is shared or not
+ */
+bool
+get_rel_relisshared(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relisshared;
+ ReleaseSysCache(tp);
+
+ return result;
+}
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index d67cd9a405..c5a7616b33 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -102,13 +102,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,8 +790,15 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * If the plan contains any child relations that would have been added by the
+ * planner, they would not have been locked yet, because AcquirePlannerLocks()
+ * only locks relations that would be present in the original query's range
+ * table (that is, before entering the planner). So, the plan could go stale
+ * before it reaches execution if any of those child relations get modified
+ * concurrently. The executor must check that the plan (CachedPlan) is still
+ * valid after taking a lock on each of the child tables during the plan
+ * initialization phase, and if it is not, ask the caller to recreate the
+ * plan.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -805,60 +812,56 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
-
- return false;
+ return cplan->is_valid;
}
/*
@@ -1128,8 +1131,16 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid unless it contains inheritance/partition child
+ * tables, because they will not have been locked as here we only lock the
+ * tables mentioned in the original query. Inheritance/partition child tables
+ * are locked by the executor when initializing the plan tree and if the plan
+ * gets invalidated as a result of taking those locks, the executor must ask
+ * the caller to get a new plan by calling here again. Locking of the child
+ * tables is deferred to the executor in this manner, because not all child
+ * tables may need to be locked as some may get pruned during the executor
+ * plan initialization which performs initial pruing on any nodes that
+ * support partition pruning.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1164,7 +1175,10 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
{
if (CheckCachedPlan(plansource))
{
- /* We want a generic plan, and we already have a valid one */
+ /*
+ * We want a generic plan, and we already have a valid one, though
+ * see the header comment.
+ */
plan = plansource->gplan;
Assert(plan->magic == CACHEDPLAN_MAGIC);
}
@@ -1362,8 +1376,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if the executor would need to take additional locks, that is, in
+ * addition to those taken by AcquirePlannerLocks() on a given query.
*/
foreach(lc, plan->stmt_list)
{
@@ -1739,58 +1753,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 08ea852b65..392abb5150 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -88,7 +88,7 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt, struct CachedPlan *cplan,
const char *queryString, IntoClause *into, ExplainState *es,
ParamListInfo params, QueryEnvironment *queryEnv);
extern void ExplainOnePlan(QueryDesc *queryDesc,
@@ -108,6 +108,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index af2bf36dfb..4b7368a0dc 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -32,9 +32,12 @@
*/
typedef struct QueryDesc
{
+ NodeTag type;
+
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ struct CachedPlan *cplan; /* CachedPlan, if plannedstmt is from one */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +60,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ struct CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index c677e490d7..edf2f13d04 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -72,7 +73,7 @@
/* Hook for plugins to get control in ExecutorStart() */
-typedef void (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
+typedef bool (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
/* Hook for plugins to get control in ExecutorRun() */
@@ -197,8 +198,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
-extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
-extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count, bool execute_once);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
@@ -256,6 +257,17 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the cached plan, if any, still valid at this point? That is, not
+ * invalidated by the incoming invalidation messages that have been processed
+ * recently.
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanStillValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
@@ -590,6 +602,7 @@ exec_rt_fetch(Index rti, EState *estate)
}
extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
+extern void ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids);
extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Index rti);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 233fb6b4f9..20c1bacae1 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -623,6 +623,8 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct CachedPlan *es_cachedplan; /* CachedPlan if plannedstmt is from
+ * one */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 4ee91e3cf9..598bf2688a 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,7 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
bool orstronger);
+extern bool CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger);
extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/tcop/pquery.h b/src/include/tcop/pquery.h
index a5e65b98aa..577b81a9ee 100644
--- a/src/include/tcop/pquery.h
+++ b/src/include/tcop/pquery.h
@@ -29,7 +29,7 @@ extern List *FetchPortalTargetList(Portal portal);
extern List *FetchStatementTargetList(Node *stmt);
-extern void PortalStart(Portal portal, ParamListInfo params,
+extern bool PortalStart(Portal portal, ParamListInfo params,
int eflags, Snapshot snapshot);
extern void PortalSetResultFormat(Portal portal, int nFormats,
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index f5fdbfe116..a024e5dcd0 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -140,6 +140,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_relisshared(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 916e59d9fe..c83a67fea3 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -221,6 +221,20 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Called by the executor on every relation lock taken when initializing the
+ * plan tree in the CachedPlan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..2fca84d027 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-replan
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7cd76eb34b..ce189156ad 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2023, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,45 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static bool
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ bool plan_valid;
+
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ plan_valid = prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ plan_valid ? "valid" : "not valid");
+
+ return plan_valid;
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +127,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-replan.out b/src/test/modules/delay_execution/expected/cached-plan-replan.out
new file mode 100644
index 0000000000..0ac6a17c2b
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-replan.out
@@ -0,0 +1,156 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo11 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = $1)
+(6 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-----------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo11 foo_1
+ Filter: (a = $1)
+(4 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------
+Bitmap Heap Scan on foo11 foo
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = 1)
+(4 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------
+Seq Scan on foo11 foo
+ Filter: (a = 1)
+(2 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+----------------------------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Index Only Scan using foo11_a_idx on foo11 t1
+ -> Materialize
+ -> Index Scan using foo11_a_idx on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(18 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Sort
+ Sort Key: t1.a
+ -> Seq Scan on foo11 t1
+ -> Sort
+ Sort Key: t2.a
+ -> Seq Scan on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(21 rows)
+
diff --git a/src/test/modules/delay_execution/specs/cached-plan-replan.spec b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
new file mode 100644
index 0000000000..3c92cbd5c6
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
@@ -0,0 +1,61 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo1 PARTITION OF foo FOR VALUES IN (1) PARTITION BY LIST (a);
+ CREATE TABLE foo11 PARTITION OF foo1 FOR VALUES IN (1);
+ CREATE INDEX foo11_a ON foo1 (a);
+ CREATE TABLE foo2 PARTITION OF foo FOR VALUES IN (2);
+ CREATE VIEW foov AS SELECT * FROM foo;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP TABLE foo;
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# no Append case (only one partition selected by the planner)
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Append with partition-wise join aggregate and join plans as child subplans
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo11_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
--
2.35.3
v44-0001-Make-PlanState-tree-cleanup-non-recursive.patchapplication/octet-stream; name=v44-0001-Make-PlanState-tree-cleanup-non-recursive.patchDownload
From a55bd363690bc4c28047e4b874ce80384e37c49d Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 1 Aug 2023 11:36:24 +0900
Subject: [PATCH v44 1/6] Make PlanState tree cleanup non-recursive
With this change, node type specific subroutines of ExecEndNode()
are no longer required to also clean up the child nodes of a given
node, only its own stuff. Instead, ExecEndPlan() calls
ExecInitNode() directly for each node in the PlanState tree by
iterating over a list (EState.es_planstate_nodes) of all those nodes
built during the ExecInitNode() traversal of the tree.
This changes the order in which the nodes get cleaned up, because
they are now cleaned up in the order in which they are added into
the list which is from leaf-level up to the root, whereas with the
current recursive approach cleanup occurs from the root to the
leaves. The change seems harmless though, because there isn't
necessarily any coupling between of the cleanup actions of parent
and child nodes.
The main motivation behind this change is to allow the cases in
the future where ExecInitNode() traversal of the plan tree may
be aborted in the middle resulting in a partially initialized
PlanState tree. Dealing with that case by making the cleanup
phase walk over a list of successfully initialized nodes seems
better / more robust than making the individual ExecEndNode()
subroutines deal with partially valid PlanState nodes.
---
src/backend/executor/README | 4 +-
src/backend/executor/execMain.c | 36 +++++++-------
src/backend/executor/execProcnode.c | 56 ++++++++++------------
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeAgg.c | 4 +-
src/backend/executor/nodeAppend.c | 20 +-------
src/backend/executor/nodeBitmapAnd.c | 23 +--------
src/backend/executor/nodeBitmapHeapscan.c | 5 +-
src/backend/executor/nodeBitmapOr.c | 23 +--------
src/backend/executor/nodeForeignscan.c | 4 +-
src/backend/executor/nodeGather.c | 2 +-
src/backend/executor/nodeGatherMerge.c | 2 +-
src/backend/executor/nodeGroup.c | 5 +-
src/backend/executor/nodeHash.c | 8 +---
src/backend/executor/nodeHashjoin.c | 6 +--
src/backend/executor/nodeIncrementalSort.c | 5 +-
src/backend/executor/nodeLimit.c | 2 +-
src/backend/executor/nodeLockRows.c | 2 +-
src/backend/executor/nodeMaterial.c | 5 +-
src/backend/executor/nodeMemoize.c | 5 +-
src/backend/executor/nodeMergeAppend.c | 20 +-------
src/backend/executor/nodeMergejoin.c | 6 +--
src/backend/executor/nodeModifyTable.c | 7 +--
src/backend/executor/nodeNestloop.c | 6 +--
src/backend/executor/nodeProjectSet.c | 5 +-
src/backend/executor/nodeRecursiveunion.c | 6 +--
src/backend/executor/nodeResult.c | 5 +-
src/backend/executor/nodeSetOp.c | 2 +-
src/backend/executor/nodeSort.c | 5 +-
src/backend/executor/nodeSubqueryscan.c | 5 +-
src/backend/executor/nodeUnique.c | 2 +-
src/backend/executor/nodeWindowAgg.c | 4 +-
src/include/nodes/execnodes.h | 2 +
33 files changed, 80 insertions(+), 214 deletions(-)
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..67a5c1769b 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -310,13 +310,13 @@ This is a sketch of control flow for full query processing:
AfterTriggerEndQuery
ExecutorEnd
- ExecEndNode --- recursively releases resources
+ ExecEndPlan --- releases plan resources
FreeExecutorState
frees per-query context and child contexts
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4c5a7bbf62..235bb52ccc 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -82,7 +82,7 @@ ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
static void InitPlan(QueryDesc *queryDesc, int eflags);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
-static void ExecEndPlan(PlanState *planstate, EState *estate);
+static void ExecEndPlan(EState *estate);
static void ExecutePlan(EState *estate, PlanState *planstate,
bool use_parallel_mode,
CmdType operation,
@@ -500,7 +500,7 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
*/
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
- ExecEndPlan(queryDesc->planstate, estate);
+ ExecEndPlan(estate);
/* do away with our snapshots */
UnregisterSnapshot(estate->es_snapshot);
@@ -1499,23 +1499,21 @@ ExecPostprocessPlan(EState *estate)
* ----------------------------------------------------------------
*/
static void
-ExecEndPlan(PlanState *planstate, EState *estate)
+ExecEndPlan(EState *estate)
{
ListCell *l;
/*
- * shut down the node-type-specific query processing
+ * Shut down the node-type-specific query processing for all nodes that
+ * were initialized in InitPlan(). That includes the nodes in both the
+ * main plan tree (es_plannedstmt->planTree) and those in subplans
+ * (es_plannedstmt->subplans).
*/
- ExecEndNode(planstate);
-
- /*
- * for subplans too
- */
- foreach(l, estate->es_subplanstates)
+ foreach(l, estate->es_planstate_nodes)
{
- PlanState *subplanstate = (PlanState *) lfirst(l);
+ PlanState *pstate = (PlanState *) lfirst(l);
- ExecEndNode(subplanstate);
+ ExecEndNode(pstate);
}
/*
@@ -3030,13 +3028,17 @@ EvalPlanQualEnd(EPQState *epqstate)
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
- ExecEndNode(epqstate->recheckplanstate);
-
- foreach(l, estate->es_subplanstates)
+ /*
+ * Shut down the node-type-specific query processing for all nodes that
+ * were initialized in InitPlan(). That includes the nodes in both the
+ * main plan tree (epqstate->plan) and those in subplans
+ * (es_plannedstmt->subplans).
+ */
+ foreach(l, estate->es_planstate_nodes)
{
- PlanState *subplanstate = (PlanState *) lfirst(l);
+ PlanState *planstate = (PlanState *) lfirst(l);
- ExecEndNode(subplanstate);
+ ExecEndNode(planstate);
}
/* throw away the per-estate tuple table, some node may have used it */
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d4..653f74cf58 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -1,11 +1,13 @@
/*-------------------------------------------------------------------------
*
* execProcnode.c
- * contains dispatch functions which call the appropriate "initialize",
- * "get a tuple", and "cleanup" routines for the given node type.
- * If the node has children, then it will presumably call ExecInitNode,
- * ExecProcNode, or ExecEndNode on its subnodes and do the appropriate
- * processing.
+ * Contains dispatch functions ExecInitNode(), ExecProcNode(), and
+ * ExecEndNode(), which call the appropriate "initialize", "get a tuple",
+ * and "cleanup" routines, respectively, for the given node type.
+ *
+ * While the first two process the node's children recursively, ExecEndNode()
+ * is only concerned with the cleaning of the node itself while the children
+ * are processed by the caller.
*
* Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
@@ -49,7 +51,9 @@
* Eventually this calls ExecInitNode() on the right and left subplans
* and so forth until the entire plan is initialized. The result
* of ExecInitNode() is a plan state tree built with the same structure
- * as the underlying plan tree.
+ * as the underlying plan tree. (The plan state nodes are also added to
+ * a list in the same order in which they are created for the final
+ * cleanup processing.)
*
* * Then when ExecutorRun() is called, it calls ExecutePlan() which calls
* ExecProcNode() repeatedly on the top node of the plan state tree.
@@ -61,14 +65,10 @@
* form the tuples it returns.
*
* * Eventually ExecSeqScan() stops returning tuples and the nest
- * loop join ends. Lastly, ExecutorEnd() calls ExecEndNode() which
- * calls ExecEndNestLoop() which in turn calls ExecEndNode() on
- * its subplans which result in ExecEndSeqScan().
+ * loop join ends. Lastly, ExecutorEnd() calls ExecEndPlan(), which
+ * in turn calls ExecEndNode() on all the nodes that were initialized:
+ * the two Seq Scans and the Nest Loop in this case.
*
- * This should show how the executor works by having
- * ExecInitNode(), ExecProcNode() and ExecEndNode() dispatch
- * their work to the appropriate node support routines which may
- * in turn call these routines themselves on their subplans.
*/
#include "postgres.h"
@@ -136,6 +136,9 @@ static bool ExecShutdownNode_walker(PlanState *node, void *context);
* 'eflags' is a bitwise OR of flag bits described in executor.h
*
* Returns a PlanState node corresponding to the given Plan node.
+ *
+ * As a side-effect, all PlanState nodes that are created are appended to
+ * estate->es_planstate_nodes for the cleanup processing in ExecEndPlan().
* ------------------------------------------------------------------------
*/
PlanState *
@@ -411,6 +414,10 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
result->instrument = InstrAlloc(1, estate->es_instrument,
result->async_capable);
+ /* And remember for the cleanup processing in ExecEndPlan(). */
+ estate->es_planstate_nodes = lappend(estate->es_planstate_nodes,
+ result);
+
return result;
}
@@ -545,29 +552,18 @@ MultiExecProcNode(PlanState *node)
/* ----------------------------------------------------------------
* ExecEndNode
*
- * Recursively cleans up all the nodes in the plan rooted
- * at 'node'.
+ * Cleans up node
*
- * After this operation, the query plan will not be able to be
- * processed any further. This should be called only after
- * the query plan has been fully executed.
+ * Unlike ExecInitNode(), this does not recurse into child nodes, because
+ * they are processed separately. So the ExecEnd* routine for any given
+ * node type is only responsible for cleaning up its own resources.
* ----------------------------------------------------------------
*/
void
ExecEndNode(PlanState *node)
{
- /*
- * do nothing when we get to the end of a leaf on tree.
- */
- if (node == NULL)
- return;
-
- /*
- * Make sure there's enough stack available. Need to check here, in
- * addition to ExecProcNode() (via ExecProcNodeFirst()), because it's not
- * guaranteed that ExecProcNode() is reached for all nodes.
- */
- check_stack_depth();
+ /* We only ever get called on nodes that were actually initialized. */
+ Assert(node != NULL);
if (node->chgParam != NULL)
{
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c06b228858..b567165003 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -154,6 +154,8 @@ CreateExecutorState(void)
estate->es_exprcontexts = NIL;
+ estate->es_planstate_nodes = NIL;
+
estate->es_subplanstates = NIL;
estate->es_auxmodifytables = NIL;
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 468db94fe5..e9d9ab6bdd 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -4304,7 +4304,6 @@ GetAggInitVal(Datum textInitVal, Oid transtype)
void
ExecEndAgg(AggState *node)
{
- PlanState *outerPlan;
int transno;
int numGroupingSets = Max(node->maxsets, 1);
int setno;
@@ -4367,8 +4366,7 @@ ExecEndAgg(AggState *node)
/* clean up tuple table */
ExecClearTuple(node->ss.ss_ScanTupleSlot);
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ /* outerPlan is closely separately. */
}
void
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..9148d7d3b1 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -376,30 +376,12 @@ ExecAppend(PlanState *pstate)
/* ----------------------------------------------------------------
* ExecEndAppend
- *
- * Shuts down the subscans of the append node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndAppend(AppendState *node)
{
- PlanState **appendplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- appendplans = node->appendplans;
- nplans = node->as_nplans;
-
- /*
- * shut down each of the subscans
- */
- for (i = 0; i < nplans; i++)
- ExecEndNode(appendplans[i]);
+ /* Nothing to do as the nodes in appendplans are closed separately. */
}
void
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4c5eb2b23b..147592f7e2 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -168,33 +168,12 @@ MultiExecBitmapAnd(BitmapAndState *node)
/* ----------------------------------------------------------------
* ExecEndBitmapAnd
- *
- * Shuts down the subscans of the BitmapAnd node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndBitmapAnd(BitmapAndState *node)
{
- PlanState **bitmapplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- bitmapplans = node->bitmapplans;
- nplans = node->nplans;
-
- /*
- * shut down each of the subscans (that we've initialized)
- */
- for (i = 0; i < nplans; i++)
- {
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
- }
+ /* Nothing to do as the nodes in bitmapplans are closed separately. */
}
void
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index f35df0b8bf..d58ee4f4e1 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -667,10 +667,7 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
ExecClearTuple(node->ss.ss_ScanTupleSlot);
- /*
- * close down subplans
- */
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
/*
* release bitmaps and buffers if any
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 0bf8af9652..736852a0ae 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -186,33 +186,12 @@ MultiExecBitmapOr(BitmapOrState *node)
/* ----------------------------------------------------------------
* ExecEndBitmapOr
- *
- * Shuts down the subscans of the BitmapOr node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndBitmapOr(BitmapOrState *node)
{
- PlanState **bitmapplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- bitmapplans = node->bitmapplans;
- nplans = node->nplans;
-
- /*
- * shut down each of the subscans (that we've initialized)
- */
- for (i = 0; i < nplans; i++)
- {
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
- }
+ /* Nothing to do as the nodes in bitmapplans are closed separately. */
}
void
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index c2139acca0..e6616dd718 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -309,9 +309,7 @@ ExecEndForeignScan(ForeignScanState *node)
else
node->fdwroutine->EndForeignScan(node);
- /* Shut down any outer plan. */
- if (outerPlanState(node))
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
/* Free the exprcontext */
ExecFreeExprContext(&node->ss.ps);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 307fc10eea..f7a69f185b 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -248,7 +248,7 @@ ExecGather(PlanState *pstate)
void
ExecEndGather(GatherState *node)
{
- ExecEndNode(outerPlanState(node)); /* let children clean up first */
+ /* outerPlan is closed separately. */
ExecShutdownGather(node);
ExecFreeExprContext(&node->ps);
if (node->ps.ps_ResultTupleSlot)
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 9d5e1a46e9..d357ff0c47 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -288,7 +288,7 @@ ExecGatherMerge(PlanState *pstate)
void
ExecEndGatherMerge(GatherMergeState *node)
{
- ExecEndNode(outerPlanState(node)); /* let children clean up first */
+ /* outerPlan is closed separately. */
ExecShutdownGatherMerge(node);
ExecFreeExprContext(&node->ps);
if (node->ps.ps_ResultTupleSlot)
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 25a1618952..2badcc7e60 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -226,15 +226,12 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
void
ExecEndGroup(GroupState *node)
{
- PlanState *outerPlan;
-
ExecFreeExprContext(&node->ss.ps);
/* clean up tuple table */
ExecClearTuple(node->ss.ss_ScanTupleSlot);
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ /* outerPlan is closed separately. */
}
void
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 8b5c35b82b..edd2324384 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -413,18 +413,12 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
void
ExecEndHash(HashState *node)
{
- PlanState *outerPlan;
-
/*
* free exprcontext
*/
ExecFreeExprContext(&node->ps);
- /*
- * shut down the subplan
- */
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ /* outerPlan is closed separately. */
}
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 980746128b..8078d7f229 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -879,11 +879,7 @@ ExecEndHashJoin(HashJoinState *node)
ExecClearTuple(node->hj_OuterTupleSlot);
ExecClearTuple(node->hj_HashTupleSlot);
- /*
- * clean up subtrees
- */
- ExecEndNode(outerPlanState(node));
- ExecEndNode(innerPlanState(node));
+ /* outerPlan and innerPlan are closed separately. */
}
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 7683e3341c..52b146cfb8 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1101,10 +1101,7 @@ ExecEndIncrementalSort(IncrementalSortState *node)
node->prefixsort_state = NULL;
}
- /*
- * Shut down the subplan.
- */
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
SO_printf("ExecEndIncrementalSort: sort node shutdown\n");
}
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 425fbfc405..a75099dd73 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -535,7 +535,7 @@ void
ExecEndLimit(LimitState *node)
{
ExecFreeExprContext(&node->ps);
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
}
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index e459971d32..55de8d3d65 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -386,7 +386,7 @@ ExecEndLockRows(LockRowsState *node)
{
/* We may have shut down EPQ already, but no harm in another call */
EvalPlanQualEnd(&node->lr_epqstate);
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
}
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 09632678b0..ef04e9a8e7 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -251,10 +251,7 @@ ExecEndMaterial(MaterialState *node)
tuplestore_end(node->tuplestorestate);
node->tuplestorestate = NULL;
- /*
- * shut down the subplan
- */
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 4f04269e26..61578d4b5c 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -1100,10 +1100,7 @@ ExecEndMemoize(MemoizeState *node)
*/
ExecFreeExprContext(&node->ss.ps);
- /*
- * shut down the subplan
- */
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
}
void
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 21b5726e6e..8aa64944c9 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -310,30 +310,12 @@ heap_compare_slots(Datum a, Datum b, void *arg)
/* ----------------------------------------------------------------
* ExecEndMergeAppend
- *
- * Shuts down the subscans of the MergeAppend node.
- *
- * Returns nothing of interest.
* ----------------------------------------------------------------
*/
void
ExecEndMergeAppend(MergeAppendState *node)
{
- PlanState **mergeplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- mergeplans = node->mergeplans;
- nplans = node->ms_nplans;
-
- /*
- * shut down each of the subscans
- */
- for (i = 0; i < nplans; i++)
- ExecEndNode(mergeplans[i]);
+ /* Nothing to do as the nodes in mergeplans are closed separately. */
}
void
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 00f96d045e..7b530d9088 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1654,11 +1654,7 @@ ExecEndMergeJoin(MergeJoinState *node)
ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
ExecClearTuple(node->mj_MarkedTupleSlot);
- /*
- * shut down the subplans
- */
- ExecEndNode(innerPlanState(node));
- ExecEndNode(outerPlanState(node));
+ /* outerPlan and innerPlan are closed separately. */
MJ1_printf("ExecEndMergeJoin: %s\n",
"node processing ended");
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 2a5fec8d01..bdbaa4753b 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4397,7 +4397,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* ----------------------------------------------------------------
* ExecEndModifyTable
*
- * Shuts down the plan.
+ * Releases ModifyTable resources.
*
* Returns nothing of interest.
* ----------------------------------------------------------------
@@ -4461,10 +4461,7 @@ ExecEndModifyTable(ModifyTableState *node)
*/
EvalPlanQualEnd(&node->mt_epqstate);
- /*
- * shut down subplan
- */
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
}
void
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b3d52e69ec..5cfb50a366 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -374,11 +374,7 @@ ExecEndNestLoop(NestLoopState *node)
*/
ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- /*
- * close down subplans
- */
- ExecEndNode(outerPlanState(node));
- ExecEndNode(innerPlanState(node));
+ /* outerPlan and innerPlan are closed separately. */
NL1_printf("ExecEndNestLoop: %s\n",
"node processing ended");
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index f6ff3dc44c..4a388220ee 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -330,10 +330,7 @@ ExecEndProjectSet(ProjectSetState *node)
*/
ExecClearTuple(node->ps.ps_ResultTupleSlot);
- /*
- * shut down subplans
- */
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
}
void
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index e781003934..aee31c7139 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -281,11 +281,7 @@ ExecEndRecursiveUnion(RecursiveUnionState *node)
if (node->tableContext)
MemoryContextDelete(node->tableContext);
- /*
- * close down subplans
- */
- ExecEndNode(outerPlanState(node));
- ExecEndNode(innerPlanState(node));
+ /* outerPlan and innerPlan are closed separately. */
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 4219712d30..a100b144be 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -250,10 +250,7 @@ ExecEndResult(ResultState *node)
*/
ExecClearTuple(node->ps.ps_ResultTupleSlot);
- /*
- * shut down subplans
- */
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
}
void
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 4bc2406b89..f7db9a3415 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -590,7 +590,7 @@ ExecEndSetOp(SetOpState *node)
MemoryContextDelete(node->tableContext);
ExecFreeExprContext(&node->ps);
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
}
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index c6c72c6e67..078d041c40 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -317,10 +317,7 @@ ExecEndSort(SortState *node)
tuplesort_end((Tuplesortstate *) node->tuplesortstate);
node->tuplesortstate = NULL;
- /*
- * shut down the subplan
- */
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
SO1_printf("ExecEndSort: %s\n",
"sort node shutdown");
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 42471bfc04..bc55a82fc3 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -179,10 +179,7 @@ ExecEndSubqueryScan(SubqueryScanState *node)
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
ExecClearTuple(node->ss.ss_ScanTupleSlot);
- /*
- * close down subquery
- */
- ExecEndNode(node->subplan);
+ /* subplan is closed separately. */
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 45035d74fa..50babacdc8 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -173,7 +173,7 @@ ExecEndUnique(UniqueState *node)
ExecFreeExprContext(&node->ps);
- ExecEndNode(outerPlanState(node));
+ /* outerPlan is closed separately. */
}
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 310ac23e3a..648cdadc32 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2681,7 +2681,6 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
void
ExecEndWindowAgg(WindowAggState *node)
{
- PlanState *outerPlan;
int i;
release_partition(node);
@@ -2714,8 +2713,7 @@ ExecEndWindowAgg(WindowAggState *node)
pfree(node->perfunc);
pfree(node->peragg);
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ /* outerPlan is closed separately. */
}
/* -----------------
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cb714f4a19..233fb6b4f9 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -671,6 +671,8 @@ typedef struct EState
List *es_exprcontexts; /* List of ExprContexts within EState */
+ List *es_planstate_nodes; /* "flat" list of PlanState nodes */
+
List *es_subplanstates; /* List of PlanState for SubPlans */
List *es_auxmodifytables; /* List of secondary ModifyTableStates */
--
2.35.3
On Thu, Aug 3, 2023 at 4:37 AM Amit Langote <amitlangote09@gmail.com> wrote:
Here's a patch set where the refactoring to move the ExecutorStart()
calls to be closer to GetCachedPlan() (for the call sites that use a
CachedPlan) is extracted into a separate patch, 0002. Its commit
message notes an aspect of this refactoring that I feel a bit nervous
about -- needing to also move the CommandCounterIncrement() call from
the loop in PortalRunMulti() to PortalStart() which now does
ExecutorStart() for the PORTAL_MULTI_QUERY case.
I spent some time today reviewing 0001. Here are a few thoughts and
notes about things that I looked at.
First, I wondered whether it was really adequate for ExecEndPlan() to
just loop over estate->es_plan_nodes and call it good. Put
differently, is it possible that we could ever have more than one
relevant EState, say for a subplan or an EPQ execution or something,
so that this loop wouldn't cover everything? I found nothing to make
me think that this is a real danger.
Second, I wondered whether the ordering of cleanup operations could be
an issue. Right now, a node can position cleanup code before, after,
or both before and after recursing to child nodes, whereas with this
design change, the cleanup code will always be run before recursing to
child nodes. Here, I think we have problems. Both ExecGather and
ExecEndGatherMerge intentionally clean up the children before the
parent, so that the child shutdown happens before
ExecParallelCleanup(). Based on the comment and commit
acf555bc53acb589b5a2827e65d655fa8c9adee0, this appears to be
intentional, and you can sort of see why from looking at the stuff
that happens in ExecParallelCleanup(). If the instrumentation data
vanishes before the child nodes have a chance to clean things up,
maybe EXPLAIN ANALYZE won't reflect that instrumentation any more. If
the DSA vanishes, maybe we'll crash if we try to access it. If we
actually reach DestroyParallelContext(), we're just going to start
killing the workers. None of that sounds like what we want.
The good news, of a sort, is that I think this might be the only case
of this sort of problem. Most nodes recurse at the end, after doing
all the cleanup, so the behavior won't change. Moreover, even if it
did, most cleanup operations look pretty localized -- they affect only
the node itself, and not its children. A somewhat interesting case is
nodes associated with subplans. Right now, because of the coding of
ExecEndPlan, nodes associated with subplans are all cleaned up at the
very end, after everything that's not inside of a subplan. But with
this change, they'd get cleaned up in the order of initialization,
which actually seems more natural, as long as it doesn't break
anything, which I think it probably won't, since as I mention in most
cases node cleanup looks quite localized, i.e. it doesn't care whether
it happens before or after the cleanup of other nodes.
I think something will have to be done about the parallel query stuff,
though. I'm not sure exactly what. It is a little weird that Gather
and Gather Merge treat starting and killing workers as a purely
"private matter" that they can decide to handle without the executor
overall being very much aware of it. So maybe there's a way that some
of the cleanup logic here could be hoisted up into the general
executor machinery, that is, first end all the nodes, and then go
back, and end all the parallelism using, maybe, another list inside of
the estate. However, I think that the existence of ExecShutdownNode()
is a complication here -- we need to make sure that we don't break
either the case where that happen before overall plan shutdown, or the
case where it doesn't.
Third, a couple of minor comments on details of how you actually made
these changes in the patch set. Personally, I would remove all of the
"is closed separately" comments that you added. I think it's a
violation of the general coding principle that you should make the
code look like it's always been that way. Sure, in the immediate
future, people might wonder why you don't need to recurse, but 5 or 10
years from now that's just going to be clutter. Second, in the cases
where the ExecEndNode functions end up completely empty, I would
suggest just removing the functions entirely and making the switch
that dispatches on the node type have a switch case that lists all the
nodes that don't need a callback here and say /* Nothing do for these
node types */ break;. This will save a few CPU cycles and I think it
will be easier to read as well.
Fourth, I wonder whether we really need this patch at all. I initially
thought we did, because if we abandon the initialization of a plan
partway through, then we end up with a plan that is in a state that
previously would never have occurred, and we still have to be able to
clean it up. However, perhaps it's a difference without a distinction.
Say we have a partial plan tree, where not all of the PlanState nodes
ever got created. We then just call the existing version of
ExecEndPlan() on it, with no changes. What goes wrong? Sure, we might
call ExecEndNode() on some null pointers where in the current world
there would always be valid pointers, but ExecEndNode() will handle
that just fine, by doing nothing for those nodes, because it starts
with a NULL-check.
Another alternative design might be to switch ExecEndNode to use
planstate_tree_walker to walk the node tree, removing the walk from
the node-type-specific functions as in this patch, and deleting the
end-node functions that are no longer required altogether, as proposed
above. I somehow feel that this would be cleaner than the status quo,
but here again, I'm not sure we really need it. planstate_tree_walker
would just pass over any NULL pointers that it found without doing
anything, but the current code does that too, so while this might be
more beautiful than what we have now, I'm not sure that there's any
real reason to do it. The fact that, like the current patch, it would
change the order in which nodes are cleaned up is also an issue -- the
Gather/Gather Merge ordering issues might be easier to handle this way
with some hack in ExecEndNode() than they are with the design you have
now, but we'd still have to do something about them, I believe.
Sorry if this is a bit of a meandering review, but those are my thoughts.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
Second, I wondered whether the ordering of cleanup operations could be
an issue. Right now, a node can position cleanup code before, after,
or both before and after recursing to child nodes, whereas with this
design change, the cleanup code will always be run before recursing to
child nodes. Here, I think we have problems. Both ExecGather and
ExecEndGatherMerge intentionally clean up the children before the
parent, so that the child shutdown happens before
ExecParallelCleanup(). Based on the comment and commit
acf555bc53acb589b5a2827e65d655fa8c9adee0, this appears to be
intentional, and you can sort of see why from looking at the stuff
that happens in ExecParallelCleanup().
Right, I doubt that changing that is going to work out well.
Hash joins might have issues with it too.
Could it work to make the patch force child cleanup before parent,
instead of after? Or would that break other places?
On the whole though I think it's probably a good idea to leave
parent nodes in control of the timing, so I kind of side with
your later comment about whether we want to change this at all.
regards, tom lane
On Mon, Aug 7, 2023 at 11:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Right, I doubt that changing that is going to work out well.
Hash joins might have issues with it too.
I thought about the case, because Hash and Hash Join are such closely
intertwined nodes, but I don't see any problem there. It doesn't
really look like it would matter in what order things got cleaned up.
Unless I'm missing something, all of the data structures are just
independent things that we have to get rid of sometime.
Could it work to make the patch force child cleanup before parent,
instead of after? Or would that break other places?
To me, it seems like the overwhelming majority of the code simply
doesn't care. You could pick an order out of a hat and it would be
100% OK. But I haven't gone and looked through it with this specific
idea in mind.
On the whole though I think it's probably a good idea to leave
parent nodes in control of the timing, so I kind of side with
your later comment about whether we want to change this at all.
My overall feeling here is that what Gather and Gather Merge is doing
is pretty weird. I think I kind of knew that at the time this was all
getting implemented and reviewed, but I wasn't keen to introduce more
infrastructure changes than necessary given that parallel query, as a
project, was still pretty new and I didn't want to give other hackers
more reasons to be unhappy with what was already a lot of very
wide-ranging change to the system. A good number of years having gone
by now, and other people having worked on that code some more, I'm not
too worried about someone calling for a wholesale revert of parallel
query. However, there's a second problem here as well, which is that
I'm still not sure what the right thing to do is. We've fiddled around
with the shutdown sequence for parallel query a number of times now,
and I think there's still stuff that doesn't work quite right,
especially around getting all of the instrumentation data back to the
leader. I haven't spent enough time on this recently enough to be sure
what if any problems remain, though.
So on the one hand, I don't really like the fact that we have an
ad-hoc recursion arrangement here, instead of using
planstate_tree_walker or, as Amit proposes, a List. Giving subordinate
nodes control over the ordering when they don't really need it just
means we have more code with more possibility for bugs and less
certainty about whether the theoretical flexibility is doing anything
in practice. But on the other hand, because we know that at least for
the Gather/GatherMerge case it seems like it probably matters
somewhat, it definitely seems appealing not to change anything as part
of this patch set that we don't really have to.
I've had it firmly in my mind here that we were going to need to
change something somehow -- I mean, the possibility of returning in
the middle of node initialization seems like a pretty major change to
the way this stuff works, and it seems hard for me to believe that we
can just do that and not have to adjust any code anywhere else. Can it
really be true that we can do that and yet not end up creating any
states anywhere with which the current cleanup code is unprepared to
cope? Maybe, but it would seem like rather good luck if that's how it
shakes out. Still, at the moment, I'm having a hard time understanding
what this particular change buys us.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Aug 8, 2023 at 12:36 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Aug 3, 2023 at 4:37 AM Amit Langote <amitlangote09@gmail.com> wrote:
Here's a patch set where the refactoring to move the ExecutorStart()
calls to be closer to GetCachedPlan() (for the call sites that use a
CachedPlan) is extracted into a separate patch, 0002. Its commit
message notes an aspect of this refactoring that I feel a bit nervous
about -- needing to also move the CommandCounterIncrement() call from
the loop in PortalRunMulti() to PortalStart() which now does
ExecutorStart() for the PORTAL_MULTI_QUERY case.I spent some time today reviewing 0001. Here are a few thoughts and
notes about things that I looked at.
Thanks for taking a look at this.
First, I wondered whether it was really adequate for ExecEndPlan() to
just loop over estate->es_plan_nodes and call it good. Put
differently, is it possible that we could ever have more than one
relevant EState, say for a subplan or an EPQ execution or something,
so that this loop wouldn't cover everything? I found nothing to make
me think that this is a real danger.
Check.
Second, I wondered whether the ordering of cleanup operations could be
an issue. Right now, a node can position cleanup code before, after,
or both before and after recursing to child nodes, whereas with this
design change, the cleanup code will always be run before recursing to
child nodes.
Because a node is appended to es_planstate_nodes at the end of
ExecInitNode(), child nodes get added before their parent nodes. So
the children are cleaned up first.
Here, I think we have problems. Both ExecGather and
ExecEndGatherMerge intentionally clean up the children before the
parent, so that the child shutdown happens before
ExecParallelCleanup(). Based on the comment and commit
acf555bc53acb589b5a2827e65d655fa8c9adee0, this appears to be
intentional, and you can sort of see why from looking at the stuff
that happens in ExecParallelCleanup(). If the instrumentation data
vanishes before the child nodes have a chance to clean things up,
maybe EXPLAIN ANALYZE won't reflect that instrumentation any more. If
the DSA vanishes, maybe we'll crash if we try to access it. If we
actually reach DestroyParallelContext(), we're just going to start
killing the workers. None of that sounds like what we want.The good news, of a sort, is that I think this might be the only case
of this sort of problem. Most nodes recurse at the end, after doing
all the cleanup, so the behavior won't change. Moreover, even if it
did, most cleanup operations look pretty localized -- they affect only
the node itself, and not its children. A somewhat interesting case is
nodes associated with subplans. Right now, because of the coding of
ExecEndPlan, nodes associated with subplans are all cleaned up at the
very end, after everything that's not inside of a subplan. But with
this change, they'd get cleaned up in the order of initialization,
which actually seems more natural, as long as it doesn't break
anything, which I think it probably won't, since as I mention in most
cases node cleanup looks quite localized, i.e. it doesn't care whether
it happens before or after the cleanup of other nodes.I think something will have to be done about the parallel query stuff,
though. I'm not sure exactly what. It is a little weird that Gather
and Gather Merge treat starting and killing workers as a purely
"private matter" that they can decide to handle without the executor
overall being very much aware of it. So maybe there's a way that some
of the cleanup logic here could be hoisted up into the general
executor machinery, that is, first end all the nodes, and then go
back, and end all the parallelism using, maybe, another list inside of
the estate. However, I think that the existence of ExecShutdownNode()
is a complication here -- we need to make sure that we don't break
either the case where that happen before overall plan shutdown, or the
case where it doesn't.
Given that children are closed before parent, the order of operations
in ExecEndGather[Merge] is unchanged.
Third, a couple of minor comments on details of how you actually made
these changes in the patch set. Personally, I would remove all of the
"is closed separately" comments that you added. I think it's a
violation of the general coding principle that you should make the
code look like it's always been that way. Sure, in the immediate
future, people might wonder why you don't need to recurse, but 5 or 10
years from now that's just going to be clutter. Second, in the cases
where the ExecEndNode functions end up completely empty, I would
suggest just removing the functions entirely and making the switch
that dispatches on the node type have a switch case that lists all the
nodes that don't need a callback here and say /* Nothing do for these
node types */ break;. This will save a few CPU cycles and I think it
will be easier to read as well.
I agree with both suggestions.
Fourth, I wonder whether we really need this patch at all. I initially
thought we did, because if we abandon the initialization of a plan
partway through, then we end up with a plan that is in a state that
previously would never have occurred, and we still have to be able to
clean it up. However, perhaps it's a difference without a distinction.
Say we have a partial plan tree, where not all of the PlanState nodes
ever got created. We then just call the existing version of
ExecEndPlan() on it, with no changes. What goes wrong? Sure, we might
call ExecEndNode() on some null pointers where in the current world
there would always be valid pointers, but ExecEndNode() will handle
that just fine, by doing nothing for those nodes, because it starts
with a NULL-check.
Well, not all cleanup actions for a given node type are a recursive
call to ExecEndNode(), some are also things like this:
/*
* clean out the tuple table
*/
ExecClearTuple(node->ps.ps_ResultTupleSlot);
But should ExecInitNode() subroutines return the partially initialized
PlanState node or NULL on detecting invalidation? If I'm
understanding how you think this should be working correctly, I think
you mean the former, because if it were the latter, ExecInitNode()
would end up returning NULL at the top for the root and then there's
nothing to pass to ExecEndNode(), so no way to clean up to begin with.
In that case, I think we will need to adjust ExecEndNode() subroutines
to add `if (node->ps.ps_ResultTupleSlot)` in the above code, for
example. That's something Tom had said he doesn't like very much [1].
Some node types such as Append, BitmapAnd, etc. that contain a list of
subplans would need some adjustment, such as using palloc0 for
as_appendplans[], etc. so that uninitialized subplans have NULL in the
array.
There are also issues around ForeignScan, CustomScan
ExecEndNode()-time callbacks when they are partially initialized -- is
it OK to call the *EndScan callback if the *BeginScan one may not have
been called to begin with? Though, perhaps we can adjust the
ExecInitNode() subroutines for those to return NULL by opening the
relation and checking for invalidation at the beginning instead of in
the middle. That should be done for all Scan or leaf-level node
types.
Anyway, I guess, for the patch's purpose, maybe we should bite the
bullet and make those adjustments rather than change ExecEndNode() as
proposed. I can give that another try.
Another alternative design might be to switch ExecEndNode to use
planstate_tree_walker to walk the node tree, removing the walk from
the node-type-specific functions as in this patch, and deleting the
end-node functions that are no longer required altogether, as proposed
above. I somehow feel that this would be cleaner than the status quo,
but here again, I'm not sure we really need it. planstate_tree_walker
would just pass over any NULL pointers that it found without doing
anything, but the current code does that too, so while this might be
more beautiful than what we have now, I'm not sure that there's any
real reason to do it. The fact that, like the current patch, it would
change the order in which nodes are cleaned up is also an issue -- the
Gather/Gather Merge ordering issues might be easier to handle this way
with some hack in ExecEndNode() than they are with the design you have
now, but we'd still have to do something about them, I believe.
It might be interesting to see if introducing planstate_tree_walker()
in ExecEndNode() makes it easier to reason about ExecEndNode()
generally speaking, but I think you may be that doing so may not
really make matters easier for the partially initialized planstate
tree case.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
On Tue, Aug 8, 2023 at 10:32 AM Amit Langote <amitlangote09@gmail.com> wrote:
But should ExecInitNode() subroutines return the partially initialized
PlanState node or NULL on detecting invalidation? If I'm
understanding how you think this should be working correctly, I think
you mean the former, because if it were the latter, ExecInitNode()
would end up returning NULL at the top for the root and then there's
nothing to pass to ExecEndNode(), so no way to clean up to begin with.
In that case, I think we will need to adjust ExecEndNode() subroutines
to add `if (node->ps.ps_ResultTupleSlot)` in the above code, for
example. That's something Tom had said he doesn't like very much [1].
Yeah, I understood Tom's goal as being "don't return partially
initialized nodes."
Personally, I'm not sure that's an important goal. In fact, I don't
even think it's a desirable one. It doesn't look difficult to audit
the end-node functions for cases where they'd fail if a particular
pointer were NULL instead of pointing to some real data, and just
fixing all such cases to have NULL-tests looks like purely mechanical
work that we are unlikely to get wrong. And at least some cases
wouldn't require any changes at all.
If we don't do that, the complexity doesn't go away. It just moves
someplace else. Presumably what we do in that case is have
ExecInitNode functions undo any initialization that they've already
done before returning NULL. There are basically two ways to do that.
Option one is to add code at the point where they return early to
clean up anything they've already initialized, but that code is likely
to substantially duplicate whatever the ExecEndNode function already
knows how to do, and it's very easy for logic like this to get broken
if somebody rearranges an ExecInitNode function down the road. Option
two is to rearrange the ExecInitNode functions now, to open relations
or recurse at the beginning, so that we discover the need to fail
before we initialize anything. That restricts our ability to further
rearrange the functions in future somewhat, but more importantly,
IMHO, it introduces more risk right now. Checking that the ExecEndNode
function will not fail if some pointers are randomly null is a lot
easier than checking that changing the order of operations in an
ExecInitNode function breaks nothing.
I'm not here to say that we can't do one of those things. But I think
adding null-tests to ExecEndNode functions looks like *far* less work
and *way* less risk.
There's a second issue here, too, which is when we abort ExecInitNode
partway through, how do we signal that? You're rightly pointing out
here that if we do that by returning NULL, then we don't do it by
returning a pointer to the partially initialized node that we just
created, which means that we either need to store those partially
initialized nodes in a separate data structure as you propose to do in
0001, or else we need to pick a different signalling convention. We
could change (a) ExecInitNode to have an additional argument, bool
*kaboom, or (b) we could make it return bool and return the node
pointer via a new additional argument, or (c) we could put a Boolean
flag into the estate and let the function signal failure by flipping
the value of the flag. If we do any of those things, then as far as I
can see 0001 is unnecessary. If we do none of them but also avoid
creating partially initialized nodes by one of the two techniques
mentioned two paragraphs prior, then 0001 is also unnecessary. If we
do none of them but do create partially initialized nodes, then we
need 0001.
So if this were a restaurant menu, then it might look like this:
Prix Fixe Menu (choose one from each)
First Course - How do we clean up after partial initialization?
(1) ExecInitNode functions produce partially initialized nodes
(2) ExecInitNode functions get refactored so that the stuff that can
cause early exit always happens first, so that no cleanup is ever
needed
(3) ExecInitNode functions do any required cleanup in situ
Second Course - How do we signal that initialization stopped early?
(A) Return NULL.
(B) Add a bool * out-parmeter to ExecInitNode.
(C) Add a Node * out-parameter to ExecInitNode and change the return
value to bool.
(D) Add a bool to the EState.
(E) Something else, maybe.
I think that we need 0001 if we choose specifically (1) and (A). My
gut feeling is that the least-invasive way to do this project is to
choose (1) and (D). My second choice would be (1) and (C), and my
third choice would be (1) and (A). If I can't have (1), I think I
prefer (2) over (3), but I also believe I prefer hiding in a deep hole
to either of them. Maybe I'm not seeing the whole picture correctly
here, but both (2) and (3) look awfully painful to me.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Aug 9, 2023 at 1:05 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 8, 2023 at 10:32 AM Amit Langote <amitlangote09@gmail.com> wrote:
But should ExecInitNode() subroutines return the partially initialized
PlanState node or NULL on detecting invalidation? If I'm
understanding how you think this should be working correctly, I think
you mean the former, because if it were the latter, ExecInitNode()
would end up returning NULL at the top for the root and then there's
nothing to pass to ExecEndNode(), so no way to clean up to begin with.
In that case, I think we will need to adjust ExecEndNode() subroutines
to add `if (node->ps.ps_ResultTupleSlot)` in the above code, for
example. That's something Tom had said he doesn't like very much [1].Yeah, I understood Tom's goal as being "don't return partially
initialized nodes."Personally, I'm not sure that's an important goal. In fact, I don't
even think it's a desirable one. It doesn't look difficult to audit
the end-node functions for cases where they'd fail if a particular
pointer were NULL instead of pointing to some real data, and just
fixing all such cases to have NULL-tests looks like purely mechanical
work that we are unlikely to get wrong. And at least some cases
wouldn't require any changes at all.If we don't do that, the complexity doesn't go away. It just moves
someplace else. Presumably what we do in that case is have
ExecInitNode functions undo any initialization that they've already
done before returning NULL. There are basically two ways to do that.
Option one is to add code at the point where they return early to
clean up anything they've already initialized, but that code is likely
to substantially duplicate whatever the ExecEndNode function already
knows how to do, and it's very easy for logic like this to get broken
if somebody rearranges an ExecInitNode function down the road.
Yeah, I too am not a fan of making ExecInitNode() clean up partially
initialized nodes.
Option
two is to rearrange the ExecInitNode functions now, to open relations
or recurse at the beginning, so that we discover the need to fail
before we initialize anything. That restricts our ability to further
rearrange the functions in future somewhat, but more importantly,
IMHO, it introduces more risk right now. Checking that the ExecEndNode
function will not fail if some pointers are randomly null is a lot
easier than checking that changing the order of operations in an
ExecInitNode function breaks nothing.I'm not here to say that we can't do one of those things. But I think
adding null-tests to ExecEndNode functions looks like *far* less work
and *way* less risk.
+1
There's a second issue here, too, which is when we abort ExecInitNode
partway through, how do we signal that? You're rightly pointing out
here that if we do that by returning NULL, then we don't do it by
returning a pointer to the partially initialized node that we just
created, which means that we either need to store those partially
initialized nodes in a separate data structure as you propose to do in
0001,or else we need to pick a different signalling convention. We
could change (a) ExecInitNode to have an additional argument, bool
*kaboom, or (b) we could make it return bool and return the node
pointer via a new additional argument, or (c) we could put a Boolean
flag into the estate and let the function signal failure by flipping
the value of the flag.
The failure can already be detected by seeing that
ExecPlanIsValid(estate) is false. The question is what ExecInitNode()
or any of its subroutines should return once it is. I think the
following convention works:
Return partially initialized state from ExecInit* function where we
detect the invalidation after calling ExecInitNode() on a child plan,
so that ExecEndNode() can recurse to clean it up.
Return NULL from ExecInit* functions where we detect the invalidation
after opening and locking a relation but before calling ExecInitNode()
to initialize a child plan if there's one at all. Even if we may set
things like ExprContext, TupleTableSlot fields, they are cleaned up
independently of the plan tree anyway via the cleanup called with
es_exprcontexts, es_tupleTable, respectively. I even noticed bits
like this in ExecEnd* functions:
- /*
- * Free the exprcontext(s) ... now dead code, see ExecFreeExprContext
- */
-#ifdef NOT_USED
- ExecFreeExprContext(&node->ss.ps);
- if (node->ioss_RuntimeContext)
- FreeExprContext(node->ioss_RuntimeContext, true);
-#endif
So, AFAICS, ExprContext, TupleTableSlot cleanup in ExecNode* functions
is unnecessary but remain around because nobody cared about and got
around to getting rid of it.
If we do any of those things, then as far as I
can see 0001 is unnecessary. If we do none of them but also avoid
creating partially initialized nodes by one of the two techniques
mentioned two paragraphs prior, then 0001 is also unnecessary. If we
do none of them but do create partially initialized nodes, then we
need 0001.So if this were a restaurant menu, then it might look like this:
Prix Fixe Menu (choose one from each)
First Course - How do we clean up after partial initialization?
(1) ExecInitNode functions produce partially initialized nodes
(2) ExecInitNode functions get refactored so that the stuff that can
cause early exit always happens first, so that no cleanup is ever
needed
(3) ExecInitNode functions do any required cleanup in situSecond Course - How do we signal that initialization stopped early?
(A) Return NULL.
(B) Add a bool * out-parmeter to ExecInitNode.
(C) Add a Node * out-parameter to ExecInitNode and change the return
value to bool.
(D) Add a bool to the EState.
(E) Something else, maybe.I think that we need 0001 if we choose specifically (1) and (A). My
gut feeling is that the least-invasive way to do this project is to
choose (1) and (D). My second choice would be (1) and (C), and my
third choice would be (1) and (A). If I can't have (1), I think I
prefer (2) over (3), but I also believe I prefer hiding in a deep hole
to either of them. Maybe I'm not seeing the whole picture correctly
here, but both (2) and (3) look awfully painful to me.
I think what I've ended up with in the attached 0001 (WIP) is both
(1), (2), and (D). As mentioned above, (D) is implemented with the
ExecPlanStillValid() function.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v45-0006-Track-opened-range-table-relations-in-a-List-in-.patchapplication/octet-stream; name=v45-0006-Track-opened-range-table-relations-in-a-List-in-.patchDownload
From 77a07e115a18527ebe9312d703d5c76d94fc1f84 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:49 +0900
Subject: [PATCH v45 6/6] Track opened range table relations in a List in
EState
This makes ExecCloseRangeTableRelations faster when there are many
relations in the range table but only a few are opened during
execution, such as when run-time pruning kicks in on an Append
containing thousands of partition subplans.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 9 +++++----
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 2 ++
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 1197ff2bf2..d678940a3f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1640,12 +1640,13 @@ ExecCloseResultRelations(EState *estate)
void
ExecCloseRangeTableRelations(EState *estate)
{
- int i;
+ ListCell *lc;
- for (i = 0; i < estate->es_range_table_size; i++)
+ foreach(lc, estate->es_opened_relations)
{
- if (estate->es_relations[i])
- table_close(estate->es_relations[i], NoLock);
+ Relation rel = lfirst(lc);
+
+ table_close(rel, NoLock);
}
}
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index af92d2b3c3..f0320cfa34 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -837,6 +837,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
}
estate->es_relations[rti - 1] = rel;
+ estate->es_opened_relations = lappend(estate->es_opened_relations,
+ rel);
}
return rel;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 719a728319..06bf829b78 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,8 @@ typedef struct EState
Index es_range_table_size; /* size of the range table arrays */
Relation *es_relations; /* Array of per-range-table-entry Relation
* pointers, or NULL if not yet opened */
+ List *es_opened_relations; /* List of non-NULL entries in
+ * es_relations in no specific order */
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
--
2.35.3
v45-0003-Add-field-to-store-parent-relids-to-Append-Merge.patchapplication/octet-stream; name=v45-0003-Add-field-to-store-parent-relids-to-Append-Merge.patchDownload
From 0a9161783a415ea5593514cb4d650cd3bc6c0601 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:31 +0900
Subject: [PATCH v45 3/6] Add field to store parent relids to
Append/MergeAppend
There's no way currently in the executor to tell if the child
subplans of Append/MergeAppend are scanning partitions, and if
they indeed do, what the RT indexes of their parent/ancestor tables
are. Executor doesn't need to see their RT indexes except for
run-time pruning, in which case they can can be found in the
PartitionPruneInfo, but a future commit will create a need for
them to be available at all times for the purpose of locking
those parent/ancestor tables when executing a cached plan.
The code to look up partitioned parent relids for a given list of
partition scan subpaths of an Append/MergeAppend is already present
in make_partition_pruneinfo() but it's local to partprune.c. This
commit refactors that code into its own function called
add_append_subpath_partrelids() defined in appendinfo.c and
generalizes it to consider child join and aggregate paths. To
facilitate looking up of parent rels of child grouping rels in
add_append_subpath_partrelids(), parent links are now also set in
the RelOptInfos of child grouping rels too, like they are in
those of child base and join rels.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/optimizer/plan/createplan.c | 41 ++++++--
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/plan/setrefs.c | 4 +
src/backend/optimizer/util/appendinfo.c | 134 ++++++++++++++++++++++++
src/backend/partitioning/partprune.c | 124 +++-------------------
src/include/nodes/plannodes.h | 14 +++
src/include/optimizer/appendinfo.h | 3 +
src/include/partitioning/partprune.h | 3 +-
8 files changed, 203 insertions(+), 123 deletions(-)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index af48109058..8ac1d3909b 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -25,6 +25,7 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
@@ -1210,6 +1211,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
Oid *nodeCollations = NULL;
bool *nodeNullsFirst = NULL;
bool consider_async = false;
+ List *allpartrelids = NIL;
/*
* The subpaths list could be empty, if every child was proven empty by
@@ -1351,15 +1353,23 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
++nasyncplans;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ plan->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1380,7 +1390,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
partpruneinfo =
make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
plan->appendplans = subplans;
@@ -1426,6 +1437,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ List *allpartrelids = NIL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1515,15 +1527,23 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplan = (Plan *) sort;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ node->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1535,7 +1555,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
partpruneinfo = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
node->mergeplans = subplans;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 44efb1f4eb..f97bc09113 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7855,8 +7855,11 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
agg_costs, gd, &child_extra,
&child_partially_grouped_rel);
+ /* Mark as child of grouped_rel. */
+ child_grouped_rel->parent = grouped_rel;
if (child_partially_grouped_rel)
{
+ child_partially_grouped_rel->parent = grouped_rel;
partially_grouped_live_children =
lappend(partially_grouped_live_children,
child_partially_grouped_rel);
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 97fa561e4e..854dd7c8af 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1766,6 +1766,8 @@ set_append_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) aplan, rtoffset);
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
+ foreach(l, aplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (aplan->part_prune_info)
{
@@ -1842,6 +1844,8 @@ set_mergeappend_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) mplan, rtoffset);
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
+ foreach(l, mplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (mplan->part_prune_info)
{
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index f456b3b0a4..5bd8e82b9b 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -41,6 +41,7 @@ static void make_inh_translation_list(Relation oldrelation,
AppendRelInfo *appinfo);
static Node *adjust_appendrel_attrs_mutator(Node *node,
adjust_appendrel_attrs_context *context);
+static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
/*
@@ -1035,3 +1036,136 @@ distribute_row_identity_vars(PlannerInfo *root)
}
}
}
+
+/*
+ * add_append_subpath_partrelids
+ * Look up a child subpath's rel's partitioned parent relids up to
+ * parentrel and add the bitmapset containing those into
+ * 'allpartrelids'
+ */
+List *
+add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids)
+{
+ RelOptInfo *prel = subpath->parent;
+ Relids partrelids = NULL;
+
+ /* Nothing to do if there's no parent to begin with. */
+ if (!IS_OTHER_REL(prel))
+ return allpartrelids;
+
+ /*
+ * Traverse up to the pathrel's topmost partitioned parent, collecting
+ * parent relids as we go; but stop if we reach parentrel. (Normally, a
+ * pathrel's topmost partitioned parent is either parentrel or a UNION ALL
+ * appendrel child of parentrel. But when handling partitionwise joins of
+ * multi-level partitioning trees, we can see an append path whose
+ * parentrel is an intermediate partitioned table.)
+ */
+ do
+ {
+ Relids parent_relids = NULL;
+
+ /*
+ * For simple child rels, we can simply set the parent_relids to
+ * prel->parent->relids. But for partitionwise join and aggregate
+ * child rels, while we can use prel->parent to move up the tree,
+ * parent_relids must be found the hard way through AppendInfoInfos,
+ * because 1) a joinrel's relids may point to RTE_JOIN entries,
+ * 2) topmost parent grouping rel's relids field is NULL.
+ */
+ if (IS_SIMPLE_REL(prel))
+ {
+ prel = prel->parent;
+ /* Stop once we reach the root partitioned rel. */
+ if (!IS_PARTITIONED_REL(prel))
+ break;
+ parent_relids = bms_add_members(parent_relids, prel->relids);
+ }
+ else
+ {
+ AppendRelInfo **appinfos;
+ int nappinfos,
+ i;
+
+ appinfos = find_appinfos_by_relids(root, prel->relids,
+ &nappinfos);
+ for (i = 0; i < nappinfos; i++)
+ {
+ AppendRelInfo *appinfo = appinfos[i];
+
+ parent_relids = bms_add_member(parent_relids,
+ appinfo->parent_relid);
+ }
+ pfree(appinfos);
+ prel = prel->parent;
+ }
+ /* accept this level as an interesting parent */
+ partrelids = bms_add_members(partrelids, parent_relids);
+ if (prel == parentrel)
+ break; /* don't traverse above parentrel */
+ } while (IS_OTHER_REL(prel));
+
+ if (partrelids == NULL)
+ return allpartrelids;
+
+ return add_part_relids(allpartrelids, partrelids);
+}
+
+/*
+ * add_part_relids
+ * Add new info to a list of Bitmapsets of partitioned relids.
+ *
+ * Within 'allpartrelids', there is one Bitmapset for each topmost parent
+ * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
+ * parent as well as its relevant non-leaf child partitions. Since (by
+ * construction of the rangetable list) parent partitions must have lower
+ * RT indexes than their children, we can distinguish the topmost parent
+ * as being the lowest set bit in the Bitmapset.
+ *
+ * 'partrelids' contains the RT indexes of a parent partitioned rel, and
+ * possibly some non-leaf children, that are newly identified as parents of
+ * some subpath rel passed to make_partition_pruneinfo(). These are added
+ * to an appropriate member of 'allpartrelids'.
+ *
+ * Note that the list contains only RT indexes of partitioned tables that
+ * are parents of some scan-level relation appearing in the 'subpaths' that
+ * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
+ * not allowed to be higher than the 'parentrel' associated with the append
+ * path. In this way, we avoid expending cycles on partitioned rels that
+ * can't contribute useful pruning information for the problem at hand.
+ * (It is possible for 'parentrel' to be a child partitioned table, and it
+ * is also possible for scan-level relations to be child partitioned tables
+ * rather than leaf partitions. Hence we must construct this relation set
+ * with reference to the particular append path we're dealing with, rather
+ * than looking at the full partitioning structure represented in the
+ * RelOptInfos.)
+ */
+static List *
+add_part_relids(List *allpartrelids, Bitmapset *partrelids)
+{
+ Index targetpart;
+ ListCell *lc;
+
+ /* We can easily get the lowest set bit this way: */
+ targetpart = bms_next_member(partrelids, -1);
+ Assert(targetpart > 0);
+
+ /* Look for a matching topmost parent */
+ foreach(lc, allpartrelids)
+ {
+ Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
+ Index currtarget = bms_next_member(currpartrelids, -1);
+
+ if (targetpart == currtarget)
+ {
+ /* Found a match, so add any new RT indexes to this hierarchy */
+ currpartrelids = bms_add_members(currpartrelids, partrelids);
+ lfirst(lc) = currpartrelids;
+ return allpartrelids;
+ }
+ }
+ /* No match, so add the new partition hierarchy to the list */
+ return lappend(allpartrelids, partrelids);
+}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 7179b22a05..213512a5f4 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -138,7 +138,6 @@ typedef struct PruneStepResult
} PruneStepResult;
-static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
RelOptInfo *parentrel,
List *prunequal,
@@ -218,33 +217,32 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'allpartrelids' contains Bitmapsets of RT indexes of partitioned parents
+ * whose partitions' Paths are in 'subpaths'; there's one Bitmapset for every
+ * partition tree involved.
*/
PartitionPruneInfo *
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ List *allpartrelids)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
- List *allpartrelids;
List *prunerelinfos;
int *relid_subplan_map;
ListCell *lc;
int i;
+ Assert(list_length(allpartrelids) > 0);
+
/*
- * Scan the subpaths to see which ones are scans of partition child
- * relations, and identify their parent partitioned rels. (Note: we must
- * restrict the parent partitioned rels to be parentrel or children of
- * parentrel, otherwise we couldn't translate prunequal to match.)
- *
- * Also construct a temporary array to map from partition-child-relation
- * relid to the index in 'subpaths' of the scan plan for that partition.
+ * Construct a temporary array to map from partition-child-relation relid
+ * to the index in 'subpaths' of the scan plan for that partition.
* (Use of "subplan" rather than "subpath" is a bit of a misnomer, but
* we'll let it stand.) For convenience, we use 1-based indexes here, so
* that zero can represent an un-filled array entry.
*/
- allpartrelids = NIL;
relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
i = 1;
@@ -253,50 +251,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Path *path = (Path *) lfirst(lc);
RelOptInfo *pathrel = path->parent;
- /* We don't consider partitioned joins here */
- if (pathrel->reloptkind == RELOPT_OTHER_MEMBER_REL)
- {
- RelOptInfo *prel = pathrel;
- Bitmapset *partrelids = NULL;
-
- /*
- * Traverse up to the pathrel's topmost partitioned parent,
- * collecting parent relids as we go; but stop if we reach
- * parentrel. (Normally, a pathrel's topmost partitioned parent
- * is either parentrel or a UNION ALL appendrel child of
- * parentrel. But when handling partitionwise joins of
- * multi-level partitioning trees, we can see an append path whose
- * parentrel is an intermediate partitioned table.)
- */
- do
- {
- AppendRelInfo *appinfo;
-
- Assert(prel->relid < root->simple_rel_array_size);
- appinfo = root->append_rel_array[prel->relid];
- prel = find_base_rel(root, appinfo->parent_relid);
- if (!IS_PARTITIONED_REL(prel))
- break; /* reached a non-partitioned parent */
- /* accept this level as an interesting parent */
- partrelids = bms_add_member(partrelids, prel->relid);
- if (prel == parentrel)
- break; /* don't traverse above parentrel */
- } while (prel->reloptkind == RELOPT_OTHER_MEMBER_REL);
-
- if (partrelids)
- {
- /*
- * Found some relevant parent partitions, which may or may not
- * overlap with partition trees we already found. Add new
- * information to the allpartrelids list.
- */
- allpartrelids = add_part_relids(allpartrelids, partrelids);
- /* Also record the subplan in relid_subplan_map[] */
- /* No duplicates please */
- Assert(relid_subplan_map[pathrel->relid] == 0);
- relid_subplan_map[pathrel->relid] = i;
- }
- }
+ /* No duplicates please */
+ Assert(relid_subplan_map[pathrel->relid] == 0);
+ relid_subplan_map[pathrel->relid] = i;
i++;
}
@@ -362,63 +319,6 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return pruneinfo;
}
-/*
- * add_part_relids
- * Add new info to a list of Bitmapsets of partitioned relids.
- *
- * Within 'allpartrelids', there is one Bitmapset for each topmost parent
- * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
- * parent as well as its relevant non-leaf child partitions. Since (by
- * construction of the rangetable list) parent partitions must have lower
- * RT indexes than their children, we can distinguish the topmost parent
- * as being the lowest set bit in the Bitmapset.
- *
- * 'partrelids' contains the RT indexes of a parent partitioned rel, and
- * possibly some non-leaf children, that are newly identified as parents of
- * some subpath rel passed to make_partition_pruneinfo(). These are added
- * to an appropriate member of 'allpartrelids'.
- *
- * Note that the list contains only RT indexes of partitioned tables that
- * are parents of some scan-level relation appearing in the 'subpaths' that
- * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
- * not allowed to be higher than the 'parentrel' associated with the append
- * path. In this way, we avoid expending cycles on partitioned rels that
- * can't contribute useful pruning information for the problem at hand.
- * (It is possible for 'parentrel' to be a child partitioned table, and it
- * is also possible for scan-level relations to be child partitioned tables
- * rather than leaf partitions. Hence we must construct this relation set
- * with reference to the particular append path we're dealing with, rather
- * than looking at the full partitioning structure represented in the
- * RelOptInfos.)
- */
-static List *
-add_part_relids(List *allpartrelids, Bitmapset *partrelids)
-{
- Index targetpart;
- ListCell *lc;
-
- /* We can easily get the lowest set bit this way: */
- targetpart = bms_next_member(partrelids, -1);
- Assert(targetpart > 0);
-
- /* Look for a matching topmost parent */
- foreach(lc, allpartrelids)
- {
- Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
- Index currtarget = bms_next_member(currpartrelids, -1);
-
- if (targetpart == currtarget)
- {
- /* Found a match, so add any new RT indexes to this hierarchy */
- currpartrelids = bms_add_members(currpartrelids, partrelids);
- lfirst(lc) = currpartrelids;
- return allpartrelids;
- }
- }
- /* No match, so add the new partition hierarchy to the list */
- return lappend(allpartrelids, partrelids);
-}
-
/*
* make_partitionedrel_pruneinfo
* Build a List of PartitionedRelPruneInfos, one for each interesting
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 1b787fe031..7a5f3ba625 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -267,6 +267,13 @@ typedef struct Append
List *appendplans;
int nasyncplans; /* # of asynchronous plans */
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * appendplans. The list contains a bitmapset for every partition tree
+ * covered by this Append.
+ */
+ List *allpartrelids;
+
/*
* All 'appendplans' preceding this index are non-partial plans. All
* 'appendplans' from this index onwards are partial plans.
@@ -291,6 +298,13 @@ typedef struct MergeAppend
List *mergeplans;
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * mergeplans. The list contains a bitmapset for every partition tree
+ * covered by this MergeAppend.
+ */
+ List *allpartrelids;
+
/* these fields are just like the sort-key info in struct Sort: */
/* number of sort-key columns */
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index a05f91f77d..1621a7319a 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -46,5 +46,8 @@ extern void add_row_identity_columns(PlannerInfo *root, Index rtindex,
RangeTblEntry *target_rte,
Relation target_relation);
extern void distribute_row_identity_vars(PlannerInfo *root);
+extern List *add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids);
#endif /* APPENDINFO_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 8636e04e37..caa774a111 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -73,7 +73,8 @@ typedef struct PartitionPruneContext
extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ List *allpartrelids);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v45-0005-Delay-locking-of-child-tables-in-cached-plans-un.patchapplication/octet-stream; name=v45-0005-Delay-locking-of-child-tables-in-cached-plans-un.patchDownload
From 0ed0a6928f0c9a4c713850b6f0d72ab5a00c8425 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:45 +0900
Subject: [PATCH v45 5/6] Delay locking of child tables in cached plans until
ExecutorStart()
Currently, GetCachedPlan() takes a lock on all relations contained in
a cached plan before returning it as a valid plan to its callers for
execution. One disadvantage is that if the plan contains partitions
that are prunable with conditions involving EXTERN parameters and
other stable expressions (known as "initial pruning"), many of them
would be locked unnecessarily, because only those that survive
initial pruning need to have been locked. Locking all partitions this
way causes significant delay when there are many partitions. Note
that initial pruning occurs during executor's initialization of the
plan, that is, ExecInitNode().
This commit rearranges things to move the locking of child tables
referenced in a cached plan to occur during ExecInitNode() so that
initial pruning in the ExecInitNode() subroutines of the plan nodes
that support pruning can eliminate any child tables that need not be
scanned and thus locked.
To determine that a given table is a child table,
ExecGetRangeTableRelation() now looks at the RTE's inFromCl field,
which is only true for tables that are directly mentioned in the
query but false for child tables. Note that any tables whose RTEs'
inFromCl is true would already have been locked by GetCachedPlan(),
so need not be locked again during execution.
If the locking of child tables causes the CachedPlan to go stale, that
is, its is_valid set to false by PlanCacheRelCallback() when an
invalidation message matching some child table contained in the plan
is processed, ExecInitNode() abandons the initialization of the
remaining nodes in the plan tree. In that case, InitPlan() returns
after setting QueryDesc.planstate to NULL to indicate to the caller
that no execution is possible with the plan tree as is. Also,
ExecutorStart() now returns true or false to indicate whether or not
QueryDesc.planstate points to a successfully initialized PlanState
tree. Call sites that use GetCachedPlan() to get the plan trees to
pass to the executor should now be prepared to retry in the cases
where ExecutorStart() returns false.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
contrib/auto_explain/auto_explain.c | 12 +-
.../pg_stat_statements/pg_stat_statements.c | 12 +-
src/backend/commands/copyto.c | 7 +-
src/backend/commands/createas.c | 10 +-
src/backend/commands/explain.c | 33 +++-
src/backend/commands/extension.c | 4 +-
src/backend/commands/matview.c | 10 +-
src/backend/commands/portalcmds.c | 5 +-
src/backend/commands/prepare.c | 23 ++-
src/backend/executor/README | 39 ++++-
src/backend/executor/execMain.c | 64 +++++--
src/backend/executor/execParallel.c | 14 +-
src/backend/executor/execPartition.c | 10 ++
src/backend/executor/execUtils.c | 61 +++++--
src/backend/executor/functions.c | 5 +-
src/backend/executor/nodeAppend.c | 19 +++
src/backend/executor/nodeMergeAppend.c | 19 +++
src/backend/executor/spi.c | 26 ++-
src/backend/storage/lmgr/lmgr.c | 45 +++++
src/backend/tcop/postgres.c | 18 +-
src/backend/tcop/pquery.c | 49 +++++-
src/backend/utils/cache/lsyscache.c | 21 +++
src/backend/utils/cache/plancache.c | 156 +++++++-----------
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 4 +
src/include/executor/executor.h | 7 +-
src/include/storage/lmgr.h | 1 +
src/include/tcop/pquery.h | 2 +-
src/include/utils/lsyscache.h | 1 +
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 67 +++++++-
.../expected/cached-plan-replan.out | 156 ++++++++++++++++++
.../specs/cached-plan-replan.spec | 61 +++++++
33 files changed, 786 insertions(+), 181 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-replan.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-replan.spec
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index c3ac27ae99..a0630d7944 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -78,7 +78,7 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
-static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void explain_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count, bool execute_once);
@@ -258,9 +258,11 @@ _PG_init(void)
/*
* ExecutorStart hook: start up logging if needed
*/
-static void
+static bool
explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
/*
* At the beginning of each top-level statement, decide whether we'll
* sample this statement. If nested-statement explaining is enabled,
@@ -296,9 +298,9 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
}
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
if (auto_explain_enabled())
{
@@ -316,6 +318,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return plan_valid;
}
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 55b957d251..1160a7326a 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -325,7 +325,7 @@ static PlannedStmt *pgss_planner(Query *parse,
const char *query_string,
int cursorOptions,
ParamListInfo boundParams);
-static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void pgss_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count, bool execute_once);
@@ -963,13 +963,15 @@ pgss_planner(Query *parse,
/*
* ExecutorStart hook: start up tracking if needed
*/
-static void
+static bool
pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
/*
* If query has queryId zero, don't track it. This prevents double
@@ -992,6 +994,8 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return plan_valid;
}
/*
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9e4b2437a5..916d6dced3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,8 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL,
+ pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -567,8 +568,10 @@ BeginCopyTo(ParseState *pstate,
* Call ExecutorStart to prepare the plan for execution.
*
* ExecutorStart computes a result tupdesc for us
+ *
+ * OK to ignore the return value; plan can't become invalid.
*/
- ExecutorStart(cstate->queryDesc, 0);
+ (void) ExecutorStart(cstate->queryDesc, 0);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index e91920ca14..e5cce4c07c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,12 +325,16 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * call ExecutorStart to prepare the plan for execution
+ *
+ * OK to ignore the return value; plan can't become invalid.
+ */
+ (void) ExecutorStart(queryDesc, GetIntoRelEFlags(into));
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 59d57f9c10..6171a20fe2 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -416,7 +416,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
- queryDesc = ExplainQueryDesc(plan, queryString, into, es,
+ queryDesc = ExplainQueryDesc(plan, NULL, queryString, into, es,
params, queryEnv);
Assert(queryDesc);
@@ -429,9 +429,11 @@ ExplainOneQuery(Query *query, int cursorOptions,
/*
* ExplainQueryDesc
* Set up QueryDesc for EXPLAINing a given plan
+ *
+ * This returns NULL if cplan is found to be no longer valid.
*/
QueryDesc *
-ExplainQueryDesc(PlannedStmt *stmt,
+ExplainQueryDesc(PlannedStmt *stmt, CachedPlan *cplan,
const char *queryString, IntoClause *into, ExplainState *es,
ParamListInfo params, QueryEnvironment *queryEnv)
{
@@ -467,7 +469,7 @@ ExplainQueryDesc(PlannedStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(stmt, queryString,
+ queryDesc = CreateQueryDesc(stmt, cplan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
@@ -481,8 +483,18 @@ ExplainQueryDesc(PlannedStmt *stmt,
if (into)
eflags |= GetIntoRelEFlags(into);
- /* Call ExecutorStart to prepare the plan for execution. */
- ExecutorStart(queryDesc, eflags);
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated during plan intialization.
+ */
+ if (!ExecutorStart(queryDesc, eflags))
+ {
+ /* Clean up. */
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ return NULL;
+ }
return queryDesc;
}
@@ -4884,6 +4896,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 535072d181..93a683e312 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -797,11 +797,13 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
- ExecutorStart(qdesc, 0);
+ /* OK to ignore the return value; plan can't become invalid. */
+ (void) ExecutorStart(qdesc, 0);
ExecutorRun(qdesc, ForwardScanDirection, 0, true);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ac2e74fa3f..38795ce7ca 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,12 +408,16 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, 0);
+ /*
+ * call ExecutorStart to prepare the plan for execution
+ *
+ * OK to ignore the return value; plan can't become invalid.
+ */
+ (void) ExecutorStart(queryDesc, 0);
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 73ed7aa2f0..5120f93414 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -142,9 +142,10 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
/*
* Start execution, inserting parameters if any.
+ *
+ * OK to ignore the return value; plan can't become invalid here.
*/
- PortalStart(portal, params, 0, GetActiveSnapshot());
-
+ (void) PortalStart(portal, params, 0, GetActiveSnapshot());
Assert(portal->strategy == PORTAL_ONE_SELECT);
/*
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 1e9a98ad6e..156c3c5fee 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -183,6 +183,7 @@ ExecuteQuery(ParseState *pstate,
paramLI = EvaluateParams(pstate, entry, stmt->params, estate);
}
+replan:
/* Create a new portal to run the query in */
portal = CreateNewPortal();
/* Don't display the portal in pg_cursors, it is for internal use only */
@@ -251,9 +252,15 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal contains a cached plan, it
+ * must be recreated if the cached plan was found to have been invalidated
+ * when initializing one of the plan trees contained in it.
*/
- PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ if (!PortalStart(portal, paramLI, eflags, GetActiveSnapshot()))
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
(void) PortalRun(portal, count, false, true, dest, dest, qc);
@@ -574,7 +581,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -618,6 +625,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI,
CurrentResourceOwner, queryEnv);
@@ -642,9 +650,14 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
QueryDesc *queryDesc;
- queryDesc = ExplainQueryDesc(pstmt, queryString,
+ queryDesc = ExplainQueryDesc(pstmt, cplan, queryString,
into, es, paramLI, queryEnv);
- Assert(queryDesc != NULL);
+ if (queryDesc == NULL)
+ {
+ ExplainResetOutput(es);
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ goto replan;
+ }
ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
queryEnv, &planduration,
(es->buffers ? &bufusage : NULL));
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..0a7bb42ccb 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,37 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Normally, the executor does not lock non-index relations appearing in a given
+plan tree when initializing it for execution if the plan tree is freshly
+created, that is, not derived from a CachedPlan. The reason for that is that
+the locks must already have been taken during parsing, rewriting, and planning
+of the query in that case. If the plan tree is a cached one, there may still
+be unlocked relations present in the plan tree, because GetCachedPlan() only
+locks the relations that would be present in the query's range table before
+planning occurs, but not relations that would have been added to the range
+table during planning. This means that inheritance child tables present in
+a cached plan, which are added to the query's range table during planning,
+would not have been locked when the plan enters the executor.
+
+GetCachedPlan() punts on locking child tables because not all may actually be
+scanned during a given execution of the plan if the child tables are partitions
+which may get pruned away due to execution-initialization-time pruning. So the
+locking of child tables is made to wait till execution-initialization-time,
+which occurs during ExecInitNode() on the plan nodes containing the child
+tables.
+
+So, there's a time window during which a cached plan tree could go stale
+if it contains child tables, because they could get changed in other backends
+before ExecInitNode() gets a lock on them. This means the executor now must
+check the validity of the plan tree every time it takes a lock on a child
+table contained in the tree after execution-initialization-pruning has been
+performed. It does that by looking at CachedPlan.is_valid of the CachedPlan
+passed to it. If the plan tree is indeed stale (is_valid=false), the executor
+must give up continuing to initialize it any further and return to the caller
+letting it know that the execution must be retried with a new plan tree.
Query Processing Control Flow
-----------------------------
@@ -316,7 +347,13 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale during one of the recursive calls of ExecInitNode() after taking a
+lock on a child table, the control is immmediately returned to the caller of
+ExecutorStart(), which must redo the steps from CreateQueryDesc with a new
+plan tree.
+
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 1a848b1c20..1197ff2bf2 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -79,7 +79,7 @@ ExecutorEnd_hook_type ExecutorEnd_hook = NULL;
ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
/* decls for local routines only used within this module */
-static void InitPlan(QueryDesc *queryDesc, int eflags);
+static bool InitPlan(QueryDesc *queryDesc, int eflags);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
static void ExecEndPlan(PlanState *planstate, EState *estate);
@@ -119,6 +119,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
*
* eflags contains flag bits as described in executor.h.
*
+ * Plan initialization may fail if the input plan tree is found to have been
+ * invalidated, which can happen if it comes from a CachedPlan.
+ *
+ * Returns true if plan was successfully initialized and false otherwise. If
+ * the latter, the caller must call ExecutorEnd() on 'queryDesc' to clean up
+ * after failed plan initialization.
+ *
* NB: the CurrentMemoryContext when this is called will become the parent
* of the per-query context used for this Executor invocation.
*
@@ -128,7 +135,7 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
*
* ----------------------------------------------------------------
*/
-void
+bool
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
/*
@@ -140,14 +147,15 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
if (ExecutorStart_hook)
- (*ExecutorStart_hook) (queryDesc, eflags);
- else
- standard_ExecutorStart(queryDesc, eflags);
+ return (*ExecutorStart_hook) (queryDesc, eflags);
+
+ return standard_ExecutorStart(queryDesc, eflags);
}
-void
+bool
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
EState *estate;
MemoryContext oldcontext;
@@ -263,9 +271,11 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Initialize the plan state tree
*/
- InitPlan(queryDesc, eflags);
+ plan_valid = InitPlan(queryDesc, eflags);
MemoryContextSwitchTo(oldcontext);
+
+ return plan_valid;
}
/* ----------------------------------------------------------------
@@ -620,6 +630,17 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
RTEPermissionInfo *perminfo = lfirst_node(RTEPermissionInfo, l);
Assert(OidIsValid(perminfo->relid));
+
+ /*
+ * Relations whose permissions need to be checked must already have
+ * been locked by the parser or by GetCachedPlan() if a cached plan is
+ * being executed.
+ *
+ * XXX shouldn't we skip calling ExecCheckPermissions from InitPlan
+ * in a parallel worker?
+ */
+ Assert(CheckRelLockedByMe(perminfo->relid, AccessShareLock, true) ||
+ IsParallelWorker());
result = ExecCheckOneRelPerms(perminfo);
if (!result)
{
@@ -829,9 +850,12 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ * Returns true if the plan tree is successfully initialized for execution,
+ * false otherwise.
* ----------------------------------------------------------------
*/
-static void
+static bool
InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
@@ -850,12 +874,12 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
/*
- * initialize the node's execution state
+ * Set up range table in EState.
*/
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
- estate->es_cachedplan = NULL;
+ estate->es_cachedplan = queryDesc->cplan;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
@@ -1016,7 +1040,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
queryDesc->tupDesc = tupType;
Assert(planstate != NULL);
queryDesc->planstate = planstate;
- return;
+ return true;
plan_init_suspended:
/*
@@ -1024,6 +1048,7 @@ plan_init_suspended:
* will clean up initialized plan nodes from estate->es_planstate_nodes.
*/
queryDesc->planstate = planstate;
+ return false;
}
/*
@@ -1441,7 +1466,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked by the planner or ExecLockAppendNonLeafRelations().
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
@@ -2873,7 +2898,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2960,6 +2986,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
PlanState *subplanstate;
subplanstate = ExecInitNode(subplan, rcestate, 0);
+
+ /*
+ * At this point, we had better not received any new invalidation
+ * messages that would have caused the plan tree to go stale.
+ */
+ Assert(ExecPlanStillValid(rcestate));
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
}
@@ -3003,6 +3035,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
*/
epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
+ /*
+ * At this point, we had better not received any new invalidation messages
+ * that would have caused the plan tree to go stale.
+ */
+ Assert(ExecPlanStillValid(rcestate));
+
MemoryContextSwitchTo(oldcontext);
}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index cc2b8ccab7..bfa2a8ec18 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1248,8 +1248,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. Note that no CachedPlan is available
+ * here even if the leader may have gotten the plan tree from one. That's
+ * fine though, because the leader would have taken the locks necessary
+ * for the plan tree that we have here to be fully valid. That is true
+ * despite the fact that we will be taking our own copies of those locks
+ * in ExecGetRangeTableRelation(), because none of them would be the locks
+ * that are not already taken by the leader.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
@@ -1430,7 +1439,8 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
- ExecutorStart(queryDesc, fpes->eflags);
+ /* OK to ignore the return value; plan can't become invalid. */
+ (void) ExecutorStart(queryDesc, fpes->eflags);
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index e88455368c..cf73d28baa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -513,6 +513,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
oldcxt = MemoryContextSwitchTo(proute->memcxt);
+ /*
+ * Note that while we normally check ExecPlanStillValid(estate) after each
+ * lock taken during execution initialization, it is fine not do so for
+ * partitions opened here, for tuple routing. Locks taken here can't
+ * possibly invalidate the plan given that the plan doesn't contain any
+ * info about those partitions.
+ */
partrel = table_open(partOid, RowExclusiveLock);
leaf_part_rri = makeNode(ResultRelInfo);
@@ -1111,6 +1118,9 @@ ExecInitPartitionDispatchInfo(EState *estate,
* Only sub-partitioned tables need to be locked here. The root
* partitioned table will already have been locked as it's referenced in
* the query's rtable.
+ *
+ * See the comment in ExecInitPartitionInfo() about taking locks and
+ * not checking ExecPlanStillValid(estate) here.
*/
if (partoid != RelationGetRelid(proute->partition_root))
rel = table_open(partoid, RowExclusiveLock);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index f4611bdd27..af92d2b3c3 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -804,7 +804,25 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (IsParallelWorker() ||
+ (estate->es_cachedplan != NULL && !rte->inFromCl))
+ {
+ /*
+ * Take a lock if we are a parallel worker or if this is a child
+ * table referenced in a cached plan.
+ *
+ * Parallel workers need to have their own local lock on the
+ * relation. This ensures sane behavior in case the parent process
+ * exits before we do.
+ *
+ * When executing a cached plan, child tables must be locked
+ * here, because plancache.c (GetCachedPlan()) would only have
+ * locked tables mentioned in the query, that is, tables whose
+ * RTEs' inFromCl is true.
+ */
+ rel = table_open(rte->relid, rte->rellockmode);
+ }
+ else
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -817,15 +835,6 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rellockmode == AccessShareLock ||
CheckRelationLockedByMe(rel, rte->rellockmode, false));
}
- else
- {
- /*
- * If we are a parallel worker, we need to obtain our own local
- * lock on the relation. This ensures sane behavior in case the
- * parent process exits before we do.
- */
- rel = table_open(rte->relid, rte->rellockmode);
- }
estate->es_relations[rti - 1] = rel;
}
@@ -833,6 +842,38 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
return rel;
}
+/*
+ * ExecLockAppendNonLeafRelations
+ * Lock non-leaf relations whose children are scanned by a given
+ * Append/MergeAppend node
+ */
+void
+ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids)
+{
+ ListCell *l;
+
+ /* This should get called only when executing cached plans. */
+ Assert(estate->es_cachedplan != NULL);
+ foreach(l, allpartrelids)
+ {
+ Bitmapset *partrelids = lfirst_node(Bitmapset, l);
+ int i;
+
+ /*
+ * Note that we don't lock the first member (i=0) of each bitmapset
+ * because it stands for the root parent mentioned in the query that
+ * should always have been locked before entering the executor.
+ */
+ i = 0;
+ while ((i = bms_next_member(partrelids, i)) > 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(i, estate);
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+}
+
/*
* ExecInitResultRelation
* Open relation given by the passed-in RT index and fill its
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f55424eb5a..4ddf4fd7a9 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -838,6 +838,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL, /* fmgr_sql() doesn't use CachedPlans */
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -862,7 +863,9 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
eflags = EXEC_FLAG_SKIP_TRIGGERS;
else
eflags = 0; /* default run-to-completion flags */
- ExecutorStart(es->qd, eflags);
+
+ /* OK to ignore the return value; plan can't become invalid. */
+ (void) ExecutorStart(es->qd, eflags);
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 5d9fa4bff3..ce60ca2126 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -133,6 +133,25 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_syncdone = false;
appendstate->as_begun = false;
+ /*
+ * Lock non-leaf partitions whose leaf children are present in
+ * node->appendplans. Only need to do so if executing a cached
+ * plan, because child tables present in cached plans are not
+ * locked before execution.
+ *
+ * XXX - some of the non-leaf partitions may also be mentioned in
+ * part_prune_info, which if they are would get locked again in
+ * ExecInitPartitionPruning() because it calls
+ * ExecGetRangeTableRelation() which locks child tables.
+ */
+ if (estate->es_cachedplan)
+ {
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
+ }
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 255b05aad3..657fa91ec4 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -81,6 +81,25 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.state = estate;
mergestate->ps.ExecProcNode = ExecMergeAppend;
+ /*
+ * Lock non-leaf partitions whose leaf children are present in
+ * node->mergeplans. Only need to do so if executing a cached
+ * plan, because child tables present in cached plans are not
+ * locked before execution.
+ *
+ * XXX - some of the non-leaf partitions may also be mentioned in
+ * part_prune_info, which if they are would get locked again in
+ * ExecInitPartitionPruning() because it calls
+ * ExecGetRangeTableRelation() which locks child tables.
+ */
+ if (estate->es_cachedplan)
+ {
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
+ }
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index d36ca35d3a..9c4ed74240 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1582,6 +1582,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
Snapshot snapshot;
MemoryContext oldcontext;
Portal portal;
+ bool plan_valid;
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -1623,6 +1624,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
_SPI_current->processed = 0;
_SPI_current->tuptable = NULL;
+replan:
/* Create the portal */
if (name == NULL || name[0] == '\0')
{
@@ -1766,15 +1768,23 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if the cached plan was found to have been invalidated when
+ * initializing one of the plan trees contained in it.
*/
- PortalStart(portal, paramLI, 0, snapshot);
+ plan_valid = PortalStart(portal, paramLI, 0, snapshot);
Assert(portal->strategy != PORTAL_MULTI_QUERY);
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
+ if (!plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/* Pop the SPI stack */
_SPI_end_call(true);
@@ -2552,6 +2562,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2669,6 +2680,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
@@ -2682,10 +2694,16 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
eflags = EXEC_FLAG_SKIP_TRIGGERS;
- ExecutorStart(qdesc, eflags);
+ if (!ExecutorStart(qdesc, eflags))
+ {
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ Assert(cplan);
+ ReleaseCachedPlan(cplan, plan_owner);
+ goto replan;
+ }
res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
-
FreeQueryDesc(qdesc);
}
else
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a672..c807e9cdcc 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -27,6 +27,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
@@ -364,6 +365,50 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
return false;
}
+/*
+ * CheckRelLockedByMe
+ *
+ * Returns true if current transaction holds a lock on the given relation of
+ * mode 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
+ */
+bool
+CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+ Oid dbId = get_rel_relisshared(relid) ? InvalidOid : MyDatabaseId;
+ LOCKTAG tag;
+
+ SET_LOCKTAG_RELATION(tag, dbId, relid);
+
+ if (LockHeldByMe(&tag, lockmode))
+ return true;
+
+ if (orstronger)
+ {
+ LOCKMODE slockmode;
+
+ for (slockmode = lockmode + 1;
+ slockmode <= MaxLockMode;
+ slockmode++)
+ {
+ if (LockHeldByMe(&tag, slockmode))
+ {
+#ifdef NOT_USED
+ /* Sometimes this might be useful for debugging purposes */
+ elog(WARNING, "lock mode %s substituted for %s on relation %s",
+ GetLockmodeName(tag.locktag_lockmethodid, slockmode),
+ GetLockmodeName(tag.locktag_lockmethodid, lockmode),
+ RelationGetRelationName(relation));
+#endif
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
/*
* LockHasWaitersRelation
*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 36cc99ec9c..88724a8d67 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1232,7 +1232,12 @@ exec_simple_query(const char *query_string)
/*
* Start the portal. No parameters here.
*/
- PortalStart(portal, NULL, 0, InvalidSnapshot);
+ {
+ bool plan_valid PG_USED_FOR_ASSERTS_ONLY;
+
+ plan_valid = PortalStart(portal, NULL, 0, InvalidSnapshot);
+ Assert(plan_valid);
+ }
/*
* Select the appropriate output format: text unless we are doing a
@@ -1737,6 +1742,7 @@ exec_bind_message(StringInfo input_message)
"commands ignored until end of transaction block"),
errdetail_abort()));
+replan:
/*
* Create the portal. Allow silent replacement of an existing portal only
* if the unnamed portal is specified.
@@ -2028,9 +2034,15 @@ exec_bind_message(StringInfo input_message)
PopActiveSnapshot();
/*
- * And we're ready to start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if the cached plan was found to have been invalidated when
+ * initializing one of the plan trees contained in it.
*/
- PortalStart(portal, params, 0, InvalidSnapshot);
+ if (!PortalStart(portal, params, 0, InvalidSnapshot))
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
/*
* Apply the result format requests to the portal.
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 701808f303..48cd6f4304 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -60,6 +60,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -72,6 +73,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan, if plan is from one */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -341,10 +343,12 @@ FetchStatementTargetList(Node *stmt)
* presently ignored for non-PORTAL_ONE_SELECT portals (it's only intended
* to be used for cursors).
*
- * On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * True is returned if portal is ready to accept PortalRun() calls, and the
+ * result tupdesc (if any) is known. False if the plan tree is no longer
+ * valid, in which case, the caller must retry after generating a new
+ * CachedPlan.
*/
-void
+bool
PortalStart(Portal portal, ParamListInfo params,
int eflags, Snapshot snapshot)
{
@@ -353,6 +357,7 @@ PortalStart(Portal portal, ParamListInfo params,
MemoryContext oldContext;
QueryDesc *queryDesc;
int myeflags = 0;
+ bool plan_valid = true;
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_DEFINED);
@@ -407,6 +412,7 @@ PortalStart(Portal portal, ParamListInfo params,
* set the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -431,8 +437,19 @@ PortalStart(Portal portal, ParamListInfo params,
else
myeflags = eflags;
- /* Call ExecutorStart to prepare the plan for execution. */
- ExecutorStart(queryDesc, myeflags);
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated during plan intialization.
+ */
+ if (!ExecutorStart(queryDesc, myeflags))
+ {
+ Assert(queryDesc->cplan);
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ plan_valid = false;
+ goto plan_init_failed;
+ }
/*
* This tells PortalCleanup to shut down the executor, though
@@ -525,7 +542,7 @@ PortalStart(Portal portal, ParamListInfo params,
* Create the QueryDesc. DestReceiver will be set in
* PortalRunMulti() before calling ExecutorRun().
*/
- queryDesc = CreateQueryDesc(plan,
+ queryDesc = CreateQueryDesc(plan, portal->cplan,
portal->sourceText,
!is_utility ?
GetActiveSnapshot() :
@@ -541,7 +558,20 @@ PortalStart(Portal portal, ParamListInfo params,
if (is_utility)
continue;
- ExecutorStart(queryDesc, myeflags);
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated
+ * during plan intialization.
+ */
+ if (!ExecutorStart(queryDesc, myeflags))
+ {
+ PopActiveSnapshot();
+ Assert(queryDesc->cplan);
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ plan_valid = false;
+ goto plan_init_failed;
+ }
PopActiveSnapshot();
}
}
@@ -563,12 +593,15 @@ PortalStart(Portal portal, ParamListInfo params,
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+plan_init_failed:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- portal->status = PORTAL_READY;
+ return plan_valid;
}
/*
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index fc6d267e44..2725d02312 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2095,6 +2095,27 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * get_rel_relisshared
+ *
+ * Returns if the given relation is shared or not
+ */
+bool
+get_rel_relisshared(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relisshared;
+ ReleaseSysCache(tp);
+
+ return result;
+}
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index d67cd9a405..c5a7616b33 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -102,13 +102,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,8 +790,15 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * If the plan contains any child relations that would have been added by the
+ * planner, they would not have been locked yet, because AcquirePlannerLocks()
+ * only locks relations that would be present in the original query's range
+ * table (that is, before entering the planner). So, the plan could go stale
+ * before it reaches execution if any of those child relations get modified
+ * concurrently. The executor must check that the plan (CachedPlan) is still
+ * valid after taking a lock on each of the child tables during the plan
+ * initialization phase, and if it is not, ask the caller to recreate the
+ * plan.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -805,60 +812,56 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
-
- return false;
+ return cplan->is_valid;
}
/*
@@ -1128,8 +1131,16 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid unless it contains inheritance/partition child
+ * tables, because they will not have been locked as here we only lock the
+ * tables mentioned in the original query. Inheritance/partition child tables
+ * are locked by the executor when initializing the plan tree and if the plan
+ * gets invalidated as a result of taking those locks, the executor must ask
+ * the caller to get a new plan by calling here again. Locking of the child
+ * tables is deferred to the executor in this manner, because not all child
+ * tables may need to be locked as some may get pruned during the executor
+ * plan initialization which performs initial pruing on any nodes that
+ * support partition pruning.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1164,7 +1175,10 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
{
if (CheckCachedPlan(plansource))
{
- /* We want a generic plan, and we already have a valid one */
+ /*
+ * We want a generic plan, and we already have a valid one, though
+ * see the header comment.
+ */
plan = plansource->gplan;
Assert(plan->magic == CACHEDPLAN_MAGIC);
}
@@ -1362,8 +1376,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if the executor would need to take additional locks, that is, in
+ * addition to those taken by AcquirePlannerLocks() on a given query.
*/
foreach(lc, plan->stmt_list)
{
@@ -1739,58 +1753,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 08ea852b65..392abb5150 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -88,7 +88,7 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt, struct CachedPlan *cplan,
const char *queryString, IntoClause *into, ExplainState *es,
ParamListInfo params, QueryEnvironment *queryEnv);
extern void ExplainOnePlan(QueryDesc *queryDesc,
@@ -108,6 +108,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index af2bf36dfb..4b7368a0dc 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -32,9 +32,12 @@
*/
typedef struct QueryDesc
{
+ NodeTag type;
+
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ struct CachedPlan *cplan; /* CachedPlan, if plannedstmt is from one */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +60,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ struct CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 8ec636cab8..edf2f13d04 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -73,7 +73,7 @@
/* Hook for plugins to get control in ExecutorStart() */
-typedef void (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
+typedef bool (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
/* Hook for plugins to get control in ExecutorRun() */
@@ -198,8 +198,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
-extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
-extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count, bool execute_once);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
@@ -602,6 +602,7 @@ exec_rt_fetch(Index rti, EState *estate)
}
extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
+extern void ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids);
extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Index rti);
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 4ee91e3cf9..598bf2688a 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,7 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
bool orstronger);
+extern bool CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger);
extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/tcop/pquery.h b/src/include/tcop/pquery.h
index a5e65b98aa..577b81a9ee 100644
--- a/src/include/tcop/pquery.h
+++ b/src/include/tcop/pquery.h
@@ -29,7 +29,7 @@ extern List *FetchPortalTargetList(Portal portal);
extern List *FetchStatementTargetList(Node *stmt);
-extern void PortalStart(Portal portal, ParamListInfo params,
+extern bool PortalStart(Portal portal, ParamListInfo params,
int eflags, Snapshot snapshot);
extern void PortalSetResultFormat(Portal portal, int nFormats,
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index f5fdbfe116..a024e5dcd0 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -140,6 +140,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_relisshared(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..2fca84d027 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-replan
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7cd76eb34b..ce189156ad 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2023, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,45 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static bool
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ bool plan_valid;
+
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ plan_valid = prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ plan_valid ? "valid" : "not valid");
+
+ return plan_valid;
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +127,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-replan.out b/src/test/modules/delay_execution/expected/cached-plan-replan.out
new file mode 100644
index 0000000000..0ac6a17c2b
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-replan.out
@@ -0,0 +1,156 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo11 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = $1)
+(6 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-----------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo11 foo_1
+ Filter: (a = $1)
+(4 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------
+Bitmap Heap Scan on foo11 foo
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = 1)
+(4 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------
+Seq Scan on foo11 foo
+ Filter: (a = 1)
+(2 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+----------------------------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Index Only Scan using foo11_a_idx on foo11 t1
+ -> Materialize
+ -> Index Scan using foo11_a_idx on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(18 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Sort
+ Sort Key: t1.a
+ -> Seq Scan on foo11 t1
+ -> Sort
+ Sort Key: t2.a
+ -> Seq Scan on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(21 rows)
+
diff --git a/src/test/modules/delay_execution/specs/cached-plan-replan.spec b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
new file mode 100644
index 0000000000..3c92cbd5c6
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
@@ -0,0 +1,61 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo1 PARTITION OF foo FOR VALUES IN (1) PARTITION BY LIST (a);
+ CREATE TABLE foo11 PARTITION OF foo1 FOR VALUES IN (1);
+ CREATE INDEX foo11_a ON foo1 (a);
+ CREATE TABLE foo2 PARTITION OF foo FOR VALUES IN (2);
+ CREATE VIEW foov AS SELECT * FROM foo;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP TABLE foo;
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# no Append case (only one partition selected by the planner)
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Append with partition-wise join aggregate and join plans as child subplans
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo11_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
--
2.35.3
v45-0004-Set-inFromCl-to-false-in-child-table-RTEs.patchapplication/octet-stream; name=v45-0004-Set-inFromCl-to-false-in-child-table-RTEs.patchDownload
From af264e901e7636bf081a68257544d933b16d77ca Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:43 +0900
Subject: [PATCH v45 4/6] Set inFromCl to false in child table RTEs
This is to allow the executor be able to distinguish tables that are
directly mentioned in the query from those that get added to the
query during planning. A subsequent commit will teach the executor
to lock only the tables of the latter kind when executing a cached
plan.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.comk
---
src/backend/optimizer/util/inherit.c | 6 ++++++
src/backend/parser/analyze.c | 7 +++----
src/include/nodes/parsenodes.h | 9 +++++++--
3 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 94de855a22..9bac07bf40 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -492,6 +492,12 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
}
else
childrte->inh = false;
+ /*
+ * Mark child tables as not being directly mentioned in the query. This
+ * allows the executor's ExecGetRangeTableRelation() to conveniently
+ * identify it as an inheritance child table.
+ */
+ childrte->inFromCl = false;
childrte->securityQuals = NIL;
/*
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 4006632092..bcf6fcdde2 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -3267,10 +3267,9 @@ transformLockingClause(ParseState *pstate, Query *qry, LockingClause *lc,
/*
* Lock all regular tables used in query and its subqueries. We
* examine inFromCl to exclude auto-added RTEs, particularly NEW/OLD
- * in rules. This is a bit of an abuse of a mostly-obsolete flag, but
- * it's convenient. We can't rely on the namespace mechanism that has
- * largely replaced inFromCl, since for example we need to lock
- * base-relation RTEs even if they are masked by upper joins.
+ * in rules. We can't rely on the namespace mechanism since for
+ * example we need to lock base-relation RTEs even if they are masked
+ * by upper joins.
*/
i = 0;
foreach(rt, qry->rtable)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index fe003ded50..72f2b0c04f 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -994,11 +994,16 @@ typedef struct PartitionCmd
*
* inFromCl marks those range variables that are listed in the FROM clause.
* It's false for RTEs that are added to a query behind the scenes, such
- * as the NEW and OLD variables for a rule, or the subqueries of a UNION.
+ * as the NEW and OLD variables for a rule, or the subqueries of a UNION,
+ * or the RTEs of inheritance child tables that are added by the planner.
* This flag is not used during parsing (except in transformLockingClause,
* q.v.); the parser now uses a separate "namespace" data structure to
* control visibility. But it is needed by ruleutils.c to determine
- * whether RTEs should be shown in decompiled queries.
+ * whether RTEs should be shown in decompiled queries. It is used by the
+ * executor to determine that a given RTE_RELATION entry belongs to a table
+ * directly mentioned in the query or to a child table added by the planner.
+ * It needs to know that for the case where the child tables in a plan need
+ * to be locked.
*
* securityQuals is a list of security barrier quals (boolean expressions),
* to be tested in the listed order before returning a row from the
--
2.35.3
v45-0002-Refactoring-to-move-ExecutorStart-calls-to-be-ne.patchapplication/octet-stream; name=v45-0002-Refactoring-to-move-ExecutorStart-calls-to-be-ne.patchDownload
From 1b27fcbdfed5426ff19721d3e19664adb84f9c7d Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 3 Aug 2023 12:34:31 +0900
Subject: [PATCH v45 2/6] Refactoring to move ExecutorStart() calls to be near
GetCachedPlan()
An upcoming patch will make ExecutorStart() detect the invalidation
of a CachedPlan when initializing the plan tree contained in it. A
caller must retry with a new CachedPlan when ExecutorStart() detects
an invalidation. Having the ExecutorStart() in the same or nearby
as GetCachedPlan() makes it more convenient to implement the replan
loop.
The following sites have thus been modified:
* The ExecutorStart() call in ExplainOnePlan() is moved, along with
CreateQueryDesc(), into a new function ExplainQueryDesc(), which its
callers now call before calling it.
* The ExecutorStart() call in _SPI_pquery() is moved to its caller
_SPI_execute_plan().
* The ExecutorStart() call in PortalRunMulti() is moved to
PortalStart(). This requires a new List field in PortalData to
store the QueryDescs created in PortalStart() and the associated
memory context field. One unintended consequence is that the
CommandCounterIncrement() between queries in PORTAL_MULTI_QUERY
cases is now done in the loop in PortalStart() and not in
PortalRunMulti(). That still seems to work because the Snapshot
registered in QueryDesc/EState is updated to account for the
CCI().
---
src/backend/commands/explain.c | 121 ++++++-----
src/backend/commands/prepare.c | 12 +-
src/backend/executor/spi.c | 27 +--
src/backend/tcop/pquery.c | 311 +++++++++++++----------------
src/backend/utils/mmgr/portalmem.c | 9 +
src/include/commands/explain.h | 6 +-
src/include/utils/portal.h | 2 +
7 files changed, 250 insertions(+), 238 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..59d57f9c10 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -393,6 +393,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
else
{
PlannedStmt *plan;
+ QueryDesc *queryDesc;
instr_time planstart,
planduration;
BufferUsage bufusage_start,
@@ -415,12 +416,77 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
+ queryDesc = ExplainQueryDesc(plan, queryString, into, es,
+ params, queryEnv);
+ Assert(queryDesc);
+
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(queryDesc, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
+/*
+ * ExplainQueryDesc
+ * Set up QueryDesc for EXPLAINing a given plan
+ */
+QueryDesc *
+ExplainQueryDesc(PlannedStmt *stmt,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv)
+{
+ QueryDesc *queryDesc;
+ DestReceiver *dest;
+ int eflags;
+ int instrument_option = 0;
+
+ /*
+ * Normally we discard the query's output, but if explaining CREATE TABLE
+ * AS, we'd better use the appropriate tuple receiver.
+ */
+ if (into)
+ dest = CreateIntoRelDestReceiver(into);
+ else
+ dest = None_Receiver;
+
+ if (es->analyze && es->timing)
+ instrument_option |= INSTRUMENT_TIMER;
+ else if (es->analyze)
+ instrument_option |= INSTRUMENT_ROWS;
+
+ if (es->buffers)
+ instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
+
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+
+ /* Create a QueryDesc for the query */
+ queryDesc = CreateQueryDesc(stmt, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv, instrument_option);
+
+ /* Select execution options */
+ if (es->analyze)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_EXPLAIN_ONLY;
+ if (es->generic)
+ eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
+ if (into)
+ eflags |= GetIntoRelEFlags(into);
+
+ /* Call ExecutorStart to prepare the plan for execution. */
+ ExecutorStart(queryDesc, eflags);
+
+ return queryDesc;
+}
+
/*
* ExplainOneUtility -
* print out the execution plan for one utility statement
@@ -524,29 +590,16 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
{
- DestReceiver *dest;
- QueryDesc *queryDesc;
instr_time starttime;
double totaltime = 0;
- int eflags;
- int instrument_option = 0;
- Assert(plannedstmt->commandType != CMD_UTILITY);
-
- if (es->analyze && es->timing)
- instrument_option |= INSTRUMENT_TIMER;
- else if (es->analyze)
- instrument_option |= INSTRUMENT_ROWS;
-
- if (es->buffers)
- instrument_option |= INSTRUMENT_BUFFERS;
- if (es->wal)
- instrument_option |= INSTRUMENT_WAL;
+ Assert(queryDesc->plannedstmt->commandType != CMD_UTILITY);
/*
* We always collect timing for the entire statement, even when node-level
@@ -555,40 +608,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
- /*
- * Normally we discard the query's output, but if explaining CREATE TABLE
- * AS, we'd better use the appropriate tuple receiver.
- */
- if (into)
- dest = CreateIntoRelDestReceiver(into);
- else
- dest = None_Receiver;
-
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
-
- /* Select execution options */
- if (es->analyze)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_EXPLAIN_ONLY;
- if (es->generic)
- eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
- if (into)
- eflags |= GetIntoRelEFlags(into);
-
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
-
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..1e9a98ad6e 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -639,8 +639,16 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ {
+ QueryDesc *queryDesc;
+
+ queryDesc = ExplainQueryDesc(pstmt, queryString,
+ into, es, paramLI, queryEnv);
+ Assert(queryDesc != NULL);
+ ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
+ }
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 33975687b3..d36ca35d3a 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -71,7 +71,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -2661,6 +2661,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2674,8 +2675,17 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+
+ ExecutorStart(qdesc, eflags);
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
+
FreeQueryDesc(qdesc);
}
else
@@ -2850,10 +2860,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2897,14 +2906,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5565f200c3..701808f303 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -35,12 +36,6 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc);
static void FillPortalStore(Portal portal, bool isTopLevel);
static uint64 RunFromStore(Portal portal, ScanDirection direction, uint64 count,
DestReceiver *dest);
@@ -116,86 +111,6 @@ FreeQueryDesc(QueryDesc *qdesc)
}
-/*
- * ProcessQuery
- * Execute a single plannable query within a PORTAL_MULTI_QUERY,
- * PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
- *
- * plan: the plan tree for the query
- * sourceText: the source text of the query
- * params: any parameters needed
- * dest: where to send results
- * qc: where to store the command completion status data.
- *
- * qc may be NULL if caller doesn't want a status string.
- *
- * Must be called in a memory context that will be reset or deleted on
- * error; otherwise the executor's memory usage will be leaked.
- */
-static void
-ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc)
-{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
-
- /*
- * Run the plan to completion.
- */
- ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
-
- /*
- * Build command completion status data, if caller wants one.
- */
- if (qc)
- {
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
- SetQueryCompletion(qc, CMDTAG_SELECT, queryDesc->estate->es_processed);
- break;
- case CMD_INSERT:
- SetQueryCompletion(qc, CMDTAG_INSERT, queryDesc->estate->es_processed);
- break;
- case CMD_UPDATE:
- SetQueryCompletion(qc, CMDTAG_UPDATE, queryDesc->estate->es_processed);
- break;
- case CMD_DELETE:
- SetQueryCompletion(qc, CMDTAG_DELETE, queryDesc->estate->es_processed);
- break;
- case CMD_MERGE:
- SetQueryCompletion(qc, CMDTAG_MERGE, queryDesc->estate->es_processed);
- break;
- default:
- SetQueryCompletion(qc, CMDTAG_UNKNOWN, queryDesc->estate->es_processed);
- break;
- }
- }
-
- /*
- * Now, we close down all the scans and free allocated resources.
- */
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
-
- FreeQueryDesc(queryDesc);
-}
-
/*
* ChoosePortalStrategy
* Select portal execution strategy given the intended statement list.
@@ -435,10 +350,9 @@ PortalStart(Portal portal, ParamListInfo params,
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
- int myeflags;
+ int myeflags = 0;
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_DEFINED);
@@ -448,15 +362,13 @@ PortalStart(Portal portal, ParamListInfo params,
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -472,6 +384,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -489,8 +403,8 @@ PortalStart(Portal portal, ParamListInfo params,
*/
/*
- * Create QueryDesc in portal's context; for the moment, set
- * the destination to DestNone.
+ * Create QueryDesc in portal->queryContext; for the moment,
+ * set the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
portal->sourceText,
@@ -501,30 +415,41 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti(). */
+ if (portal->strategy == PORTAL_ONE_RETURNING ||
+ portal->strategy == PORTAL_ONE_MOD_WITH)
+ portal->qdescs = list_make1(queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
+ /* Call ExecutorStart to prepare the plan for execution. */
ExecutorStart(queryDesc, myeflags);
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -536,29 +461,6 @@ PortalStart(Portal portal, ParamListInfo params,
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -581,7 +483,69 @@ PortalStart(Portal portal, ParamListInfo params,
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool first = true;
+
+ myeflags = eflags;
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /*
+ * Push the snapshot to be used by the executor.
+ */
+ if (!is_utility)
+ {
+ /*
+ * Must copy the snapshot for all statements
+ * except thec first as we'll need to update its
+ * command ID.
+ */
+ if (!first)
+ PushCopiedSnapshot(GetTransactionSnapshot());
+ else
+ PushActiveSnapshot(GetTransactionSnapshot());
+ }
+
+ /*
+ * From the 2nd statement onwards, update the command
+ * ID and the snapshot to match.
+ */
+ if (!first)
+ {
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+ }
+
+ first = false;
+
+ /*
+ * Create the QueryDesc. DestReceiver will be set in
+ * PortalRunMulti() before calling ExecutorRun().
+ */
+ queryDesc = CreateQueryDesc(plan,
+ portal->sourceText,
+ !is_utility ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ if (is_utility)
+ continue;
+
+ ExecutorStart(queryDesc, myeflags);
+ PopActiveSnapshot();
+ }
+ }
+
portal->tupDesc = NULL;
break;
}
@@ -594,7 +558,6 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
@@ -604,7 +567,6 @@ PortalStart(Portal portal, ParamListInfo params,
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
portal->status = PORTAL_READY;
}
@@ -1193,7 +1155,7 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1176,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = (QueryDesc *) lfirst(qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1233,33 +1196,26 @@ PortalRunMulti(Portal portal,
if (log_executor_stats)
ResetUsage();
- /*
- * Must always have a snapshot for plannable queries. First time
- * through, take a new snapshot; for subsequent queries in the
- * same portal, just update the snapshot's copy of the command
- * counter.
- */
+ /* Push the snapshot for plannable queries. */
if (!active_snapshot_set)
{
- Snapshot snapshot = GetTransactionSnapshot();
+ Snapshot snapshot = qdesc->snapshot;
- /* If told to, register the snapshot and save in portal */
+ /*
+ * If told to, register the snapshot and save in portal
+ *
+ * Note that the command ID of qdesc->snapshot for 2nd query
+ * onwards would have been updated in PortalStart() to account
+ * for CCI() done between queries, but it's OK that here we
+ * don't likewise update holdSnapshot's command ID.
+ */
if (setHoldSnapshot)
{
snapshot = RegisterSnapshot(snapshot);
portal->holdSnapshot = snapshot;
}
- /*
- * We can't have the holdSnapshot also be the active one,
- * because UpdateActiveSnapshotCommandId would complain. So
- * force an extra snapshot copy. Plain PushActiveSnapshot
- * would have copied the transaction snapshot anyway, so this
- * only adds a copy step when setHoldSnapshot is true. (It's
- * okay for the command ID of the active snapshot to diverge
- * from what holdSnapshot has.)
- */
- PushCopiedSnapshot(snapshot);
+ PushActiveSnapshot(snapshot);
/*
* As for PORTAL_ONE_SELECT portals, it does not seem
@@ -1268,26 +1224,39 @@ PortalRunMulti(Portal portal,
active_snapshot_set = true;
}
- else
- UpdateActiveSnapshotCommandId();
+ /*
+ * Run the plan to completion.
+ */
+ qdesc->dest = dest;
+ ExecutorRun(qdesc, ForwardScanDirection, 0, true);
+
+ /*
+ * Build command completion status data if needed.
+ */
if (pstmt->canSetTag)
{
- /* statement can set tag string */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- dest, qc);
- }
- else
- {
- /* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- altdest, NULL);
+ switch (qdesc->operation)
+ {
+ case CMD_SELECT:
+ SetQueryCompletion(qc, CMDTAG_SELECT, qdesc->estate->es_processed);
+ break;
+ case CMD_INSERT:
+ SetQueryCompletion(qc, CMDTAG_INSERT, qdesc->estate->es_processed);
+ break;
+ case CMD_UPDATE:
+ SetQueryCompletion(qc, CMDTAG_UPDATE, qdesc->estate->es_processed);
+ break;
+ case CMD_DELETE:
+ SetQueryCompletion(qc, CMDTAG_DELETE, qdesc->estate->es_processed);
+ break;
+ case CMD_MERGE:
+ SetQueryCompletion(qc, CMDTAG_MERGE, qdesc->estate->es_processed);
+ break;
+ default:
+ SetQueryCompletion(qc, CMDTAG_UNKNOWN, qdesc->estate->es_processed);
+ break;
+ }
}
if (log_executor_stats)
@@ -1342,12 +1311,12 @@ PortalRunMulti(Portal portal,
if (portal->stmts == NIL)
break;
- /*
- * Increment command counter between queries, but not after the last
- * one.
- */
- if (lnext(portal->stmts, stmtlist_item) != NULL)
- CommandCounterIncrement();
+ if (qdesc->estate)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ }
+ FreeQueryDesc(qdesc);
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 06dfa85f04..0cad450dcd 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -201,6 +201,13 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /*
+ * initialize portal's query context to store QueryDescs created during
+ * PortalStart() and then used in PortalRun().
+ */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -224,6 +231,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -594,6 +602,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3d3e632a0c..08ea852b65 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -88,7 +88,11 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv);
+extern void ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aa08b1e0fc..af059e30f8 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,8 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
--
2.35.3
v45-0001-Add-support-for-allowing-ExecInitNode-to-detect-.patchapplication/octet-stream; name=v45-0001-Add-support-for-allowing-ExecInitNode-to-detect-.patchDownload
From ddaa761e96cd4b34035edd40f6bc17b953574496 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 11 Aug 2023 14:09:29 +0900
Subject: [PATCH v45 1/6] Add support for allowing ExecInitNode to detect
CachedPlan invalidation
This means ExecInitNode() will check for CachedPlan invalidation at
various points (after locking tables, initializing child plans, etc.)
and return either a partially initialized planstate node or NULL if
no cleanup is necessary.
ExecEndNode() subroutines now always check a pointer for nullness
before calling cleanup on it.
---
contrib/postgres_fdw/postgres_fdw.c | 4 ++++
src/backend/executor/execMain.c | 23 +++++++++++++++++--
src/backend/executor/execPartition.c | 4 ++++
src/backend/executor/execProcnode.c | 12 +++++++++-
src/backend/executor/execUtils.c | 2 ++
src/backend/executor/nodeAgg.c | 15 ++++--------
src/backend/executor/nodeAppend.c | 10 +++++---
src/backend/executor/nodeBitmapAnd.c | 7 +++---
src/backend/executor/nodeBitmapHeapscan.c | 19 +++++----------
src/backend/executor/nodeBitmapIndexscan.c | 10 ++------
src/backend/executor/nodeBitmapOr.c | 7 +++---
src/backend/executor/nodeCtescan.c | 12 ----------
src/backend/executor/nodeCustom.c | 16 ++++++-------
src/backend/executor/nodeForeignscan.c | 12 ++++------
src/backend/executor/nodeFunctionscan.c | 12 ----------
src/backend/executor/nodeGather.c | 6 ++---
src/backend/executor/nodeGatherMerge.c | 5 ++--
src/backend/executor/nodeGroup.c | 7 ++----
src/backend/executor/nodeHash.c | 7 ++----
src/backend/executor/nodeHashjoin.c | 16 ++++---------
src/backend/executor/nodeIncrementalSort.c | 10 ++------
src/backend/executor/nodeIndexonlyscan.c | 20 ++++------------
src/backend/executor/nodeIndexscan.c | 20 ++++------------
src/backend/executor/nodeLimit.c | 3 ++-
src/backend/executor/nodeLockRows.c | 2 ++
src/backend/executor/nodeMaterial.c | 7 ++----
src/backend/executor/nodeMemoize.c | 12 +++-------
src/backend/executor/nodeMergeAppend.c | 6 ++++-
src/backend/executor/nodeMergejoin.c | 16 ++++---------
src/backend/executor/nodeModifyTable.c | 18 ++++++---------
.../executor/nodeNamedtuplestorescan.c | 11 ---------
src/backend/executor/nodeNestloop.c | 15 ++++--------
src/backend/executor/nodeProjectSet.c | 12 ++--------
src/backend/executor/nodeRecursiveunion.c | 10 ++++++--
src/backend/executor/nodeResult.c | 12 ++--------
src/backend/executor/nodeSamplescan.c | 16 +++----------
src/backend/executor/nodeSeqscan.c | 14 ++---------
src/backend/executor/nodeSetOp.c | 6 ++---
src/backend/executor/nodeSort.c | 9 ++------
src/backend/executor/nodeSubqueryscan.c | 14 ++---------
src/backend/executor/nodeTableFuncscan.c | 12 ----------
src/backend/executor/nodeTidrangescan.c | 14 ++---------
src/backend/executor/nodeTidscan.c | 14 ++---------
src/backend/executor/nodeUnique.c | 7 ++----
src/backend/executor/nodeValuesscan.c | 13 -----------
src/backend/executor/nodeWindowAgg.c | 13 ++---------
src/backend/executor/nodeWorktablescan.c | 11 ---------
src/include/executor/executor.h | 12 ++++++++++
src/include/nodes/execnodes.h | 2 ++
src/include/utils/plancache.h | 14 +++++++++++
50 files changed, 191 insertions(+), 360 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index c5cada55fb..1edd4c3f17 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2658,7 +2658,11 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
/* Get info about foreign table. */
rtindex = node->resultRelInfo->ri_RangeTableIndex;
if (fsplan->scan.scanrelid == 0)
+ {
dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (!ExecPlanStillValid(estate))
+ return;
+ }
else
dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4c5a7bbf62..1a848b1c20 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -839,7 +839,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
- PlanState *planstate;
+ PlanState *planstate = NULL;
TupleDesc tupType;
ListCell *l;
int i;
@@ -855,6 +855,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = NULL;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
@@ -886,6 +887,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (!ExecPlanStillValid(estate))
+ goto plan_init_suspended;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -953,9 +956,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
sp_eflags |= EXEC_FLAG_REWIND;
subplanstate = ExecInitNode(subplan, estate, sp_eflags);
-
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
+ if (!ExecPlanStillValid(estate))
+ goto plan_init_suspended;
i++;
}
@@ -966,6 +970,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ goto plan_init_suspended;
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -1008,6 +1014,15 @@ InitPlan(QueryDesc *queryDesc, int eflags)
}
queryDesc->tupDesc = tupType;
+ Assert(planstate != NULL);
+ queryDesc->planstate = planstate;
+ return;
+
+plan_init_suspended:
+ /*
+ * Plan initialization failed. Mark QueryDesc as such. ExecEndPlan()
+ * will clean up initialized plan nodes from estate->es_planstate_nodes.
+ */
queryDesc->planstate = planstate;
}
@@ -3010,6 +3025,10 @@ EvalPlanQualEnd(EPQState *epqstate)
MemoryContext oldcontext;
ListCell *l;
+ /* Nothing to do if EvalPlanQualInit() wasn't done to begin with. */
+ if (epqstate->parentestate == NULL)
+ return;
+
rtsize = epqstate->parentestate->es_range_table_size;
/*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index eb8a87fd63..e88455368c 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1801,6 +1801,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/* Create the working data structure for pruning */
prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Perform an initial partition prune pass, if required.
@@ -1927,6 +1929,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d4..842c6751c5 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -135,7 +135,13 @@ static bool ExecShutdownNode_walker(PlanState *node, void *context);
* 'estate' is the shared execution state for the plan tree
* 'eflags' is a bitwise OR of flag bits described in executor.h
*
- * Returns a PlanState node corresponding to the given Plan node.
+ * Returns a PlanState node corresponding to the given Plan node or NULL.
+ *
+ * NULL may be returned either if the input node is NULL or if the plan
+ * tree that the node is a part of is found to have been invalidated when
+ * taking a lock on the relation mentioned in the node or in a child
+ * node. The latter case arises if the plan tree contains inheritance/
+ * partition child tables and is from a CachedPlan.
* ------------------------------------------------------------------------
*/
PlanState *
@@ -388,6 +394,10 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
break;
}
+ if (!ExecPlanStillValid(estate))
+ return result;
+
+ Assert(result != NULL);
ExecSetExecProcNode(result, result->ExecProcNode);
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c06b228858..f4611bdd27 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -848,6 +848,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (!ExecPlanStillValid(estate))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 468db94fe5..f46c3df199 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3150,7 +3150,8 @@ hashagg_reset_spill_state(AggState *aggstate)
}
/* free batches */
- list_free_deep(aggstate->hash_batches);
+ if (aggstate->hash_batches)
+ list_free_deep(aggstate->hash_batches);
aggstate->hash_batches = NIL;
/* close tape set */
@@ -3304,6 +3305,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return aggstate;
/*
* initialize source tuple type.
@@ -4357,16 +4360,6 @@ ExecEndAgg(AggState *node)
if (node->hashcontext)
ReScanExprContext(node->hashcontext);
- /*
- * We don't actually free any ExprContexts here (see comment in
- * ExecFreeExprContext), just unlinking the output one from the plan node
- * suffices.
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /* clean up tuple table */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
}
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..5d9fa4bff3 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -147,6 +147,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
list_length(node->appendplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -185,8 +187,9 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->ps.resultopsset = true;
appendstate->ps.resultopsfixed = false;
- appendplanstates = (PlanState **) palloc(nplans *
- sizeof(PlanState *));
+ appendstate->appendplans = appendplanstates =
+ (PlanState **) palloc0(nplans * sizeof(PlanState *));
+ appendstate->as_nplans = nplans;
/*
* call ExecInitNode on each of the valid plans to be executed and save
@@ -221,11 +224,12 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return appendstate;
}
appendstate->as_first_partial_plan = firstvalid;
appendstate->appendplans = appendplanstates;
- appendstate->as_nplans = nplans;
/* Initialize async state */
appendstate->as_asyncplans = asyncplans;
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4c5eb2b23b..93e5de0c1a 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -78,7 +78,6 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
bitmapandstate->ps.state = estate;
bitmapandstate->ps.ExecProcNode = ExecBitmapAnd;
bitmapandstate->bitmapplans = bitmapplanstates;
- bitmapandstate->nplans = nplans;
/*
* call ExecInitNode on each of the plans to be executed and save the
@@ -88,8 +87,10 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
foreach(l, node->bitmapplans)
{
initNode = (Plan *) lfirst(l);
- bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
- i++;
+ bitmapplanstates[i++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return bitmapandstate;
+ bitmapandstate->nplans = i;
}
/*
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index f35df0b8bf..3cdece852c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -655,18 +655,6 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
*/
scanDesc = node->ss.ss_currentScanDesc;
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clear out tuple table slots
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close down subplans
*/
@@ -693,7 +681,8 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
/*
* close heap scan
*/
- table_endscan(scanDesc);
+ if (scanDesc)
+ table_endscan(scanDesc);
}
/* ----------------------------------------------------------------
@@ -763,11 +752,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 83ec9ede89..4200472d02 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -184,14 +184,6 @@ ExecEndBitmapIndexScan(BitmapIndexScanState *node)
indexRelationDesc = node->biss_RelationDesc;
indexScanDesc = node->biss_ScanDesc;
- /*
- * Free the exprcontext ... now dead code, see ExecFreeExprContext
- */
-#ifdef NOT_USED
- if (node->biss_RuntimeContext)
- FreeExprContext(node->biss_RuntimeContext, true);
-#endif
-
/*
* close the index relation (no-op if we didn't open it)
*/
@@ -263,6 +255,8 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 0bf8af9652..e0e9228e35 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -79,7 +79,6 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
bitmaporstate->ps.state = estate;
bitmaporstate->ps.ExecProcNode = ExecBitmapOr;
bitmaporstate->bitmapplans = bitmapplanstates;
- bitmaporstate->nplans = nplans;
/*
* call ExecInitNode on each of the plans to be executed and save the
@@ -89,8 +88,10 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
foreach(l, node->bitmapplans)
{
initNode = (Plan *) lfirst(l);
- bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
- i++;
+ bitmapplanstates[i++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return bitmaporstate;
+ bitmaporstate->nplans = i;
}
/*
diff --git a/src/backend/executor/nodeCtescan.c b/src/backend/executor/nodeCtescan.c
index cc4c4243e2..a0c0c4be33 100644
--- a/src/backend/executor/nodeCtescan.c
+++ b/src/backend/executor/nodeCtescan.c
@@ -287,18 +287,6 @@ ExecInitCteScan(CteScan *node, EState *estate, int eflags)
void
ExecEndCteScan(CteScanState *node)
{
- /*
- * Free exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* If I am the leader, free the tuplestore.
*/
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index bd42c65b29..38061c30b9 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -61,6 +61,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
css->ss.ss_currentRelation = scan_rel;
}
@@ -127,15 +129,11 @@ ExecCustomScan(PlanState *pstate)
void
ExecEndCustomScan(CustomScanState *node)
{
- Assert(node->methods->EndCustomScan != NULL);
- node->methods->EndCustomScan(node);
-
- /* Free the exprcontext */
- ExecFreeExprContext(&node->ss.ps);
-
- /* Clean out the tuple table */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ if (node->methods)
+ {
+ Assert(node->methods->EndCustomScan != NULL);
+ node->methods->EndCustomScan(node);
+ }
}
void
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index c2139acca0..a3705082a9 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -173,6 +173,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -264,6 +266,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* Tell the FDW to initialize the scan.
@@ -312,14 +316,6 @@ ExecEndForeignScan(ForeignScanState *node)
/* Shut down any outer plan. */
if (outerPlanState(node))
ExecEndNode(outerPlanState(node));
-
- /* Free the exprcontext */
- ExecFreeExprContext(&node->ss.ps);
-
- /* clean out the tuple table */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeFunctionscan.c b/src/backend/executor/nodeFunctionscan.c
index dd06ef8aee..a49c1a2c85 100644
--- a/src/backend/executor/nodeFunctionscan.c
+++ b/src/backend/executor/nodeFunctionscan.c
@@ -523,18 +523,6 @@ ExecEndFunctionScan(FunctionScanState *node)
{
int i;
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* Release slots and tuplestore resources
*/
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 307fc10eea..6b26e03f74 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -89,6 +89,9 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return gatherstate;
+
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
@@ -250,9 +253,6 @@ ExecEndGather(GatherState *node)
{
ExecEndNode(outerPlanState(node)); /* let children clean up first */
ExecShutdownGather(node);
- ExecFreeExprContext(&node->ps);
- if (node->ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 9d5e1a46e9..84412f94bb 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -108,6 +108,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return gm_state;
/*
* Leader may access ExecProcNode result directly (if
@@ -290,9 +292,6 @@ ExecEndGatherMerge(GatherMergeState *node)
{
ExecEndNode(outerPlanState(node)); /* let children clean up first */
ExecShutdownGatherMerge(node);
- ExecFreeExprContext(&node->ps);
- if (node->ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 25a1618952..b6068887f6 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -185,6 +185,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return grpstate;
/*
* Initialize scan slot and type.
@@ -228,11 +230,6 @@ ExecEndGroup(GroupState *node)
{
PlanState *outerPlan;
- ExecFreeExprContext(&node->ss.ps);
-
- /* clean up tuple table */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
}
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 8b5c35b82b..030bf0ed43 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -386,6 +386,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hashstate;
/*
* initialize our result slot and type. No need to build projection
@@ -415,11 +417,6 @@ ExecEndHash(HashState *node)
{
PlanState *outerPlan;
- /*
- * free exprcontext
- */
- ExecFreeExprContext(&node->ps);
-
/*
* shut down the subplan
*/
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 980746128b..49a6ba4276 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -752,8 +752,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hjstate;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hjstate;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
@@ -867,18 +871,6 @@ ExecEndHashJoin(HashJoinState *node)
node->hj_HashTable = NULL;
}
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->js.ps);
-
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->hj_OuterTupleSlot);
- ExecClearTuple(node->hj_HashTupleSlot);
-
/*
* clean up subtrees
*/
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 7683e3341c..6caa1aa306 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1041,6 +1041,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return incrsortstate;
/*
* Initialize scan slot and type.
@@ -1079,14 +1081,6 @@ ExecEndIncrementalSort(IncrementalSortState *node)
{
SO_printf("ExecEndIncrementalSort: shutting down sort node\n");
- /* clean out the scan tuple */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
- /* must drop pointer to sort result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- /* must drop standalone tuple slots from outer node */
- ExecDropSingleTupleTableSlot(node->group_pivot);
- ExecDropSingleTupleTableSlot(node->transfer_tuple);
-
/*
* Release tuplesort resources.
*/
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0b43a9b969..ea7fd89c0c 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -380,22 +380,6 @@ ExecEndIndexOnlyScan(IndexOnlyScanState *node)
node->ioss_VMBuffer = InvalidBuffer;
}
- /*
- * Free the exprcontext(s) ... now dead code, see ExecFreeExprContext
- */
-#ifdef NOT_USED
- ExecFreeExprContext(&node->ss.ps);
- if (node->ioss_RuntimeContext)
- FreeExprContext(node->ioss_RuntimeContext, true);
-#endif
-
- /*
- * clear out tuple table slots
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close the index relation (no-op if we didn't open it)
*/
@@ -512,6 +496,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -565,6 +551,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->ioss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 4540c7781d..906358011a 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -794,22 +794,6 @@ ExecEndIndexScan(IndexScanState *node)
indexRelationDesc = node->iss_RelationDesc;
indexScanDesc = node->iss_ScanDesc;
- /*
- * Free the exprcontext(s) ... now dead code, see ExecFreeExprContext
- */
-#ifdef NOT_USED
- ExecFreeExprContext(&node->ss.ps);
- if (node->iss_RuntimeContext)
- FreeExprContext(node->iss_RuntimeContext, true);
-#endif
-
- /*
- * clear out tuple table slots
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close the index relation (no-op if we didn't open it)
*/
@@ -925,6 +909,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -970,6 +956,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 425fbfc405..6760de0f25 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -476,6 +476,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return limitstate;
/*
* initialize child expressions
@@ -534,7 +536,6 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
void
ExecEndLimit(LimitState *node)
{
- ExecFreeExprContext(&node->ps);
ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index e459971d32..2599332f01 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -322,6 +322,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return lrstate;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 09632678b0..b974ebdc8a 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return matstate;
/*
* Initialize result type and slot. No need to initialize projection info
@@ -239,11 +241,6 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
void
ExecEndMaterial(MaterialState *node)
{
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* Release tuplestore resources
*/
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 4f04269e26..d0cdbe1fd7 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -938,6 +938,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mstate;
/*
* Initialize return slot and type. No need to initialize projection info
@@ -1043,6 +1045,7 @@ ExecEndMemoize(MemoizeState *node)
{
#ifdef USE_ASSERT_CHECKING
/* Validate the memory accounting code is correct in assert builds. */
+ if (node->hashtable)
{
int count;
uint64 mem = 0;
@@ -1091,15 +1094,6 @@ ExecEndMemoize(MemoizeState *node)
/* Remove the cache context */
MemoryContextDelete(node->tableContext);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
- /* must drop pointer to cache result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-
- /*
- * free exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
/*
* shut down the subplan
*/
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 21b5726e6e..255b05aad3 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -95,6 +95,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
list_length(node->mergeplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -120,7 +122,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ms_prune_state = NULL;
}
- mergeplanstates = (PlanState **) palloc(nplans * sizeof(PlanState *));
+ mergeplanstates = (PlanState **) palloc0(nplans * sizeof(PlanState *));
mergestate->mergeplans = mergeplanstates;
mergestate->ms_nplans = nplans;
@@ -151,6 +153,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
}
mergestate->ps.ps_ProjInfo = NULL;
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 00f96d045e..e7f4512419 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1490,11 +1490,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
@@ -1642,18 +1646,6 @@ ExecEndMergeJoin(MergeJoinState *node)
{
MJ1_printf("ExecEndMergeJoin: %s\n",
"ending node processing");
-
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->js.ps);
-
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->mj_MarkedTupleSlot);
-
/*
* shut down the subplans
*/
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 5005d8c0d1..c28d5058e9 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3985,6 +3985,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
node->epqParam, node->resultRelations);
@@ -4012,6 +4015,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
if (resultRelInfo != mtstate->rootResultRelInfo)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* For child result relations, store the root result relation
@@ -4039,6 +4044,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mtstate;
/*
* Do additional per-result-relation initialization.
@@ -4446,17 +4453,6 @@ ExecEndModifyTable(ModifyTableState *node)
ExecDropSingleTupleTableSlot(node->mt_root_tuple_slot);
}
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
/*
* Terminate EPQ execution if active
*/
diff --git a/src/backend/executor/nodeNamedtuplestorescan.c b/src/backend/executor/nodeNamedtuplestorescan.c
index 46832ad82f..e142ef593b 100644
--- a/src/backend/executor/nodeNamedtuplestorescan.c
+++ b/src/backend/executor/nodeNamedtuplestorescan.c
@@ -164,17 +164,6 @@ ExecInitNamedTuplestoreScan(NamedTuplestoreScan *node, EState *estate, int eflag
void
ExecEndNamedTuplestoreScan(NamedTuplestoreScanState *node)
{
- /*
- * Free exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b3d52e69ec..0158a3e592 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -295,11 +295,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return nlstate;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return nlstate;
/*
* Initialize result slot, type and projection.
@@ -363,17 +367,6 @@ ExecEndNestLoop(NestLoopState *node)
{
NL1_printf("ExecEndNestLoop: %s\n",
"ending node processing");
-
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->js.ps);
-
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
-
/*
* close down subplans
*/
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index f6ff3dc44c..1b4774d4f7 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -247,6 +247,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return state;
/*
* we don't use inner plan
@@ -320,16 +322,6 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
void
ExecEndProjectSet(ProjectSetState *node)
{
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ps);
-
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
/*
* shut down subplans
*/
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index e781003934..ca4f78685d 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return rustate;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return rustate;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
@@ -272,8 +276,10 @@ void
ExecEndRecursiveUnion(RecursiveUnionState *node)
{
/* Release tuplestores */
- tuplestore_end(node->working_table);
- tuplestore_end(node->intermediate_table);
+ if (node->working_table)
+ tuplestore_end(node->working_table);
+ if (node->intermediate_table)
+ tuplestore_end(node->intermediate_table);
/* free subsidiary stuff including hashtable */
if (node->tempContext)
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 4219712d30..d4ea101cbe 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -208,6 +208,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return resstate;
/*
* we don't use inner plan
@@ -240,16 +242,6 @@ ExecInitResult(Result *node, EState *estate, int eflags)
void
ExecEndResult(ResultState *node)
{
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ps);
-
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
/*
* shut down subplans
*/
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index d7e22b1dbb..02a7db96e1 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -125,6 +125,8 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
@@ -185,21 +187,9 @@ ExecEndSampleScan(SampleScanState *node)
/*
* Tell sampling function that we finished the scan.
*/
- if (node->tsmroutine->EndSampleScan)
+ if (node->tsmroutine && node->tsmroutine->EndSampleScan)
node->tsmroutine->EndSampleScan(node);
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close heap scan
*/
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 4da0f28f7b..48e20aa735 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,8 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
@@ -190,18 +192,6 @@ ExecEndSeqScan(SeqScanState *node)
*/
scanDesc = node->ss.ss_currentScanDesc;
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close heap scan
*/
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 4bc2406b89..7a3a142204 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return setopstate;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
@@ -582,13 +584,9 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
void
ExecEndSetOp(SetOpState *node)
{
- /* clean up tuple table */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
/* free subsidiary stuff including hashtable */
if (node->tableContext)
MemoryContextDelete(node->tableContext);
- ExecFreeExprContext(&node->ps);
ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index c6c72c6e67..3ebbc46604 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return sortstate;
/*
* Initialize scan slot and type.
@@ -303,13 +305,6 @@ ExecEndSort(SortState *node)
SO1_printf("ExecEndSort: %s\n",
"shutting down sort node");
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
- /* must drop pointer to sort result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-
/*
* Release tuplesort resources
*/
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 42471bfc04..3c5c7c2ebb 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return subquerystate;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
@@ -167,18 +169,6 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
void
ExecEndSubqueryScan(SubqueryScanState *node)
{
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the upper tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close down subquery
*/
diff --git a/src/backend/executor/nodeTableFuncscan.c b/src/backend/executor/nodeTableFuncscan.c
index 791cbd2372..a60dcd4943 100644
--- a/src/backend/executor/nodeTableFuncscan.c
+++ b/src/backend/executor/nodeTableFuncscan.c
@@ -213,18 +213,6 @@ ExecInitTableFuncScan(TableFuncScan *node, EState *estate, int eflags)
void
ExecEndTableFuncScan(TableFuncScanState *node)
{
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* Release tuplestore resources
*/
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 2124c55ef5..d337f3d54a 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -331,18 +331,6 @@ ExecEndTidRangeScan(TidRangeScanState *node)
if (scan != NULL)
table_endscan(scan);
-
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clear out tuple table slots
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
@@ -386,6 +374,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 862bd0330b..9637f354b2 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -472,18 +472,6 @@ ExecEndTidScan(TidScanState *node)
{
if (node->ss.ss_currentScanDesc)
table_endscan(node->ss.ss_currentScanDesc);
-
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clear out tuple table slots
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
@@ -529,6 +517,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 45035d74fa..28630e380e 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -136,6 +136,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return uniquestate;
/*
* Initialize result slot and type. Unique nodes do no projections, so
@@ -168,11 +170,6 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
void
ExecEndUnique(UniqueState *node)
{
- /* clean up tuple table */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
- ExecFreeExprContext(&node->ps);
-
ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeValuesscan.c b/src/backend/executor/nodeValuesscan.c
index 32ace63017..3f86783ad7 100644
--- a/src/backend/executor/nodeValuesscan.c
+++ b/src/backend/executor/nodeValuesscan.c
@@ -328,19 +328,6 @@ ExecInitValuesScan(ValuesScan *node, EState *estate, int eflags)
void
ExecEndValuesScan(ValuesScanState *node)
{
- /*
- * Free both exprcontexts
- */
- ExecFreeExprContext(&node->ss.ps);
- node->ss.ps.ps_ExprContext = node->rowcontext;
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 310ac23e3a..a4153be495 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2458,6 +2458,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return winstate;
/*
* initialize source tuple type (which is also the tuple type that we'll
@@ -2691,17 +2693,6 @@ ExecEndWindowAgg(WindowAggState *node)
ExecClearTuple(node->agg_row_slot);
ExecClearTuple(node->temp_slot_1);
ExecClearTuple(node->temp_slot_2);
- if (node->framehead_slot)
- ExecClearTuple(node->framehead_slot);
- if (node->frametail_slot)
- ExecClearTuple(node->frametail_slot);
-
- /*
- * Free both the expr contexts.
- */
- ExecFreeExprContext(&node->ss.ps);
- node->ss.ps.ps_ExprContext = node->tmpcontext;
- ExecFreeExprContext(&node->ss.ps);
for (i = 0; i < node->numaggs; i++)
{
diff --git a/src/backend/executor/nodeWorktablescan.c b/src/backend/executor/nodeWorktablescan.c
index 0c13448236..cc63ddfeca 100644
--- a/src/backend/executor/nodeWorktablescan.c
+++ b/src/backend/executor/nodeWorktablescan.c
@@ -190,17 +190,6 @@ ExecInitWorkTableScan(WorkTableScan *node, EState *estate, int eflags)
void
ExecEndWorkTableScan(WorkTableScanState *node)
{
- /*
- * Free exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index c677e490d7..8ec636cab8 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -256,6 +257,17 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the cached plan, if any, still valid at this point? That is, not
+ * invalidated by the incoming invalidation messages that have been processed
+ * recently.
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanStillValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cb714f4a19..719a728319 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -623,6 +623,8 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct CachedPlan *es_cachedplan; /* CachedPlan if plannedstmt is from,
+ * one or NULL if not */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 916e59d9fe..c83a67fea3 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -221,6 +221,20 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Called by the executor on every relation lock taken when initializing the
+ * plan tree in the CachedPlan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
--
2.35.3
On Fri, Aug 11, 2023 at 14:31 Amit Langote <amitlangote09@gmail.com> wrote:
On Wed, Aug 9, 2023 at 1:05 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 8, 2023 at 10:32 AM Amit Langote <amitlangote09@gmail.com>
wrote:
But should ExecInitNode() subroutines return the partially initialized
PlanState node or NULL on detecting invalidation? If I'm
understanding how you think this should be working correctly, I think
you mean the former, because if it were the latter, ExecInitNode()
would end up returning NULL at the top for the root and then there's
nothing to pass to ExecEndNode(), so no way to clean up to begin with.
In that case, I think we will need to adjust ExecEndNode() subroutines
to add `if (node->ps.ps_ResultTupleSlot)` in the above code, for
example. That's something Tom had said he doesn't like very much [1].Yeah, I understood Tom's goal as being "don't return partially
initialized nodes."Personally, I'm not sure that's an important goal. In fact, I don't
even think it's a desirable one. It doesn't look difficult to audit
the end-node functions for cases where they'd fail if a particular
pointer were NULL instead of pointing to some real data, and just
fixing all such cases to have NULL-tests looks like purely mechanical
work that we are unlikely to get wrong. And at least some cases
wouldn't require any changes at all.If we don't do that, the complexity doesn't go away. It just moves
someplace else. Presumably what we do in that case is have
ExecInitNode functions undo any initialization that they've already
done before returning NULL. There are basically two ways to do that.
Option one is to add code at the point where they return early to
clean up anything they've already initialized, but that code is likely
to substantially duplicate whatever the ExecEndNode function already
knows how to do, and it's very easy for logic like this to get broken
if somebody rearranges an ExecInitNode function down the road.Yeah, I too am not a fan of making ExecInitNode() clean up partially
initialized nodes.Option
two is to rearrange the ExecInitNode functions now, to open relations
or recurse at the beginning, so that we discover the need to fail
before we initialize anything. That restricts our ability to further
rearrange the functions in future somewhat, but more importantly,
IMHO, it introduces more risk right now. Checking that the ExecEndNode
function will not fail if some pointers are randomly null is a lot
easier than checking that changing the order of operations in an
ExecInitNode function breaks nothing.I'm not here to say that we can't do one of those things. But I think
adding null-tests to ExecEndNode functions looks like *far* less work
and *way* less risk.+1
There's a second issue here, too, which is when we abort ExecInitNode
partway through, how do we signal that? You're rightly pointing out
here that if we do that by returning NULL, then we don't do it by
returning a pointer to the partially initialized node that we just
created, which means that we either need to store those partially
initialized nodes in a separate data structure as you propose to do in
0001,or else we need to pick a different signalling convention. We
could change (a) ExecInitNode to have an additional argument, bool
*kaboom, or (b) we could make it return bool and return the node
pointer via a new additional argument, or (c) we could put a Boolean
flag into the estate and let the function signal failure by flipping
the value of the flag.The failure can already be detected by seeing that
ExecPlanIsValid(estate) is false. The question is what ExecInitNode()
or any of its subroutines should return once it is. I think the
following convention works:Return partially initialized state from ExecInit* function where we
detect the invalidation after calling ExecInitNode() on a child plan,
so that ExecEndNode() can recurse to clean it up.Return NULL from ExecInit* functions where we detect the invalidation
after opening and locking a relation but before calling ExecInitNode()
to initialize a child plan if there's one at all. Even if we may set
things like ExprContext, TupleTableSlot fields, they are cleaned up
independently of the plan tree anyway via the cleanup called with
es_exprcontexts, es_tupleTable, respectively. I even noticed bits
like this in ExecEnd* functions:- /*
- * Free the exprcontext(s) ... now dead code, see ExecFreeExprContext
- */
-#ifdef NOT_USED
- ExecFreeExprContext(&node->ss.ps);
- if (node->ioss_RuntimeContext)
- FreeExprContext(node->ioss_RuntimeContext, true);
-#endifSo, AFAICS, ExprContext, TupleTableSlot cleanup in ExecNode* functions
is unnecessary but remain around because nobody cared about and got
around to getting rid of it.If we do any of those things, then as far as I
can see 0001 is unnecessary. If we do none of them but also avoid
creating partially initialized nodes by one of the two techniques
mentioned two paragraphs prior, then 0001 is also unnecessary. If we
do none of them but do create partially initialized nodes, then we
need 0001.So if this were a restaurant menu, then it might look like this:
Prix Fixe Menu (choose one from each)
First Course - How do we clean up after partial initialization?
(1) ExecInitNode functions produce partially initialized nodes
(2) ExecInitNode functions get refactored so that the stuff that can
cause early exit always happens first, so that no cleanup is ever
needed
(3) ExecInitNode functions do any required cleanup in situSecond Course - How do we signal that initialization stopped early?
(A) Return NULL.
(B) Add a bool * out-parmeter to ExecInitNode.
(C) Add a Node * out-parameter to ExecInitNode and change the return
value to bool.
(D) Add a bool to the EState.
(E) Something else, maybe.I think that we need 0001 if we choose specifically (1) and (A). My
gut feeling is that the least-invasive way to do this project is to
choose (1) and (D). My second choice would be (1) and (C), and my
third choice would be (1) and (A). If I can't have (1), I think I
prefer (2) over (3), but I also believe I prefer hiding in a deep hole
to either of them. Maybe I'm not seeing the whole picture correctly
here, but both (2) and (3) look awfully painful to me.I think what I've ended up with in the attached 0001 (WIP) is both
(1), (2), and (D). As mentioned above, (D) is implemented with the
ExecPlanStillValid() function.
After removing the unnecessary cleanup code from most node types’ ExecEnd*
functions, one thing I’m tempted to do is remove the functions that do
nothing else but recurse to close the outerPlan, innerPlan child nodes. We
could instead have ExecEndNode() itself recurse to close outerPlan,
innerPlan child nodes at the top, which preserves the
close-child-before-self behavior for Gather* nodes, and close node type
specific cleanup functions for nodes that do have any local cleanup to do.
Perhaps, we could even use planstate_tree_walker() called at the top
instead of the usual bottom so that nodes with a list of child subplans
like Append also don’t need to have their own ExecEnd* functions.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
On Fri, Aug 11, 2023 at 9:50 AM Amit Langote <amitlangote09@gmail.com> wrote:
After removing the unnecessary cleanup code from most node types’ ExecEnd* functions, one thing I’m tempted to do is remove the functions that do nothing else but recurse to close the outerPlan, innerPlan child nodes. We could instead have ExecEndNode() itself recurse to close outerPlan, innerPlan child nodes at the top, which preserves the close-child-before-self behavior for Gather* nodes, and close node type specific cleanup functions for nodes that do have any local cleanup to do. Perhaps, we could even use planstate_tree_walker() called at the top instead of the usual bottom so that nodes with a list of child subplans like Append also don’t need to have their own ExecEnd* functions.
I think 0001 needs to be split up. Like, this is code cleanup:
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
This is providing for NULL pointers where we don't currently:
- list_free_deep(aggstate->hash_batches);
+ if (aggstate->hash_batches)
+ list_free_deep(aggstate->hash_batches);
And this is the early return mechanism per se:
+ if (!ExecPlanStillValid(estate))
+ return aggstate;
I think at least those 3 kinds of changes deserve to be in separate
patches with separate commit messages explaining the rationale behind
each e.g. "Remove unnecessary cleanup calls in ExecEnd* functions.
These calls are no longer required, because <reasons>. Removing them
saves a few CPU cycles and simplifies planned refactoring, so do
that."
--
Robert Haas
EDB: http://www.enterprisedb.com
Thanks for taking a look.
On Mon, Aug 28, 2023 at 10:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Aug 11, 2023 at 9:50 AM Amit Langote <amitlangote09@gmail.com> wrote:
After removing the unnecessary cleanup code from most node types’ ExecEnd* functions, one thing I’m tempted to do is remove the functions that do nothing else but recurse to close the outerPlan, innerPlan child nodes. We could instead have ExecEndNode() itself recurse to close outerPlan, innerPlan child nodes at the top, which preserves the close-child-before-self behavior for Gather* nodes, and close node type specific cleanup functions for nodes that do have any local cleanup to do. Perhaps, we could even use planstate_tree_walker() called at the top instead of the usual bottom so that nodes with a list of child subplans like Append also don’t need to have their own ExecEnd* functions.
I think 0001 needs to be split up. Like, this is code cleanup:
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);This is providing for NULL pointers where we don't currently:
- list_free_deep(aggstate->hash_batches); + if (aggstate->hash_batches) + list_free_deep(aggstate->hash_batches);And this is the early return mechanism per se:
+ if (!ExecPlanStillValid(estate))
+ return aggstate;I think at least those 3 kinds of changes deserve to be in separate
patches with separate commit messages explaining the rationale behind
each e.g. "Remove unnecessary cleanup calls in ExecEnd* functions.
These calls are no longer required, because <reasons>. Removing them
saves a few CPU cycles and simplifies planned refactoring, so do
that."
Breaking up the patch as you describe makes sense, so I've done that:
Attached 0001 removes unnecessary cleanup calls from ExecEnd*() routines.
0002 adds NULLness checks in ExecEnd*() routines on some pointers that
may not be initialized by the corresponding ExecInit*() routines in
the case where it returns early.
0003 adds the early return mechanism based on checking CachedPlan
invalidation, though no CachedPlan is actually passed to the executor
yet, so no functional changes here yet.
Other patches are rebased over these. One significant change is in
0004 which does the refactoring to make the callers of ExecutorStart()
aware that it may now return with a partially initialized planstate
tree that should not be executed. I added a new flag
EState.es_canceled to denote that state of the execution to complement
the existing es_finished. I also needed to add
AfterTriggerCancelQuery() to ensure that we don't attempt to fire a
canceled query's triggers. Most of these changes are needed only to
appease the various Asserts in these parts of the code and I thought
they are warranted given the introduction of a new state of query
execution.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v46-0004-Make-ExecutorStart-return-early-upon-plan-invali.patchapplication/octet-stream; name=v46-0004-Make-ExecutorStart-return-early-upon-plan-invali.patchDownload
From 76a2848e8f70ccbbf9c1844c5f3c49fa728ae169 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 3 Aug 2023 12:34:31 +0900
Subject: [PATCH v46 4/8] Make ExecutorStart() return early upon plan
invalidation
When passing a plan tree from a CachedPlan to the executor,
ExecutorStart() can now return a planstate tree that isn't completely
set up. This scenario occurs if the CachedPlan becomes invalidated while
it's being initialized with ExecInitNode(). Execution must be retried
with a new CachedPlan when that scenario occurs. Partially initilized
EState must be cleaned up by calling ExecutorEnd() and
FreeExecutorState().
ExecutorStart() and ExecutorStart_hook() now return a Boolean telling
the caller if the plan initialization failed.
For the replan loop in that context, it makes more sense to have
ExecutorStart() either in the same scope or closer to where
GetCachedPlan() is invoked. So this commit modifies the following
sites:
* The ExecutorStart() call in ExplainOnePlan() is moved into a new
function ExplainQueryDesc() along with CreateQueryDesc(). Callers
of ExplainOnePlan() should now call the new function first.
* The ExecutorStart() call in _SPI_pquery() is moved to its caller
_SPI_execute_plan().
* The ExecutorStart() call in PortalRunMulti() is moved to
PortalStart(). This requires a new List field in PortalData to
store the QueryDescs created in PortalStart() and a new memory
context for those. One unintended consequence is that
CommandCounterIncrement() between queries in PORTAL_MULTI_QUERY
cases is now done in the loop in PortalStart() and not in
PortalRunMulti(). That still works because the Snapshot registered
in QueryDesc/EState is updated to account for the CCI().
This commit also adds a new flag to EState called es_canceled that
complements es_finished to denote the new scenario where
ExecutorStart() returns with a partially setup planstate tree. Also,
to reset the AFTER trigger state that would have been set up in the
ExecutorStart(), this adds a new function AfterTriggerCancelQuery()
which is called from ExecutorEnd() (not ExecutorFinish()) when
es_canceled is true.
Note that this commit by itself doesn't make any functional change,
because the CachedPlan is not passed into the executor yet.
---
contrib/auto_explain/auto_explain.c | 12 +-
.../pg_stat_statements/pg_stat_statements.c | 12 +-
src/backend/commands/copyto.c | 4 +-
src/backend/commands/createas.c | 8 +-
src/backend/commands/explain.c | 142 ++++---
src/backend/commands/extension.c | 3 +-
src/backend/commands/matview.c | 8 +-
src/backend/commands/portalcmds.c | 5 +-
src/backend/commands/prepare.c | 31 +-
src/backend/commands/trigger.c | 13 +
src/backend/executor/execMain.c | 57 ++-
src/backend/executor/execParallel.c | 3 +-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 4 +-
src/backend/executor/spi.c | 48 ++-
src/backend/tcop/postgres.c | 18 +-
src/backend/tcop/pquery.c | 345 +++++++++---------
src/backend/utils/mmgr/portalmem.c | 9 +
src/include/commands/explain.h | 7 +-
src/include/commands/trigger.h | 1 +
src/include/executor/executor.h | 6 +-
src/include/nodes/execnodes.h | 3 +
src/include/tcop/pquery.h | 2 +-
src/include/utils/portal.h | 2 +
24 files changed, 460 insertions(+), 284 deletions(-)
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index c3ac27ae99..a0630d7944 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -78,7 +78,7 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
-static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void explain_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count, bool execute_once);
@@ -258,9 +258,11 @@ _PG_init(void)
/*
* ExecutorStart hook: start up logging if needed
*/
-static void
+static bool
explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
/*
* At the beginning of each top-level statement, decide whether we'll
* sample this statement. If nested-statement explaining is enabled,
@@ -296,9 +298,9 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
}
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
if (auto_explain_enabled())
{
@@ -316,6 +318,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return plan_valid;
}
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 06b65aeef5..5354dff7d7 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -324,7 +324,7 @@ static PlannedStmt *pgss_planner(Query *parse,
const char *query_string,
int cursorOptions,
ParamListInfo boundParams);
-static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void pgss_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count, bool execute_once);
@@ -961,13 +961,15 @@ pgss_planner(Query *parse,
/*
* ExecutorStart hook: start up tracking if needed
*/
-static void
+static bool
pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
/*
* If query has queryId zero, don't track it. This prevents double
@@ -990,6 +992,8 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return plan_valid;
}
/*
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index eaa3172793..a45489f8f5 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -567,8 +567,10 @@ BeginCopyTo(ParseState *pstate,
* Call ExecutorStart to prepare the plan for execution.
*
* ExecutorStart computes a result tupdesc for us
+ *
+ * OK to ignore the return value; plan can't become invalid.
*/
- ExecutorStart(cstate->queryDesc, 0);
+ (void) ExecutorStart(cstate->queryDesc, 0);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index e91920ca14..167db4cf56 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -329,8 +329,12 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * call ExecutorStart to prepare the plan for execution
+ *
+ * OK to ignore the return value; plan can't become invalid.
+ */
+ (void) ExecutorStart(queryDesc, GetIntoRelEFlags(into));
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..fe9314bc96 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -393,6 +393,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
else
{
PlannedStmt *plan;
+ QueryDesc *queryDesc;
instr_time planstart,
planduration;
BufferUsage bufusage_start,
@@ -415,12 +416,87 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
+ queryDesc = ExplainQueryDesc(plan, queryString, into, es,
+ params, queryEnv);
+ Assert(queryDesc);
+
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(queryDesc, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
+/*
+ * ExplainQueryDesc
+ * Set up QueryDesc for EXPLAINing a given plan
+ */
+QueryDesc *
+ExplainQueryDesc(PlannedStmt *stmt,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv)
+{
+ QueryDesc *queryDesc;
+ DestReceiver *dest;
+ int eflags;
+ int instrument_option = 0;
+
+ /*
+ * Normally we discard the query's output, but if explaining CREATE TABLE
+ * AS, we'd better use the appropriate tuple receiver.
+ */
+ if (into)
+ dest = CreateIntoRelDestReceiver(into);
+ else
+ dest = None_Receiver;
+
+ if (es->analyze && es->timing)
+ instrument_option |= INSTRUMENT_TIMER;
+ else if (es->analyze)
+ instrument_option |= INSTRUMENT_ROWS;
+
+ if (es->buffers)
+ instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
+
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+
+ /* Create a QueryDesc for the query */
+ queryDesc = CreateQueryDesc(stmt, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv, instrument_option);
+
+ /* Select execution options */
+ if (es->analyze)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_EXPLAIN_ONLY;
+ if (es->generic)
+ eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
+ if (into)
+ eflags |= GetIntoRelEFlags(into);
+
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated during plan intialization.
+ */
+ if (!ExecutorStart(queryDesc, eflags))
+ {
+ /* Clean up. */
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ return NULL;
+ }
+
+ return queryDesc;
+}
+
/*
* ExplainOneUtility -
* print out the execution plan for one utility statement
@@ -524,29 +600,16 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
{
- DestReceiver *dest;
- QueryDesc *queryDesc;
instr_time starttime;
double totaltime = 0;
- int eflags;
- int instrument_option = 0;
-
- Assert(plannedstmt->commandType != CMD_UTILITY);
- if (es->analyze && es->timing)
- instrument_option |= INSTRUMENT_TIMER;
- else if (es->analyze)
- instrument_option |= INSTRUMENT_ROWS;
-
- if (es->buffers)
- instrument_option |= INSTRUMENT_BUFFERS;
- if (es->wal)
- instrument_option |= INSTRUMENT_WAL;
+ Assert(queryDesc->plannedstmt->commandType != CMD_UTILITY);
/*
* We always collect timing for the entire statement, even when node-level
@@ -555,40 +618,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
- /*
- * Normally we discard the query's output, but if explaining CREATE TABLE
- * AS, we'd better use the appropriate tuple receiver.
- */
- if (into)
- dest = CreateIntoRelDestReceiver(into);
- else
- dest = None_Receiver;
-
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
-
- /* Select execution options */
- if (es->analyze)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_EXPLAIN_ONLY;
- if (es->generic)
- eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
- if (into)
- eflags |= GetIntoRelEFlags(into);
-
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
-
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -4865,6 +4894,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 535072d181..b702a65e81 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -801,7 +801,8 @@ execute_sql_string(const char *sql)
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
- ExecutorStart(qdesc, 0);
+ /* OK to ignore the return value; plan can't become invalid. */
+ (void) ExecutorStart(qdesc, 0);
ExecutorRun(qdesc, ForwardScanDirection, 0, true);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ac2e74fa3f..7124994a43 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -412,8 +412,12 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, 0);
+ /*
+ * call ExecutorStart to prepare the plan for execution
+ *
+ * OK to ignore the return value; plan can't become invalid.
+ */
+ (void) ExecutorStart(queryDesc, 0);
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 73ed7aa2f0..5120f93414 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -142,9 +142,10 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
/*
* Start execution, inserting parameters if any.
+ *
+ * OK to ignore the return value; plan can't become invalid here.
*/
- PortalStart(portal, params, 0, GetActiveSnapshot());
-
+ (void) PortalStart(portal, params, 0, GetActiveSnapshot());
Assert(portal->strategy == PORTAL_ONE_SELECT);
/*
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..699df429c4 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -183,6 +183,7 @@ ExecuteQuery(ParseState *pstate,
paramLI = EvaluateParams(pstate, entry, stmt->params, estate);
}
+replan:
/* Create a new portal to run the query in */
portal = CreateNewPortal();
/* Don't display the portal in pg_cursors, it is for internal use only */
@@ -251,9 +252,15 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal contains a cached plan, it
+ * must be recreated if the cached plan was found to have been invalidated
+ * when initializing one of the plan trees contained in it.
*/
- PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ if (!PortalStart(portal, paramLI, eflags, GetActiveSnapshot()))
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
(void) PortalRun(portal, count, false, true, dest, dest, qc);
@@ -574,7 +581,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -618,6 +625,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI,
CurrentResourceOwner, queryEnv);
@@ -639,8 +647,21 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ {
+ QueryDesc *queryDesc;
+
+ queryDesc = ExplainQueryDesc(pstmt, queryString,
+ into, es, paramLI, queryEnv);
+ if (queryDesc == NULL)
+ {
+ ExplainResetOutput(es);
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ goto replan;
+ }
+ ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
+ }
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 52177759ab..dd139432b9 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5009,6 +5009,19 @@ AfterTriggerBeginQuery(void)
afterTriggers.query_depth++;
}
+/* ----------
+ * AfterTriggerCancelQuery()
+ *
+ * Called from ExecutorEnd() if the query execution was canceled.
+ * ----------
+ */
+void
+AfterTriggerCancelQuery(void)
+{
+ /* Set to a value denoting that no query is active. */
+ afterTriggers.query_depth = -1;
+}
+
/* ----------
* AfterTriggerEndQuery()
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index f3054cbe7e..88ebfb218b 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -79,7 +79,7 @@ ExecutorEnd_hook_type ExecutorEnd_hook = NULL;
ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
/* decls for local routines only used within this module */
-static void InitPlan(QueryDesc *queryDesc, int eflags);
+static bool InitPlan(QueryDesc *queryDesc, int eflags);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
static void ExecEndPlan(PlanState *planstate, EState *estate);
@@ -119,6 +119,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
*
* eflags contains flag bits as described in executor.h.
*
+ * Plan initialization may fail if the input plan tree is found to have been
+ * invalidated, which can happen if it comes from a CachedPlan.
+ *
+ * Returns true if plan was successfully initialized and false otherwise. If
+ * the latter, the caller must call ExecutorEnd() on 'queryDesc' to clean up
+ * after failed plan initialization.
+ *
* NB: the CurrentMemoryContext when this is called will become the parent
* of the per-query context used for this Executor invocation.
*
@@ -128,7 +135,7 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
*
* ----------------------------------------------------------------
*/
-void
+bool
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
/*
@@ -140,14 +147,15 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
if (ExecutorStart_hook)
- (*ExecutorStart_hook) (queryDesc, eflags);
- else
- standard_ExecutorStart(queryDesc, eflags);
+ return (*ExecutorStart_hook) (queryDesc, eflags);
+
+ return standard_ExecutorStart(queryDesc, eflags);
}
-void
+bool
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
EState *estate;
MemoryContext oldcontext;
@@ -263,9 +271,14 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Initialize the plan state tree
*/
- InitPlan(queryDesc, eflags);
+ plan_valid = InitPlan(queryDesc, eflags);
+
+ /* Mark execution as canceled if plan won't be executed. */
+ estate->es_canceled = !plan_valid;
MemoryContextSwitchTo(oldcontext);
+
+ return plan_valid;
}
/* ----------------------------------------------------------------
@@ -325,6 +338,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
estate = queryDesc->estate;
Assert(estate != NULL);
+ Assert(!estate->es_canceled);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -429,7 +443,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* This should be run once and only once per Executor instance */
- Assert(!estate->es_finished);
+ Assert(!estate->es_finished && !estate->es_canceled);
/* Switch into per-query memory context */
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -488,11 +502,11 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
Assert(estate != NULL);
/*
- * Check that ExecutorFinish was called, unless in EXPLAIN-only mode. This
- * Assert is needed because ExecutorFinish is new as of 9.1, and callers
- * might forget to call it.
+ * Check that ExecutorFinish was called, unless in EXPLAIN-only mode or if
+ * execution was canceled. This Assert is needed because ExecutorFinish is
+ * new as of 9.1, and callers might forget to call it.
*/
- Assert(estate->es_finished ||
+ Assert(estate->es_finished || estate->es_canceled ||
(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -506,6 +520,14 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
UnregisterSnapshot(estate->es_snapshot);
UnregisterSnapshot(estate->es_crosscheck_snapshot);
+ /*
+ * Cancel trigger execution too if the query execution was canceled.
+ */
+ if (estate->es_canceled &&
+ !(estate->es_top_eflags &
+ (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
+ AfterTriggerCancelQuery();
+
/*
* Must switch out of context before destroying it
*/
@@ -829,9 +851,12 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ * Returns true if the plan tree is successfully initialized for execution,
+ * false otherwise.
* ----------------------------------------------------------------
*/
-static void
+static bool
InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
@@ -1014,9 +1039,15 @@ InitPlan(QueryDesc *queryDesc, int eflags)
}
}
+ queryDesc->tupDesc = tupType;
+ Assert(planstate != NULL);
+ queryDesc->planstate = planstate;
+ return true;
+
plan_init_suspended:
queryDesc->tupDesc = tupType;
queryDesc->planstate = planstate;
+ return false;
}
/*
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index cc2b8ccab7..f84a3a17d5 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1430,7 +1430,8 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
- ExecutorStart(queryDesc, fpes->eflags);
+ /* OK to ignore the return value; plan can't become invalid. */
+ (void) ExecutorStart(queryDesc, fpes->eflags);
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c3f7279b06..da8a1511ac 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -151,6 +151,7 @@ CreateExecutorState(void)
estate->es_top_eflags = 0;
estate->es_instrument = 0;
estate->es_finished = false;
+ estate->es_canceled = false;
estate->es_exprcontexts = NIL;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f55424eb5a..8cf0b3132d 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -862,7 +862,9 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
eflags = EXEC_FLAG_SKIP_TRIGGERS;
else
eflags = 0; /* default run-to-completion flags */
- ExecutorStart(es->qd, eflags);
+
+ /* OK to ignore the return value; plan can't become invalid. */
+ (void) ExecutorStart(es->qd, eflags);
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 33975687b3..6a96d7fc22 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -71,7 +71,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1582,6 +1582,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
Snapshot snapshot;
MemoryContext oldcontext;
Portal portal;
+ bool plan_valid;
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -1623,6 +1624,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
_SPI_current->processed = 0;
_SPI_current->tuptable = NULL;
+replan:
/* Create the portal */
if (name == NULL || name[0] == '\0')
{
@@ -1766,15 +1768,23 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if the cached plan was found to have been invalidated when
+ * initializing one of the plan trees contained in it.
*/
- PortalStart(portal, paramLI, 0, snapshot);
+ plan_valid = PortalStart(portal, paramLI, 0, snapshot);
Assert(portal->strategy != PORTAL_MULTI_QUERY);
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
+ if (!plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/* Pop the SPI stack */
_SPI_end_call(true);
@@ -2552,6 +2562,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2661,6 +2672,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2674,8 +2686,23 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+
+ if (!ExecutorStart(qdesc, eflags))
+ {
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ Assert(cplan);
+ ReleaseCachedPlan(cplan, plan_owner);
+ goto replan;
+ }
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2850,10 +2877,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2897,14 +2923,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e4756f8be2..204002cff2 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1232,7 +1232,12 @@ exec_simple_query(const char *query_string)
/*
* Start the portal. No parameters here.
*/
- PortalStart(portal, NULL, 0, InvalidSnapshot);
+ {
+ bool plan_valid PG_USED_FOR_ASSERTS_ONLY;
+
+ plan_valid = PortalStart(portal, NULL, 0, InvalidSnapshot);
+ Assert(plan_valid);
+ }
/*
* Select the appropriate output format: text unless we are doing a
@@ -1737,6 +1742,7 @@ exec_bind_message(StringInfo input_message)
"commands ignored until end of transaction block"),
errdetail_abort()));
+replan:
/*
* Create the portal. Allow silent replacement of an existing portal only
* if the unnamed portal is specified.
@@ -2028,9 +2034,15 @@ exec_bind_message(StringInfo input_message)
PopActiveSnapshot();
/*
- * And we're ready to start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if the cached plan was found to have been invalidated when
+ * initializing one of the plan trees contained in it.
*/
- PortalStart(portal, params, 0, InvalidSnapshot);
+ if (!PortalStart(portal, params, 0, InvalidSnapshot))
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
/*
* Apply the result format requests to the portal.
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5565f200c3..9a96b77f1e 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -35,12 +36,6 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc);
static void FillPortalStore(Portal portal, bool isTopLevel);
static uint64 RunFromStore(Portal portal, ScanDirection direction, uint64 count,
DestReceiver *dest);
@@ -116,86 +111,6 @@ FreeQueryDesc(QueryDesc *qdesc)
}
-/*
- * ProcessQuery
- * Execute a single plannable query within a PORTAL_MULTI_QUERY,
- * PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
- *
- * plan: the plan tree for the query
- * sourceText: the source text of the query
- * params: any parameters needed
- * dest: where to send results
- * qc: where to store the command completion status data.
- *
- * qc may be NULL if caller doesn't want a status string.
- *
- * Must be called in a memory context that will be reset or deleted on
- * error; otherwise the executor's memory usage will be leaked.
- */
-static void
-ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc)
-{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
-
- /*
- * Run the plan to completion.
- */
- ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
-
- /*
- * Build command completion status data, if caller wants one.
- */
- if (qc)
- {
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
- SetQueryCompletion(qc, CMDTAG_SELECT, queryDesc->estate->es_processed);
- break;
- case CMD_INSERT:
- SetQueryCompletion(qc, CMDTAG_INSERT, queryDesc->estate->es_processed);
- break;
- case CMD_UPDATE:
- SetQueryCompletion(qc, CMDTAG_UPDATE, queryDesc->estate->es_processed);
- break;
- case CMD_DELETE:
- SetQueryCompletion(qc, CMDTAG_DELETE, queryDesc->estate->es_processed);
- break;
- case CMD_MERGE:
- SetQueryCompletion(qc, CMDTAG_MERGE, queryDesc->estate->es_processed);
- break;
- default:
- SetQueryCompletion(qc, CMDTAG_UNKNOWN, queryDesc->estate->es_processed);
- break;
- }
- }
-
- /*
- * Now, we close down all the scans and free allocated resources.
- */
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
-
- FreeQueryDesc(queryDesc);
-}
-
/*
* ChoosePortalStrategy
* Select portal execution strategy given the intended statement list.
@@ -426,19 +341,21 @@ FetchStatementTargetList(Node *stmt)
* presently ignored for non-PORTAL_ONE_SELECT portals (it's only intended
* to be used for cursors).
*
- * On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * True is returned if portal is ready to accept PortalRun() calls, and the
+ * result tupdesc (if any) is known. False if the plan tree is no longer
+ * valid, in which case, the caller must retry after generating a new
+ * CachedPlan.
*/
-void
+bool
PortalStart(Portal portal, ParamListInfo params,
int eflags, Snapshot snapshot)
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
- int myeflags;
+ int myeflags = 0;
+ bool plan_valid = true;
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_DEFINED);
@@ -448,15 +365,13 @@ PortalStart(Portal portal, ParamListInfo params,
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -472,6 +387,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -489,8 +406,8 @@ PortalStart(Portal portal, ParamListInfo params,
*/
/*
- * Create QueryDesc in portal's context; for the moment, set
- * the destination to DestNone.
+ * Create QueryDesc in portal->queryContext; for the moment,
+ * set the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
portal->sourceText,
@@ -501,30 +418,51 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti(). */
+ if (portal->strategy == PORTAL_ONE_RETURNING ||
+ portal->strategy == PORTAL_ONE_MOD_WITH)
+ portal->qdescs = list_make1(queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Call ExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated during plan intialization.
*/
- ExecutorStart(queryDesc, myeflags);
+ if (!ExecutorStart(queryDesc, myeflags))
+ {
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ plan_valid = false;
+ goto plan_init_failed;
+ }
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -536,29 +474,6 @@ PortalStart(Portal portal, ParamListInfo params,
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -581,7 +496,81 @@ PortalStart(Portal portal, ParamListInfo params,
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool first = true;
+
+ myeflags = eflags;
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /*
+ * Push the snapshot to be used by the executor.
+ */
+ if (!is_utility)
+ {
+ /*
+ * Must copy the snapshot for all statements
+ * except thec first as we'll need to update its
+ * command ID.
+ */
+ if (!first)
+ PushCopiedSnapshot(GetTransactionSnapshot());
+ else
+ PushActiveSnapshot(GetTransactionSnapshot());
+ }
+
+ /*
+ * From the 2nd statement onwards, update the command
+ * ID and the snapshot to match.
+ */
+ if (!first)
+ {
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+ }
+
+ first = false;
+
+ /*
+ * Create the QueryDesc. DestReceiver will be set in
+ * PortalRunMulti() before calling ExecutorRun().
+ */
+ queryDesc = CreateQueryDesc(plan,
+ portal->sourceText,
+ !is_utility ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ if (is_utility)
+ continue;
+
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated
+ * during plan intialization.
+ */
+ if (!ExecutorStart(queryDesc, myeflags))
+ {
+ PopActiveSnapshot();
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ plan_valid = false;
+ goto plan_init_failed;
+ }
+ PopActiveSnapshot();
+ }
+ }
+
portal->tupDesc = NULL;
break;
}
@@ -594,19 +583,20 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+plan_init_failed:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
- portal->status = PORTAL_READY;
+ return plan_valid;
}
/*
@@ -1193,7 +1183,7 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1204,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = (QueryDesc *) lfirst(qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1233,33 +1224,26 @@ PortalRunMulti(Portal portal,
if (log_executor_stats)
ResetUsage();
- /*
- * Must always have a snapshot for plannable queries. First time
- * through, take a new snapshot; for subsequent queries in the
- * same portal, just update the snapshot's copy of the command
- * counter.
- */
+ /* Push the snapshot for plannable queries. */
if (!active_snapshot_set)
{
- Snapshot snapshot = GetTransactionSnapshot();
+ Snapshot snapshot = qdesc->snapshot;
- /* If told to, register the snapshot and save in portal */
+ /*
+ * If told to, register the snapshot and save in portal
+ *
+ * Note that the command ID of qdesc->snapshot for 2nd query
+ * onwards would have been updated in PortalStart() to account
+ * for CCI() done between queries, but it's OK that here we
+ * don't likewise update holdSnapshot's command ID.
+ */
if (setHoldSnapshot)
{
snapshot = RegisterSnapshot(snapshot);
portal->holdSnapshot = snapshot;
}
- /*
- * We can't have the holdSnapshot also be the active one,
- * because UpdateActiveSnapshotCommandId would complain. So
- * force an extra snapshot copy. Plain PushActiveSnapshot
- * would have copied the transaction snapshot anyway, so this
- * only adds a copy step when setHoldSnapshot is true. (It's
- * okay for the command ID of the active snapshot to diverge
- * from what holdSnapshot has.)
- */
- PushCopiedSnapshot(snapshot);
+ PushActiveSnapshot(snapshot);
/*
* As for PORTAL_ONE_SELECT portals, it does not seem
@@ -1268,26 +1252,39 @@ PortalRunMulti(Portal portal,
active_snapshot_set = true;
}
- else
- UpdateActiveSnapshotCommandId();
+ /*
+ * Run the plan to completion.
+ */
+ qdesc->dest = dest;
+ ExecutorRun(qdesc, ForwardScanDirection, 0, true);
+
+ /*
+ * Build command completion status data if needed.
+ */
if (pstmt->canSetTag)
{
- /* statement can set tag string */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- dest, qc);
- }
- else
- {
- /* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- altdest, NULL);
+ switch (qdesc->operation)
+ {
+ case CMD_SELECT:
+ SetQueryCompletion(qc, CMDTAG_SELECT, qdesc->estate->es_processed);
+ break;
+ case CMD_INSERT:
+ SetQueryCompletion(qc, CMDTAG_INSERT, qdesc->estate->es_processed);
+ break;
+ case CMD_UPDATE:
+ SetQueryCompletion(qc, CMDTAG_UPDATE, qdesc->estate->es_processed);
+ break;
+ case CMD_DELETE:
+ SetQueryCompletion(qc, CMDTAG_DELETE, qdesc->estate->es_processed);
+ break;
+ case CMD_MERGE:
+ SetQueryCompletion(qc, CMDTAG_MERGE, qdesc->estate->es_processed);
+ break;
+ default:
+ SetQueryCompletion(qc, CMDTAG_UNKNOWN, qdesc->estate->es_processed);
+ break;
+ }
}
if (log_executor_stats)
@@ -1342,12 +1339,12 @@ PortalRunMulti(Portal portal,
if (portal->stmts == NIL)
break;
- /*
- * Increment command counter between queries, but not after the last
- * one.
- */
- if (lnext(portal->stmts, stmtlist_item) != NULL)
- CommandCounterIncrement();
+ if (qdesc->estate)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ }
+ FreeQueryDesc(qdesc);
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 06dfa85f04..0cad450dcd 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -201,6 +201,13 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /*
+ * initialize portal's query context to store QueryDescs created during
+ * PortalStart() and then used in PortalRun().
+ */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -224,6 +231,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -594,6 +602,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3d3e632a0c..37554727ee 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -88,7 +88,11 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv);
+extern void ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
@@ -104,6 +108,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 430e3ca7dd..d4f7c29301 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -257,6 +257,7 @@ extern void ExecASTruncateTriggers(EState *estate,
extern void AfterTriggerBeginXact(void);
extern void AfterTriggerBeginQuery(void);
+extern void AfterTriggerCancelQuery(void);
extern void AfterTriggerEndQuery(EState *estate);
extern void AfterTriggerFireDeferred(void);
extern void AfterTriggerEndXact(bool isCommit);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 72cbf120c5..10c5cda169 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -73,7 +73,7 @@
/* Hook for plugins to get control in ExecutorStart() */
-typedef void (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
+typedef bool (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
/* Hook for plugins to get control in ExecutorRun() */
@@ -198,8 +198,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
-extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
-extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count, bool execute_once);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index b2a576b76d..0922be6678 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -670,6 +670,9 @@ typedef struct EState
int es_top_eflags; /* eflags passed to ExecutorStart */
int es_instrument; /* OR of InstrumentOption flags */
bool es_finished; /* true when ExecutorFinish is done */
+ bool es_canceled; /* true when execution was canceled
+ * upon encountering that plan was invalided
+ * during ExecInitNode() */
List *es_exprcontexts; /* List of ExprContexts within EState */
diff --git a/src/include/tcop/pquery.h b/src/include/tcop/pquery.h
index a5e65b98aa..577b81a9ee 100644
--- a/src/include/tcop/pquery.h
+++ b/src/include/tcop/pquery.h
@@ -29,7 +29,7 @@ extern List *FetchPortalTargetList(Portal portal);
extern List *FetchStatementTargetList(Node *stmt);
-extern void PortalStart(Portal portal, ParamListInfo params,
+extern bool PortalStart(Portal portal, ParamListInfo params,
int eflags, Snapshot snapshot);
extern void PortalSetResultFormat(Portal portal, int nFormats,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aa08b1e0fc..af059e30f8 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,8 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
--
2.35.3
v46-0006-Set-inFromCl-to-false-in-child-table-RTEs.patchapplication/octet-stream; name=v46-0006-Set-inFromCl-to-false-in-child-table-RTEs.patchDownload
From 7f6ec474c66c75124c48c62a7fc5d68d3750cc37 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:43 +0900
Subject: [PATCH v46 6/8] Set inFromCl to false in child table RTEs
This is to allow the executor be able to distinguish tables that are
directly mentioned in the query from those that get added to the
query during planning. A subsequent commit will teach the executor
to lock only the tables of the latter kind when executing a cached
plan.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.comk
---
src/backend/optimizer/util/inherit.c | 6 ++++++
src/backend/parser/analyze.c | 7 +++----
src/include/nodes/parsenodes.h | 9 +++++++--
3 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 94de855a22..9bac07bf40 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -492,6 +492,12 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
}
else
childrte->inh = false;
+ /*
+ * Mark child tables as not being directly mentioned in the query. This
+ * allows the executor's ExecGetRangeTableRelation() to conveniently
+ * identify it as an inheritance child table.
+ */
+ childrte->inFromCl = false;
childrte->securityQuals = NIL;
/*
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 7a1dfb6364..cf269f8c53 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -3305,10 +3305,9 @@ transformLockingClause(ParseState *pstate, Query *qry, LockingClause *lc,
/*
* Lock all regular tables used in query and its subqueries. We
* examine inFromCl to exclude auto-added RTEs, particularly NEW/OLD
- * in rules. This is a bit of an abuse of a mostly-obsolete flag, but
- * it's convenient. We can't rely on the namespace mechanism that has
- * largely replaced inFromCl, since for example we need to lock
- * base-relation RTEs even if they are masked by upper joins.
+ * in rules. We can't rely on the namespace mechanism since for
+ * example we need to lock base-relation RTEs even if they are masked
+ * by upper joins.
*/
i = 0;
foreach(rt, qry->rtable)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index fef4c714b8..d875e11192 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -994,11 +994,16 @@ typedef struct PartitionCmd
*
* inFromCl marks those range variables that are listed in the FROM clause.
* It's false for RTEs that are added to a query behind the scenes, such
- * as the NEW and OLD variables for a rule, or the subqueries of a UNION.
+ * as the NEW and OLD variables for a rule, or the subqueries of a UNION,
+ * or the RTEs of inheritance child tables that are added by the planner.
* This flag is not used during parsing (except in transformLockingClause,
* q.v.); the parser now uses a separate "namespace" data structure to
* control visibility. But it is needed by ruleutils.c to determine
- * whether RTEs should be shown in decompiled queries.
+ * whether RTEs should be shown in decompiled queries. It is used by the
+ * executor to determine that a given RTE_RELATION entry belongs to a table
+ * directly mentioned in the query or to a child table added by the planner.
+ * It needs to know that for the case where the child tables in a plan need
+ * to be locked.
*
* securityQuals is a list of security barrier quals (boolean expressions),
* to be tested in the listed order before returning a row from the
--
2.35.3
v46-0005-Add-field-to-store-parent-relids-to-Append-Merge.patchapplication/octet-stream; name=v46-0005-Add-field-to-store-parent-relids-to-Append-Merge.patchDownload
From 766003a0342fb2eb659c5c8280cead5a74053c22 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:31 +0900
Subject: [PATCH v46 5/8] Add field to store parent relids to
Append/MergeAppend
There's no way currently in the executor to tell if the child
subplans of Append/MergeAppend are scanning partitions, and if
they indeed do, what the RT indexes of their parent/ancestor tables
are. Executor doesn't need to see their RT indexes except for
run-time pruning, in which case they can can be found in the
PartitionPruneInfo, but a future commit will create a need for
them to be available at all times for the purpose of locking
those parent/ancestor tables when executing a cached plan.
The code to look up partitioned parent relids for a given list of
partition scan subpaths of an Append/MergeAppend is already present
in make_partition_pruneinfo() but it's local to partprune.c. This
commit refactors that code into its own function called
add_append_subpath_partrelids() defined in appendinfo.c and
generalizes it to consider child join and aggregate paths. To
facilitate looking up of parent rels of child grouping rels in
add_append_subpath_partrelids(), parent links are now also set in
the RelOptInfos of child grouping rels too, like they are in
those of child base and join rels.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/optimizer/plan/createplan.c | 41 ++++++--
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/plan/setrefs.c | 4 +
src/backend/optimizer/util/appendinfo.c | 134 ++++++++++++++++++++++++
src/backend/partitioning/partprune.c | 124 +++-------------------
src/include/nodes/plannodes.h | 14 +++
src/include/optimizer/appendinfo.h | 3 +
src/include/partitioning/partprune.h | 3 +-
8 files changed, 203 insertions(+), 123 deletions(-)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 34ca6d4ac2..d1f4f606bf 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -25,6 +25,7 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
@@ -1229,6 +1230,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
Oid *nodeCollations = NULL;
bool *nodeNullsFirst = NULL;
bool consider_async = false;
+ List *allpartrelids = NIL;
/*
* The subpaths list could be empty, if every child was proven empty by
@@ -1370,15 +1372,23 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
++nasyncplans;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ plan->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1399,7 +1409,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
partpruneinfo =
make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
plan->appendplans = subplans;
@@ -1445,6 +1456,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ List *allpartrelids = NIL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1534,15 +1546,23 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplan = (Plan *) sort;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ node->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1554,7 +1574,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
partpruneinfo = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
node->mergeplans = subplans;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 44efb1f4eb..f97bc09113 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7855,8 +7855,11 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
agg_costs, gd, &child_extra,
&child_partially_grouped_rel);
+ /* Mark as child of grouped_rel. */
+ child_grouped_rel->parent = grouped_rel;
if (child_partially_grouped_rel)
{
+ child_partially_grouped_rel->parent = grouped_rel;
partially_grouped_live_children =
lappend(partially_grouped_live_children,
child_partially_grouped_rel);
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 97fa561e4e..854dd7c8af 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1766,6 +1766,8 @@ set_append_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) aplan, rtoffset);
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
+ foreach(l, aplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (aplan->part_prune_info)
{
@@ -1842,6 +1844,8 @@ set_mergeappend_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) mplan, rtoffset);
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
+ foreach(l, mplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (mplan->part_prune_info)
{
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index f456b3b0a4..5bd8e82b9b 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -41,6 +41,7 @@ static void make_inh_translation_list(Relation oldrelation,
AppendRelInfo *appinfo);
static Node *adjust_appendrel_attrs_mutator(Node *node,
adjust_appendrel_attrs_context *context);
+static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
/*
@@ -1035,3 +1036,136 @@ distribute_row_identity_vars(PlannerInfo *root)
}
}
}
+
+/*
+ * add_append_subpath_partrelids
+ * Look up a child subpath's rel's partitioned parent relids up to
+ * parentrel and add the bitmapset containing those into
+ * 'allpartrelids'
+ */
+List *
+add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids)
+{
+ RelOptInfo *prel = subpath->parent;
+ Relids partrelids = NULL;
+
+ /* Nothing to do if there's no parent to begin with. */
+ if (!IS_OTHER_REL(prel))
+ return allpartrelids;
+
+ /*
+ * Traverse up to the pathrel's topmost partitioned parent, collecting
+ * parent relids as we go; but stop if we reach parentrel. (Normally, a
+ * pathrel's topmost partitioned parent is either parentrel or a UNION ALL
+ * appendrel child of parentrel. But when handling partitionwise joins of
+ * multi-level partitioning trees, we can see an append path whose
+ * parentrel is an intermediate partitioned table.)
+ */
+ do
+ {
+ Relids parent_relids = NULL;
+
+ /*
+ * For simple child rels, we can simply set the parent_relids to
+ * prel->parent->relids. But for partitionwise join and aggregate
+ * child rels, while we can use prel->parent to move up the tree,
+ * parent_relids must be found the hard way through AppendInfoInfos,
+ * because 1) a joinrel's relids may point to RTE_JOIN entries,
+ * 2) topmost parent grouping rel's relids field is NULL.
+ */
+ if (IS_SIMPLE_REL(prel))
+ {
+ prel = prel->parent;
+ /* Stop once we reach the root partitioned rel. */
+ if (!IS_PARTITIONED_REL(prel))
+ break;
+ parent_relids = bms_add_members(parent_relids, prel->relids);
+ }
+ else
+ {
+ AppendRelInfo **appinfos;
+ int nappinfos,
+ i;
+
+ appinfos = find_appinfos_by_relids(root, prel->relids,
+ &nappinfos);
+ for (i = 0; i < nappinfos; i++)
+ {
+ AppendRelInfo *appinfo = appinfos[i];
+
+ parent_relids = bms_add_member(parent_relids,
+ appinfo->parent_relid);
+ }
+ pfree(appinfos);
+ prel = prel->parent;
+ }
+ /* accept this level as an interesting parent */
+ partrelids = bms_add_members(partrelids, parent_relids);
+ if (prel == parentrel)
+ break; /* don't traverse above parentrel */
+ } while (IS_OTHER_REL(prel));
+
+ if (partrelids == NULL)
+ return allpartrelids;
+
+ return add_part_relids(allpartrelids, partrelids);
+}
+
+/*
+ * add_part_relids
+ * Add new info to a list of Bitmapsets of partitioned relids.
+ *
+ * Within 'allpartrelids', there is one Bitmapset for each topmost parent
+ * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
+ * parent as well as its relevant non-leaf child partitions. Since (by
+ * construction of the rangetable list) parent partitions must have lower
+ * RT indexes than their children, we can distinguish the topmost parent
+ * as being the lowest set bit in the Bitmapset.
+ *
+ * 'partrelids' contains the RT indexes of a parent partitioned rel, and
+ * possibly some non-leaf children, that are newly identified as parents of
+ * some subpath rel passed to make_partition_pruneinfo(). These are added
+ * to an appropriate member of 'allpartrelids'.
+ *
+ * Note that the list contains only RT indexes of partitioned tables that
+ * are parents of some scan-level relation appearing in the 'subpaths' that
+ * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
+ * not allowed to be higher than the 'parentrel' associated with the append
+ * path. In this way, we avoid expending cycles on partitioned rels that
+ * can't contribute useful pruning information for the problem at hand.
+ * (It is possible for 'parentrel' to be a child partitioned table, and it
+ * is also possible for scan-level relations to be child partitioned tables
+ * rather than leaf partitions. Hence we must construct this relation set
+ * with reference to the particular append path we're dealing with, rather
+ * than looking at the full partitioning structure represented in the
+ * RelOptInfos.)
+ */
+static List *
+add_part_relids(List *allpartrelids, Bitmapset *partrelids)
+{
+ Index targetpart;
+ ListCell *lc;
+
+ /* We can easily get the lowest set bit this way: */
+ targetpart = bms_next_member(partrelids, -1);
+ Assert(targetpart > 0);
+
+ /* Look for a matching topmost parent */
+ foreach(lc, allpartrelids)
+ {
+ Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
+ Index currtarget = bms_next_member(currpartrelids, -1);
+
+ if (targetpart == currtarget)
+ {
+ /* Found a match, so add any new RT indexes to this hierarchy */
+ currpartrelids = bms_add_members(currpartrelids, partrelids);
+ lfirst(lc) = currpartrelids;
+ return allpartrelids;
+ }
+ }
+ /* No match, so add the new partition hierarchy to the list */
+ return lappend(allpartrelids, partrelids);
+}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 7179b22a05..213512a5f4 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -138,7 +138,6 @@ typedef struct PruneStepResult
} PruneStepResult;
-static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
RelOptInfo *parentrel,
List *prunequal,
@@ -218,33 +217,32 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'allpartrelids' contains Bitmapsets of RT indexes of partitioned parents
+ * whose partitions' Paths are in 'subpaths'; there's one Bitmapset for every
+ * partition tree involved.
*/
PartitionPruneInfo *
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ List *allpartrelids)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
- List *allpartrelids;
List *prunerelinfos;
int *relid_subplan_map;
ListCell *lc;
int i;
+ Assert(list_length(allpartrelids) > 0);
+
/*
- * Scan the subpaths to see which ones are scans of partition child
- * relations, and identify their parent partitioned rels. (Note: we must
- * restrict the parent partitioned rels to be parentrel or children of
- * parentrel, otherwise we couldn't translate prunequal to match.)
- *
- * Also construct a temporary array to map from partition-child-relation
- * relid to the index in 'subpaths' of the scan plan for that partition.
+ * Construct a temporary array to map from partition-child-relation relid
+ * to the index in 'subpaths' of the scan plan for that partition.
* (Use of "subplan" rather than "subpath" is a bit of a misnomer, but
* we'll let it stand.) For convenience, we use 1-based indexes here, so
* that zero can represent an un-filled array entry.
*/
- allpartrelids = NIL;
relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
i = 1;
@@ -253,50 +251,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Path *path = (Path *) lfirst(lc);
RelOptInfo *pathrel = path->parent;
- /* We don't consider partitioned joins here */
- if (pathrel->reloptkind == RELOPT_OTHER_MEMBER_REL)
- {
- RelOptInfo *prel = pathrel;
- Bitmapset *partrelids = NULL;
-
- /*
- * Traverse up to the pathrel's topmost partitioned parent,
- * collecting parent relids as we go; but stop if we reach
- * parentrel. (Normally, a pathrel's topmost partitioned parent
- * is either parentrel or a UNION ALL appendrel child of
- * parentrel. But when handling partitionwise joins of
- * multi-level partitioning trees, we can see an append path whose
- * parentrel is an intermediate partitioned table.)
- */
- do
- {
- AppendRelInfo *appinfo;
-
- Assert(prel->relid < root->simple_rel_array_size);
- appinfo = root->append_rel_array[prel->relid];
- prel = find_base_rel(root, appinfo->parent_relid);
- if (!IS_PARTITIONED_REL(prel))
- break; /* reached a non-partitioned parent */
- /* accept this level as an interesting parent */
- partrelids = bms_add_member(partrelids, prel->relid);
- if (prel == parentrel)
- break; /* don't traverse above parentrel */
- } while (prel->reloptkind == RELOPT_OTHER_MEMBER_REL);
-
- if (partrelids)
- {
- /*
- * Found some relevant parent partitions, which may or may not
- * overlap with partition trees we already found. Add new
- * information to the allpartrelids list.
- */
- allpartrelids = add_part_relids(allpartrelids, partrelids);
- /* Also record the subplan in relid_subplan_map[] */
- /* No duplicates please */
- Assert(relid_subplan_map[pathrel->relid] == 0);
- relid_subplan_map[pathrel->relid] = i;
- }
- }
+ /* No duplicates please */
+ Assert(relid_subplan_map[pathrel->relid] == 0);
+ relid_subplan_map[pathrel->relid] = i;
i++;
}
@@ -362,63 +319,6 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return pruneinfo;
}
-/*
- * add_part_relids
- * Add new info to a list of Bitmapsets of partitioned relids.
- *
- * Within 'allpartrelids', there is one Bitmapset for each topmost parent
- * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
- * parent as well as its relevant non-leaf child partitions. Since (by
- * construction of the rangetable list) parent partitions must have lower
- * RT indexes than their children, we can distinguish the topmost parent
- * as being the lowest set bit in the Bitmapset.
- *
- * 'partrelids' contains the RT indexes of a parent partitioned rel, and
- * possibly some non-leaf children, that are newly identified as parents of
- * some subpath rel passed to make_partition_pruneinfo(). These are added
- * to an appropriate member of 'allpartrelids'.
- *
- * Note that the list contains only RT indexes of partitioned tables that
- * are parents of some scan-level relation appearing in the 'subpaths' that
- * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
- * not allowed to be higher than the 'parentrel' associated with the append
- * path. In this way, we avoid expending cycles on partitioned rels that
- * can't contribute useful pruning information for the problem at hand.
- * (It is possible for 'parentrel' to be a child partitioned table, and it
- * is also possible for scan-level relations to be child partitioned tables
- * rather than leaf partitions. Hence we must construct this relation set
- * with reference to the particular append path we're dealing with, rather
- * than looking at the full partitioning structure represented in the
- * RelOptInfos.)
- */
-static List *
-add_part_relids(List *allpartrelids, Bitmapset *partrelids)
-{
- Index targetpart;
- ListCell *lc;
-
- /* We can easily get the lowest set bit this way: */
- targetpart = bms_next_member(partrelids, -1);
- Assert(targetpart > 0);
-
- /* Look for a matching topmost parent */
- foreach(lc, allpartrelids)
- {
- Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
- Index currtarget = bms_next_member(currpartrelids, -1);
-
- if (targetpart == currtarget)
- {
- /* Found a match, so add any new RT indexes to this hierarchy */
- currpartrelids = bms_add_members(currpartrelids, partrelids);
- lfirst(lc) = currpartrelids;
- return allpartrelids;
- }
- }
- /* No match, so add the new partition hierarchy to the list */
- return lappend(allpartrelids, partrelids);
-}
-
/*
* make_partitionedrel_pruneinfo
* Build a List of PartitionedRelPruneInfos, one for each interesting
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 1b787fe031..7a5f3ba625 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -267,6 +267,13 @@ typedef struct Append
List *appendplans;
int nasyncplans; /* # of asynchronous plans */
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * appendplans. The list contains a bitmapset for every partition tree
+ * covered by this Append.
+ */
+ List *allpartrelids;
+
/*
* All 'appendplans' preceding this index are non-partial plans. All
* 'appendplans' from this index onwards are partial plans.
@@ -291,6 +298,13 @@ typedef struct MergeAppend
List *mergeplans;
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * mergeplans. The list contains a bitmapset for every partition tree
+ * covered by this MergeAppend.
+ */
+ List *allpartrelids;
+
/* these fields are just like the sort-key info in struct Sort: */
/* number of sort-key columns */
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index a05f91f77d..1621a7319a 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -46,5 +46,8 @@ extern void add_row_identity_columns(PlannerInfo *root, Index rtindex,
RangeTblEntry *target_rte,
Relation target_relation);
extern void distribute_row_identity_vars(PlannerInfo *root);
+extern List *add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids);
#endif /* APPENDINFO_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 8636e04e37..caa774a111 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -73,7 +73,8 @@ typedef struct PartitionPruneContext
extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ List *allpartrelids);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v46-0008-Track-opened-range-table-relations-in-a-List-in-.patchapplication/octet-stream; name=v46-0008-Track-opened-range-table-relations-in-a-List-in-.patchDownload
From 5da59305b0000098cabf508f7c0e4a4a74a0c11a Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:49 +0900
Subject: [PATCH v46 8/8] Track opened range table relations in a List in
EState
This makes ExecCloseRangeTableRelations faster when there are many
relations in the range table but only a few are opened during
execution, such as when run-time pruning kicks in on an Append
containing thousands of partition subplans.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 9 +++++----
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 2 ++
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 09a104f0a3..6a010b74df 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1650,12 +1650,13 @@ ExecCloseResultRelations(EState *estate)
void
ExecCloseRangeTableRelations(EState *estate)
{
- int i;
+ ListCell *lc;
- for (i = 0; i < estate->es_range_table_size; i++)
+ foreach(lc, estate->es_opened_relations)
{
- if (estate->es_relations[i])
- table_close(estate->es_relations[i], NoLock);
+ Relation rel = lfirst(lc);
+
+ table_close(rel, NoLock);
}
}
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 94c8e5e875..3d1d467807 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -812,6 +812,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
}
estate->es_relations[rti - 1] = rel;
+ estate->es_opened_relations = lappend(estate->es_opened_relations,
+ rel);
}
return rel;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0922be6678..fba1527792 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,8 @@ typedef struct EState
Index es_range_table_size; /* size of the range table arrays */
Relation *es_relations; /* Array of per-range-table-entry Relation
* pointers, or NULL if not yet opened */
+ List *es_opened_relations; /* List of non-NULL entries in
+ * es_relations in no specific order */
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
--
2.35.3
v46-0007-Delay-locking-of-child-tables-in-cached-plans-un.patchapplication/octet-stream; name=v46-0007-Delay-locking-of-child-tables-in-cached-plans-un.patchDownload
From 8c561798798243d972ae50b3c46712c4c077876c Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Tue, 4 Jul 2023 22:36:45 +0900
Subject: [PATCH v46 7/8] Delay locking of child tables in cached plans until
ExecutorStart()
Currently, GetCachedPlan() takes a lock on all relations contained in
a cached plan before returning it as a valid plan to its callers for
execution. One disadvantage is that if the plan contains partitions
that are prunable with conditions involving EXTERN parameters and
other stable expressions (known as "initial pruning"), many of them
would be locked unnecessarily, because only those that survive
initial pruning need to have been locked. Locking all partitions this
way causes significant delay when there are many partitions. Note
that initial pruning occurs during executor's initialization of the
plan, that is, ExecInitNode().
This commit rearranges things to move the locking of child tables
referenced in a cached plan to occur during ExecInitNode() so that
initial pruning in the ExecInitNode() subroutines of the plan nodes
that support pruning can eliminate any child tables that need not be
scanned and thus locked.
To determine that a given table is a child table,
ExecGetRangeTableRelation() now looks at the RTE's inFromCl field,
which is only true for tables that are directly mentioned in the
query but false for child tables. Note that any tables whose RTEs'
inFromCl is true would already have been locked by GetCachedPlan(),
so need not be locked again during execution.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/commands/copyto.c | 3 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 8 +-
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 2 +-
src/backend/executor/README | 39 ++++-
src/backend/executor/execMain.c | 20 ++-
src/backend/executor/execParallel.c | 9 +-
src/backend/executor/execPartition.c | 10 ++
src/backend/executor/execUtils.c | 61 +++++--
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeAppend.c | 19 +++
src/backend/executor/nodeMergeAppend.c | 19 +++
src/backend/executor/spi.c | 1 +
src/backend/storage/lmgr/lmgr.c | 45 +++++
src/backend/tcop/pquery.c | 7 +-
src/backend/utils/cache/lsyscache.c | 21 +++
src/backend/utils/cache/plancache.c | 157 +++++++-----------
src/include/commands/explain.h | 2 +-
src/include/executor/execdesc.h | 4 +
src/include/executor/executor.h | 1 +
src/include/storage/lmgr.h | 1 +
src/include/utils/lsyscache.h | 1 +
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 67 +++++++-
.../expected/cached-plan-replan.out | 156 +++++++++++++++++
.../specs/cached-plan-replan.spec | 61 +++++++
28 files changed, 592 insertions(+), 131 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-replan.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-replan.spec
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index a45489f8f5..ab8bf0df72 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,8 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL,
+ pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 167db4cf56..e5cce4c07c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index fe9314bc96..6171a20fe2 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -416,7 +416,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
- queryDesc = ExplainQueryDesc(plan, queryString, into, es,
+ queryDesc = ExplainQueryDesc(plan, NULL, queryString, into, es,
params, queryEnv);
Assert(queryDesc);
@@ -429,9 +429,11 @@ ExplainOneQuery(Query *query, int cursorOptions,
/*
* ExplainQueryDesc
* Set up QueryDesc for EXPLAINing a given plan
+ *
+ * This returns NULL if cplan is found to be no longer valid.
*/
QueryDesc *
-ExplainQueryDesc(PlannedStmt *stmt,
+ExplainQueryDesc(PlannedStmt *stmt, CachedPlan *cplan,
const char *queryString, IntoClause *into, ExplainState *es,
ParamListInfo params, QueryEnvironment *queryEnv)
{
@@ -467,7 +469,7 @@ ExplainQueryDesc(PlannedStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(stmt, queryString,
+ queryDesc = CreateQueryDesc(stmt, cplan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index b702a65e81..93a683e312 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -797,6 +797,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 7124994a43..38795ce7ca 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 699df429c4..156c3c5fee 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -650,7 +650,7 @@ replan:
{
QueryDesc *queryDesc;
- queryDesc = ExplainQueryDesc(pstmt, queryString,
+ queryDesc = ExplainQueryDesc(pstmt, cplan, queryString,
into, es, paramLI, queryEnv);
if (queryDesc == NULL)
{
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..0a7bb42ccb 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,37 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Normally, the executor does not lock non-index relations appearing in a given
+plan tree when initializing it for execution if the plan tree is freshly
+created, that is, not derived from a CachedPlan. The reason for that is that
+the locks must already have been taken during parsing, rewriting, and planning
+of the query in that case. If the plan tree is a cached one, there may still
+be unlocked relations present in the plan tree, because GetCachedPlan() only
+locks the relations that would be present in the query's range table before
+planning occurs, but not relations that would have been added to the range
+table during planning. This means that inheritance child tables present in
+a cached plan, which are added to the query's range table during planning,
+would not have been locked when the plan enters the executor.
+
+GetCachedPlan() punts on locking child tables because not all may actually be
+scanned during a given execution of the plan if the child tables are partitions
+which may get pruned away due to execution-initialization-time pruning. So the
+locking of child tables is made to wait till execution-initialization-time,
+which occurs during ExecInitNode() on the plan nodes containing the child
+tables.
+
+So, there's a time window during which a cached plan tree could go stale
+if it contains child tables, because they could get changed in other backends
+before ExecInitNode() gets a lock on them. This means the executor now must
+check the validity of the plan tree every time it takes a lock on a child
+table contained in the tree after execution-initialization-pruning has been
+performed. It does that by looking at CachedPlan.is_valid of the CachedPlan
+passed to it. If the plan tree is indeed stale (is_valid=false), the executor
+must give up continuing to initialize it any further and return to the caller
+letting it know that the execution must be retried with a new plan tree.
Query Processing Control Flow
-----------------------------
@@ -316,7 +347,13 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale during one of the recursive calls of ExecInitNode() after taking a
+lock on a child table, the control is immmediately returned to the caller of
+ExecutorStart(), which must redo the steps from CreateQueryDesc with a new
+plan tree.
+
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 88ebfb218b..09a104f0a3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -642,6 +642,17 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
RTEPermissionInfo *perminfo = lfirst_node(RTEPermissionInfo, l);
Assert(OidIsValid(perminfo->relid));
+
+ /*
+ * Relations whose permissions need to be checked must already have
+ * been locked by the parser or by GetCachedPlan() if a cached plan is
+ * being executed.
+ *
+ * XXX shouldn't we skip calling ExecCheckPermissions from InitPlan
+ * in a parallel worker?
+ */
+ Assert(CheckRelLockedByMe(perminfo->relid, AccessShareLock, true) ||
+ IsParallelWorker());
result = ExecCheckOneRelPerms(perminfo);
if (!result)
{
@@ -875,12 +886,12 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
/*
- * initialize the node's execution state
+ * Set up range table in EState.
*/
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
- estate->es_cachedplan = NULL;
+ estate->es_cachedplan = queryDesc->cplan;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
@@ -1465,7 +1476,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked by the planner or ExecLockAppendNonLeafRelations().
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
@@ -2897,7 +2908,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f84a3a17d5..209f618a07 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1248,8 +1248,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Set up a QueryDesc for the query. While the leader might've sourced
+ * the plan tree from a CachedPlan, we don't have one here. This isn't
+ * an issue since the leader ensured the required locks, making our
+ * plan tree valid. Even as we get our own lock copies in
+ * ExecGetRangeTableRelation(), they're all already held by the leader.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index e88455368c..cf73d28baa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -513,6 +513,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
oldcxt = MemoryContextSwitchTo(proute->memcxt);
+ /*
+ * Note that while we normally check ExecPlanStillValid(estate) after each
+ * lock taken during execution initialization, it is fine not do so for
+ * partitions opened here, for tuple routing. Locks taken here can't
+ * possibly invalidate the plan given that the plan doesn't contain any
+ * info about those partitions.
+ */
partrel = table_open(partOid, RowExclusiveLock);
leaf_part_rri = makeNode(ResultRelInfo);
@@ -1111,6 +1118,9 @@ ExecInitPartitionDispatchInfo(EState *estate,
* Only sub-partitioned tables need to be locked here. The root
* partitioned table will already have been locked as it's referenced in
* the query's rtable.
+ *
+ * See the comment in ExecInitPartitionInfo() about taking locks and
+ * not checking ExecPlanStillValid(estate) here.
*/
if (partoid != RelationGetRelid(proute->partition_root))
rel = table_open(partoid, RowExclusiveLock);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index da8a1511ac..94c8e5e875 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -779,7 +779,25 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (IsParallelWorker() ||
+ (estate->es_cachedplan != NULL && !rte->inFromCl))
+ {
+ /*
+ * Take a lock if we are a parallel worker or if this is a child
+ * table referenced in a cached plan.
+ *
+ * Parallel workers need to have their own local lock on the
+ * relation. This ensures sane behavior in case the parent process
+ * exits before we do.
+ *
+ * When executing a cached plan, child tables must be locked
+ * here, because plancache.c (GetCachedPlan()) would only have
+ * locked tables mentioned in the query, that is, tables whose
+ * RTEs' inFromCl is true.
+ */
+ rel = table_open(rte->relid, rte->rellockmode);
+ }
+ else
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -792,15 +810,6 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rellockmode == AccessShareLock ||
CheckRelationLockedByMe(rel, rte->rellockmode, false));
}
- else
- {
- /*
- * If we are a parallel worker, we need to obtain our own local
- * lock on the relation. This ensures sane behavior in case the
- * parent process exits before we do.
- */
- rel = table_open(rte->relid, rte->rellockmode);
- }
estate->es_relations[rti - 1] = rel;
}
@@ -808,6 +817,38 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
return rel;
}
+/*
+ * ExecLockAppendNonLeafRelations
+ * Lock non-leaf relations whose children are scanned by a given
+ * Append/MergeAppend node
+ */
+void
+ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids)
+{
+ ListCell *l;
+
+ /* This should get called only when executing cached plans. */
+ Assert(estate->es_cachedplan != NULL);
+ foreach(l, allpartrelids)
+ {
+ Bitmapset *partrelids = lfirst_node(Bitmapset, l);
+ int i;
+
+ /*
+ * Note that we don't lock the first member (i=0) of each bitmapset
+ * because it stands for the root parent mentioned in the query that
+ * should always have been locked before entering the executor.
+ */
+ i = 0;
+ while ((i = bms_next_member(partrelids, i)) > 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(i, estate);
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+}
+
/*
* ExecInitResultRelation
* Open relation given by the passed-in RT index and fill its
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 8cf0b3132d..4ddf4fd7a9 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -838,6 +838,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL, /* fmgr_sql() doesn't use CachedPlans */
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 588f5388c7..20330c5c58 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -133,6 +133,25 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_syncdone = false;
appendstate->as_begun = false;
+ /*
+ * Lock non-leaf partitions whose leaf children are present in
+ * node->appendplans. Only need to do so if executing a cached
+ * plan, because child tables present in cached plans are not
+ * locked before execution.
+ *
+ * XXX - some of the non-leaf partitions may also be mentioned in
+ * part_prune_info, which if they are would get locked again in
+ * ExecInitPartitionPruning() because it calls
+ * ExecGetRangeTableRelation() which locks child tables.
+ */
+ if (estate->es_cachedplan)
+ {
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
+ }
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index c9d406c230..a8f9157192 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -81,6 +81,25 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.state = estate;
mergestate->ps.ExecProcNode = ExecMergeAppend;
+ /*
+ * Lock non-leaf partitions whose leaf children are present in
+ * node->mergeplans. Only need to do so if executing a cached
+ * plan, because child tables present in cached plans are not
+ * locked before execution.
+ *
+ * XXX - some of the non-leaf partitions may also be mentioned in
+ * part_prune_info, which if they are would get locked again in
+ * ExecInitPartitionPruning() because it calls
+ * ExecGetRangeTableRelation() which locks child tables.
+ */
+ if (estate->es_cachedplan)
+ {
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
+ }
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 6a96d7fc22..9c4ed74240 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2680,6 +2680,7 @@ replan:
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a672..c807e9cdcc 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -27,6 +27,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
@@ -364,6 +365,50 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
return false;
}
+/*
+ * CheckRelLockedByMe
+ *
+ * Returns true if current transaction holds a lock on the given relation of
+ * mode 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
+ */
+bool
+CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+ Oid dbId = get_rel_relisshared(relid) ? InvalidOid : MyDatabaseId;
+ LOCKTAG tag;
+
+ SET_LOCKTAG_RELATION(tag, dbId, relid);
+
+ if (LockHeldByMe(&tag, lockmode))
+ return true;
+
+ if (orstronger)
+ {
+ LOCKMODE slockmode;
+
+ for (slockmode = lockmode + 1;
+ slockmode <= MaxLockMode;
+ slockmode++)
+ {
+ if (LockHeldByMe(&tag, slockmode))
+ {
+#ifdef NOT_USED
+ /* Sometimes this might be useful for debugging purposes */
+ elog(WARNING, "lock mode %s substituted for %s on relation %s",
+ GetLockmodeName(tag.locktag_lockmethodid, slockmode),
+ GetLockmodeName(tag.locktag_lockmethodid, lockmode),
+ RelationGetRelationName(relation));
+#endif
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
/*
* LockHasWaitersRelation
*
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 9a96b77f1e..48cd6f4304 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -60,6 +60,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -72,6 +73,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan, if plan is from one */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -410,6 +412,7 @@ PortalStart(Portal portal, ParamListInfo params,
* set the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -440,6 +443,7 @@ PortalStart(Portal portal, ParamListInfo params,
*/
if (!ExecutorStart(queryDesc, myeflags))
{
+ Assert(queryDesc->cplan);
ExecutorEnd(queryDesc);
FreeQueryDesc(queryDesc);
PopActiveSnapshot();
@@ -538,7 +542,7 @@ PortalStart(Portal portal, ParamListInfo params,
* Create the QueryDesc. DestReceiver will be set in
* PortalRunMulti() before calling ExecutorRun().
*/
- queryDesc = CreateQueryDesc(plan,
+ queryDesc = CreateQueryDesc(plan, portal->cplan,
portal->sourceText,
!is_utility ?
GetActiveSnapshot() :
@@ -562,6 +566,7 @@ PortalStart(Portal portal, ParamListInfo params,
if (!ExecutorStart(queryDesc, myeflags))
{
PopActiveSnapshot();
+ Assert(queryDesc->cplan);
ExecutorEnd(queryDesc);
FreeQueryDesc(queryDesc);
plan_valid = false;
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index fc6d267e44..2725d02312 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2095,6 +2095,27 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * get_rel_relisshared
+ *
+ * Returns if the given relation is shared or not
+ */
+bool
+get_rel_relisshared(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relisshared;
+ ReleaseSysCache(tp);
+
+ return result;
+}
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 7d4168f82f..39fb0878fe 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -104,13 +104,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -792,8 +792,15 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * If the plan contains any child relations that would have been added by the
+ * planner, they would not have been locked yet, because AcquirePlannerLocks()
+ * only locks relations that would be present in the original query's range
+ * table (that is, before entering the planner). So, the plan could go stale
+ * before it reaches execution if any of those child relations get modified
+ * concurrently. The executor must check that the plan (CachedPlan) is still
+ * valid after taking a lock on each of the child tables during the plan
+ * initialization phase, and if it is not, ask the caller to recreate the
+ * plan.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -807,60 +814,56 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
-
- return false;
+ return cplan->is_valid;
}
/*
@@ -1129,9 +1132,16 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* This function hides the logic that decides whether to use a generic
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
- *
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+
+ * Upon return, the plan is generally valid. However, if it includes
+ * inheritance/partition child tables, they will not have been locked, since
+ * only tables mentioned in the original query are locked here. The executor
+ * locks these child tables when setting up the plan tree. If the plan is
+ * invalidated due to these locks, the executor should prompt the calling
+ * module to fetch a new plan by calling this function again. We defer child
+ * table locking to the executor like this because not all might need locking;
+ * some might be pruned during executor plan initialization, especially if
+ * the plan nodes under which they are scanned support partition pruning.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1166,7 +1176,10 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
{
if (CheckCachedPlan(plansource))
{
- /* We want a generic plan, and we already have a valid one */
+ /*
+ * We want a generic plan, and we already have a valid one, though
+ * see the header comment.
+ */
plan = plansource->gplan;
Assert(plan->magic == CACHEDPLAN_MAGIC);
}
@@ -1364,8 +1377,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if the executor would need to take additional locks, that is, in
+ * addition to those taken by AcquirePlannerLocks() on a given query.
*/
foreach(lc, plan->stmt_list)
{
@@ -1741,58 +1754,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 37554727ee..392abb5150 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -88,7 +88,7 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt, struct CachedPlan *cplan,
const char *queryString, IntoClause *into, ExplainState *es,
ParamListInfo params, QueryEnvironment *queryEnv);
extern void ExplainOnePlan(QueryDesc *queryDesc,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index af2bf36dfb..4b7368a0dc 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -32,9 +32,12 @@
*/
typedef struct QueryDesc
{
+ NodeTag type;
+
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ struct CachedPlan *cplan; /* CachedPlan, if plannedstmt is from one */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +60,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ struct CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 10c5cda169..eaa605e513 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -599,6 +599,7 @@ exec_rt_fetch(Index rti, EState *estate)
}
extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
+extern void ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids);
extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Index rti);
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 4ee91e3cf9..598bf2688a 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,7 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
bool orstronger);
+extern bool CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger);
extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index f5fdbfe116..a024e5dcd0 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -140,6 +140,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_relisshared(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..2fca84d027 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-replan
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7cd76eb34b..ce189156ad 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2023, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,45 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static bool
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ bool plan_valid;
+
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ plan_valid = prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ plan_valid ? "valid" : "not valid");
+
+ return plan_valid;
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +127,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-replan.out b/src/test/modules/delay_execution/expected/cached-plan-replan.out
new file mode 100644
index 0000000000..0ac6a17c2b
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-replan.out
@@ -0,0 +1,156 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo11 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = $1)
+(6 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-----------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo11 foo_1
+ Filter: (a = $1)
+(4 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------
+Bitmap Heap Scan on foo11 foo
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = 1)
+(4 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------
+Seq Scan on foo11 foo
+ Filter: (a = 1)
+(2 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+----------------------------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Index Only Scan using foo11_a_idx on foo11 t1
+ -> Materialize
+ -> Index Scan using foo11_a_idx on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(18 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Sort
+ Sort Key: t1.a
+ -> Seq Scan on foo11 t1
+ -> Sort
+ Sort Key: t2.a
+ -> Seq Scan on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(21 rows)
+
diff --git a/src/test/modules/delay_execution/specs/cached-plan-replan.spec b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
new file mode 100644
index 0000000000..3c92cbd5c6
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
@@ -0,0 +1,61 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo1 PARTITION OF foo FOR VALUES IN (1) PARTITION BY LIST (a);
+ CREATE TABLE foo11 PARTITION OF foo1 FOR VALUES IN (1);
+ CREATE INDEX foo11_a ON foo1 (a);
+ CREATE TABLE foo2 PARTITION OF foo FOR VALUES IN (2);
+ CREATE VIEW foov AS SELECT * FROM foo;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP TABLE foo;
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# no Append case (only one partition selected by the planner)
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Append with partition-wise join aggregate and join plans as child subplans
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo11_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
--
2.35.3
v46-0003-Support-for-ExecInitNode-to-detect-CachedPlan-in.patchapplication/octet-stream; name=v46-0003-Support-for-ExecInitNode-to-detect-CachedPlan-in.patchDownload
From c6234c690231d0aa9cc211309e7059d5c366d06e Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 11 Aug 2023 14:09:29 +0900
Subject: [PATCH v46 3/8] Support for ExecInitNode() to detect CachedPlan
invalidation
This commit adds checks to determine if a CachedPlan remains valid
during ExecInitNode() traversal of the plan from the CachedPlan. This
includes points right after opening/locking tables and during
recursive ExecInitNode() calls to initialize child plans. Depending
on the situation, specific ExecInit*() routines will:
* Return NULL if invalidation is spotted right after opening a table
or after a function that opens one, but before initializing child
nodes.
* Return the partially initialized PlanState node if invalidation is
found after recursively initializing a child node via
ExecInitNode().
A prior commit already fortified ExecEnd*() to manage these partial
nodes, containing partially initialized nodes and missing child node
links.
Importantly, this commit doesn't alter functionality. The CachedPlan
isn't fed to the executor as of now, and the executor doesn't lock
tables.
---
contrib/postgres_fdw/postgres_fdw.c | 4 ++++
src/backend/executor/execMain.c | 24 ++++++++++++++++++++--
src/backend/executor/execPartition.c | 4 ++++
src/backend/executor/execProcnode.c | 17 ++++++++++++++-
src/backend/executor/execUtils.c | 2 ++
src/backend/executor/nodeAgg.c | 2 ++
src/backend/executor/nodeAppend.c | 14 ++++++++++---
src/backend/executor/nodeBitmapAnd.c | 11 +++++++---
src/backend/executor/nodeBitmapHeapscan.c | 4 ++++
src/backend/executor/nodeBitmapIndexscan.c | 2 ++
src/backend/executor/nodeBitmapOr.c | 11 +++++++---
src/backend/executor/nodeCustom.c | 2 ++
src/backend/executor/nodeForeignscan.c | 4 ++++
src/backend/executor/nodeGather.c | 3 +++
src/backend/executor/nodeGatherMerge.c | 2 ++
src/backend/executor/nodeGroup.c | 2 ++
src/backend/executor/nodeHash.c | 2 ++
src/backend/executor/nodeHashjoin.c | 4 ++++
src/backend/executor/nodeIncrementalSort.c | 2 ++
src/backend/executor/nodeIndexonlyscan.c | 4 ++++
src/backend/executor/nodeIndexscan.c | 4 ++++
src/backend/executor/nodeLimit.c | 2 ++
src/backend/executor/nodeLockRows.c | 2 ++
src/backend/executor/nodeMaterial.c | 2 ++
src/backend/executor/nodeMemoize.c | 2 ++
src/backend/executor/nodeMergeAppend.c | 10 ++++++++-
src/backend/executor/nodeMergejoin.c | 4 ++++
src/backend/executor/nodeModifyTable.c | 7 +++++++
src/backend/executor/nodeNestloop.c | 4 ++++
src/backend/executor/nodeProjectSet.c | 2 ++
src/backend/executor/nodeRecursiveunion.c | 4 ++++
src/backend/executor/nodeResult.c | 2 ++
src/backend/executor/nodeSamplescan.c | 2 ++
src/backend/executor/nodeSeqscan.c | 2 ++
src/backend/executor/nodeSetOp.c | 2 ++
src/backend/executor/nodeSort.c | 2 ++
src/backend/executor/nodeSubqueryscan.c | 2 ++
src/backend/executor/nodeTidrangescan.c | 2 ++
src/backend/executor/nodeTidscan.c | 2 ++
src/backend/executor/nodeUnique.c | 2 ++
src/backend/executor/nodeWindowAgg.c | 2 ++
src/include/executor/executor.h | 10 +++++++++
src/include/nodes/execnodes.h | 2 ++
src/include/utils/plancache.h | 14 +++++++++++++
44 files changed, 196 insertions(+), 13 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 1393716587..ab7ecb925c 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2660,7 +2660,11 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
/* Get info about foreign table. */
rtindex = node->resultRelInfo->ri_RangeTableIndex;
if (fsplan->scan.scanrelid == 0)
+ {
dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (!ExecPlanStillValid(estate))
+ return;
+ }
else
dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4c5a7bbf62..f3054cbe7e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -839,8 +839,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
- PlanState *planstate;
- TupleDesc tupType;
+ PlanState *planstate = NULL;
+ TupleDesc tupType = NULL;
ListCell *l;
int i;
@@ -855,6 +855,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = NULL;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
@@ -886,6 +887,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (!ExecPlanStillValid(estate))
+ goto plan_init_suspended;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -956,6 +959,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
+ if (!ExecPlanStillValid(estate))
+ goto plan_init_suspended;
i++;
}
@@ -966,6 +971,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ goto plan_init_suspended;
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -1007,6 +1014,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
}
}
+plan_init_suspended:
queryDesc->tupDesc = tupType;
queryDesc->planstate = planstate;
}
@@ -2945,6 +2953,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
PlanState *subplanstate;
subplanstate = ExecInitNode(subplan, rcestate, 0);
+
+ /*
+ * At this point, we had better not received any new invalidation
+ * messages that would have caused the plan tree to go stale.
+ */
+ Assert(ExecPlanStillValid(rcestate));
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
}
@@ -2988,6 +3002,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
*/
epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
+ /*
+ * At this point, we had better not received any new invalidation messages
+ * that would have caused the plan tree to go stale.
+ */
+ Assert(ExecPlanStillValid(rcestate));
+
MemoryContextSwitchTo(oldcontext);
}
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index eb8a87fd63..e88455368c 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1801,6 +1801,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/* Create the working data structure for pruning */
prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Perform an initial partition prune pass, if required.
@@ -1927,6 +1929,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 6098cdca69..d5952d0d50 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -135,7 +135,18 @@ static bool ExecShutdownNode_walker(PlanState *node, void *context);
* 'estate' is the shared execution state for the plan tree
* 'eflags' is a bitwise OR of flag bits described in executor.h
*
- * Returns a PlanState node corresponding to the given Plan node.
+ * Returns a PlanState node corresponding to the given Plan node or NULL.
+ *
+ * Various node type specific ExecInit* routines listed below either
+ * return NULL or a partially initialized PlanState tree if the CachedPlan
+ * is found to be invalidated. That is checked by calling
+ * ExecPlanStillValid() at various points, such as after opening/locking
+ * a relation, or after calling a function that does which includes
+ * recursive invocations of ExecInitNode() to initialize child nodes.
+ * A given ExecInit* routine should return NULL upon getting false from
+ * ExecPlanStillValid() if no child node has been initialzed at the point
+ * of checking and the partially initialized PlanState node if a child
+ * node has been recursively initialized.
* ------------------------------------------------------------------------
*/
PlanState *
@@ -388,6 +399,10 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
break;
}
+ if (!ExecPlanStillValid(estate))
+ return result;
+
+ Assert(result != NULL);
ExecSetExecProcNode(result, result->ExecProcNode);
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 16704c0c2f..c3f7279b06 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -822,6 +822,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (!ExecPlanStillValid(estate))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index aac9e9fc80..f46c3df199 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3305,6 +3305,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return aggstate;
/*
* initialize source tuple type.
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..588f5388c7 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -147,6 +147,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
list_length(node->appendplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -185,8 +187,13 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->ps.resultopsset = true;
appendstate->ps.resultopsfixed = false;
- appendplanstates = (PlanState **) palloc(nplans *
- sizeof(PlanState *));
+ /*
+ * Any uninitialized sunbodes will have NULL in appendplans in the case of
+ * an early return.
+ */
+ appendstate->appendplans = appendplanstates =
+ (PlanState **) palloc0(nplans * sizeof(PlanState *));
+ appendstate->as_nplans = nplans;
/*
* call ExecInitNode on each of the valid plans to be executed and save
@@ -221,11 +228,12 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return appendstate;
}
appendstate->as_first_partial_plan = firstvalid;
appendstate->appendplans = appendplanstates;
- appendstate->as_nplans = nplans;
/* Initialize async state */
appendstate->as_asyncplans = asyncplans;
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4c5eb2b23b..c0495ec90f 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -69,6 +69,10 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
*/
nplans = list_length(node->bitmapplans);
+ /*
+ * Any uninitialized sunbodes will have NULL in bitmapplans in the case of
+ * an early return.
+ */
bitmapplanstates = (PlanState **) palloc0(nplans * sizeof(PlanState *));
/*
@@ -78,7 +82,6 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
bitmapandstate->ps.state = estate;
bitmapandstate->ps.ExecProcNode = ExecBitmapAnd;
bitmapandstate->bitmapplans = bitmapplanstates;
- bitmapandstate->nplans = nplans;
/*
* call ExecInitNode on each of the plans to be executed and save the
@@ -88,8 +91,10 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
foreach(l, node->bitmapplans)
{
initNode = (Plan *) lfirst(l);
- bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
- i++;
+ bitmapplanstates[i++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return bitmapandstate;
+ bitmapandstate->nplans = i;
}
/*
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index ffa51c06b4..3cdece852c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -752,11 +752,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 7cf8532bc9..4200472d02 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -255,6 +255,8 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 0bf8af9652..00120669a5 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -70,6 +70,10 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
*/
nplans = list_length(node->bitmapplans);
+ /*
+ * Any uninitialized sunbodes will have NULL in bitmapplans in the case of
+ * an early return.
+ */
bitmapplanstates = (PlanState **) palloc0(nplans * sizeof(PlanState *));
/*
@@ -79,7 +83,6 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
bitmaporstate->ps.state = estate;
bitmaporstate->ps.ExecProcNode = ExecBitmapOr;
bitmaporstate->bitmapplans = bitmapplanstates;
- bitmaporstate->nplans = nplans;
/*
* call ExecInitNode on each of the plans to be executed and save the
@@ -89,8 +92,10 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
foreach(l, node->bitmapplans)
{
initNode = (Plan *) lfirst(l);
- bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
- i++;
+ bitmapplanstates[i++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return bitmaporstate;
+ bitmaporstate->nplans = i;
}
/*
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index e80be3af81..76f5c2fd09 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -61,6 +61,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
css->ss.ss_currentRelation = scan_rel;
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index d5aaa983f7..0eeb66530a 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -173,6 +173,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -264,6 +266,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* Tell the FDW to initialize the scan.
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index bb2500a469..6b26e03f74 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -89,6 +89,9 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return gatherstate;
+
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 7a71a58509..84412f94bb 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -108,6 +108,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return gm_state;
/*
* Leader may access ExecProcNode result directly (if
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 8c650f0e46..b6068887f6 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -185,6 +185,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return grpstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index e72f0986c2..030bf0ed43 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -386,6 +386,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hashstate;
/*
* initialize our result slot and type. No need to build projection
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index aea44a9d56..49a6ba4276 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -752,8 +752,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hjstate;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hjstate;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index dcb8470ba7..6caa1aa306 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1041,6 +1041,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return incrsortstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f1db35665c..ea7fd89c0c 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -496,6 +496,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -549,6 +551,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->ioss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 14b9c00217..906358011a 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -909,6 +909,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -954,6 +956,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 5654158e3e..6760de0f25 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -476,6 +476,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return limitstate;
/*
* initialize child expressions
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index e459971d32..2599332f01 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -322,6 +322,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return lrstate;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 753ea28915..b974ebdc8a 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return matstate;
/*
* Initialize result type and slot. No need to initialize projection info
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 5352ca10c8..d0cdbe1fd7 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -938,6 +938,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mstate;
/*
* Initialize return slot and type. No need to initialize projection info
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 21b5726e6e..c9d406c230 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -95,6 +95,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
list_length(node->mergeplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -120,7 +122,11 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ms_prune_state = NULL;
}
- mergeplanstates = (PlanState **) palloc(nplans * sizeof(PlanState *));
+ /*
+ * Any uninitialized sunbodes will have NULL in mergeplans in the case of
+ * an early return.
+ */
+ mergeplanstates = (PlanState **) palloc0(nplans * sizeof(PlanState *));
mergestate->mergeplans = mergeplanstates;
mergestate->ms_nplans = nplans;
@@ -151,6 +157,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
}
mergestate->ps.ps_ProjInfo = NULL;
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 648fdd9a5f..e7f4512419 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1490,11 +1490,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index d21a178ad5..c28d5058e9 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3985,6 +3985,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
node->epqParam, node->resultRelations);
@@ -4012,6 +4015,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
if (resultRelInfo != mtstate->rootResultRelInfo)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* For child result relations, store the root result relation
@@ -4039,6 +4044,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mtstate;
/*
* Do additional per-result-relation initialization.
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index fc8f833d8b..0158a3e592 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -295,11 +295,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return nlstate;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return nlstate;
/*
* Initialize result slot, type and projection.
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index b4bbdc89b1..1b4774d4f7 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -247,6 +247,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return state;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index 3dfcb4cafb..ca4f78685d 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return rustate;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return rustate;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index e9f5732f33..d4ea101cbe 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -208,6 +208,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return resstate;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 1aa0e2a205..edda889e55 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -125,6 +125,8 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 49a5933aff..48e20aa735 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,8 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 98c1b84d43..7a3a142204 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return setopstate;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index eea7f2ae15..3ebbc46604 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return sortstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 1ee6295660..3c5c7c2ebb 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return subquerystate;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index da622d3f5f..d337f3d54a 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -374,6 +374,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 15055077d0..9637f354b2 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -517,6 +517,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 01f951197c..28630e380e 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -136,6 +136,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return uniquestate;
/*
* Initialize result slot and type. Unique nodes do no projections, so
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 3849d2f847..04d4eebce4 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2461,6 +2461,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return winstate;
/*
* initialize source tuple type (which is also the tuple type that we'll
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index aeebe0e0ff..72cbf120c5 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -256,6 +257,15 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the CachedPlan in es_cachedplan still valid?
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanStillValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cb714f4a19..b2a576b76d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -623,6 +623,8 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct CachedPlan *es_cachedplan; /* CachedPlan if plannedstmt is from
+ * one or NULL if not */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 916e59d9fe..c83a67fea3 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -221,6 +221,20 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Called by the executor on every relation lock taken when initializing the
+ * plan tree in the CachedPlan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
--
2.35.3
v46-0001-Refactor-ExecEnd-routines-to-enhance-efficiency.patchapplication/octet-stream; name=v46-0001-Refactor-ExecEnd-routines-to-enhance-efficiency.patchDownload
From ec8faad9bc9ae157ebca85a7892857a04f06fb39 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 1 Sep 2023 17:46:32 +0900
Subject: [PATCH v46 1/8] Refactor ExecEnd* routines to enhance efficiency
This commit removes unnecessary ExecExprFreeContext() calls in ExecEnd*
routines as the actual cleanup is managed by FreeExecutorState. With
no remaining callers for ExecExprFreeContext(), this commit also
removes the function.
This commit also drops redundant ExecClearTuple() calls, as
ExecResetTupleTable() in ExecEndPlan() already takes care of resetting
all TupleTableSlots.
After these modifications, the ExecEnd*() routines for ValuesScan,
NamedTuplestoreScan, and WorkTableScan became redundant. Thus, this
commit removes them. These changes not only optimize CPU usage during
ExecEndNode() processing but also pave the way for an upcoming patch.
This future patch aims to allow ExecEndNode() to expect PlanState
trees that are only partially initialized in some cases.
---
src/backend/executor/execProcnode.c | 18 +++++--------
src/backend/executor/execUtils.c | 26 -------------------
src/backend/executor/nodeAgg.c | 10 -------
src/backend/executor/nodeBitmapHeapscan.c | 12 ---------
src/backend/executor/nodeBitmapIndexscan.c | 8 ------
src/backend/executor/nodeCtescan.c | 13 +---------
src/backend/executor/nodeCustom.c | 8 +-----
src/backend/executor/nodeFunctionscan.c | 12 ---------
src/backend/executor/nodeGather.c | 3 ---
src/backend/executor/nodeGatherMerge.c | 3 ---
src/backend/executor/nodeGroup.c | 5 ----
src/backend/executor/nodeHash.c | 5 ----
src/backend/executor/nodeHashjoin.c | 12 ---------
src/backend/executor/nodeIncrementalSort.c | 8 ------
src/backend/executor/nodeIndexonlyscan.c | 16 ------------
src/backend/executor/nodeIndexscan.c | 16 ------------
src/backend/executor/nodeLimit.c | 1 -
src/backend/executor/nodeMaterial.c | 5 ----
src/backend/executor/nodeMemoize.c | 9 -------
src/backend/executor/nodeMergejoin.c | 12 ---------
src/backend/executor/nodeModifyTable.c | 11 --------
.../executor/nodeNamedtuplestorescan.c | 22 ----------------
src/backend/executor/nodeNestloop.c | 11 --------
src/backend/executor/nodeProjectSet.c | 10 -------
src/backend/executor/nodeResult.c | 10 -------
src/backend/executor/nodeSamplescan.c | 13 +---------
src/backend/executor/nodeSeqscan.c | 12 ---------
src/backend/executor/nodeSetOp.c | 4 ---
src/backend/executor/nodeSort.c | 7 -----
src/backend/executor/nodeSubqueryscan.c | 12 ---------
src/backend/executor/nodeTableFuncscan.c | 12 ---------
src/backend/executor/nodeTidrangescan.c | 12 ---------
src/backend/executor/nodeTidscan.c | 12 ---------
src/backend/executor/nodeUnique.c | 5 ----
src/backend/executor/nodeValuesscan.c | 24 -----------------
src/backend/executor/nodeWindowAgg.c | 17 ------------
src/backend/executor/nodeWorktablescan.c | 22 ----------------
src/include/executor/executor.h | 1 -
.../executor/nodeNamedtuplestorescan.h | 1 -
src/include/executor/nodeValuesscan.h | 1 -
src/include/executor/nodeWorktablescan.h | 1 -
41 files changed, 9 insertions(+), 413 deletions(-)
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d4..6098cdca69 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -667,22 +667,10 @@ ExecEndNode(PlanState *node)
ExecEndTableFuncScan((TableFuncScanState *) node);
break;
- case T_ValuesScanState:
- ExecEndValuesScan((ValuesScanState *) node);
- break;
-
case T_CteScanState:
ExecEndCteScan((CteScanState *) node);
break;
- case T_NamedTuplestoreScanState:
- ExecEndNamedTuplestoreScan((NamedTuplestoreScanState *) node);
- break;
-
- case T_WorkTableScanState:
- ExecEndWorkTableScan((WorkTableScanState *) node);
- break;
-
case T_ForeignScanState:
ExecEndForeignScan((ForeignScanState *) node);
break;
@@ -757,6 +745,12 @@ ExecEndNode(PlanState *node)
ExecEndLimit((LimitState *) node);
break;
+ /* No clean up actions for these nodes. */
+ case T_ValuesScanState:
+ case T_NamedTuplestoreScanState:
+ case T_WorkTableScanState:
+ break;
+
default:
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));
break;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c06b228858..16704c0c2f 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -638,32 +638,6 @@ tlist_matches_tupdesc(PlanState *ps, List *tlist, int varno, TupleDesc tupdesc)
return true;
}
-/* ----------------
- * ExecFreeExprContext
- *
- * A plan node's ExprContext should be freed explicitly during executor
- * shutdown because there may be shutdown callbacks to call. (Other resources
- * made by the above routines, such as projection info, don't need to be freed
- * explicitly because they're just memory in the per-query memory context.)
- *
- * However ... there is no particular need to do it during ExecEndNode,
- * because FreeExecutorState will free any remaining ExprContexts within
- * the EState. Letting FreeExecutorState do it allows the ExprContexts to
- * be freed in reverse order of creation, rather than order of creation as
- * will happen if we delete them here, which saves O(N^2) work in the list
- * cleanup inside FreeExprContext.
- * ----------------
- */
-void
-ExecFreeExprContext(PlanState *planstate)
-{
- /*
- * Per above discussion, don't actually delete the ExprContext. We do
- * unlink it from the plan node, though.
- */
- planstate->ps_ExprContext = NULL;
-}
-
/* ----------------------------------------------------------------
* Scan node support
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 468db94fe5..f154f28902 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -4357,16 +4357,6 @@ ExecEndAgg(AggState *node)
if (node->hashcontext)
ReScanExprContext(node->hashcontext);
- /*
- * We don't actually free any ExprContexts here (see comment in
- * ExecFreeExprContext), just unlinking the output one from the plan node
- * suffices.
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /* clean up tuple table */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index f35df0b8bf..2db0acfc76 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -655,18 +655,6 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
*/
scanDesc = node->ss.ss_currentScanDesc;
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clear out tuple table slots
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close down subplans
*/
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 83ec9ede89..7cf8532bc9 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -184,14 +184,6 @@ ExecEndBitmapIndexScan(BitmapIndexScanState *node)
indexRelationDesc = node->biss_RelationDesc;
indexScanDesc = node->biss_ScanDesc;
- /*
- * Free the exprcontext ... now dead code, see ExecFreeExprContext
- */
-#ifdef NOT_USED
- if (node->biss_RuntimeContext)
- FreeExprContext(node->biss_RuntimeContext, true);
-#endif
-
/*
* close the index relation (no-op if we didn't open it)
*/
diff --git a/src/backend/executor/nodeCtescan.c b/src/backend/executor/nodeCtescan.c
index cc4c4243e2..14e010c0ea 100644
--- a/src/backend/executor/nodeCtescan.c
+++ b/src/backend/executor/nodeCtescan.c
@@ -287,23 +287,12 @@ ExecInitCteScan(CteScan *node, EState *estate, int eflags)
void
ExecEndCteScan(CteScanState *node)
{
- /*
- * Free exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* If I am the leader, free the tuplestore.
*/
if (node->leader == node)
{
+ Assert(node->cte_table);
tuplestore_end(node->cte_table);
node->cte_table = NULL;
}
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index bd42c65b29..e80be3af81 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -127,15 +127,9 @@ ExecCustomScan(PlanState *pstate)
void
ExecEndCustomScan(CustomScanState *node)
{
+ Assert(node->methods);
Assert(node->methods->EndCustomScan != NULL);
node->methods->EndCustomScan(node);
-
- /* Free the exprcontext */
- ExecFreeExprContext(&node->ss.ps);
-
- /* Clean out the tuple table */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
void
diff --git a/src/backend/executor/nodeFunctionscan.c b/src/backend/executor/nodeFunctionscan.c
index dd06ef8aee..a49c1a2c85 100644
--- a/src/backend/executor/nodeFunctionscan.c
+++ b/src/backend/executor/nodeFunctionscan.c
@@ -523,18 +523,6 @@ ExecEndFunctionScan(FunctionScanState *node)
{
int i;
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* Release slots and tuplestore resources
*/
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 307fc10eea..bb2500a469 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -250,9 +250,6 @@ ExecEndGather(GatherState *node)
{
ExecEndNode(outerPlanState(node)); /* let children clean up first */
ExecShutdownGather(node);
- ExecFreeExprContext(&node->ps);
- if (node->ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 9d5e1a46e9..7a71a58509 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -290,9 +290,6 @@ ExecEndGatherMerge(GatherMergeState *node)
{
ExecEndNode(outerPlanState(node)); /* let children clean up first */
ExecShutdownGatherMerge(node);
- ExecFreeExprContext(&node->ps);
- if (node->ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 25a1618952..8c650f0e46 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -228,11 +228,6 @@ ExecEndGroup(GroupState *node)
{
PlanState *outerPlan;
- ExecFreeExprContext(&node->ss.ps);
-
- /* clean up tuple table */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
}
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 8b5c35b82b..e72f0986c2 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -415,11 +415,6 @@ ExecEndHash(HashState *node)
{
PlanState *outerPlan;
- /*
- * free exprcontext
- */
- ExecFreeExprContext(&node->ps);
-
/*
* shut down the subplan
*/
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 980746128b..aea44a9d56 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -867,18 +867,6 @@ ExecEndHashJoin(HashJoinState *node)
node->hj_HashTable = NULL;
}
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->js.ps);
-
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->hj_OuterTupleSlot);
- ExecClearTuple(node->hj_HashTupleSlot);
-
/*
* clean up subtrees
*/
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 7683e3341c..dcb8470ba7 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1079,14 +1079,6 @@ ExecEndIncrementalSort(IncrementalSortState *node)
{
SO_printf("ExecEndIncrementalSort: shutting down sort node\n");
- /* clean out the scan tuple */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
- /* must drop pointer to sort result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- /* must drop standalone tuple slots from outer node */
- ExecDropSingleTupleTableSlot(node->group_pivot);
- ExecDropSingleTupleTableSlot(node->transfer_tuple);
-
/*
* Release tuplesort resources.
*/
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0b43a9b969..f1db35665c 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -380,22 +380,6 @@ ExecEndIndexOnlyScan(IndexOnlyScanState *node)
node->ioss_VMBuffer = InvalidBuffer;
}
- /*
- * Free the exprcontext(s) ... now dead code, see ExecFreeExprContext
- */
-#ifdef NOT_USED
- ExecFreeExprContext(&node->ss.ps);
- if (node->ioss_RuntimeContext)
- FreeExprContext(node->ioss_RuntimeContext, true);
-#endif
-
- /*
- * clear out tuple table slots
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close the index relation (no-op if we didn't open it)
*/
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 4540c7781d..14b9c00217 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -794,22 +794,6 @@ ExecEndIndexScan(IndexScanState *node)
indexRelationDesc = node->iss_RelationDesc;
indexScanDesc = node->iss_ScanDesc;
- /*
- * Free the exprcontext(s) ... now dead code, see ExecFreeExprContext
- */
-#ifdef NOT_USED
- ExecFreeExprContext(&node->ss.ps);
- if (node->iss_RuntimeContext)
- FreeExprContext(node->iss_RuntimeContext, true);
-#endif
-
- /*
- * clear out tuple table slots
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close the index relation (no-op if we didn't open it)
*/
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 425fbfc405..5654158e3e 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -534,7 +534,6 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
void
ExecEndLimit(LimitState *node)
{
- ExecFreeExprContext(&node->ps);
ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 09632678b0..753ea28915 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -239,11 +239,6 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
void
ExecEndMaterial(MaterialState *node)
{
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* Release tuplestore resources
*/
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 4f04269e26..94bf479287 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -1091,15 +1091,6 @@ ExecEndMemoize(MemoizeState *node)
/* Remove the cache context */
MemoryContextDelete(node->tableContext);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
- /* must drop pointer to cache result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-
- /*
- * free exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
/*
* shut down the subplan
*/
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 00f96d045e..648fdd9a5f 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1642,18 +1642,6 @@ ExecEndMergeJoin(MergeJoinState *node)
{
MJ1_printf("ExecEndMergeJoin: %s\n",
"ending node processing");
-
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->js.ps);
-
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->mj_MarkedTupleSlot);
-
/*
* shut down the subplans
*/
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 5005d8c0d1..d21a178ad5 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4446,17 +4446,6 @@ ExecEndModifyTable(ModifyTableState *node)
ExecDropSingleTupleTableSlot(node->mt_root_tuple_slot);
}
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
/*
* Terminate EPQ execution if active
*/
diff --git a/src/backend/executor/nodeNamedtuplestorescan.c b/src/backend/executor/nodeNamedtuplestorescan.c
index 46832ad82f..3547dc2b10 100644
--- a/src/backend/executor/nodeNamedtuplestorescan.c
+++ b/src/backend/executor/nodeNamedtuplestorescan.c
@@ -155,28 +155,6 @@ ExecInitNamedTuplestoreScan(NamedTuplestoreScan *node, EState *estate, int eflag
return scanstate;
}
-/* ----------------------------------------------------------------
- * ExecEndNamedTuplestoreScan
- *
- * frees any storage allocated through C routines.
- * ----------------------------------------------------------------
- */
-void
-ExecEndNamedTuplestoreScan(NamedTuplestoreScanState *node)
-{
- /*
- * Free exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-}
-
/* ----------------------------------------------------------------
* ExecReScanNamedTuplestoreScan
*
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b3d52e69ec..fc8f833d8b 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -363,17 +363,6 @@ ExecEndNestLoop(NestLoopState *node)
{
NL1_printf("ExecEndNestLoop: %s\n",
"ending node processing");
-
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->js.ps);
-
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
-
/*
* close down subplans
*/
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index f6ff3dc44c..b4bbdc89b1 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -320,16 +320,6 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
void
ExecEndProjectSet(ProjectSetState *node)
{
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ps);
-
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
/*
* shut down subplans
*/
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 4219712d30..e9f5732f33 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -240,16 +240,6 @@ ExecInitResult(Result *node, EState *estate, int eflags)
void
ExecEndResult(ResultState *node)
{
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ps);
-
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
/*
* shut down subplans
*/
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index d7e22b1dbb..1aa0e2a205 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -185,21 +185,10 @@ ExecEndSampleScan(SampleScanState *node)
/*
* Tell sampling function that we finished the scan.
*/
+ Assert(node->tsmroutine);
if (node->tsmroutine->EndSampleScan)
node->tsmroutine->EndSampleScan(node);
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close heap scan
*/
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 4da0f28f7b..49a5933aff 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -190,18 +190,6 @@ ExecEndSeqScan(SeqScanState *node)
*/
scanDesc = node->ss.ss_currentScanDesc;
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close heap scan
*/
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 4bc2406b89..98c1b84d43 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -582,13 +582,9 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
void
ExecEndSetOp(SetOpState *node)
{
- /* clean up tuple table */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
/* free subsidiary stuff including hashtable */
if (node->tableContext)
MemoryContextDelete(node->tableContext);
- ExecFreeExprContext(&node->ps);
ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index c6c72c6e67..eea7f2ae15 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -303,13 +303,6 @@ ExecEndSort(SortState *node)
SO1_printf("ExecEndSort: %s\n",
"shutting down sort node");
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
- /* must drop pointer to sort result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-
/*
* Release tuplesort resources
*/
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 42471bfc04..1ee6295660 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -167,18 +167,6 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
void
ExecEndSubqueryScan(SubqueryScanState *node)
{
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the upper tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close down subquery
*/
diff --git a/src/backend/executor/nodeTableFuncscan.c b/src/backend/executor/nodeTableFuncscan.c
index 791cbd2372..a60dcd4943 100644
--- a/src/backend/executor/nodeTableFuncscan.c
+++ b/src/backend/executor/nodeTableFuncscan.c
@@ -213,18 +213,6 @@ ExecInitTableFuncScan(TableFuncScan *node, EState *estate, int eflags)
void
ExecEndTableFuncScan(TableFuncScanState *node)
{
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* Release tuplestore resources
*/
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 2124c55ef5..da622d3f5f 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -331,18 +331,6 @@ ExecEndTidRangeScan(TidRangeScanState *node)
if (scan != NULL)
table_endscan(scan);
-
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clear out tuple table slots
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 862bd0330b..15055077d0 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -472,18 +472,6 @@ ExecEndTidScan(TidScanState *node)
{
if (node->ss.ss_currentScanDesc)
table_endscan(node->ss.ss_currentScanDesc);
-
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clear out tuple table slots
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 45035d74fa..01f951197c 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -168,11 +168,6 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
void
ExecEndUnique(UniqueState *node)
{
- /* clean up tuple table */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
- ExecFreeExprContext(&node->ps);
-
ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeValuesscan.c b/src/backend/executor/nodeValuesscan.c
index 32ace63017..fbfb067f3b 100644
--- a/src/backend/executor/nodeValuesscan.c
+++ b/src/backend/executor/nodeValuesscan.c
@@ -319,30 +319,6 @@ ExecInitValuesScan(ValuesScan *node, EState *estate, int eflags)
return scanstate;
}
-/* ----------------------------------------------------------------
- * ExecEndValuesScan
- *
- * frees any storage allocated through C routines.
- * ----------------------------------------------------------------
- */
-void
-ExecEndValuesScan(ValuesScanState *node)
-{
- /*
- * Free both exprcontexts
- */
- ExecFreeExprContext(&node->ss.ps);
- node->ss.ps.ps_ExprContext = node->rowcontext;
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-}
-
/* ----------------------------------------------------------------
* ExecReScanValuesScan
*
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 310ac23e3a..77724a6daa 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2686,23 +2686,6 @@ ExecEndWindowAgg(WindowAggState *node)
release_partition(node);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
- ExecClearTuple(node->first_part_slot);
- ExecClearTuple(node->agg_row_slot);
- ExecClearTuple(node->temp_slot_1);
- ExecClearTuple(node->temp_slot_2);
- if (node->framehead_slot)
- ExecClearTuple(node->framehead_slot);
- if (node->frametail_slot)
- ExecClearTuple(node->frametail_slot);
-
- /*
- * Free both the expr contexts.
- */
- ExecFreeExprContext(&node->ss.ps);
- node->ss.ps.ps_ExprContext = node->tmpcontext;
- ExecFreeExprContext(&node->ss.ps);
-
for (i = 0; i < node->numaggs; i++)
{
if (node->peragg[i].aggcontext != node->aggcontext)
diff --git a/src/backend/executor/nodeWorktablescan.c b/src/backend/executor/nodeWorktablescan.c
index 0c13448236..17a548865e 100644
--- a/src/backend/executor/nodeWorktablescan.c
+++ b/src/backend/executor/nodeWorktablescan.c
@@ -181,28 +181,6 @@ ExecInitWorkTableScan(WorkTableScan *node, EState *estate, int eflags)
return scanstate;
}
-/* ----------------------------------------------------------------
- * ExecEndWorkTableScan
- *
- * frees any storage allocated through C routines.
- * ----------------------------------------------------------------
- */
-void
-ExecEndWorkTableScan(WorkTableScanState *node)
-{
- /*
- * Free exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-}
-
/* ----------------------------------------------------------------
* ExecReScanWorkTableScan
*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index c677e490d7..aeebe0e0ff 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -569,7 +569,6 @@ extern void ExecAssignProjectionInfo(PlanState *planstate,
TupleDesc inputDesc);
extern void ExecConditionalAssignProjectionInfo(PlanState *planstate,
TupleDesc inputDesc, int varno);
-extern void ExecFreeExprContext(PlanState *planstate);
extern void ExecAssignScanType(ScanState *scanstate, TupleDesc tupDesc);
extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
ScanState *scanstate,
diff --git a/src/include/executor/nodeNamedtuplestorescan.h b/src/include/executor/nodeNamedtuplestorescan.h
index 3ff687023a..9d80236fe5 100644
--- a/src/include/executor/nodeNamedtuplestorescan.h
+++ b/src/include/executor/nodeNamedtuplestorescan.h
@@ -17,7 +17,6 @@
#include "nodes/execnodes.h"
extern NamedTuplestoreScanState *ExecInitNamedTuplestoreScan(NamedTuplestoreScan *node, EState *estate, int eflags);
-extern void ExecEndNamedTuplestoreScan(NamedTuplestoreScanState *node);
extern void ExecReScanNamedTuplestoreScan(NamedTuplestoreScanState *node);
#endif /* NODENAMEDTUPLESTORESCAN_H */
diff --git a/src/include/executor/nodeValuesscan.h b/src/include/executor/nodeValuesscan.h
index a52fa678df..fe3f043951 100644
--- a/src/include/executor/nodeValuesscan.h
+++ b/src/include/executor/nodeValuesscan.h
@@ -17,7 +17,6 @@
#include "nodes/execnodes.h"
extern ValuesScanState *ExecInitValuesScan(ValuesScan *node, EState *estate, int eflags);
-extern void ExecEndValuesScan(ValuesScanState *node);
extern void ExecReScanValuesScan(ValuesScanState *node);
#endif /* NODEVALUESSCAN_H */
diff --git a/src/include/executor/nodeWorktablescan.h b/src/include/executor/nodeWorktablescan.h
index e553a453f3..f31b22cec4 100644
--- a/src/include/executor/nodeWorktablescan.h
+++ b/src/include/executor/nodeWorktablescan.h
@@ -17,7 +17,6 @@
#include "nodes/execnodes.h"
extern WorkTableScanState *ExecInitWorkTableScan(WorkTableScan *node, EState *estate, int eflags);
-extern void ExecEndWorkTableScan(WorkTableScanState *node);
extern void ExecReScanWorkTableScan(WorkTableScanState *node);
#endif /* NODEWORKTABLESCAN_H */
--
2.35.3
v46-0002-Check-pointer-NULLness-before-cleanup-in-ExecEnd.patchapplication/octet-stream; name=v46-0002-Check-pointer-NULLness-before-cleanup-in-ExecEnd.patchDownload
From 3dfe81f48a58a92e8c81469600d3502f18a8b137 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 1 Sep 2023 22:05:35 +0900
Subject: [PATCH v46 2/8] Check pointer NULLness before cleanup in ExecEnd*
routines
Many routines already perform this check, but a few instances remain.
Currently, these NULLness checks might seem redundant since ExecEnd*
routines operate under the assumption that their matching ExecInit*
routine would have fully executed, ensuring pointers are set. However,
a forthcoming patch will modify ExecInit* routines to sometimes exit
early, potentially leaving some pointers in an undetermined state.
---
src/backend/executor/nodeAgg.c | 3 ++-
src/backend/executor/nodeBitmapHeapscan.c | 3 ++-
src/backend/executor/nodeForeignscan.c | 21 ++++++++------------
src/backend/executor/nodeMemoize.c | 1 +
src/backend/executor/nodeRecursiveunion.c | 6 ++++--
src/backend/executor/nodeWindowAgg.c | 24 +++++++++++++++--------
6 files changed, 33 insertions(+), 25 deletions(-)
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index f154f28902..aac9e9fc80 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3150,7 +3150,8 @@ hashagg_reset_spill_state(AggState *aggstate)
}
/* free batches */
- list_free_deep(aggstate->hash_batches);
+ if (aggstate->hash_batches)
+ list_free_deep(aggstate->hash_batches);
aggstate->hash_batches = NIL;
/* close tape set */
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 2db0acfc76..ffa51c06b4 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -681,7 +681,8 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
/*
* close heap scan
*/
- table_endscan(scanDesc);
+ if (scanDesc)
+ table_endscan(scanDesc);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index c2139acca0..d5aaa983f7 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -301,25 +301,20 @@ ExecEndForeignScan(ForeignScanState *node)
EState *estate = node->ss.ps.state;
/* Let the FDW shut down */
- if (plan->operation != CMD_SELECT)
+ if (node->fdwroutine)
{
- if (estate->es_epq_active == NULL)
- node->fdwroutine->EndDirectModify(node);
+ if (plan->operation != CMD_SELECT)
+ {
+ if (estate->es_epq_active == NULL)
+ node->fdwroutine->EndDirectModify(node);
+ }
+ else
+ node->fdwroutine->EndForeignScan(node);
}
- else
- node->fdwroutine->EndForeignScan(node);
/* Shut down any outer plan. */
if (outerPlanState(node))
ExecEndNode(outerPlanState(node));
-
- /* Free the exprcontext */
- ExecFreeExprContext(&node->ss.ps);
-
- /* clean out the tuple table */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 94bf479287..5352ca10c8 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -1043,6 +1043,7 @@ ExecEndMemoize(MemoizeState *node)
{
#ifdef USE_ASSERT_CHECKING
/* Validate the memory accounting code is correct in assert builds. */
+ if (node->hashtable)
{
int count;
uint64 mem = 0;
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index e781003934..3dfcb4cafb 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -272,8 +272,10 @@ void
ExecEndRecursiveUnion(RecursiveUnionState *node)
{
/* Release tuplestores */
- tuplestore_end(node->working_table);
- tuplestore_end(node->intermediate_table);
+ if (node->working_table)
+ tuplestore_end(node->working_table);
+ if (node->intermediate_table)
+ tuplestore_end(node->intermediate_table);
/* free subsidiary stuff including hashtable */
if (node->tempContext)
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 77724a6daa..3849d2f847 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -1351,11 +1351,14 @@ release_partition(WindowAggState *winstate)
* any aggregate temp data). We don't rely on retail pfree because some
* aggregates might have allocated data we don't have direct pointers to.
*/
- MemoryContextResetAndDeleteChildren(winstate->partcontext);
- MemoryContextResetAndDeleteChildren(winstate->aggcontext);
+ if (winstate->partcontext)
+ MemoryContextResetAndDeleteChildren(winstate->partcontext);
+ if (winstate->aggcontext)
+ MemoryContextResetAndDeleteChildren(winstate->aggcontext);
for (i = 0; i < winstate->numaggs; i++)
{
- if (winstate->peragg[i].aggcontext != winstate->aggcontext)
+ if (winstate->peragg[i].aggcontext &&
+ winstate->peragg[i].aggcontext != winstate->aggcontext)
MemoryContextResetAndDeleteChildren(winstate->peragg[i].aggcontext);
}
@@ -2688,14 +2691,19 @@ ExecEndWindowAgg(WindowAggState *node)
for (i = 0; i < node->numaggs; i++)
{
- if (node->peragg[i].aggcontext != node->aggcontext)
+ if (node->peragg[i].aggcontext &&
+ node->peragg[i].aggcontext != node->aggcontext)
MemoryContextDelete(node->peragg[i].aggcontext);
}
- MemoryContextDelete(node->partcontext);
- MemoryContextDelete(node->aggcontext);
+ if (node->partcontext)
+ MemoryContextDelete(node->partcontext);
+ if (node->aggcontext)
+ MemoryContextDelete(node->aggcontext);
- pfree(node->perfunc);
- pfree(node->peragg);
+ if (node->perfunc)
+ pfree(node->perfunc);
+ if (node->peragg)
+ pfree(node->peragg);
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
--
2.35.3
On Tue, Sep 5, 2023 at 3:13 AM Amit Langote <amitlangote09@gmail.com> wrote:
Attached 0001 removes unnecessary cleanup calls from ExecEnd*() routines.
It also adds a few random Assert()s to verify that unrelated pointers
are not NULL. I suggest that it shouldn't do that.
The commit message doesn't mention the removal of the calls to
ExecDropSingleTupleTableSlot. It's not clear to me why that's OK and I
think it would be nice to mention it in the commit message, assuming
that it is in fact OK.
I suggest changing the subject line of the commit to something like
"Remove obsolete executor cleanup code."
0002 adds NULLness checks in ExecEnd*() routines on some pointers that
may not be initialized by the corresponding ExecInit*() routines in
the case where it returns early.
I think you should only add these where it's needed. For example, I
think list_free_deep(NIL) is fine.
The changes to ExecEndForeignScan look like they include stuff that
belongs in 0001.
Personally, I prefer explicit NULL-tests i.e. if (x != NULL) to
implicit ones like if (x), but opinions vary.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Sep 5, 2023 at 11:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Sep 5, 2023 at 3:13 AM Amit Langote <amitlangote09@gmail.com> wrote:
Attached 0001 removes unnecessary cleanup calls from ExecEnd*() routines.
It also adds a few random Assert()s to verify that unrelated pointers
are not NULL. I suggest that it shouldn't do that.
OK, removed.
The commit message doesn't mention the removal of the calls to
ExecDropSingleTupleTableSlot. It's not clear to me why that's OK and I
think it would be nice to mention it in the commit message, assuming
that it is in fact OK.
That is not OK, so I dropped their removal. I think I confused them
with slots in other functions initialized with
ExecInitExtraTupleSlot() that *are* put into the estate.
I suggest changing the subject line of the commit to something like
"Remove obsolete executor cleanup code."
Sure.
0002 adds NULLness checks in ExecEnd*() routines on some pointers that
may not be initialized by the corresponding ExecInit*() routines in
the case where it returns early.I think you should only add these where it's needed. For example, I
think list_free_deep(NIL) is fine.
OK, done.
The changes to ExecEndForeignScan look like they include stuff that
belongs in 0001.
Oops, yes. Moved to 0001.
Personally, I prefer explicit NULL-tests i.e. if (x != NULL) to
implicit ones like if (x), but opinions vary.
I agree, so changed all the new tests to use (x != NULL) form.
Typically, I try to stick with whatever style is used in the nearby
code, though I can see both styles being used in the ExecEnd*()
routines. I opted to use the style that we both happen to prefer.
Attached updated patches. Thanks for the review.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v47-0005-Add-field-to-store-parent-relids-to-Append-Merge.patchapplication/octet-stream; name=v47-0005-Add-field-to-store-parent-relids-to-Append-Merge.patchDownload
From 32b706d61e4517654e85676477676cc11f682f63 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:54:02 +0900
Subject: [PATCH v47 5/8] Add field to store parent relids to
Append/MergeAppend
There's no way currently in the executor to tell if the child
subplans of Append/MergeAppend are scanning partitions, and if
they indeed do, what the RT indexes of their parent/ancestor tables
are. Executor doesn't need to see their RT indexes except for
run-time pruning, in which case they can can be found in the
PartitionPruneInfo, but a future commit will create a need for
them to be available at all times for the purpose of locking
those parent/ancestor tables when executing a cached plan.
The code to look up partitioned parent relids for a given list of
partition scan subpaths of an Append/MergeAppend is already present
in make_partition_pruneinfo() but it's local to partprune.c. This
commit refactors that code into its own function called
add_append_subpath_partrelids() defined in appendinfo.c and
generalizes it to consider child join and aggregate paths. To
facilitate looking up of parent rels of child grouping rels in
add_append_subpath_partrelids(), parent links are now also set in
the RelOptInfos of child grouping rels too, like they are in
those of child base and join rels.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/optimizer/plan/createplan.c | 41 ++++++--
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/plan/setrefs.c | 4 +
src/backend/optimizer/util/appendinfo.c | 134 ++++++++++++++++++++++++
src/backend/partitioning/partprune.c | 124 +++-------------------
src/include/nodes/plannodes.h | 14 +++
src/include/optimizer/appendinfo.h | 3 +
src/include/partitioning/partprune.h | 3 +-
8 files changed, 203 insertions(+), 123 deletions(-)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 34ca6d4ac2..d1f4f606bf 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -25,6 +25,7 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
@@ -1229,6 +1230,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
Oid *nodeCollations = NULL;
bool *nodeNullsFirst = NULL;
bool consider_async = false;
+ List *allpartrelids = NIL;
/*
* The subpaths list could be empty, if every child was proven empty by
@@ -1370,15 +1372,23 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
++nasyncplans;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ plan->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1399,7 +1409,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
partpruneinfo =
make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
plan->appendplans = subplans;
@@ -1445,6 +1456,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ List *allpartrelids = NIL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1534,15 +1546,23 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplan = (Plan *) sort;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ node->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1554,7 +1574,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
partpruneinfo = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
node->mergeplans = subplans;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 44efb1f4eb..f97bc09113 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7855,8 +7855,11 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
agg_costs, gd, &child_extra,
&child_partially_grouped_rel);
+ /* Mark as child of grouped_rel. */
+ child_grouped_rel->parent = grouped_rel;
if (child_partially_grouped_rel)
{
+ child_partially_grouped_rel->parent = grouped_rel;
partially_grouped_live_children =
lappend(partially_grouped_live_children,
child_partially_grouped_rel);
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 97fa561e4e..854dd7c8af 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1766,6 +1766,8 @@ set_append_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) aplan, rtoffset);
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
+ foreach(l, aplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (aplan->part_prune_info)
{
@@ -1842,6 +1844,8 @@ set_mergeappend_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) mplan, rtoffset);
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
+ foreach(l, mplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (mplan->part_prune_info)
{
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index f456b3b0a4..5bd8e82b9b 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -41,6 +41,7 @@ static void make_inh_translation_list(Relation oldrelation,
AppendRelInfo *appinfo);
static Node *adjust_appendrel_attrs_mutator(Node *node,
adjust_appendrel_attrs_context *context);
+static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
/*
@@ -1035,3 +1036,136 @@ distribute_row_identity_vars(PlannerInfo *root)
}
}
}
+
+/*
+ * add_append_subpath_partrelids
+ * Look up a child subpath's rel's partitioned parent relids up to
+ * parentrel and add the bitmapset containing those into
+ * 'allpartrelids'
+ */
+List *
+add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids)
+{
+ RelOptInfo *prel = subpath->parent;
+ Relids partrelids = NULL;
+
+ /* Nothing to do if there's no parent to begin with. */
+ if (!IS_OTHER_REL(prel))
+ return allpartrelids;
+
+ /*
+ * Traverse up to the pathrel's topmost partitioned parent, collecting
+ * parent relids as we go; but stop if we reach parentrel. (Normally, a
+ * pathrel's topmost partitioned parent is either parentrel or a UNION ALL
+ * appendrel child of parentrel. But when handling partitionwise joins of
+ * multi-level partitioning trees, we can see an append path whose
+ * parentrel is an intermediate partitioned table.)
+ */
+ do
+ {
+ Relids parent_relids = NULL;
+
+ /*
+ * For simple child rels, we can simply set the parent_relids to
+ * prel->parent->relids. But for partitionwise join and aggregate
+ * child rels, while we can use prel->parent to move up the tree,
+ * parent_relids must be found the hard way through AppendInfoInfos,
+ * because 1) a joinrel's relids may point to RTE_JOIN entries,
+ * 2) topmost parent grouping rel's relids field is NULL.
+ */
+ if (IS_SIMPLE_REL(prel))
+ {
+ prel = prel->parent;
+ /* Stop once we reach the root partitioned rel. */
+ if (!IS_PARTITIONED_REL(prel))
+ break;
+ parent_relids = bms_add_members(parent_relids, prel->relids);
+ }
+ else
+ {
+ AppendRelInfo **appinfos;
+ int nappinfos,
+ i;
+
+ appinfos = find_appinfos_by_relids(root, prel->relids,
+ &nappinfos);
+ for (i = 0; i < nappinfos; i++)
+ {
+ AppendRelInfo *appinfo = appinfos[i];
+
+ parent_relids = bms_add_member(parent_relids,
+ appinfo->parent_relid);
+ }
+ pfree(appinfos);
+ prel = prel->parent;
+ }
+ /* accept this level as an interesting parent */
+ partrelids = bms_add_members(partrelids, parent_relids);
+ if (prel == parentrel)
+ break; /* don't traverse above parentrel */
+ } while (IS_OTHER_REL(prel));
+
+ if (partrelids == NULL)
+ return allpartrelids;
+
+ return add_part_relids(allpartrelids, partrelids);
+}
+
+/*
+ * add_part_relids
+ * Add new info to a list of Bitmapsets of partitioned relids.
+ *
+ * Within 'allpartrelids', there is one Bitmapset for each topmost parent
+ * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
+ * parent as well as its relevant non-leaf child partitions. Since (by
+ * construction of the rangetable list) parent partitions must have lower
+ * RT indexes than their children, we can distinguish the topmost parent
+ * as being the lowest set bit in the Bitmapset.
+ *
+ * 'partrelids' contains the RT indexes of a parent partitioned rel, and
+ * possibly some non-leaf children, that are newly identified as parents of
+ * some subpath rel passed to make_partition_pruneinfo(). These are added
+ * to an appropriate member of 'allpartrelids'.
+ *
+ * Note that the list contains only RT indexes of partitioned tables that
+ * are parents of some scan-level relation appearing in the 'subpaths' that
+ * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
+ * not allowed to be higher than the 'parentrel' associated with the append
+ * path. In this way, we avoid expending cycles on partitioned rels that
+ * can't contribute useful pruning information for the problem at hand.
+ * (It is possible for 'parentrel' to be a child partitioned table, and it
+ * is also possible for scan-level relations to be child partitioned tables
+ * rather than leaf partitions. Hence we must construct this relation set
+ * with reference to the particular append path we're dealing with, rather
+ * than looking at the full partitioning structure represented in the
+ * RelOptInfos.)
+ */
+static List *
+add_part_relids(List *allpartrelids, Bitmapset *partrelids)
+{
+ Index targetpart;
+ ListCell *lc;
+
+ /* We can easily get the lowest set bit this way: */
+ targetpart = bms_next_member(partrelids, -1);
+ Assert(targetpart > 0);
+
+ /* Look for a matching topmost parent */
+ foreach(lc, allpartrelids)
+ {
+ Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
+ Index currtarget = bms_next_member(currpartrelids, -1);
+
+ if (targetpart == currtarget)
+ {
+ /* Found a match, so add any new RT indexes to this hierarchy */
+ currpartrelids = bms_add_members(currpartrelids, partrelids);
+ lfirst(lc) = currpartrelids;
+ return allpartrelids;
+ }
+ }
+ /* No match, so add the new partition hierarchy to the list */
+ return lappend(allpartrelids, partrelids);
+}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 7179b22a05..213512a5f4 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -138,7 +138,6 @@ typedef struct PruneStepResult
} PruneStepResult;
-static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
RelOptInfo *parentrel,
List *prunequal,
@@ -218,33 +217,32 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'allpartrelids' contains Bitmapsets of RT indexes of partitioned parents
+ * whose partitions' Paths are in 'subpaths'; there's one Bitmapset for every
+ * partition tree involved.
*/
PartitionPruneInfo *
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ List *allpartrelids)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
- List *allpartrelids;
List *prunerelinfos;
int *relid_subplan_map;
ListCell *lc;
int i;
+ Assert(list_length(allpartrelids) > 0);
+
/*
- * Scan the subpaths to see which ones are scans of partition child
- * relations, and identify their parent partitioned rels. (Note: we must
- * restrict the parent partitioned rels to be parentrel or children of
- * parentrel, otherwise we couldn't translate prunequal to match.)
- *
- * Also construct a temporary array to map from partition-child-relation
- * relid to the index in 'subpaths' of the scan plan for that partition.
+ * Construct a temporary array to map from partition-child-relation relid
+ * to the index in 'subpaths' of the scan plan for that partition.
* (Use of "subplan" rather than "subpath" is a bit of a misnomer, but
* we'll let it stand.) For convenience, we use 1-based indexes here, so
* that zero can represent an un-filled array entry.
*/
- allpartrelids = NIL;
relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
i = 1;
@@ -253,50 +251,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Path *path = (Path *) lfirst(lc);
RelOptInfo *pathrel = path->parent;
- /* We don't consider partitioned joins here */
- if (pathrel->reloptkind == RELOPT_OTHER_MEMBER_REL)
- {
- RelOptInfo *prel = pathrel;
- Bitmapset *partrelids = NULL;
-
- /*
- * Traverse up to the pathrel's topmost partitioned parent,
- * collecting parent relids as we go; but stop if we reach
- * parentrel. (Normally, a pathrel's topmost partitioned parent
- * is either parentrel or a UNION ALL appendrel child of
- * parentrel. But when handling partitionwise joins of
- * multi-level partitioning trees, we can see an append path whose
- * parentrel is an intermediate partitioned table.)
- */
- do
- {
- AppendRelInfo *appinfo;
-
- Assert(prel->relid < root->simple_rel_array_size);
- appinfo = root->append_rel_array[prel->relid];
- prel = find_base_rel(root, appinfo->parent_relid);
- if (!IS_PARTITIONED_REL(prel))
- break; /* reached a non-partitioned parent */
- /* accept this level as an interesting parent */
- partrelids = bms_add_member(partrelids, prel->relid);
- if (prel == parentrel)
- break; /* don't traverse above parentrel */
- } while (prel->reloptkind == RELOPT_OTHER_MEMBER_REL);
-
- if (partrelids)
- {
- /*
- * Found some relevant parent partitions, which may or may not
- * overlap with partition trees we already found. Add new
- * information to the allpartrelids list.
- */
- allpartrelids = add_part_relids(allpartrelids, partrelids);
- /* Also record the subplan in relid_subplan_map[] */
- /* No duplicates please */
- Assert(relid_subplan_map[pathrel->relid] == 0);
- relid_subplan_map[pathrel->relid] = i;
- }
- }
+ /* No duplicates please */
+ Assert(relid_subplan_map[pathrel->relid] == 0);
+ relid_subplan_map[pathrel->relid] = i;
i++;
}
@@ -362,63 +319,6 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return pruneinfo;
}
-/*
- * add_part_relids
- * Add new info to a list of Bitmapsets of partitioned relids.
- *
- * Within 'allpartrelids', there is one Bitmapset for each topmost parent
- * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
- * parent as well as its relevant non-leaf child partitions. Since (by
- * construction of the rangetable list) parent partitions must have lower
- * RT indexes than their children, we can distinguish the topmost parent
- * as being the lowest set bit in the Bitmapset.
- *
- * 'partrelids' contains the RT indexes of a parent partitioned rel, and
- * possibly some non-leaf children, that are newly identified as parents of
- * some subpath rel passed to make_partition_pruneinfo(). These are added
- * to an appropriate member of 'allpartrelids'.
- *
- * Note that the list contains only RT indexes of partitioned tables that
- * are parents of some scan-level relation appearing in the 'subpaths' that
- * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
- * not allowed to be higher than the 'parentrel' associated with the append
- * path. In this way, we avoid expending cycles on partitioned rels that
- * can't contribute useful pruning information for the problem at hand.
- * (It is possible for 'parentrel' to be a child partitioned table, and it
- * is also possible for scan-level relations to be child partitioned tables
- * rather than leaf partitions. Hence we must construct this relation set
- * with reference to the particular append path we're dealing with, rather
- * than looking at the full partitioning structure represented in the
- * RelOptInfos.)
- */
-static List *
-add_part_relids(List *allpartrelids, Bitmapset *partrelids)
-{
- Index targetpart;
- ListCell *lc;
-
- /* We can easily get the lowest set bit this way: */
- targetpart = bms_next_member(partrelids, -1);
- Assert(targetpart > 0);
-
- /* Look for a matching topmost parent */
- foreach(lc, allpartrelids)
- {
- Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
- Index currtarget = bms_next_member(currpartrelids, -1);
-
- if (targetpart == currtarget)
- {
- /* Found a match, so add any new RT indexes to this hierarchy */
- currpartrelids = bms_add_members(currpartrelids, partrelids);
- lfirst(lc) = currpartrelids;
- return allpartrelids;
- }
- }
- /* No match, so add the new partition hierarchy to the list */
- return lappend(allpartrelids, partrelids);
-}
-
/*
* make_partitionedrel_pruneinfo
* Build a List of PartitionedRelPruneInfos, one for each interesting
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 1b787fe031..7a5f3ba625 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -267,6 +267,13 @@ typedef struct Append
List *appendplans;
int nasyncplans; /* # of asynchronous plans */
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * appendplans. The list contains a bitmapset for every partition tree
+ * covered by this Append.
+ */
+ List *allpartrelids;
+
/*
* All 'appendplans' preceding this index are non-partial plans. All
* 'appendplans' from this index onwards are partial plans.
@@ -291,6 +298,13 @@ typedef struct MergeAppend
List *mergeplans;
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * mergeplans. The list contains a bitmapset for every partition tree
+ * covered by this MergeAppend.
+ */
+ List *allpartrelids;
+
/* these fields are just like the sort-key info in struct Sort: */
/* number of sort-key columns */
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index a05f91f77d..1621a7319a 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -46,5 +46,8 @@ extern void add_row_identity_columns(PlannerInfo *root, Index rtindex,
RangeTblEntry *target_rte,
Relation target_relation);
extern void distribute_row_identity_vars(PlannerInfo *root);
+extern List *add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids);
#endif /* APPENDINFO_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 8636e04e37..caa774a111 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -73,7 +73,8 @@ typedef struct PartitionPruneContext
extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ List *allpartrelids);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v47-0006-Set-inFromCl-to-false-in-child-table-RTEs.patchapplication/octet-stream; name=v47-0006-Set-inFromCl-to-false-in-child-table-RTEs.patchDownload
From c24beb3f1a74a9d3d485b85ea2a3026f71674aaa Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:54:11 +0900
Subject: [PATCH v47 6/8] Set inFromCl to false in child table RTEs
This is to allow the executor be able to distinguish tables that are
directly mentioned in the query from those that get added to the
query during planning. A subsequent commit will teach the executor
to lock only the tables of the latter kind when executing a cached
plan.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.comk
---
src/backend/optimizer/util/inherit.c | 6 ++++++
src/backend/parser/analyze.c | 7 +++----
src/include/nodes/parsenodes.h | 9 +++++++--
3 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 94de855a22..9bac07bf40 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -492,6 +492,12 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
}
else
childrte->inh = false;
+ /*
+ * Mark child tables as not being directly mentioned in the query. This
+ * allows the executor's ExecGetRangeTableRelation() to conveniently
+ * identify it as an inheritance child table.
+ */
+ childrte->inFromCl = false;
childrte->securityQuals = NIL;
/*
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 7a1dfb6364..cf269f8c53 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -3305,10 +3305,9 @@ transformLockingClause(ParseState *pstate, Query *qry, LockingClause *lc,
/*
* Lock all regular tables used in query and its subqueries. We
* examine inFromCl to exclude auto-added RTEs, particularly NEW/OLD
- * in rules. This is a bit of an abuse of a mostly-obsolete flag, but
- * it's convenient. We can't rely on the namespace mechanism that has
- * largely replaced inFromCl, since for example we need to lock
- * base-relation RTEs even if they are masked by upper joins.
+ * in rules. We can't rely on the namespace mechanism since for
+ * example we need to lock base-relation RTEs even if they are masked
+ * by upper joins.
*/
i = 0;
foreach(rt, qry->rtable)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index fef4c714b8..d875e11192 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -994,11 +994,16 @@ typedef struct PartitionCmd
*
* inFromCl marks those range variables that are listed in the FROM clause.
* It's false for RTEs that are added to a query behind the scenes, such
- * as the NEW and OLD variables for a rule, or the subqueries of a UNION.
+ * as the NEW and OLD variables for a rule, or the subqueries of a UNION,
+ * or the RTEs of inheritance child tables that are added by the planner.
* This flag is not used during parsing (except in transformLockingClause,
* q.v.); the parser now uses a separate "namespace" data structure to
* control visibility. But it is needed by ruleutils.c to determine
- * whether RTEs should be shown in decompiled queries.
+ * whether RTEs should be shown in decompiled queries. It is used by the
+ * executor to determine that a given RTE_RELATION entry belongs to a table
+ * directly mentioned in the query or to a child table added by the planner.
+ * It needs to know that for the case where the child tables in a plan need
+ * to be locked.
*
* securityQuals is a list of security barrier quals (boolean expressions),
* to be tested in the listed order before returning a row from the
--
2.35.3
v47-0008-Track-opened-range-table-relations-in-a-List-in-.patchapplication/octet-stream; name=v47-0008-Track-opened-range-table-relations-in-a-List-in-.patchDownload
From f28f8e58d3fd4120e4710f5aa9fe4060d064df22 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:54:19 +0900
Subject: [PATCH v47 8/8] Track opened range table relations in a List in
EState
This makes ExecCloseRangeTableRelations faster when there are many
relations in the range table but only a few are opened during
execution, such as when run-time pruning kicks in on an Append
containing thousands of partition subplans.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 9 +++++----
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 2 ++
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 9a3f6c5978..9d88cf30cb 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1650,12 +1650,13 @@ ExecCloseResultRelations(EState *estate)
void
ExecCloseRangeTableRelations(EState *estate)
{
- int i;
+ ListCell *lc;
- for (i = 0; i < estate->es_range_table_size; i++)
+ foreach(lc, estate->es_opened_relations)
{
- if (estate->es_relations[i])
- table_close(estate->es_relations[i], NoLock);
+ Relation rel = lfirst(lc);
+
+ table_close(rel, NoLock);
}
}
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 94c8e5e875..3d1d467807 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -812,6 +812,8 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
}
estate->es_relations[rti - 1] = rel;
+ estate->es_opened_relations = lappend(estate->es_opened_relations,
+ rel);
}
return rel;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0922be6678..fba1527792 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,8 @@ typedef struct EState
Index es_range_table_size; /* size of the range table arrays */
Relation *es_relations; /* Array of per-range-table-entry Relation
* pointers, or NULL if not yet opened */
+ List *es_opened_relations; /* List of non-NULL entries in
+ * es_relations in no specific order */
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
--
2.35.3
v47-0007-Delay-locking-of-child-tables-in-cached-plans-un.patchapplication/octet-stream; name=v47-0007-Delay-locking-of-child-tables-in-cached-plans-un.patchDownload
From 3acf239d7d3e4ca1db1aab2b258590ccd6cda87b Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:54:15 +0900
Subject: [PATCH v47 7/8] Delay locking of child tables in cached plans until
ExecutorStart()
Currently, GetCachedPlan() takes a lock on all relations contained in
a cached plan before returning it as a valid plan to its callers for
execution. One disadvantage is that if the plan contains partitions
that are prunable with conditions involving EXTERN parameters and
other stable expressions (known as "initial pruning"), many of them
would be locked unnecessarily, because only those that survive
initial pruning need to have been locked. Locking all partitions this
way causes significant delay when there are many partitions. Note
that initial pruning occurs during executor's initialization of the
plan, that is, ExecInitNode().
This commit rearranges things to move the locking of child tables
referenced in a cached plan to occur during ExecInitNode() so that
initial pruning in the ExecInitNode() subroutines of the plan nodes
that support pruning can eliminate any child tables that need not be
scanned and thus locked.
To determine that a given table is a child table,
ExecGetRangeTableRelation() now looks at the RTE's inFromCl field,
which is only true for tables that are directly mentioned in the
query but false for child tables. Note that any tables whose RTEs'
inFromCl is true would already have been locked by GetCachedPlan(),
so need not be locked again during execution.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/commands/copyto.c | 3 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 8 +-
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 2 +-
src/backend/executor/README | 36 +++-
src/backend/executor/execMain.c | 18 +-
src/backend/executor/execParallel.c | 9 +-
src/backend/executor/execPartition.c | 10 ++
src/backend/executor/execUtils.c | 61 +++++--
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeAppend.c | 19 +++
src/backend/executor/nodeMergeAppend.c | 19 +++
src/backend/executor/spi.c | 1 +
src/backend/storage/lmgr/lmgr.c | 45 +++++
src/backend/tcop/pquery.c | 7 +-
src/backend/utils/cache/lsyscache.c | 21 +++
src/backend/utils/cache/plancache.c | 154 +++++++----------
src/include/commands/explain.h | 2 +-
src/include/executor/execdesc.h | 4 +
src/include/executor/executor.h | 1 +
src/include/storage/lmgr.h | 1 +
src/include/utils/lsyscache.h | 1 +
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 67 +++++++-
.../expected/cached-plan-replan.out | 156 ++++++++++++++++++
.../specs/cached-plan-replan.spec | 61 +++++++
28 files changed, 586 insertions(+), 129 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-replan.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-replan.spec
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index a45489f8f5..ab8bf0df72 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,8 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL,
+ pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 167db4cf56..e5cce4c07c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index fe9314bc96..6171a20fe2 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -416,7 +416,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
- queryDesc = ExplainQueryDesc(plan, queryString, into, es,
+ queryDesc = ExplainQueryDesc(plan, NULL, queryString, into, es,
params, queryEnv);
Assert(queryDesc);
@@ -429,9 +429,11 @@ ExplainOneQuery(Query *query, int cursorOptions,
/*
* ExplainQueryDesc
* Set up QueryDesc for EXPLAINing a given plan
+ *
+ * This returns NULL if cplan is found to be no longer valid.
*/
QueryDesc *
-ExplainQueryDesc(PlannedStmt *stmt,
+ExplainQueryDesc(PlannedStmt *stmt, CachedPlan *cplan,
const char *queryString, IntoClause *into, ExplainState *es,
ParamListInfo params, QueryEnvironment *queryEnv)
{
@@ -467,7 +469,7 @@ ExplainQueryDesc(PlannedStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(stmt, queryString,
+ queryDesc = CreateQueryDesc(stmt, cplan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index b702a65e81..93a683e312 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -797,6 +797,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 7124994a43..38795ce7ca 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index bcdf56fe32..f8d0b0ee25 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -650,7 +650,7 @@ replan:
{
QueryDesc *queryDesc;
- queryDesc = ExplainQueryDesc(pstmt, queryString,
+ queryDesc = ExplainQueryDesc(pstmt, cplan, queryString,
into, es, paramLI, queryEnv);
if (queryDesc == NULL)
{
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..6d2240610d 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,34 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Typically, when the executor initializes a plan tree for execution, it doesn't
+lock non-index relations if the plan tree is freshly generated and not derived
+from a CachedPlan. This is because such locks have already been established
+during the query's parsing, rewriting, and planning phases. However, with a
+cached plan tree, there can be relations that remain unlocked. The function
+GetCachedPlan() locks relations existing in the query's range table pre-planning
+but doesn't account for those added during the planning phase. Consequently,
+inheritance child tables, introduced to the query's range table during planning,
+won't be locked when the cached plan reaches the executor.
+
+The decision to defer locking child tables with GetCachedPlan() arises from the
+fact that not all might be accessed during plan execution. For instance, if
+child tables are partitions, some might be omitted due to pruning at
+execution-initialization-time. Thus, the responsibility of locking these child
+tables is pushed to execution-initialization-time, taking place in ExecInitNode()
+for plan nodes encompassing these tables.
+
+This approach opens a window where a cached plan tree with child tables could
+become outdated if another backend modifies these tables before ExecInitNode()
+locks them. Given this, the executor has the added duty to confirm the plan
+tree's validity whenever it locks a child table post execution-initialization-
+pruning. This validation is done by checking the CachedPlan.is_valid attribute
+of the CachedPlan provided. If the plan tree is outdated (is_valid=false), the
+executor halts any further initialization and alerts the caller that they should
+retry execution with another freshly created plan tree.
Query Processing Control Flow
-----------------------------
@@ -316,7 +344,13 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale during one of the recursive calls of ExecInitNode() after taking a
+lock on a child table, the control is immmediately returned to the caller of
+ExecutorStart(), which must redo the steps from CreateQueryDesc with a new
+plan tree.
+
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 383ebee008..9a3f6c5978 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -642,6 +642,17 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
RTEPermissionInfo *perminfo = lfirst_node(RTEPermissionInfo, l);
Assert(OidIsValid(perminfo->relid));
+
+ /*
+ * Relations whose permissions need to be checked must already have
+ * been locked by the parser or by GetCachedPlan() if a cached plan is
+ * being executed.
+ *
+ * XXX shouldn't we skip calling ExecCheckPermissions from InitPlan
+ * in a parallel worker?
+ */
+ Assert(CheckRelLockedByMe(perminfo->relid, AccessShareLock, true) ||
+ IsParallelWorker());
result = ExecCheckOneRelPerms(perminfo);
if (!result)
{
@@ -880,7 +891,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
- estate->es_cachedplan = NULL;
+ estate->es_cachedplan = queryDesc->cplan;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
@@ -1465,7 +1476,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked by the planner or ExecLockAppendNonLeafRelations().
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
@@ -2897,7 +2908,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f84a3a17d5..209f618a07 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1248,8 +1248,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Set up a QueryDesc for the query. While the leader might've sourced
+ * the plan tree from a CachedPlan, we don't have one here. This isn't
+ * an issue since the leader ensured the required locks, making our
+ * plan tree valid. Even as we get our own lock copies in
+ * ExecGetRangeTableRelation(), they're all already held by the leader.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index e88455368c..cf73d28baa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -513,6 +513,13 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
oldcxt = MemoryContextSwitchTo(proute->memcxt);
+ /*
+ * Note that while we normally check ExecPlanStillValid(estate) after each
+ * lock taken during execution initialization, it is fine not do so for
+ * partitions opened here, for tuple routing. Locks taken here can't
+ * possibly invalidate the plan given that the plan doesn't contain any
+ * info about those partitions.
+ */
partrel = table_open(partOid, RowExclusiveLock);
leaf_part_rri = makeNode(ResultRelInfo);
@@ -1111,6 +1118,9 @@ ExecInitPartitionDispatchInfo(EState *estate,
* Only sub-partitioned tables need to be locked here. The root
* partitioned table will already have been locked as it's referenced in
* the query's rtable.
+ *
+ * See the comment in ExecInitPartitionInfo() about taking locks and
+ * not checking ExecPlanStillValid(estate) here.
*/
if (partoid != RelationGetRelid(proute->partition_root))
rel = table_open(partoid, RowExclusiveLock);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index da8a1511ac..94c8e5e875 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -779,7 +779,25 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (IsParallelWorker() ||
+ (estate->es_cachedplan != NULL && !rte->inFromCl))
+ {
+ /*
+ * Take a lock if we are a parallel worker or if this is a child
+ * table referenced in a cached plan.
+ *
+ * Parallel workers need to have their own local lock on the
+ * relation. This ensures sane behavior in case the parent process
+ * exits before we do.
+ *
+ * When executing a cached plan, child tables must be locked
+ * here, because plancache.c (GetCachedPlan()) would only have
+ * locked tables mentioned in the query, that is, tables whose
+ * RTEs' inFromCl is true.
+ */
+ rel = table_open(rte->relid, rte->rellockmode);
+ }
+ else
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -792,15 +810,6 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rellockmode == AccessShareLock ||
CheckRelationLockedByMe(rel, rte->rellockmode, false));
}
- else
- {
- /*
- * If we are a parallel worker, we need to obtain our own local
- * lock on the relation. This ensures sane behavior in case the
- * parent process exits before we do.
- */
- rel = table_open(rte->relid, rte->rellockmode);
- }
estate->es_relations[rti - 1] = rel;
}
@@ -808,6 +817,38 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
return rel;
}
+/*
+ * ExecLockAppendNonLeafRelations
+ * Lock non-leaf relations whose children are scanned by a given
+ * Append/MergeAppend node
+ */
+void
+ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids)
+{
+ ListCell *l;
+
+ /* This should get called only when executing cached plans. */
+ Assert(estate->es_cachedplan != NULL);
+ foreach(l, allpartrelids)
+ {
+ Bitmapset *partrelids = lfirst_node(Bitmapset, l);
+ int i;
+
+ /*
+ * Note that we don't lock the first member (i=0) of each bitmapset
+ * because it stands for the root parent mentioned in the query that
+ * should always have been locked before entering the executor.
+ */
+ i = 0;
+ while ((i = bms_next_member(partrelids, i)) > 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(i, estate);
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+}
+
/*
* ExecInitResultRelation
* Open relation given by the passed-in RT index and fill its
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 8cf0b3132d..4ddf4fd7a9 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -838,6 +838,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL, /* fmgr_sql() doesn't use CachedPlans */
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 588f5388c7..20330c5c58 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -133,6 +133,25 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_syncdone = false;
appendstate->as_begun = false;
+ /*
+ * Lock non-leaf partitions whose leaf children are present in
+ * node->appendplans. Only need to do so if executing a cached
+ * plan, because child tables present in cached plans are not
+ * locked before execution.
+ *
+ * XXX - some of the non-leaf partitions may also be mentioned in
+ * part_prune_info, which if they are would get locked again in
+ * ExecInitPartitionPruning() because it calls
+ * ExecGetRangeTableRelation() which locks child tables.
+ */
+ if (estate->es_cachedplan)
+ {
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
+ }
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index c9d406c230..a8f9157192 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -81,6 +81,25 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.state = estate;
mergestate->ps.ExecProcNode = ExecMergeAppend;
+ /*
+ * Lock non-leaf partitions whose leaf children are present in
+ * node->mergeplans. Only need to do so if executing a cached
+ * plan, because child tables present in cached plans are not
+ * locked before execution.
+ *
+ * XXX - some of the non-leaf partitions may also be mentioned in
+ * part_prune_info, which if they are would get locked again in
+ * ExecInitPartitionPruning() because it calls
+ * ExecGetRangeTableRelation() which locks child tables.
+ */
+ if (estate->es_cachedplan)
+ {
+ ExecLockAppendNonLeafRelations(estate, node->allpartrelids);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
+ }
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 6a96d7fc22..9c4ed74240 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2680,6 +2680,7 @@ replan:
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a672..c807e9cdcc 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -27,6 +27,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
@@ -364,6 +365,50 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
return false;
}
+/*
+ * CheckRelLockedByMe
+ *
+ * Returns true if current transaction holds a lock on the given relation of
+ * mode 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
+ */
+bool
+CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+ Oid dbId = get_rel_relisshared(relid) ? InvalidOid : MyDatabaseId;
+ LOCKTAG tag;
+
+ SET_LOCKTAG_RELATION(tag, dbId, relid);
+
+ if (LockHeldByMe(&tag, lockmode))
+ return true;
+
+ if (orstronger)
+ {
+ LOCKMODE slockmode;
+
+ for (slockmode = lockmode + 1;
+ slockmode <= MaxLockMode;
+ slockmode++)
+ {
+ if (LockHeldByMe(&tag, slockmode))
+ {
+#ifdef NOT_USED
+ /* Sometimes this might be useful for debugging purposes */
+ elog(WARNING, "lock mode %s substituted for %s on relation %s",
+ GetLockmodeName(tag.locktag_lockmethodid, slockmode),
+ GetLockmodeName(tag.locktag_lockmethodid, lockmode),
+ RelationGetRelationName(relation));
+#endif
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
/*
* LockHasWaitersRelation
*
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 9a96b77f1e..48cd6f4304 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -60,6 +60,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -72,6 +73,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan, if plan is from one */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -410,6 +412,7 @@ PortalStart(Portal portal, ParamListInfo params,
* set the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -440,6 +443,7 @@ PortalStart(Portal portal, ParamListInfo params,
*/
if (!ExecutorStart(queryDesc, myeflags))
{
+ Assert(queryDesc->cplan);
ExecutorEnd(queryDesc);
FreeQueryDesc(queryDesc);
PopActiveSnapshot();
@@ -538,7 +542,7 @@ PortalStart(Portal portal, ParamListInfo params,
* Create the QueryDesc. DestReceiver will be set in
* PortalRunMulti() before calling ExecutorRun().
*/
- queryDesc = CreateQueryDesc(plan,
+ queryDesc = CreateQueryDesc(plan, portal->cplan,
portal->sourceText,
!is_utility ?
GetActiveSnapshot() :
@@ -562,6 +566,7 @@ PortalStart(Portal portal, ParamListInfo params,
if (!ExecutorStart(queryDesc, myeflags))
{
PopActiveSnapshot();
+ Assert(queryDesc->cplan);
ExecutorEnd(queryDesc);
FreeQueryDesc(queryDesc);
plan_valid = false;
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index fc6d267e44..2725d02312 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2095,6 +2095,27 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * get_rel_relisshared
+ *
+ * Returns if the given relation is shared or not
+ */
+bool
+get_rel_relisshared(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relisshared;
+ ReleaseSysCache(tp);
+
+ return result;
+}
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 7d4168f82f..35d903cb98 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -104,13 +104,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -792,8 +792,13 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * If the plan includes child relations introduced by the planner, they
+ * wouldn't be locked yet. This is because AcquirePlannerLocks() only locks
+ * relations present in the original query's range table (before planner
+ * entry). Hence, the plan might become stale if child relations are modified
+ * concurrently. During the plan initialization, the executor must ensure the
+ * plan (CachedPlan) remains valid after locking each child table. If found
+ * invalid, the caller should be prompted to recreate the plan.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -807,60 +812,56 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
-
- return false;
+ return cplan->is_valid;
}
/*
@@ -1130,8 +1131,16 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * Typically, the plan returned by this function is valid. However, a caveat
+ * arises with inheritance/partition child tables. These aren't locked by
+ * this function, as we only lock tables directly mentioned in the original
+ * query here. The task of locking these child tables falls to the executor
+ * during plan tree setup. If acquiring these locks invalidates the plan, the
+ * executor should inform the caller to regenerate the plan by invoking this
+ * function again. The reason for this deferred child table locking mechanism
+ * is efficiency: not all might need to be locked. Some could be pruned during
+ * executor initialization, especially if their corresponding plan nodes
+ * facilitate partition pruning.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1166,7 +1175,10 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
{
if (CheckCachedPlan(plansource))
{
- /* We want a generic plan, and we already have a valid one */
+ /*
+ * We want a generic plan, and we already have a valid one, though
+ * see the header comment.
+ */
plan = plansource->gplan;
Assert(plan->magic == CACHEDPLAN_MAGIC);
}
@@ -1364,8 +1376,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if the executor would need to take additional locks, that is, in
+ * addition to those taken by AcquirePlannerLocks() on a given query.
*/
foreach(lc, plan->stmt_list)
{
@@ -1741,58 +1753,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 37554727ee..392abb5150 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -88,7 +88,7 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt, struct CachedPlan *cplan,
const char *queryString, IntoClause *into, ExplainState *es,
ParamListInfo params, QueryEnvironment *queryEnv);
extern void ExplainOnePlan(QueryDesc *queryDesc,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index af2bf36dfb..4b7368a0dc 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -32,9 +32,12 @@
*/
typedef struct QueryDesc
{
+ NodeTag type;
+
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ struct CachedPlan *cplan; /* CachedPlan, if plannedstmt is from one */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +60,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ struct CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 10c5cda169..eaa605e513 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -599,6 +599,7 @@ exec_rt_fetch(Index rti, EState *estate)
}
extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
+extern void ExecLockAppendNonLeafRelations(EState *estate, List *allpartrelids);
extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Index rti);
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 4ee91e3cf9..598bf2688a 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,7 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
bool orstronger);
+extern bool CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger);
extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index f5fdbfe116..a024e5dcd0 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -140,6 +140,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_relisshared(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..2fca84d027 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-replan
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7cd76eb34b..ce189156ad 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2023, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,45 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static bool
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ bool plan_valid;
+
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ plan_valid = prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ plan_valid ? "valid" : "not valid");
+
+ return plan_valid;
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +127,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-replan.out b/src/test/modules/delay_execution/expected/cached-plan-replan.out
new file mode 100644
index 0000000000..0ac6a17c2b
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-replan.out
@@ -0,0 +1,156 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo11 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = $1)
+(6 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-----------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo11 foo_1
+ Filter: (a = $1)
+(4 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------
+Bitmap Heap Scan on foo11 foo
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on foo11_a_idx
+ Index Cond: (a = 1)
+(4 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------
+Seq Scan on foo11 foo
+ Filter: (a = 1)
+(2 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+----------------------------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Index Only Scan using foo11_a_idx on foo11 t1
+ -> Materialize
+ -> Index Scan using foo11_a_idx on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(18 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Sort
+ Sort Key: t1.a
+ -> Seq Scan on foo11 t1
+ -> Sort
+ Sort Key: t2.a
+ -> Seq Scan on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(21 rows)
+
diff --git a/src/test/modules/delay_execution/specs/cached-plan-replan.spec b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
new file mode 100644
index 0000000000..3c92cbd5c6
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
@@ -0,0 +1,61 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo1 PARTITION OF foo FOR VALUES IN (1) PARTITION BY LIST (a);
+ CREATE TABLE foo11 PARTITION OF foo1 FOR VALUES IN (1);
+ CREATE INDEX foo11_a ON foo1 (a);
+ CREATE TABLE foo2 PARTITION OF foo FOR VALUES IN (2);
+ CREATE VIEW foov AS SELECT * FROM foo;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP TABLE foo;
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# no Append case (only one partition selected by the planner)
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Append with partition-wise join aggregate and join plans as child subplans
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo11_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
--
2.35.3
v47-0004-Adjustments-to-allow-ExecutorStart-to-sometimes-.patchapplication/octet-stream; name=v47-0004-Adjustments-to-allow-ExecutorStart-to-sometimes-.patchDownload
From f9ff9aa3ddc05ee948d721a05ec552d5e959c498 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:53:46 +0900
Subject: [PATCH v47 4/8] Adjustments to allow ExecutorStart() to sometimes
fail
Upon passing a plan tree from a CachedPlan to the executor, there's a
possibility that ExecutorStart() might return an incompletely set up
planstate tree. This can happen if the CachedPlan undergoes invalidation
during the ExecInitNode() initialization process. In such cases, the
execution should be reattempted using a fresh CachedPlan. Also, any
partially initialized EState must be cleaned up by invoking both
ExecutorEnd() and FreeExecutorState().
ExecutorStart() (and ExecutorStart_hook()) now return a Boolean telling
the caller if the plan initialization failed.
For the replan loop in that context, it makes more sense to have
ExecutorStart() either in the same scope or closer to where
GetCachedPlan() is invoked. So this commit modifies the following
sites:
* The ExecutorStart() call in ExplainOnePlan() is moved into a new
function ExplainQueryDesc() along with CreateQueryDesc(). Callers
of ExplainOnePlan() should now call the new function first.
* The ExecutorStart() call in _SPI_pquery() is moved to its caller
_SPI_execute_plan().
* The ExecutorStart() call in PortalRunMulti() is moved to
PortalStart(). This requires a new List field in PortalData to
store the QueryDescs created in PortalStart() and a new memory
context for those. One unintended consequence is that
CommandCounterIncrement() between queries in the PORTAL_MULTI_QUERY
case is now done in the loop in PortalStart() and not in
PortalRunMulti(). That still works because the Snapshot registered
in QueryDesc/EState is updated to account for the CCI().
This commit also adds a new flag to EState called es_canceled that
complements es_finished to denote the new scenario where
ExecutorStart() returns with a partially setup planstate tree. Also,
to reset the AFTER trigger state that would have been set up in the
ExecutorStart(), this adds a new function AfterTriggerCancelQuery()
which is called from ExecutorEnd() (not ExecutorFinish()) when
es_canceled is true.
Note that this commit by itself doesn't make any functional change,
because the CachedPlan is not passed into the executor yet.
---
contrib/auto_explain/auto_explain.c | 12 +-
.../pg_stat_statements/pg_stat_statements.c | 12 +-
src/backend/commands/copyto.c | 4 +-
src/backend/commands/createas.c | 8 +-
src/backend/commands/explain.c | 142 ++++---
src/backend/commands/extension.c | 3 +-
src/backend/commands/matview.c | 8 +-
src/backend/commands/portalcmds.c | 5 +-
src/backend/commands/prepare.c | 31 +-
src/backend/commands/trigger.c | 13 +
src/backend/executor/execMain.c | 57 ++-
src/backend/executor/execParallel.c | 3 +-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 4 +-
src/backend/executor/spi.c | 48 ++-
src/backend/tcop/postgres.c | 18 +-
src/backend/tcop/pquery.c | 345 +++++++++---------
src/backend/utils/mmgr/portalmem.c | 9 +
src/include/commands/explain.h | 7 +-
src/include/commands/trigger.h | 1 +
src/include/executor/executor.h | 6 +-
src/include/nodes/execnodes.h | 3 +
src/include/tcop/pquery.h | 2 +-
src/include/utils/portal.h | 2 +
24 files changed, 460 insertions(+), 284 deletions(-)
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index c3ac27ae99..a0630d7944 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -78,7 +78,7 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
-static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void explain_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count, bool execute_once);
@@ -258,9 +258,11 @@ _PG_init(void)
/*
* ExecutorStart hook: start up logging if needed
*/
-static void
+static bool
explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
/*
* At the beginning of each top-level statement, decide whether we'll
* sample this statement. If nested-statement explaining is enabled,
@@ -296,9 +298,9 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
}
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
if (auto_explain_enabled())
{
@@ -316,6 +318,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return plan_valid;
}
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 06b65aeef5..5354dff7d7 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -324,7 +324,7 @@ static PlannedStmt *pgss_planner(Query *parse,
const char *query_string,
int cursorOptions,
ParamListInfo boundParams);
-static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void pgss_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count, bool execute_once);
@@ -961,13 +961,15 @@ pgss_planner(Query *parse,
/*
* ExecutorStart hook: start up tracking if needed
*/
-static void
+static bool
pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
/*
* If query has queryId zero, don't track it. This prevents double
@@ -990,6 +992,8 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return plan_valid;
}
/*
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index eaa3172793..a45489f8f5 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -567,8 +567,10 @@ BeginCopyTo(ParseState *pstate,
* Call ExecutorStart to prepare the plan for execution.
*
* ExecutorStart computes a result tupdesc for us
+ *
+ * OK to ignore the return value; plan can't become invalid.
*/
- ExecutorStart(cstate->queryDesc, 0);
+ (void) ExecutorStart(cstate->queryDesc, 0);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index e91920ca14..167db4cf56 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -329,8 +329,12 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * call ExecutorStart to prepare the plan for execution
+ *
+ * OK to ignore the return value; plan can't become invalid.
+ */
+ (void) ExecutorStart(queryDesc, GetIntoRelEFlags(into));
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..fe9314bc96 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -393,6 +393,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
else
{
PlannedStmt *plan;
+ QueryDesc *queryDesc;
instr_time planstart,
planduration;
BufferUsage bufusage_start,
@@ -415,12 +416,87 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
+ queryDesc = ExplainQueryDesc(plan, queryString, into, es,
+ params, queryEnv);
+ Assert(queryDesc);
+
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(queryDesc, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
+/*
+ * ExplainQueryDesc
+ * Set up QueryDesc for EXPLAINing a given plan
+ */
+QueryDesc *
+ExplainQueryDesc(PlannedStmt *stmt,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv)
+{
+ QueryDesc *queryDesc;
+ DestReceiver *dest;
+ int eflags;
+ int instrument_option = 0;
+
+ /*
+ * Normally we discard the query's output, but if explaining CREATE TABLE
+ * AS, we'd better use the appropriate tuple receiver.
+ */
+ if (into)
+ dest = CreateIntoRelDestReceiver(into);
+ else
+ dest = None_Receiver;
+
+ if (es->analyze && es->timing)
+ instrument_option |= INSTRUMENT_TIMER;
+ else if (es->analyze)
+ instrument_option |= INSTRUMENT_ROWS;
+
+ if (es->buffers)
+ instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
+
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+
+ /* Create a QueryDesc for the query */
+ queryDesc = CreateQueryDesc(stmt, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv, instrument_option);
+
+ /* Select execution options */
+ if (es->analyze)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_EXPLAIN_ONLY;
+ if (es->generic)
+ eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
+ if (into)
+ eflags |= GetIntoRelEFlags(into);
+
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated during plan intialization.
+ */
+ if (!ExecutorStart(queryDesc, eflags))
+ {
+ /* Clean up. */
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ return NULL;
+ }
+
+ return queryDesc;
+}
+
/*
* ExplainOneUtility -
* print out the execution plan for one utility statement
@@ -524,29 +600,16 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
{
- DestReceiver *dest;
- QueryDesc *queryDesc;
instr_time starttime;
double totaltime = 0;
- int eflags;
- int instrument_option = 0;
-
- Assert(plannedstmt->commandType != CMD_UTILITY);
- if (es->analyze && es->timing)
- instrument_option |= INSTRUMENT_TIMER;
- else if (es->analyze)
- instrument_option |= INSTRUMENT_ROWS;
-
- if (es->buffers)
- instrument_option |= INSTRUMENT_BUFFERS;
- if (es->wal)
- instrument_option |= INSTRUMENT_WAL;
+ Assert(queryDesc->plannedstmt->commandType != CMD_UTILITY);
/*
* We always collect timing for the entire statement, even when node-level
@@ -555,40 +618,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
- /*
- * Normally we discard the query's output, but if explaining CREATE TABLE
- * AS, we'd better use the appropriate tuple receiver.
- */
- if (into)
- dest = CreateIntoRelDestReceiver(into);
- else
- dest = None_Receiver;
-
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
-
- /* Select execution options */
- if (es->analyze)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_EXPLAIN_ONLY;
- if (es->generic)
- eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
- if (into)
- eflags |= GetIntoRelEFlags(into);
-
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
-
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -4865,6 +4894,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 535072d181..b702a65e81 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -801,7 +801,8 @@ execute_sql_string(const char *sql)
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
- ExecutorStart(qdesc, 0);
+ /* OK to ignore the return value; plan can't become invalid. */
+ (void) ExecutorStart(qdesc, 0);
ExecutorRun(qdesc, ForwardScanDirection, 0, true);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ac2e74fa3f..7124994a43 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -412,8 +412,12 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, 0);
+ /*
+ * call ExecutorStart to prepare the plan for execution
+ *
+ * OK to ignore the return value; plan can't become invalid.
+ */
+ (void) ExecutorStart(queryDesc, 0);
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 73ed7aa2f0..5120f93414 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -142,9 +142,10 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
/*
* Start execution, inserting parameters if any.
+ *
+ * OK to ignore the return value; plan can't become invalid here.
*/
- PortalStart(portal, params, 0, GetActiveSnapshot());
-
+ (void) PortalStart(portal, params, 0, GetActiveSnapshot());
Assert(portal->strategy == PORTAL_ONE_SELECT);
/*
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..bcdf56fe32 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -183,6 +183,7 @@ ExecuteQuery(ParseState *pstate,
paramLI = EvaluateParams(pstate, entry, stmt->params, estate);
}
+replan:
/* Create a new portal to run the query in */
portal = CreateNewPortal();
/* Don't display the portal in pg_cursors, it is for internal use only */
@@ -251,9 +252,15 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal has a cached plan and
+ * it's found to be invalidated during the initialization of its plan
+ * trees, the plan must be regenerated.
*/
- PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ if (!PortalStart(portal, paramLI, eflags, GetActiveSnapshot()))
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
(void) PortalRun(portal, count, false, true, dest, dest, qc);
@@ -574,7 +581,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -618,6 +625,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI,
CurrentResourceOwner, queryEnv);
@@ -639,8 +647,21 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ {
+ QueryDesc *queryDesc;
+
+ queryDesc = ExplainQueryDesc(pstmt, queryString,
+ into, es, paramLI, queryEnv);
+ if (queryDesc == NULL)
+ {
+ ExplainResetOutput(es);
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ goto replan;
+ }
+ ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
+ }
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 52177759ab..dd139432b9 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5009,6 +5009,19 @@ AfterTriggerBeginQuery(void)
afterTriggers.query_depth++;
}
+/* ----------
+ * AfterTriggerCancelQuery()
+ *
+ * Called from ExecutorEnd() if the query execution was canceled.
+ * ----------
+ */
+void
+AfterTriggerCancelQuery(void)
+{
+ /* Set to a value denoting that no query is active. */
+ afterTriggers.query_depth = -1;
+}
+
/* ----------
* AfterTriggerEndQuery()
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 66f1b7398d..383ebee008 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -79,7 +79,7 @@ ExecutorEnd_hook_type ExecutorEnd_hook = NULL;
ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
/* decls for local routines only used within this module */
-static void InitPlan(QueryDesc *queryDesc, int eflags);
+static bool InitPlan(QueryDesc *queryDesc, int eflags);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
static void ExecEndPlan(PlanState *planstate, EState *estate);
@@ -119,6 +119,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
*
* eflags contains flag bits as described in executor.h.
*
+ * Plan initialization may fail if the input plan tree is found to have been
+ * invalidated, which can happen if it comes from a CachedPlan.
+ *
+ * Returns true if plan was successfully initialized and false otherwise. If
+ * the latter, the caller must call ExecutorEnd() on 'queryDesc' to clean up
+ * after failed plan initialization.
+ *
* NB: the CurrentMemoryContext when this is called will become the parent
* of the per-query context used for this Executor invocation.
*
@@ -128,7 +135,7 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
*
* ----------------------------------------------------------------
*/
-void
+bool
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
/*
@@ -140,14 +147,15 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
if (ExecutorStart_hook)
- (*ExecutorStart_hook) (queryDesc, eflags);
- else
- standard_ExecutorStart(queryDesc, eflags);
+ return (*ExecutorStart_hook) (queryDesc, eflags);
+
+ return standard_ExecutorStart(queryDesc, eflags);
}
-void
+bool
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
EState *estate;
MemoryContext oldcontext;
@@ -263,9 +271,14 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Initialize the plan state tree
*/
- InitPlan(queryDesc, eflags);
+ plan_valid = InitPlan(queryDesc, eflags);
+
+ /* Mark execution as canceled if plan won't be executed. */
+ estate->es_canceled = !plan_valid;
MemoryContextSwitchTo(oldcontext);
+
+ return plan_valid;
}
/* ----------------------------------------------------------------
@@ -325,6 +338,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
estate = queryDesc->estate;
Assert(estate != NULL);
+ Assert(!estate->es_canceled);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -429,7 +443,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* This should be run once and only once per Executor instance */
- Assert(!estate->es_finished);
+ Assert(!estate->es_finished && !estate->es_canceled);
/* Switch into per-query memory context */
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -488,11 +502,11 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
Assert(estate != NULL);
/*
- * Check that ExecutorFinish was called, unless in EXPLAIN-only mode. This
- * Assert is needed because ExecutorFinish is new as of 9.1, and callers
- * might forget to call it.
+ * Check that ExecutorFinish was called, unless in EXPLAIN-only mode or if
+ * execution was canceled. This Assert is needed because ExecutorFinish is
+ * new as of 9.1, and callers might forget to call it.
*/
- Assert(estate->es_finished ||
+ Assert(estate->es_finished || estate->es_canceled ||
(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -506,6 +520,14 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
UnregisterSnapshot(estate->es_snapshot);
UnregisterSnapshot(estate->es_crosscheck_snapshot);
+ /*
+ * Cancel trigger execution too if the query execution was canceled.
+ */
+ if (estate->es_canceled &&
+ !(estate->es_top_eflags &
+ (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
+ AfterTriggerCancelQuery();
+
/*
* Must switch out of context before destroying it
*/
@@ -829,9 +851,12 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ * Returns true if the plan tree is successfully initialized for execution,
+ * false otherwise.
* ----------------------------------------------------------------
*/
-static void
+static bool
InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
@@ -1014,9 +1039,15 @@ InitPlan(QueryDesc *queryDesc, int eflags)
}
}
+ queryDesc->tupDesc = tupType;
+ Assert(planstate != NULL);
+ queryDesc->planstate = planstate;
+ return true;
+
plan_init_suspended:
queryDesc->tupDesc = tupType;
queryDesc->planstate = planstate;
+ return false;
}
/*
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index cc2b8ccab7..f84a3a17d5 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1430,7 +1430,8 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
- ExecutorStart(queryDesc, fpes->eflags);
+ /* OK to ignore the return value; plan can't become invalid. */
+ (void) ExecutorStart(queryDesc, fpes->eflags);
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c3f7279b06..da8a1511ac 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -151,6 +151,7 @@ CreateExecutorState(void)
estate->es_top_eflags = 0;
estate->es_instrument = 0;
estate->es_finished = false;
+ estate->es_canceled = false;
estate->es_exprcontexts = NIL;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f55424eb5a..8cf0b3132d 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -862,7 +862,9 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
eflags = EXEC_FLAG_SKIP_TRIGGERS;
else
eflags = 0; /* default run-to-completion flags */
- ExecutorStart(es->qd, eflags);
+
+ /* OK to ignore the return value; plan can't become invalid. */
+ (void) ExecutorStart(es->qd, eflags);
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 33975687b3..6a96d7fc22 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -71,7 +71,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1582,6 +1582,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
Snapshot snapshot;
MemoryContext oldcontext;
Portal portal;
+ bool plan_valid;
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -1623,6 +1624,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
_SPI_current->processed = 0;
_SPI_current->tuptable = NULL;
+replan:
/* Create the portal */
if (name == NULL || name[0] == '\0')
{
@@ -1766,15 +1768,23 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if the cached plan was found to have been invalidated when
+ * initializing one of the plan trees contained in it.
*/
- PortalStart(portal, paramLI, 0, snapshot);
+ plan_valid = PortalStart(portal, paramLI, 0, snapshot);
Assert(portal->strategy != PORTAL_MULTI_QUERY);
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
+ if (!plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/* Pop the SPI stack */
_SPI_end_call(true);
@@ -2552,6 +2562,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2661,6 +2672,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2674,8 +2686,23 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+
+ if (!ExecutorStart(qdesc, eflags))
+ {
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ Assert(cplan);
+ ReleaseCachedPlan(cplan, plan_owner);
+ goto replan;
+ }
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2850,10 +2877,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2897,14 +2923,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e4756f8be2..204002cff2 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1232,7 +1232,12 @@ exec_simple_query(const char *query_string)
/*
* Start the portal. No parameters here.
*/
- PortalStart(portal, NULL, 0, InvalidSnapshot);
+ {
+ bool plan_valid PG_USED_FOR_ASSERTS_ONLY;
+
+ plan_valid = PortalStart(portal, NULL, 0, InvalidSnapshot);
+ Assert(plan_valid);
+ }
/*
* Select the appropriate output format: text unless we are doing a
@@ -1737,6 +1742,7 @@ exec_bind_message(StringInfo input_message)
"commands ignored until end of transaction block"),
errdetail_abort()));
+replan:
/*
* Create the portal. Allow silent replacement of an existing portal only
* if the unnamed portal is specified.
@@ -2028,9 +2034,15 @@ exec_bind_message(StringInfo input_message)
PopActiveSnapshot();
/*
- * And we're ready to start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if the cached plan was found to have been invalidated when
+ * initializing one of the plan trees contained in it.
*/
- PortalStart(portal, params, 0, InvalidSnapshot);
+ if (!PortalStart(portal, params, 0, InvalidSnapshot))
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
/*
* Apply the result format requests to the portal.
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5565f200c3..9a96b77f1e 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -35,12 +36,6 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc);
static void FillPortalStore(Portal portal, bool isTopLevel);
static uint64 RunFromStore(Portal portal, ScanDirection direction, uint64 count,
DestReceiver *dest);
@@ -116,86 +111,6 @@ FreeQueryDesc(QueryDesc *qdesc)
}
-/*
- * ProcessQuery
- * Execute a single plannable query within a PORTAL_MULTI_QUERY,
- * PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
- *
- * plan: the plan tree for the query
- * sourceText: the source text of the query
- * params: any parameters needed
- * dest: where to send results
- * qc: where to store the command completion status data.
- *
- * qc may be NULL if caller doesn't want a status string.
- *
- * Must be called in a memory context that will be reset or deleted on
- * error; otherwise the executor's memory usage will be leaked.
- */
-static void
-ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc)
-{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
-
- /*
- * Run the plan to completion.
- */
- ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
-
- /*
- * Build command completion status data, if caller wants one.
- */
- if (qc)
- {
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
- SetQueryCompletion(qc, CMDTAG_SELECT, queryDesc->estate->es_processed);
- break;
- case CMD_INSERT:
- SetQueryCompletion(qc, CMDTAG_INSERT, queryDesc->estate->es_processed);
- break;
- case CMD_UPDATE:
- SetQueryCompletion(qc, CMDTAG_UPDATE, queryDesc->estate->es_processed);
- break;
- case CMD_DELETE:
- SetQueryCompletion(qc, CMDTAG_DELETE, queryDesc->estate->es_processed);
- break;
- case CMD_MERGE:
- SetQueryCompletion(qc, CMDTAG_MERGE, queryDesc->estate->es_processed);
- break;
- default:
- SetQueryCompletion(qc, CMDTAG_UNKNOWN, queryDesc->estate->es_processed);
- break;
- }
- }
-
- /*
- * Now, we close down all the scans and free allocated resources.
- */
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
-
- FreeQueryDesc(queryDesc);
-}
-
/*
* ChoosePortalStrategy
* Select portal execution strategy given the intended statement list.
@@ -426,19 +341,21 @@ FetchStatementTargetList(Node *stmt)
* presently ignored for non-PORTAL_ONE_SELECT portals (it's only intended
* to be used for cursors).
*
- * On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * True is returned if portal is ready to accept PortalRun() calls, and the
+ * result tupdesc (if any) is known. False if the plan tree is no longer
+ * valid, in which case, the caller must retry after generating a new
+ * CachedPlan.
*/
-void
+bool
PortalStart(Portal portal, ParamListInfo params,
int eflags, Snapshot snapshot)
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
- int myeflags;
+ int myeflags = 0;
+ bool plan_valid = true;
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_DEFINED);
@@ -448,15 +365,13 @@ PortalStart(Portal portal, ParamListInfo params,
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -472,6 +387,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -489,8 +406,8 @@ PortalStart(Portal portal, ParamListInfo params,
*/
/*
- * Create QueryDesc in portal's context; for the moment, set
- * the destination to DestNone.
+ * Create QueryDesc in portal->queryContext; for the moment,
+ * set the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
portal->sourceText,
@@ -501,30 +418,51 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti(). */
+ if (portal->strategy == PORTAL_ONE_RETURNING ||
+ portal->strategy == PORTAL_ONE_MOD_WITH)
+ portal->qdescs = list_make1(queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Call ExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated during plan intialization.
*/
- ExecutorStart(queryDesc, myeflags);
+ if (!ExecutorStart(queryDesc, myeflags))
+ {
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ plan_valid = false;
+ goto plan_init_failed;
+ }
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -536,29 +474,6 @@ PortalStart(Portal portal, ParamListInfo params,
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -581,7 +496,81 @@ PortalStart(Portal portal, ParamListInfo params,
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool first = true;
+
+ myeflags = eflags;
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /*
+ * Push the snapshot to be used by the executor.
+ */
+ if (!is_utility)
+ {
+ /*
+ * Must copy the snapshot for all statements
+ * except thec first as we'll need to update its
+ * command ID.
+ */
+ if (!first)
+ PushCopiedSnapshot(GetTransactionSnapshot());
+ else
+ PushActiveSnapshot(GetTransactionSnapshot());
+ }
+
+ /*
+ * From the 2nd statement onwards, update the command
+ * ID and the snapshot to match.
+ */
+ if (!first)
+ {
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+ }
+
+ first = false;
+
+ /*
+ * Create the QueryDesc. DestReceiver will be set in
+ * PortalRunMulti() before calling ExecutorRun().
+ */
+ queryDesc = CreateQueryDesc(plan,
+ portal->sourceText,
+ !is_utility ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ if (is_utility)
+ continue;
+
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated
+ * during plan intialization.
+ */
+ if (!ExecutorStart(queryDesc, myeflags))
+ {
+ PopActiveSnapshot();
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ plan_valid = false;
+ goto plan_init_failed;
+ }
+ PopActiveSnapshot();
+ }
+ }
+
portal->tupDesc = NULL;
break;
}
@@ -594,19 +583,20 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+plan_init_failed:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
- portal->status = PORTAL_READY;
+ return plan_valid;
}
/*
@@ -1193,7 +1183,7 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1204,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = (QueryDesc *) lfirst(qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1233,33 +1224,26 @@ PortalRunMulti(Portal portal,
if (log_executor_stats)
ResetUsage();
- /*
- * Must always have a snapshot for plannable queries. First time
- * through, take a new snapshot; for subsequent queries in the
- * same portal, just update the snapshot's copy of the command
- * counter.
- */
+ /* Push the snapshot for plannable queries. */
if (!active_snapshot_set)
{
- Snapshot snapshot = GetTransactionSnapshot();
+ Snapshot snapshot = qdesc->snapshot;
- /* If told to, register the snapshot and save in portal */
+ /*
+ * If told to, register the snapshot and save in portal
+ *
+ * Note that the command ID of qdesc->snapshot for 2nd query
+ * onwards would have been updated in PortalStart() to account
+ * for CCI() done between queries, but it's OK that here we
+ * don't likewise update holdSnapshot's command ID.
+ */
if (setHoldSnapshot)
{
snapshot = RegisterSnapshot(snapshot);
portal->holdSnapshot = snapshot;
}
- /*
- * We can't have the holdSnapshot also be the active one,
- * because UpdateActiveSnapshotCommandId would complain. So
- * force an extra snapshot copy. Plain PushActiveSnapshot
- * would have copied the transaction snapshot anyway, so this
- * only adds a copy step when setHoldSnapshot is true. (It's
- * okay for the command ID of the active snapshot to diverge
- * from what holdSnapshot has.)
- */
- PushCopiedSnapshot(snapshot);
+ PushActiveSnapshot(snapshot);
/*
* As for PORTAL_ONE_SELECT portals, it does not seem
@@ -1268,26 +1252,39 @@ PortalRunMulti(Portal portal,
active_snapshot_set = true;
}
- else
- UpdateActiveSnapshotCommandId();
+ /*
+ * Run the plan to completion.
+ */
+ qdesc->dest = dest;
+ ExecutorRun(qdesc, ForwardScanDirection, 0, true);
+
+ /*
+ * Build command completion status data if needed.
+ */
if (pstmt->canSetTag)
{
- /* statement can set tag string */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- dest, qc);
- }
- else
- {
- /* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- altdest, NULL);
+ switch (qdesc->operation)
+ {
+ case CMD_SELECT:
+ SetQueryCompletion(qc, CMDTAG_SELECT, qdesc->estate->es_processed);
+ break;
+ case CMD_INSERT:
+ SetQueryCompletion(qc, CMDTAG_INSERT, qdesc->estate->es_processed);
+ break;
+ case CMD_UPDATE:
+ SetQueryCompletion(qc, CMDTAG_UPDATE, qdesc->estate->es_processed);
+ break;
+ case CMD_DELETE:
+ SetQueryCompletion(qc, CMDTAG_DELETE, qdesc->estate->es_processed);
+ break;
+ case CMD_MERGE:
+ SetQueryCompletion(qc, CMDTAG_MERGE, qdesc->estate->es_processed);
+ break;
+ default:
+ SetQueryCompletion(qc, CMDTAG_UNKNOWN, qdesc->estate->es_processed);
+ break;
+ }
}
if (log_executor_stats)
@@ -1342,12 +1339,12 @@ PortalRunMulti(Portal portal,
if (portal->stmts == NIL)
break;
- /*
- * Increment command counter between queries, but not after the last
- * one.
- */
- if (lnext(portal->stmts, stmtlist_item) != NULL)
- CommandCounterIncrement();
+ if (qdesc->estate)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ }
+ FreeQueryDesc(qdesc);
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 06dfa85f04..0cad450dcd 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -201,6 +201,13 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /*
+ * initialize portal's query context to store QueryDescs created during
+ * PortalStart() and then used in PortalRun().
+ */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -224,6 +231,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -594,6 +602,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3d3e632a0c..37554727ee 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -88,7 +88,11 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv);
+extern void ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
@@ -104,6 +108,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 430e3ca7dd..d4f7c29301 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -257,6 +257,7 @@ extern void ExecASTruncateTriggers(EState *estate,
extern void AfterTriggerBeginXact(void);
extern void AfterTriggerBeginQuery(void);
+extern void AfterTriggerCancelQuery(void);
extern void AfterTriggerEndQuery(EState *estate);
extern void AfterTriggerFireDeferred(void);
extern void AfterTriggerEndXact(bool isCommit);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 72cbf120c5..10c5cda169 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -73,7 +73,7 @@
/* Hook for plugins to get control in ExecutorStart() */
-typedef void (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
+typedef bool (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
/* Hook for plugins to get control in ExecutorRun() */
@@ -198,8 +198,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
-extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
-extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count, bool execute_once);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index b2a576b76d..0922be6678 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -670,6 +670,9 @@ typedef struct EState
int es_top_eflags; /* eflags passed to ExecutorStart */
int es_instrument; /* OR of InstrumentOption flags */
bool es_finished; /* true when ExecutorFinish is done */
+ bool es_canceled; /* true when execution was canceled
+ * upon encountering that plan was invalided
+ * during ExecInitNode() */
List *es_exprcontexts; /* List of ExprContexts within EState */
diff --git a/src/include/tcop/pquery.h b/src/include/tcop/pquery.h
index a5e65b98aa..577b81a9ee 100644
--- a/src/include/tcop/pquery.h
+++ b/src/include/tcop/pquery.h
@@ -29,7 +29,7 @@ extern List *FetchPortalTargetList(Portal portal);
extern List *FetchStatementTargetList(Node *stmt);
-extern void PortalStart(Portal portal, ParamListInfo params,
+extern bool PortalStart(Portal portal, ParamListInfo params,
int eflags, Snapshot snapshot);
extern void PortalSetResultFormat(Portal portal, int nFormats,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aa08b1e0fc..af059e30f8 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,8 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
--
2.35.3
v47-0003-Support-for-ExecInitNode-to-detect-CachedPlan-in.patchapplication/octet-stream; name=v47-0003-Support-for-ExecInitNode-to-detect-CachedPlan-in.patchDownload
From 16f2dea62bcff534a41f3185c349248498d01383 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:53:34 +0900
Subject: [PATCH v47 3/8] Support for ExecInitNode() to detect CachedPlan
invalidation
This commit adds checks to determine if a CachedPlan remains valid
during ExecInitNode() traversal of the plan from the CachedPlan. This
includes points right after opening/locking tables and during
recursive ExecInitNode() calls to initialize child plans. Depending
on the situation, specific ExecInit*() routines will:
* Return NULL if invalidation is spotted right after opening a table
or after a function that opens one, but before initializing child
nodes.
* Return the partially initialized PlanState node if invalidation is
found after recursively initializing a child node via
ExecInitNode().
A prior commit already fortified ExecEnd*() to manage these partially
initialized nodes.
Importantly, this commit doesn't alter functionality. The CachedPlan
isn't fed to the executor as of now, and the executor doesn't lock
tables.
Reviewed-by: Robert Haas
---
contrib/postgres_fdw/postgres_fdw.c | 4 ++++
src/backend/executor/execMain.c | 24 ++++++++++++++++++++--
src/backend/executor/execPartition.c | 4 ++++
src/backend/executor/execProcnode.c | 19 ++++++++++++++++-
src/backend/executor/execUtils.c | 2 ++
src/backend/executor/nodeAgg.c | 2 ++
src/backend/executor/nodeAppend.c | 14 ++++++++++---
src/backend/executor/nodeBitmapAnd.c | 11 +++++++---
src/backend/executor/nodeBitmapHeapscan.c | 4 ++++
src/backend/executor/nodeBitmapIndexscan.c | 2 ++
src/backend/executor/nodeBitmapOr.c | 11 +++++++---
src/backend/executor/nodeCustom.c | 2 ++
src/backend/executor/nodeForeignscan.c | 4 ++++
src/backend/executor/nodeGather.c | 3 +++
src/backend/executor/nodeGatherMerge.c | 2 ++
src/backend/executor/nodeGroup.c | 2 ++
src/backend/executor/nodeHash.c | 2 ++
src/backend/executor/nodeHashjoin.c | 4 ++++
src/backend/executor/nodeIncrementalSort.c | 2 ++
src/backend/executor/nodeIndexonlyscan.c | 4 ++++
src/backend/executor/nodeIndexscan.c | 4 ++++
src/backend/executor/nodeLimit.c | 2 ++
src/backend/executor/nodeLockRows.c | 2 ++
src/backend/executor/nodeMaterial.c | 2 ++
src/backend/executor/nodeMemoize.c | 2 ++
src/backend/executor/nodeMergeAppend.c | 10 ++++++++-
src/backend/executor/nodeMergejoin.c | 4 ++++
src/backend/executor/nodeModifyTable.c | 7 +++++++
src/backend/executor/nodeNestloop.c | 4 ++++
src/backend/executor/nodeProjectSet.c | 2 ++
src/backend/executor/nodeRecursiveunion.c | 4 ++++
src/backend/executor/nodeResult.c | 2 ++
src/backend/executor/nodeSamplescan.c | 2 ++
src/backend/executor/nodeSeqscan.c | 2 ++
src/backend/executor/nodeSetOp.c | 2 ++
src/backend/executor/nodeSort.c | 2 ++
src/backend/executor/nodeSubqueryscan.c | 2 ++
src/backend/executor/nodeTidrangescan.c | 2 ++
src/backend/executor/nodeTidscan.c | 2 ++
src/backend/executor/nodeUnique.c | 2 ++
src/backend/executor/nodeWindowAgg.c | 2 ++
src/include/executor/executor.h | 10 +++++++++
src/include/nodes/execnodes.h | 2 ++
src/include/utils/plancache.h | 14 +++++++++++++
44 files changed, 198 insertions(+), 13 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 1393716587..ab7ecb925c 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2660,7 +2660,11 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
/* Get info about foreign table. */
rtindex = node->resultRelInfo->ri_RangeTableIndex;
if (fsplan->scan.scanrelid == 0)
+ {
dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (!ExecPlanStillValid(estate))
+ return;
+ }
else
dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4c5a7bbf62..66f1b7398d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -839,8 +839,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
- PlanState *planstate;
- TupleDesc tupType;
+ PlanState *planstate = NULL;
+ TupleDesc tupType = NULL;
ListCell *l;
int i;
@@ -855,6 +855,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = NULL;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
@@ -886,6 +887,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (!ExecPlanStillValid(estate))
+ goto plan_init_suspended;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -956,6 +959,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
+ if (!ExecPlanStillValid(estate))
+ goto plan_init_suspended;
i++;
}
@@ -966,6 +971,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ goto plan_init_suspended;
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -1007,6 +1014,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
}
}
+plan_init_suspended:
queryDesc->tupDesc = tupType;
queryDesc->planstate = planstate;
}
@@ -2945,6 +2953,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
PlanState *subplanstate;
subplanstate = ExecInitNode(subplan, rcestate, 0);
+
+ /*
+ * At this point, we shouldn't have received any new invalidation
+ * messages that would make the plan tree stale.
+ */
+ Assert(ExecPlanStillValid(rcestate));
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
}
@@ -2988,6 +3002,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
*/
epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
+ /*
+ * At this point, we shouldn't have received any new invalidation messages
+ * that would make the plan tree stale.
+ */
+ Assert(ExecPlanStillValid(rcestate));
+
MemoryContextSwitchTo(oldcontext);
}
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index eb8a87fd63..e88455368c 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1801,6 +1801,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/* Create the working data structure for pruning */
prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Perform an initial partition prune pass, if required.
@@ -1927,6 +1929,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 6098cdca69..169a52b038 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -135,7 +135,20 @@ static bool ExecShutdownNode_walker(PlanState *node, void *context);
* 'estate' is the shared execution state for the plan tree
* 'eflags' is a bitwise OR of flag bits described in executor.h
*
- * Returns a PlanState node corresponding to the given Plan node.
+ * Returns a PlanState node corresponding to the given Plan node or NULL.
+ *
+ * The node type-specific ExecInit* routines listed in this function can
+ * either return NULL or a partially initialized PlanState tree when they
+ * detect that the CachedPlan has been invalidated. This is determined by
+ * invoking ExecPlanStillValid() at key intervals, for instance, right
+ * after opening/locking a relation, or following the call to a function
+ * that might open/lock a relation. The latter involves recursive calls
+ * to ExecInitNode() for child node initialization. If an ExecInit*
+ * routine gets a false from ExecPlanStillValid(), it should:
+ * - Return NULL if no child node was initialized at the time of
+ * checking.
+ * - Provide the partially initialized PlanState node if any child node
+ * was set up recursively by then.
* ------------------------------------------------------------------------
*/
PlanState *
@@ -388,6 +401,10 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
break;
}
+ if (!ExecPlanStillValid(estate))
+ return result;
+
+ Assert(result != NULL);
ExecSetExecProcNode(result, result->ExecProcNode);
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 16704c0c2f..c3f7279b06 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -822,6 +822,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (!ExecPlanStillValid(estate))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index f154f28902..b70e8c2cd6 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3304,6 +3304,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return aggstate;
/*
* initialize source tuple type.
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..588f5388c7 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -147,6 +147,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
list_length(node->appendplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -185,8 +187,13 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->ps.resultopsset = true;
appendstate->ps.resultopsfixed = false;
- appendplanstates = (PlanState **) palloc(nplans *
- sizeof(PlanState *));
+ /*
+ * Any uninitialized sunbodes will have NULL in appendplans in the case of
+ * an early return.
+ */
+ appendstate->appendplans = appendplanstates =
+ (PlanState **) palloc0(nplans * sizeof(PlanState *));
+ appendstate->as_nplans = nplans;
/*
* call ExecInitNode on each of the valid plans to be executed and save
@@ -221,11 +228,12 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return appendstate;
}
appendstate->as_first_partial_plan = firstvalid;
appendstate->appendplans = appendplanstates;
- appendstate->as_nplans = nplans;
/* Initialize async state */
appendstate->as_asyncplans = asyncplans;
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4c5eb2b23b..c0495ec90f 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -69,6 +69,10 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
*/
nplans = list_length(node->bitmapplans);
+ /*
+ * Any uninitialized sunbodes will have NULL in bitmapplans in the case of
+ * an early return.
+ */
bitmapplanstates = (PlanState **) palloc0(nplans * sizeof(PlanState *));
/*
@@ -78,7 +82,6 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
bitmapandstate->ps.state = estate;
bitmapandstate->ps.ExecProcNode = ExecBitmapAnd;
bitmapandstate->bitmapplans = bitmapplanstates;
- bitmapandstate->nplans = nplans;
/*
* call ExecInitNode on each of the plans to be executed and save the
@@ -88,8 +91,10 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
foreach(l, node->bitmapplans)
{
initNode = (Plan *) lfirst(l);
- bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
- i++;
+ bitmapplanstates[i++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return bitmapandstate;
+ bitmapandstate->nplans = i;
}
/*
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index c8c1c9d88e..715c111be9 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -752,11 +752,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 7cf8532bc9..4200472d02 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -255,6 +255,8 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 0bf8af9652..00120669a5 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -70,6 +70,10 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
*/
nplans = list_length(node->bitmapplans);
+ /*
+ * Any uninitialized sunbodes will have NULL in bitmapplans in the case of
+ * an early return.
+ */
bitmapplanstates = (PlanState **) palloc0(nplans * sizeof(PlanState *));
/*
@@ -79,7 +83,6 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
bitmaporstate->ps.state = estate;
bitmaporstate->ps.ExecProcNode = ExecBitmapOr;
bitmaporstate->bitmapplans = bitmapplanstates;
- bitmaporstate->nplans = nplans;
/*
* call ExecInitNode on each of the plans to be executed and save the
@@ -89,8 +92,10 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
foreach(l, node->bitmapplans)
{
initNode = (Plan *) lfirst(l);
- bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
- i++;
+ bitmapplanstates[i++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return bitmaporstate;
+ bitmaporstate->nplans = i;
}
/*
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index 28b5bb9353..160eeee071 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -61,6 +61,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
css->ss.ss_currentRelation = scan_rel;
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 298ea59a1e..0f29b03977 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -173,6 +173,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -264,6 +266,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return scanstate;
/*
* Tell the FDW to initialize the scan.
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index bb2500a469..6b26e03f74 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -89,6 +89,9 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return gatherstate;
+
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 7a71a58509..84412f94bb 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -108,6 +108,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return gm_state;
/*
* Leader may access ExecProcNode result directly (if
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 8c650f0e46..b6068887f6 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -185,6 +185,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return grpstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index e72f0986c2..030bf0ed43 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -386,6 +386,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hashstate;
/*
* initialize our result slot and type. No need to build projection
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index aea44a9d56..49a6ba4276 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -752,8 +752,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hjstate;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return hjstate;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 544e64dfab..e83feae353 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1041,6 +1041,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return incrsortstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f1db35665c..ea7fd89c0c 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -496,6 +496,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -549,6 +551,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->ioss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 14b9c00217..906358011a 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -909,6 +909,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -954,6 +956,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 5654158e3e..6760de0f25 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -476,6 +476,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return limitstate;
/*
* initialize child expressions
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index e459971d32..2599332f01 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -322,6 +322,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return lrstate;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 753ea28915..b974ebdc8a 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return matstate;
/*
* Initialize result type and slot. No need to initialize projection info
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 81f2acde5e..ac0a8a0ae4 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -938,6 +938,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mstate;
/*
* Initialize return slot and type. No need to initialize projection info
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 21b5726e6e..c9d406c230 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -95,6 +95,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
list_length(node->mergeplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -120,7 +122,11 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ms_prune_state = NULL;
}
- mergeplanstates = (PlanState **) palloc(nplans * sizeof(PlanState *));
+ /*
+ * Any uninitialized sunbodes will have NULL in mergeplans in the case of
+ * an early return.
+ */
+ mergeplanstates = (PlanState **) palloc0(nplans * sizeof(PlanState *));
mergestate->mergeplans = mergeplanstates;
mergestate->ms_nplans = nplans;
@@ -151,6 +157,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
}
mergestate->ps.ps_ProjInfo = NULL;
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 648fdd9a5f..e7f4512419 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1490,11 +1490,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index d21a178ad5..c28d5058e9 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3985,6 +3985,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ if (!ExecPlanStillValid(estate))
+ return NULL;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
node->epqParam, node->resultRelations);
@@ -4012,6 +4015,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
if (resultRelInfo != mtstate->rootResultRelInfo)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* For child result relations, store the root result relation
@@ -4039,6 +4044,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return mtstate;
/*
* Do additional per-result-relation initialization.
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index fc8f833d8b..0158a3e592 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -295,11 +295,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return nlstate;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return nlstate;
/*
* Initialize result slot, type and projection.
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index b4bbdc89b1..1b4774d4f7 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -247,6 +247,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return state;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index 54cd6f2347..6398475c62 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return rustate;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return rustate;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index e9f5732f33..d4ea101cbe 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -208,6 +208,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return resstate;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 41c1ea37ad..5bec5c1f64 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -125,6 +125,8 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 49a5933aff..48e20aa735 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,8 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 98c1b84d43..7a3a142204 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return setopstate;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index eea7f2ae15..3ebbc46604 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return sortstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 1ee6295660..3c5c7c2ebb 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return subquerystate;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index da622d3f5f..d337f3d54a 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -374,6 +374,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 15055077d0..9637f354b2 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -517,6 +517,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 01f951197c..28630e380e 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -136,6 +136,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return uniquestate;
/*
* Initialize result slot and type. Unique nodes do no projections, so
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index f5170799e4..29151fe44b 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2461,6 +2461,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (!ExecPlanStillValid(estate))
+ return winstate;
/*
* initialize source tuple type (which is also the tuple type that we'll
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index aeebe0e0ff..72cbf120c5 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -256,6 +257,15 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the CachedPlan in es_cachedplan still valid?
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanStillValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cb714f4a19..b2a576b76d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -623,6 +623,8 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct CachedPlan *es_cachedplan; /* CachedPlan if plannedstmt is from
+ * one or NULL if not */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 916e59d9fe..0a9e041d51 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -221,6 +221,20 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Invoked by the executor for each relation lock acquired during the
+ * initialization of the plan tree within the CachedPlan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
--
2.35.3
v47-0002-Check-pointer-NULLness-before-cleanup-in-ExecEnd.patchapplication/octet-stream; name=v47-0002-Check-pointer-NULLness-before-cleanup-in-ExecEnd.patchDownload
From 13e4d0b8d381c1d0e507a09e2350150d4f1bc671 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:53:16 +0900
Subject: [PATCH v47 2/8] Check pointer NULLness before cleanup in ExecEnd*
routines
Many routines already perform this check, but a few instances remain.
Currently, these NULLness checks might seem redundant since ExecEnd*
routines operate under the assumption that their matching ExecInit*
routine would have fully executed, ensuring pointers are set. However,
a forthcoming patch will modify ExecInit* routines to sometimes exit
early, potentially leaving some pointers in an undetermined state.
Reviewed-by: Robert Haas
---
src/backend/executor/nodeBitmapHeapscan.c | 3 ++-
src/backend/executor/nodeForeignscan.c | 13 +++++++-----
src/backend/executor/nodeIncrementalSort.c | 6 ++++--
src/backend/executor/nodeMemoize.c | 1 +
src/backend/executor/nodeRecursiveunion.c | 6 ++++--
src/backend/executor/nodeWindowAgg.c | 24 ++++++++++++++--------
6 files changed, 35 insertions(+), 18 deletions(-)
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 2db0acfc76..c8c1c9d88e 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -681,7 +681,8 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
/*
* close heap scan
*/
- table_endscan(scanDesc);
+ if (scanDesc != NULL)
+ table_endscan(scanDesc);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 73913ebb18..298ea59a1e 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -301,13 +301,16 @@ ExecEndForeignScan(ForeignScanState *node)
EState *estate = node->ss.ps.state;
/* Let the FDW shut down */
- if (plan->operation != CMD_SELECT)
+ if (node->fdwroutine != NULL)
{
- if (estate->es_epq_active == NULL)
- node->fdwroutine->EndDirectModify(node);
+ if (plan->operation != CMD_SELECT)
+ {
+ if (estate->es_epq_active == NULL)
+ node->fdwroutine->EndDirectModify(node);
+ }
+ else
+ node->fdwroutine->EndForeignScan(node);
}
- else
- node->fdwroutine->EndForeignScan(node);
/* Shut down any outer plan. */
if (outerPlanState(node))
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index cd094a190c..544e64dfab 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1079,8 +1079,10 @@ ExecEndIncrementalSort(IncrementalSortState *node)
{
SO_printf("ExecEndIncrementalSort: shutting down sort node\n");
- ExecDropSingleTupleTableSlot(node->group_pivot);
- ExecDropSingleTupleTableSlot(node->transfer_tuple);
+ if (node->group_pivot != NULL)
+ ExecDropSingleTupleTableSlot(node->group_pivot);
+ if (node->transfer_tuple != NULL)
+ ExecDropSingleTupleTableSlot(node->transfer_tuple);
/*
* Release tuplesort resources.
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 94bf479287..81f2acde5e 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -1043,6 +1043,7 @@ ExecEndMemoize(MemoizeState *node)
{
#ifdef USE_ASSERT_CHECKING
/* Validate the memory accounting code is correct in assert builds. */
+ if (node->hashtable != NULL)
{
int count;
uint64 mem = 0;
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index e781003934..54cd6f2347 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -272,8 +272,10 @@ void
ExecEndRecursiveUnion(RecursiveUnionState *node)
{
/* Release tuplestores */
- tuplestore_end(node->working_table);
- tuplestore_end(node->intermediate_table);
+ if (node->working_table != NULL)
+ tuplestore_end(node->working_table);
+ if (node->intermediate_table != NULL)
+ tuplestore_end(node->intermediate_table);
/* free subsidiary stuff including hashtable */
if (node->tempContext)
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 77724a6daa..f5170799e4 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -1351,11 +1351,14 @@ release_partition(WindowAggState *winstate)
* any aggregate temp data). We don't rely on retail pfree because some
* aggregates might have allocated data we don't have direct pointers to.
*/
- MemoryContextResetAndDeleteChildren(winstate->partcontext);
- MemoryContextResetAndDeleteChildren(winstate->aggcontext);
+ if (winstate->partcontext != NULL)
+ MemoryContextResetAndDeleteChildren(winstate->partcontext);
+ if (winstate->aggcontext != NULL)
+ MemoryContextResetAndDeleteChildren(winstate->aggcontext);
for (i = 0; i < winstate->numaggs; i++)
{
- if (winstate->peragg[i].aggcontext != winstate->aggcontext)
+ if (winstate->peragg[i].aggcontext != NULL &&
+ winstate->peragg[i].aggcontext != winstate->aggcontext)
MemoryContextResetAndDeleteChildren(winstate->peragg[i].aggcontext);
}
@@ -2688,14 +2691,19 @@ ExecEndWindowAgg(WindowAggState *node)
for (i = 0; i < node->numaggs; i++)
{
- if (node->peragg[i].aggcontext != node->aggcontext)
+ if (node->peragg[i].aggcontext != NULL &&
+ node->peragg[i].aggcontext != node->aggcontext)
MemoryContextDelete(node->peragg[i].aggcontext);
}
- MemoryContextDelete(node->partcontext);
- MemoryContextDelete(node->aggcontext);
+ if (node->partcontext != NULL)
+ MemoryContextDelete(node->partcontext);
+ if (node->aggcontext != NULL)
+ MemoryContextDelete(node->aggcontext);
- pfree(node->perfunc);
- pfree(node->peragg);
+ if (node->perfunc != NULL)
+ pfree(node->perfunc);
+ if (node->peragg != NULL)
+ pfree(node->peragg);
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
--
2.35.3
v47-0001-Remove-obsolete-executor-cleanup-code.patchapplication/octet-stream; name=v47-0001-Remove-obsolete-executor-cleanup-code.patchDownload
From ecc53680d6e685c8ad1efdb18c8d432751c142dc Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:52:39 +0900
Subject: [PATCH v47 1/8] Remove obsolete executor cleanup code
This commit removes unnecessary ExecExprFreeContext() calls in ExecEnd*
routines as the actual cleanup is managed by FreeExecutorState. With
no remaining callers for ExecExprFreeContext(), this commit also
removes the function.
This commit also drops redundant ExecClearTuple() calls, as
ExecResetTupleTable() in ExecEndPlan() already takes care of resetting
all TupleTableSlots.
After these modifications, the ExecEnd*() routines for ValuesScan,
NamedTuplestoreScan, and WorkTableScan became redundant. Thus, this
commit removes them. These changes not only optimize CPU usage during
ExecEndNode() processing but also pave the way for an upcoming patch.
This future patch aims to allow ExecEndNode() to expect PlanState
trees that are only partially initialized in some cases.
Reviewed-by: Robert Haas
---
src/backend/executor/execProcnode.c | 18 +++++--------
src/backend/executor/execUtils.c | 26 -------------------
src/backend/executor/nodeAgg.c | 10 -------
src/backend/executor/nodeBitmapHeapscan.c | 12 ---------
src/backend/executor/nodeBitmapIndexscan.c | 8 ------
src/backend/executor/nodeCtescan.c | 12 ---------
src/backend/executor/nodeCustom.c | 7 -----
src/backend/executor/nodeForeignscan.c | 8 ------
src/backend/executor/nodeFunctionscan.c | 12 ---------
src/backend/executor/nodeGather.c | 3 ---
src/backend/executor/nodeGatherMerge.c | 3 ---
src/backend/executor/nodeGroup.c | 5 ----
src/backend/executor/nodeHash.c | 5 ----
src/backend/executor/nodeHashjoin.c | 12 ---------
src/backend/executor/nodeIncrementalSort.c | 5 ----
src/backend/executor/nodeIndexonlyscan.c | 16 ------------
src/backend/executor/nodeIndexscan.c | 16 ------------
src/backend/executor/nodeLimit.c | 1 -
src/backend/executor/nodeMaterial.c | 5 ----
src/backend/executor/nodeMemoize.c | 9 -------
src/backend/executor/nodeMergejoin.c | 12 ---------
src/backend/executor/nodeModifyTable.c | 11 --------
.../executor/nodeNamedtuplestorescan.c | 22 ----------------
src/backend/executor/nodeNestloop.c | 11 --------
src/backend/executor/nodeProjectSet.c | 10 -------
src/backend/executor/nodeResult.c | 10 -------
src/backend/executor/nodeSamplescan.c | 12 ---------
src/backend/executor/nodeSeqscan.c | 12 ---------
src/backend/executor/nodeSetOp.c | 4 ---
src/backend/executor/nodeSort.c | 7 -----
src/backend/executor/nodeSubqueryscan.c | 12 ---------
src/backend/executor/nodeTableFuncscan.c | 12 ---------
src/backend/executor/nodeTidrangescan.c | 12 ---------
src/backend/executor/nodeTidscan.c | 12 ---------
src/backend/executor/nodeUnique.c | 5 ----
src/backend/executor/nodeValuesscan.c | 24 -----------------
src/backend/executor/nodeWindowAgg.c | 17 ------------
src/backend/executor/nodeWorktablescan.c | 22 ----------------
src/include/executor/executor.h | 1 -
.../executor/nodeNamedtuplestorescan.h | 1 -
src/include/executor/nodeValuesscan.h | 1 -
src/include/executor/nodeWorktablescan.h | 1 -
42 files changed, 6 insertions(+), 418 deletions(-)
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d4..6098cdca69 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -667,22 +667,10 @@ ExecEndNode(PlanState *node)
ExecEndTableFuncScan((TableFuncScanState *) node);
break;
- case T_ValuesScanState:
- ExecEndValuesScan((ValuesScanState *) node);
- break;
-
case T_CteScanState:
ExecEndCteScan((CteScanState *) node);
break;
- case T_NamedTuplestoreScanState:
- ExecEndNamedTuplestoreScan((NamedTuplestoreScanState *) node);
- break;
-
- case T_WorkTableScanState:
- ExecEndWorkTableScan((WorkTableScanState *) node);
- break;
-
case T_ForeignScanState:
ExecEndForeignScan((ForeignScanState *) node);
break;
@@ -757,6 +745,12 @@ ExecEndNode(PlanState *node)
ExecEndLimit((LimitState *) node);
break;
+ /* No clean up actions for these nodes. */
+ case T_ValuesScanState:
+ case T_NamedTuplestoreScanState:
+ case T_WorkTableScanState:
+ break;
+
default:
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));
break;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c06b228858..16704c0c2f 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -638,32 +638,6 @@ tlist_matches_tupdesc(PlanState *ps, List *tlist, int varno, TupleDesc tupdesc)
return true;
}
-/* ----------------
- * ExecFreeExprContext
- *
- * A plan node's ExprContext should be freed explicitly during executor
- * shutdown because there may be shutdown callbacks to call. (Other resources
- * made by the above routines, such as projection info, don't need to be freed
- * explicitly because they're just memory in the per-query memory context.)
- *
- * However ... there is no particular need to do it during ExecEndNode,
- * because FreeExecutorState will free any remaining ExprContexts within
- * the EState. Letting FreeExecutorState do it allows the ExprContexts to
- * be freed in reverse order of creation, rather than order of creation as
- * will happen if we delete them here, which saves O(N^2) work in the list
- * cleanup inside FreeExprContext.
- * ----------------
- */
-void
-ExecFreeExprContext(PlanState *planstate)
-{
- /*
- * Per above discussion, don't actually delete the ExprContext. We do
- * unlink it from the plan node, though.
- */
- planstate->ps_ExprContext = NULL;
-}
-
/* ----------------------------------------------------------------
* Scan node support
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 468db94fe5..f154f28902 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -4357,16 +4357,6 @@ ExecEndAgg(AggState *node)
if (node->hashcontext)
ReScanExprContext(node->hashcontext);
- /*
- * We don't actually free any ExprContexts here (see comment in
- * ExecFreeExprContext), just unlinking the output one from the plan node
- * suffices.
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /* clean up tuple table */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index f35df0b8bf..2db0acfc76 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -655,18 +655,6 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
*/
scanDesc = node->ss.ss_currentScanDesc;
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clear out tuple table slots
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close down subplans
*/
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 83ec9ede89..7cf8532bc9 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -184,14 +184,6 @@ ExecEndBitmapIndexScan(BitmapIndexScanState *node)
indexRelationDesc = node->biss_RelationDesc;
indexScanDesc = node->biss_ScanDesc;
- /*
- * Free the exprcontext ... now dead code, see ExecFreeExprContext
- */
-#ifdef NOT_USED
- if (node->biss_RuntimeContext)
- FreeExprContext(node->biss_RuntimeContext, true);
-#endif
-
/*
* close the index relation (no-op if we didn't open it)
*/
diff --git a/src/backend/executor/nodeCtescan.c b/src/backend/executor/nodeCtescan.c
index cc4c4243e2..a0c0c4be33 100644
--- a/src/backend/executor/nodeCtescan.c
+++ b/src/backend/executor/nodeCtescan.c
@@ -287,18 +287,6 @@ ExecInitCteScan(CteScan *node, EState *estate, int eflags)
void
ExecEndCteScan(CteScanState *node)
{
- /*
- * Free exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* If I am the leader, free the tuplestore.
*/
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index bd42c65b29..28b5bb9353 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -129,13 +129,6 @@ ExecEndCustomScan(CustomScanState *node)
{
Assert(node->methods->EndCustomScan != NULL);
node->methods->EndCustomScan(node);
-
- /* Free the exprcontext */
- ExecFreeExprContext(&node->ss.ps);
-
- /* Clean out the tuple table */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
void
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index c2139acca0..73913ebb18 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -312,14 +312,6 @@ ExecEndForeignScan(ForeignScanState *node)
/* Shut down any outer plan. */
if (outerPlanState(node))
ExecEndNode(outerPlanState(node));
-
- /* Free the exprcontext */
- ExecFreeExprContext(&node->ss.ps);
-
- /* clean out the tuple table */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeFunctionscan.c b/src/backend/executor/nodeFunctionscan.c
index dd06ef8aee..a49c1a2c85 100644
--- a/src/backend/executor/nodeFunctionscan.c
+++ b/src/backend/executor/nodeFunctionscan.c
@@ -523,18 +523,6 @@ ExecEndFunctionScan(FunctionScanState *node)
{
int i;
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* Release slots and tuplestore resources
*/
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 307fc10eea..bb2500a469 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -250,9 +250,6 @@ ExecEndGather(GatherState *node)
{
ExecEndNode(outerPlanState(node)); /* let children clean up first */
ExecShutdownGather(node);
- ExecFreeExprContext(&node->ps);
- if (node->ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 9d5e1a46e9..7a71a58509 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -290,9 +290,6 @@ ExecEndGatherMerge(GatherMergeState *node)
{
ExecEndNode(outerPlanState(node)); /* let children clean up first */
ExecShutdownGatherMerge(node);
- ExecFreeExprContext(&node->ps);
- if (node->ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 25a1618952..8c650f0e46 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -228,11 +228,6 @@ ExecEndGroup(GroupState *node)
{
PlanState *outerPlan;
- ExecFreeExprContext(&node->ss.ps);
-
- /* clean up tuple table */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
}
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 8b5c35b82b..e72f0986c2 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -415,11 +415,6 @@ ExecEndHash(HashState *node)
{
PlanState *outerPlan;
- /*
- * free exprcontext
- */
- ExecFreeExprContext(&node->ps);
-
/*
* shut down the subplan
*/
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 980746128b..aea44a9d56 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -867,18 +867,6 @@ ExecEndHashJoin(HashJoinState *node)
node->hj_HashTable = NULL;
}
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->js.ps);
-
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->hj_OuterTupleSlot);
- ExecClearTuple(node->hj_HashTupleSlot);
-
/*
* clean up subtrees
*/
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 7683e3341c..cd094a190c 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1079,11 +1079,6 @@ ExecEndIncrementalSort(IncrementalSortState *node)
{
SO_printf("ExecEndIncrementalSort: shutting down sort node\n");
- /* clean out the scan tuple */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
- /* must drop pointer to sort result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- /* must drop standalone tuple slots from outer node */
ExecDropSingleTupleTableSlot(node->group_pivot);
ExecDropSingleTupleTableSlot(node->transfer_tuple);
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0b43a9b969..f1db35665c 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -380,22 +380,6 @@ ExecEndIndexOnlyScan(IndexOnlyScanState *node)
node->ioss_VMBuffer = InvalidBuffer;
}
- /*
- * Free the exprcontext(s) ... now dead code, see ExecFreeExprContext
- */
-#ifdef NOT_USED
- ExecFreeExprContext(&node->ss.ps);
- if (node->ioss_RuntimeContext)
- FreeExprContext(node->ioss_RuntimeContext, true);
-#endif
-
- /*
- * clear out tuple table slots
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close the index relation (no-op if we didn't open it)
*/
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 4540c7781d..14b9c00217 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -794,22 +794,6 @@ ExecEndIndexScan(IndexScanState *node)
indexRelationDesc = node->iss_RelationDesc;
indexScanDesc = node->iss_ScanDesc;
- /*
- * Free the exprcontext(s) ... now dead code, see ExecFreeExprContext
- */
-#ifdef NOT_USED
- ExecFreeExprContext(&node->ss.ps);
- if (node->iss_RuntimeContext)
- FreeExprContext(node->iss_RuntimeContext, true);
-#endif
-
- /*
- * clear out tuple table slots
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close the index relation (no-op if we didn't open it)
*/
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 425fbfc405..5654158e3e 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -534,7 +534,6 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
void
ExecEndLimit(LimitState *node)
{
- ExecFreeExprContext(&node->ps);
ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 09632678b0..753ea28915 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -239,11 +239,6 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
void
ExecEndMaterial(MaterialState *node)
{
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* Release tuplestore resources
*/
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 4f04269e26..94bf479287 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -1091,15 +1091,6 @@ ExecEndMemoize(MemoizeState *node)
/* Remove the cache context */
MemoryContextDelete(node->tableContext);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
- /* must drop pointer to cache result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-
- /*
- * free exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
/*
* shut down the subplan
*/
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 00f96d045e..648fdd9a5f 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1642,18 +1642,6 @@ ExecEndMergeJoin(MergeJoinState *node)
{
MJ1_printf("ExecEndMergeJoin: %s\n",
"ending node processing");
-
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->js.ps);
-
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->mj_MarkedTupleSlot);
-
/*
* shut down the subplans
*/
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 5005d8c0d1..d21a178ad5 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4446,17 +4446,6 @@ ExecEndModifyTable(ModifyTableState *node)
ExecDropSingleTupleTableSlot(node->mt_root_tuple_slot);
}
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
/*
* Terminate EPQ execution if active
*/
diff --git a/src/backend/executor/nodeNamedtuplestorescan.c b/src/backend/executor/nodeNamedtuplestorescan.c
index 46832ad82f..3547dc2b10 100644
--- a/src/backend/executor/nodeNamedtuplestorescan.c
+++ b/src/backend/executor/nodeNamedtuplestorescan.c
@@ -155,28 +155,6 @@ ExecInitNamedTuplestoreScan(NamedTuplestoreScan *node, EState *estate, int eflag
return scanstate;
}
-/* ----------------------------------------------------------------
- * ExecEndNamedTuplestoreScan
- *
- * frees any storage allocated through C routines.
- * ----------------------------------------------------------------
- */
-void
-ExecEndNamedTuplestoreScan(NamedTuplestoreScanState *node)
-{
- /*
- * Free exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-}
-
/* ----------------------------------------------------------------
* ExecReScanNamedTuplestoreScan
*
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b3d52e69ec..fc8f833d8b 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -363,17 +363,6 @@ ExecEndNestLoop(NestLoopState *node)
{
NL1_printf("ExecEndNestLoop: %s\n",
"ending node processing");
-
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->js.ps);
-
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
-
/*
* close down subplans
*/
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index f6ff3dc44c..b4bbdc89b1 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -320,16 +320,6 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
void
ExecEndProjectSet(ProjectSetState *node)
{
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ps);
-
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
/*
* shut down subplans
*/
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 4219712d30..e9f5732f33 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -240,16 +240,6 @@ ExecInitResult(Result *node, EState *estate, int eflags)
void
ExecEndResult(ResultState *node)
{
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ps);
-
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
/*
* shut down subplans
*/
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index d7e22b1dbb..41c1ea37ad 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -188,18 +188,6 @@ ExecEndSampleScan(SampleScanState *node)
if (node->tsmroutine->EndSampleScan)
node->tsmroutine->EndSampleScan(node);
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close heap scan
*/
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 4da0f28f7b..49a5933aff 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -190,18 +190,6 @@ ExecEndSeqScan(SeqScanState *node)
*/
scanDesc = node->ss.ss_currentScanDesc;
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close heap scan
*/
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 4bc2406b89..98c1b84d43 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -582,13 +582,9 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
void
ExecEndSetOp(SetOpState *node)
{
- /* clean up tuple table */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
/* free subsidiary stuff including hashtable */
if (node->tableContext)
MemoryContextDelete(node->tableContext);
- ExecFreeExprContext(&node->ps);
ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index c6c72c6e67..eea7f2ae15 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -303,13 +303,6 @@ ExecEndSort(SortState *node)
SO1_printf("ExecEndSort: %s\n",
"shutting down sort node");
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
- /* must drop pointer to sort result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-
/*
* Release tuplesort resources
*/
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 42471bfc04..1ee6295660 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -167,18 +167,6 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
void
ExecEndSubqueryScan(SubqueryScanState *node)
{
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the upper tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close down subquery
*/
diff --git a/src/backend/executor/nodeTableFuncscan.c b/src/backend/executor/nodeTableFuncscan.c
index 791cbd2372..a60dcd4943 100644
--- a/src/backend/executor/nodeTableFuncscan.c
+++ b/src/backend/executor/nodeTableFuncscan.c
@@ -213,18 +213,6 @@ ExecInitTableFuncScan(TableFuncScan *node, EState *estate, int eflags)
void
ExecEndTableFuncScan(TableFuncScanState *node)
{
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* Release tuplestore resources
*/
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 2124c55ef5..da622d3f5f 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -331,18 +331,6 @@ ExecEndTidRangeScan(TidRangeScanState *node)
if (scan != NULL)
table_endscan(scan);
-
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clear out tuple table slots
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 862bd0330b..15055077d0 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -472,18 +472,6 @@ ExecEndTidScan(TidScanState *node)
{
if (node->ss.ss_currentScanDesc)
table_endscan(node->ss.ss_currentScanDesc);
-
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clear out tuple table slots
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 45035d74fa..01f951197c 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -168,11 +168,6 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
void
ExecEndUnique(UniqueState *node)
{
- /* clean up tuple table */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
- ExecFreeExprContext(&node->ps);
-
ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeValuesscan.c b/src/backend/executor/nodeValuesscan.c
index 32ace63017..fbfb067f3b 100644
--- a/src/backend/executor/nodeValuesscan.c
+++ b/src/backend/executor/nodeValuesscan.c
@@ -319,30 +319,6 @@ ExecInitValuesScan(ValuesScan *node, EState *estate, int eflags)
return scanstate;
}
-/* ----------------------------------------------------------------
- * ExecEndValuesScan
- *
- * frees any storage allocated through C routines.
- * ----------------------------------------------------------------
- */
-void
-ExecEndValuesScan(ValuesScanState *node)
-{
- /*
- * Free both exprcontexts
- */
- ExecFreeExprContext(&node->ss.ps);
- node->ss.ps.ps_ExprContext = node->rowcontext;
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-}
-
/* ----------------------------------------------------------------
* ExecReScanValuesScan
*
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 310ac23e3a..77724a6daa 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2686,23 +2686,6 @@ ExecEndWindowAgg(WindowAggState *node)
release_partition(node);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
- ExecClearTuple(node->first_part_slot);
- ExecClearTuple(node->agg_row_slot);
- ExecClearTuple(node->temp_slot_1);
- ExecClearTuple(node->temp_slot_2);
- if (node->framehead_slot)
- ExecClearTuple(node->framehead_slot);
- if (node->frametail_slot)
- ExecClearTuple(node->frametail_slot);
-
- /*
- * Free both the expr contexts.
- */
- ExecFreeExprContext(&node->ss.ps);
- node->ss.ps.ps_ExprContext = node->tmpcontext;
- ExecFreeExprContext(&node->ss.ps);
-
for (i = 0; i < node->numaggs; i++)
{
if (node->peragg[i].aggcontext != node->aggcontext)
diff --git a/src/backend/executor/nodeWorktablescan.c b/src/backend/executor/nodeWorktablescan.c
index 0c13448236..17a548865e 100644
--- a/src/backend/executor/nodeWorktablescan.c
+++ b/src/backend/executor/nodeWorktablescan.c
@@ -181,28 +181,6 @@ ExecInitWorkTableScan(WorkTableScan *node, EState *estate, int eflags)
return scanstate;
}
-/* ----------------------------------------------------------------
- * ExecEndWorkTableScan
- *
- * frees any storage allocated through C routines.
- * ----------------------------------------------------------------
- */
-void
-ExecEndWorkTableScan(WorkTableScanState *node)
-{
- /*
- * Free exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-}
-
/* ----------------------------------------------------------------
* ExecReScanWorkTableScan
*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index c677e490d7..aeebe0e0ff 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -569,7 +569,6 @@ extern void ExecAssignProjectionInfo(PlanState *planstate,
TupleDesc inputDesc);
extern void ExecConditionalAssignProjectionInfo(PlanState *planstate,
TupleDesc inputDesc, int varno);
-extern void ExecFreeExprContext(PlanState *planstate);
extern void ExecAssignScanType(ScanState *scanstate, TupleDesc tupDesc);
extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
ScanState *scanstate,
diff --git a/src/include/executor/nodeNamedtuplestorescan.h b/src/include/executor/nodeNamedtuplestorescan.h
index 3ff687023a..9d80236fe5 100644
--- a/src/include/executor/nodeNamedtuplestorescan.h
+++ b/src/include/executor/nodeNamedtuplestorescan.h
@@ -17,7 +17,6 @@
#include "nodes/execnodes.h"
extern NamedTuplestoreScanState *ExecInitNamedTuplestoreScan(NamedTuplestoreScan *node, EState *estate, int eflags);
-extern void ExecEndNamedTuplestoreScan(NamedTuplestoreScanState *node);
extern void ExecReScanNamedTuplestoreScan(NamedTuplestoreScanState *node);
#endif /* NODENAMEDTUPLESTORESCAN_H */
diff --git a/src/include/executor/nodeValuesscan.h b/src/include/executor/nodeValuesscan.h
index a52fa678df..fe3f043951 100644
--- a/src/include/executor/nodeValuesscan.h
+++ b/src/include/executor/nodeValuesscan.h
@@ -17,7 +17,6 @@
#include "nodes/execnodes.h"
extern ValuesScanState *ExecInitValuesScan(ValuesScan *node, EState *estate, int eflags);
-extern void ExecEndValuesScan(ValuesScanState *node);
extern void ExecReScanValuesScan(ValuesScanState *node);
#endif /* NODEVALUESSCAN_H */
diff --git a/src/include/executor/nodeWorktablescan.h b/src/include/executor/nodeWorktablescan.h
index e553a453f3..f31b22cec4 100644
--- a/src/include/executor/nodeWorktablescan.h
+++ b/src/include/executor/nodeWorktablescan.h
@@ -17,7 +17,6 @@
#include "nodes/execnodes.h"
extern WorkTableScanState *ExecInitWorkTableScan(WorkTableScan *node, EState *estate, int eflags);
-extern void ExecEndWorkTableScan(WorkTableScanState *node);
extern void ExecReScanWorkTableScan(WorkTableScanState *node);
#endif /* NODEWORKTABLESCAN_H */
--
2.35.3
On Wed, Sep 6, 2023 at 5:12 AM Amit Langote <amitlangote09@gmail.com> wrote:
Attached updated patches. Thanks for the review.
I think 0001 looks ready to commit. I'm not sure that the commit
message needs to mention future patches here, since this code cleanup
seems like a good idea regardless, but if you feel otherwise, fair
enough.
On 0002, some questions:
- In ExecEndLockRows, is the call to EvalPlanQualEnd a concern? i.e.
Does that function need any adjustment?
- In ExecEndMemoize, should there be a null-test around
MemoryContextDelete(node->tableContext) as we have in
ExecEndRecursiveUnion, ExecEndSetOp, etc.?
I wonder how we feel about setting pointers to NULL after freeing the
associated data structures. The existing code isn't consistent about
doing that, and making it do so would be a fairly large change that
would bloat this patch quite a bit. On the other hand, I think it's a
good practice as a general matter, and we do do it in some ExecEnd
functions.
On 0003, I have some doubt about whether we really have all the right
design decisions in detail here:
- Why have this weird rule where sometimes we return NULL and other
times the planstate? Is there any point to such a coding rule? Why not
just always return the planstate?
- Is there any point to all of these early exit cases? For example, in
ExecInitBitmapAnd, why exit early if initialization fails? Why not
just plunge ahead and if initialization failed the caller will notice
that and when we ExecEndNode some of the child node pointers will be
NULL but who cares? The obvious disadvantage of this approach is that
we're doing a bunch of unnecessary initialization, but we're also
speeding up the common case where we don't need to abort by avoiding a
branch that will rarely be taken. I'm not quite sure what the right
thing to do is here.
- The cases where we call ExecGetRangeTableRelation or
ExecOpenScanRelation are a bit subtler ... maybe initialization that
we're going to do later is going to barf if the tuple descriptor of
the relation isn't what we thought it was going to be. In that case it
becomes important to exit early. But if that's not actually a problem,
then we could apply the same principle here also -- don't pollute the
code with early-exit cases, just let it do its thing and sort it out
later. Do you know what the actual problems would be here if we didn't
exit early in these cases?
- Depending on the answers to the above points, one thing we could
think of doing is put an early exit case into ExecInitNode itself: if
(unlikely(!ExecPlanStillValid(whatever)) return NULL. Maybe Andres or
someone is going to argue that that checks too often and is thus too
expensive, but it would be a lot more maintainable than having similar
checks strewn throughout the ExecInit* functions. Perhaps it deserves
some thought/benchmarking. More generally, if there's anything we can
do to centralize these checks in fewer places, I think that would be
worth considering. The patch isn't terribly large as it stands, so I
don't necessarily think that this is a critical issue, but I'm just
wondering if we can do better. I'm not even sure that it would be too
expensive to just initialize the whole plan always, and then just do
one test at the end. That's not OK if the changed tuple descriptor (or
something else) is going to crash or error out in a funny way or
something before initialization is completed, but if it's just going
to result in burning a few CPU cycles in a corner case, I don't know
if we should really care.
- The "At this point" comments don't give any rationale for why we
shouldn't have received any such invalidation messages. That makes
them fairly useless; the Assert by itself clarifies that you think
that case shouldn't happen. The comment's job is to justify that
claim.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Sep 6, 2023 at 11:20 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Sep 6, 2023 at 5:12 AM Amit Langote <amitlangote09@gmail.com> wrote:
Attached updated patches. Thanks for the review.
I think 0001 looks ready to commit. I'm not sure that the commit
message needs to mention future patches here, since this code cleanup
seems like a good idea regardless, but if you feel otherwise, fair
enough.
OK, I will remove the mention of future patches.
On 0002, some questions:
- In ExecEndLockRows, is the call to EvalPlanQualEnd a concern? i.e.
Does that function need any adjustment?
I think it does with the patch as it stands. It needs to have an
early exit at the top if parentestate is NULL, which it would be if
EvalPlanQualInit() wasn't called from an ExecInit*() function.
Though, as I answer below your question as to whether there is
actually any need to interrupt all of the ExecInit*() routines,
nothing needs to change in ExecEndLockRows().
- In ExecEndMemoize, should there be a null-test around
MemoryContextDelete(node->tableContext) as we have in
ExecEndRecursiveUnion, ExecEndSetOp, etc.?
Oops, you're right. Added.
I wonder how we feel about setting pointers to NULL after freeing the
associated data structures. The existing code isn't consistent about
doing that, and making it do so would be a fairly large change that
would bloat this patch quite a bit. On the other hand, I think it's a
good practice as a general matter, and we do do it in some ExecEnd
functions.
I agree that it might be worthwhile to take the opportunity and make
the code more consistent in this regard. So, I've included those
changes too in 0002.
On 0003, I have some doubt about whether we really have all the right
design decisions in detail here:- Why have this weird rule where sometimes we return NULL and other
times the planstate? Is there any point to such a coding rule? Why not
just always return the planstate?- Is there any point to all of these early exit cases? For example, in
ExecInitBitmapAnd, why exit early if initialization fails? Why not
just plunge ahead and if initialization failed the caller will notice
that and when we ExecEndNode some of the child node pointers will be
NULL but who cares? The obvious disadvantage of this approach is that
we're doing a bunch of unnecessary initialization, but we're also
speeding up the common case where we don't need to abort by avoiding a
branch that will rarely be taken. I'm not quite sure what the right
thing to do is here.- The cases where we call ExecGetRangeTableRelation or
ExecOpenScanRelation are a bit subtler ... maybe initialization that
we're going to do later is going to barf if the tuple descriptor of
the relation isn't what we thought it was going to be. In that case it
becomes important to exit early. But if that's not actually a problem,
then we could apply the same principle here also -- don't pollute the
code with early-exit cases, just let it do its thing and sort it out
later. Do you know what the actual problems would be here if we didn't
exit early in these cases?- Depending on the answers to the above points, one thing we could
think of doing is put an early exit case into ExecInitNode itself: if
(unlikely(!ExecPlanStillValid(whatever)) return NULL. Maybe Andres or
someone is going to argue that that checks too often and is thus too
expensive, but it would be a lot more maintainable than having similar
checks strewn throughout the ExecInit* functions. Perhaps it deserves
some thought/benchmarking. More generally, if there's anything we can
do to centralize these checks in fewer places, I think that would be
worth considering. The patch isn't terribly large as it stands, so I
don't necessarily think that this is a critical issue, but I'm just
wondering if we can do better. I'm not even sure that it would be too
expensive to just initialize the whole plan always, and then just do
one test at the end. That's not OK if the changed tuple descriptor (or
something else) is going to crash or error out in a funny way or
something before initialization is completed, but if it's just going
to result in burning a few CPU cycles in a corner case, I don't know
if we should really care.
I thought about this some and figured that adding the
is-CachedPlan-still-valid tests in the following places should suffice
after all:
1. In InitPlan() right after the top-level ExecInitNode() calls
2. In ExecInit*() functions of Scan nodes, right after
ExecOpenScanRelation() calls
CachedPlans can only become invalid because of concurrent changes to
the inheritance child tables referenced in the plan. Only the
following schema modifications of child tables are possible to be
performed concurrently:
* Addition of a column (allowed only if traditional inheritance child)
* Addition of an index
* Addition of a non-index constraint
* Dropping of a child table (allowed only if traditional inheritance child)
* Dropping of an index referenced in the plan
The first 3 are not destructive enough to cause crashes, weird errors
during ExecInit*(), though the last two can be, so the 2nd set of the
tests after ExecOpenScanRelation() mentioned above.
- The "At this point" comments don't give any rationale for why we
shouldn't have received any such invalidation messages. That makes
them fairly useless; the Assert by itself clarifies that you think
that case shouldn't happen. The comment's job is to justify that
claim.
I've rewritten the comments.
I'll post the updated set of patches shortly.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
On Mon, Sep 25, 2023 at 9:57 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Wed, Sep 6, 2023 at 11:20 PM Robert Haas <robertmhaas@gmail.com> wrote:
- Is there any point to all of these early exit cases? For example, in
ExecInitBitmapAnd, why exit early if initialization fails? Why not
just plunge ahead and if initialization failed the caller will notice
that and when we ExecEndNode some of the child node pointers will be
NULL but who cares? The obvious disadvantage of this approach is that
we're doing a bunch of unnecessary initialization, but we're also
speeding up the common case where we don't need to abort by avoiding a
branch that will rarely be taken. I'm not quite sure what the right
thing to do is here.I thought about this some and figured that adding the
is-CachedPlan-still-valid tests in the following places should suffice
after all:1. In InitPlan() right after the top-level ExecInitNode() calls
2. In ExecInit*() functions of Scan nodes, right after
ExecOpenScanRelation() calls
After sleeping on this, I think we do need the checks after all the
ExecInitNode() calls too, because we have many instances of the code
like the following one:
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
<some code that dereferences outDesc>
If outerNode is a SeqScan and ExecInitSeqScan() returned early because
ExecOpenScanRelation() detected that plan was invalidated, then
tupDesc would be NULL in this case, causing the code to crash.
Now one might say that perhaps we should only add the
is-CachedPlan-valid test in the instances where there is an actual
risk of such misbehavior, but that could lead to confusion, now or
later. It seems better to add them after every ExecInitNode() call
while we're inventing the notion, because doing so relieves the
authors of future enhancements of the ExecInit*() routines from
worrying about any of this.
Attached 0003 should show how that turned out.
Updated 0002 as mentioned in the previous reply -- setting pointers to
NULL after freeing them more consistently across various ExecEnd*()
routines and using the `if (pointer != NULL)` style over the `if
(pointer)` more consistently.
Updated 0001's commit message to remove the mention of its relation to
any future commits. I intend to push it tomorrow.
Patches 0004 onwards contain changes too, mainly in terms of moving
the code around from one patch to another, but I'll omit the details
of the specific change for now.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v47-0005-Teach-the-executor-to-lock-child-tables-in-some-.patchapplication/octet-stream; name=v47-0005-Teach-the-executor-to-lock-child-tables-in-some-.patchDownload
From 5ce427a754d034cb2b3efa922e8c9d7dad200418 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 22 Sep 2023 18:17:15 +0900
Subject: [PATCH v47 5/9] Teach the executor to lock child tables in some cases
An upcoming commit will move the locking of child tables referenced
in a cached plan tree from GetCachedPlan() to the executor
initialization of the plan tree in ExecutorStart(). This commit
teaches ExecGetRangeTableRelation() to lock child tables if
EState.es_cachedplan points to a CachedPlan.
The executor must now deal with the cases where an unlocked child
table might have been concurrently dropped, so this modifies
ExecGetRangeTableRelation() to use try_table_open(). All of its
callers (and those of ExecOpenScanRelation() that calls it) must
now account for the child table disappearing, which means to abort
initializing the table's Scan node in the middle.
ExecGetRangeTableRelation() now examines inFromCl field of an RTE
to determine that a given range table relation is a child table, so
this commit also makes the planner set inFromCl to false in the
child tables' RTEs that it manufactures.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.comk
---
src/backend/executor/README | 36 +++++++++++++++++++++++-
src/backend/executor/execPartition.c | 2 ++
src/backend/executor/execUtils.c | 41 +++++++++++++++++++++-------
src/backend/optimizer/util/inherit.c | 7 +++++
src/backend/parser/analyze.c | 7 ++---
src/include/nodes/parsenodes.h | 8 ++++--
6 files changed, 84 insertions(+), 17 deletions(-)
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..6d2240610d 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,34 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Typically, when the executor initializes a plan tree for execution, it doesn't
+lock non-index relations if the plan tree is freshly generated and not derived
+from a CachedPlan. This is because such locks have already been established
+during the query's parsing, rewriting, and planning phases. However, with a
+cached plan tree, there can be relations that remain unlocked. The function
+GetCachedPlan() locks relations existing in the query's range table pre-planning
+but doesn't account for those added during the planning phase. Consequently,
+inheritance child tables, introduced to the query's range table during planning,
+won't be locked when the cached plan reaches the executor.
+
+The decision to defer locking child tables with GetCachedPlan() arises from the
+fact that not all might be accessed during plan execution. For instance, if
+child tables are partitions, some might be omitted due to pruning at
+execution-initialization-time. Thus, the responsibility of locking these child
+tables is pushed to execution-initialization-time, taking place in ExecInitNode()
+for plan nodes encompassing these tables.
+
+This approach opens a window where a cached plan tree with child tables could
+become outdated if another backend modifies these tables before ExecInitNode()
+locks them. Given this, the executor has the added duty to confirm the plan
+tree's validity whenever it locks a child table post execution-initialization-
+pruning. This validation is done by checking the CachedPlan.is_valid attribute
+of the CachedPlan provided. If the plan tree is outdated (is_valid=false), the
+executor halts any further initialization and alerts the caller that they should
+retry execution with another freshly created plan tree.
Query Processing Control Flow
-----------------------------
@@ -316,7 +344,13 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale during one of the recursive calls of ExecInitNode() after taking a
+lock on a child table, the control is immmediately returned to the caller of
+ExecutorStart(), which must redo the steps from CreateQueryDesc with a new
+plan tree.
+
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index eb8a87fd63..84978c5525 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1927,6 +1927,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (unlikely(partrel == NULL))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index f0f5740c26..117773706a 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -697,6 +697,8 @@ ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
*
* Open the heap relation to be scanned by a base-level scan plan node.
* This should be called during the node's ExecInit routine.
+ *
+ * NULL is returned if the relation is found to have been dropped.
* ----------------------------------------------------------------
*/
Relation
@@ -706,6 +708,8 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
/* Open the relation. */
rel = ExecGetRangeTableRelation(estate, scanrelid);
+ if (unlikely(rel == NULL))
+ return NULL;
/*
* Complain if we're attempting a scan of an unscannable relation, except
@@ -763,6 +767,9 @@ ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
* Open the Relation for a range table entry, if not already done
*
* The Relations will be closed again in ExecEndPlan().
+ *
+ * Returned value may be NULL if the relation is a child relation that is not
+ * already locked.
*/
Relation
ExecGetRangeTableRelation(EState *estate, Index rti)
@@ -779,7 +786,28 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (IsParallelWorker() ||
+ (estate->es_cachedplan != NULL && !rte->inFromCl))
+ {
+ /*
+ * Take a lock if we are a parallel worker or if this is a child
+ * table referenced in a cached plan.
+ *
+ * Parallel workers need to have their own local lock on the
+ * relation. This ensures sane behavior in case the parent process
+ * exits before we do.
+ *
+ * When executing a cached plan, child tables must be locked
+ * here, because plancache.c (GetCachedPlan()) would only have
+ * locked tables mentioned in the query, that is, tables whose
+ * RTEs' inFromCl is true.
+ *
+ * Note that we use try_table_open() here, because without a lock
+ * held on the relation, it may have disappeared from under us.
+ */
+ rel = try_table_open(rte->relid, rte->rellockmode);
+ }
+ else
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -792,15 +820,6 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rellockmode == AccessShareLock ||
CheckRelationLockedByMe(rel, rte->rellockmode, false));
}
- else
- {
- /*
- * If we are a parallel worker, we need to obtain our own local
- * lock on the relation. This ensures sane behavior in case the
- * parent process exits before we do.
- */
- rel = table_open(rte->relid, rte->rellockmode);
- }
estate->es_relations[rti - 1] = rel;
}
@@ -823,6 +842,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (unlikely(resultRelationDesc == NULL))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 94de855a22..1b30c0ff87 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -492,6 +492,13 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
}
else
childrte->inh = false;
+
+ /*
+ * Flag child tables as indirectly referenced in the query. This helps
+ * the executor's ExecGetRangeTableRelation() recognize them as
+ * inheritance children.
+ */
+ childrte->inFromCl = false;
childrte->securityQuals = NIL;
/*
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 7a1dfb6364..cf269f8c53 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -3305,10 +3305,9 @@ transformLockingClause(ParseState *pstate, Query *qry, LockingClause *lc,
/*
* Lock all regular tables used in query and its subqueries. We
* examine inFromCl to exclude auto-added RTEs, particularly NEW/OLD
- * in rules. This is a bit of an abuse of a mostly-obsolete flag, but
- * it's convenient. We can't rely on the namespace mechanism that has
- * largely replaced inFromCl, since for example we need to lock
- * base-relation RTEs even if they are masked by upper joins.
+ * in rules. We can't rely on the namespace mechanism since for
+ * example we need to lock base-relation RTEs even if they are masked
+ * by upper joins.
*/
i = 0;
foreach(rt, qry->rtable)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index fef4c714b8..d8b5d0c502 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -994,11 +994,15 @@ typedef struct PartitionCmd
*
* inFromCl marks those range variables that are listed in the FROM clause.
* It's false for RTEs that are added to a query behind the scenes, such
- * as the NEW and OLD variables for a rule, or the subqueries of a UNION.
+ * as the NEW and OLD variables for a rule, or the subqueries of a UNION,
+ * or the RTEs of inheritance child tables that are added by the planner.
* This flag is not used during parsing (except in transformLockingClause,
* q.v.); the parser now uses a separate "namespace" data structure to
* control visibility. But it is needed by ruleutils.c to determine
- * whether RTEs should be shown in decompiled queries.
+ * whether RTEs should be shown in decompiled queries. The executor uses
+ * this to ascertain if an RTE_RELATION entry is for a table explicitly
+ * named in the query or a child table added by the planner. This
+ * distinction is vital when child tables in a plan must be locked.
*
* securityQuals is a list of security barrier quals (boolean expressions),
* to be tested in the listed order before returning a row from the
--
2.35.3
v47-0007-Add-field-to-store-parent-relids-to-Append-Merge.patchapplication/octet-stream; name=v47-0007-Add-field-to-store-parent-relids-to-Append-Merge.patchDownload
From 3b2efe73e30649f7d2059b8a9d901b19a5986ad1 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:54:02 +0900
Subject: [PATCH v47 7/9] Add field to store parent relids to
Append/MergeAppend
There's no way currently in the executor to tell if the child
subplans of Append/MergeAppend are scanning partitions, and if
they indeed do, what the RT indexes of their parent/ancestor tables
are. Executor doesn't need to see their RT indexes except for
run-time pruning, in which case they can can be found in the
PartitionPruneInfo. A future commit will create a need for them to
be available at all times for the purpose of locking those
parent/ancestor tables when executing a cached plan, so add a
field called allpartrelids to Append/MergeAppend to store those
RT indexes. This also adds a function called
ExecLockAppendNonLeafTables() to lock those tables.
The code to look up partitioned parent relids for a given list of
partition scan subpaths of an Append/MergeAppend is already present
in make_partition_pruneinfo() but it's local to partprune.c. This
commit refactors that code into its own function called
add_append_subpath_partrelids() defined in appendinfo.c and
generalizes it to consider child join and aggregate paths. To
facilitate looking up of parent rels of child grouping rels in
add_append_subpath_partrelids(), parent links are now also set in
the RelOptInfos of child grouping rels too, like they are in
those of child base and join rels.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 2 +-
src/backend/executor/execUtils.c | 33 ++++++
src/backend/executor/nodeAppend.c | 14 +++
src/backend/executor/nodeMergeAppend.c | 14 +++
src/backend/optimizer/plan/createplan.c | 41 ++++++--
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/plan/setrefs.c | 4 +
src/backend/optimizer/util/appendinfo.c | 134 ++++++++++++++++++++++++
src/backend/partitioning/partprune.c | 124 +++-------------------
src/include/executor/executor.h | 1 +
src/include/nodes/plannodes.h | 14 +++
src/include/optimizer/appendinfo.h | 3 +
src/include/partitioning/partprune.h | 3 +-
13 files changed, 266 insertions(+), 124 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ffc62e379a..2804ec70f1 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1475,7 +1475,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked by the planner or ExecLockAppendNonLeafPartitions().
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 117773706a..2b7a08c9ba 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -827,6 +827,39 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
return rel;
}
+/*
+ * ExecLockAppendNonLeafPartitions
+ * Lock non-leaf partitions whose child partitions are scanned by a given
+ * Append/MergeAppend node
+ */
+void
+ExecLockAppendNonLeafPartitions(EState *estate, List *allpartrelids)
+{
+ ListCell *l;
+
+ /* This should get called only when executing cached plans. */
+ Assert(estate->es_cachedplan != NULL);
+ foreach(l, allpartrelids)
+ {
+ Bitmapset *partrelids = lfirst_node(Bitmapset, l);
+ int i = -1;
+
+ while ((i = bms_next_member(partrelids, i)) > 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(i, estate);
+
+ /*
+ * Don't lock the root parent mentioned in the query, because it
+ * should already have been locked before entering the executor.
+ */
+ if (!rte->inFromCl)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ Assert(CheckRelLockedByMe(rte->relid, rte->rellockmode, true));
+ }
+ }
+}
+
/*
* ExecInitResultRelation
* Open relation given by the passed-in RT index and fill its
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 53ca9dc85d..4759511f87 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -133,6 +133,20 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_syncdone = false;
appendstate->as_begun = false;
+ /*
+ * Lock non-leaf partitions whose leaf children are present in
+ * node->appendplans. Only need to do so if executing a cached
+ * plan, because child tables present in cached plans are not
+ * locked before execution.
+ *
+ * XXX - some of the non-leaf partitions may also be mentioned in
+ * part_prune_info, which if they are would get locked again in
+ * ExecInitPartitionPruning() because it calls
+ * ExecGetRangeTableRelation() which locks child tables.
+ */
+ if (estate->es_cachedplan)
+ ExecLockAppendNonLeafPartitions(estate, node->allpartrelids);
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 52c3edf278..158210aac1 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -81,6 +81,20 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.state = estate;
mergestate->ps.ExecProcNode = ExecMergeAppend;
+ /*
+ * Lock non-leaf partitions whose leaf children are present in
+ * node->mergeplans. Only need to do so if executing a cached
+ * plan, because child tables present in cached plans are not
+ * locked before execution.
+ *
+ * XXX - some of the non-leaf partitions may also be mentioned in
+ * part_prune_info, which if they are would get locked again in
+ * ExecInitPartitionPruning() because it calls
+ * ExecGetRangeTableRelation() which locks child tables.
+ */
+ if (estate->es_cachedplan)
+ ExecLockAppendNonLeafPartitions(estate, node->allpartrelids);
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 34ca6d4ac2..d1f4f606bf 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -25,6 +25,7 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
@@ -1229,6 +1230,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
Oid *nodeCollations = NULL;
bool *nodeNullsFirst = NULL;
bool consider_async = false;
+ List *allpartrelids = NIL;
/*
* The subpaths list could be empty, if every child was proven empty by
@@ -1370,15 +1372,23 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
++nasyncplans;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ plan->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1399,7 +1409,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
partpruneinfo =
make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
plan->appendplans = subplans;
@@ -1445,6 +1456,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ List *allpartrelids = NIL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1534,15 +1546,23 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplan = (Plan *) sort;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ node->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1554,7 +1574,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
partpruneinfo = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
node->mergeplans = subplans;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 44efb1f4eb..f97bc09113 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7855,8 +7855,11 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
agg_costs, gd, &child_extra,
&child_partially_grouped_rel);
+ /* Mark as child of grouped_rel. */
+ child_grouped_rel->parent = grouped_rel;
if (child_partially_grouped_rel)
{
+ child_partially_grouped_rel->parent = grouped_rel;
partially_grouped_live_children =
lappend(partially_grouped_live_children,
child_partially_grouped_rel);
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 5700bfb5cd..c235d3488f 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1766,6 +1766,8 @@ set_append_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) aplan, rtoffset);
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
+ foreach(l, aplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (aplan->part_prune_info)
{
@@ -1842,6 +1844,8 @@ set_mergeappend_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) mplan, rtoffset);
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
+ foreach(l, mplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (mplan->part_prune_info)
{
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index f456b3b0a4..5bd8e82b9b 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -41,6 +41,7 @@ static void make_inh_translation_list(Relation oldrelation,
AppendRelInfo *appinfo);
static Node *adjust_appendrel_attrs_mutator(Node *node,
adjust_appendrel_attrs_context *context);
+static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
/*
@@ -1035,3 +1036,136 @@ distribute_row_identity_vars(PlannerInfo *root)
}
}
}
+
+/*
+ * add_append_subpath_partrelids
+ * Look up a child subpath's rel's partitioned parent relids up to
+ * parentrel and add the bitmapset containing those into
+ * 'allpartrelids'
+ */
+List *
+add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids)
+{
+ RelOptInfo *prel = subpath->parent;
+ Relids partrelids = NULL;
+
+ /* Nothing to do if there's no parent to begin with. */
+ if (!IS_OTHER_REL(prel))
+ return allpartrelids;
+
+ /*
+ * Traverse up to the pathrel's topmost partitioned parent, collecting
+ * parent relids as we go; but stop if we reach parentrel. (Normally, a
+ * pathrel's topmost partitioned parent is either parentrel or a UNION ALL
+ * appendrel child of parentrel. But when handling partitionwise joins of
+ * multi-level partitioning trees, we can see an append path whose
+ * parentrel is an intermediate partitioned table.)
+ */
+ do
+ {
+ Relids parent_relids = NULL;
+
+ /*
+ * For simple child rels, we can simply set the parent_relids to
+ * prel->parent->relids. But for partitionwise join and aggregate
+ * child rels, while we can use prel->parent to move up the tree,
+ * parent_relids must be found the hard way through AppendInfoInfos,
+ * because 1) a joinrel's relids may point to RTE_JOIN entries,
+ * 2) topmost parent grouping rel's relids field is NULL.
+ */
+ if (IS_SIMPLE_REL(prel))
+ {
+ prel = prel->parent;
+ /* Stop once we reach the root partitioned rel. */
+ if (!IS_PARTITIONED_REL(prel))
+ break;
+ parent_relids = bms_add_members(parent_relids, prel->relids);
+ }
+ else
+ {
+ AppendRelInfo **appinfos;
+ int nappinfos,
+ i;
+
+ appinfos = find_appinfos_by_relids(root, prel->relids,
+ &nappinfos);
+ for (i = 0; i < nappinfos; i++)
+ {
+ AppendRelInfo *appinfo = appinfos[i];
+
+ parent_relids = bms_add_member(parent_relids,
+ appinfo->parent_relid);
+ }
+ pfree(appinfos);
+ prel = prel->parent;
+ }
+ /* accept this level as an interesting parent */
+ partrelids = bms_add_members(partrelids, parent_relids);
+ if (prel == parentrel)
+ break; /* don't traverse above parentrel */
+ } while (IS_OTHER_REL(prel));
+
+ if (partrelids == NULL)
+ return allpartrelids;
+
+ return add_part_relids(allpartrelids, partrelids);
+}
+
+/*
+ * add_part_relids
+ * Add new info to a list of Bitmapsets of partitioned relids.
+ *
+ * Within 'allpartrelids', there is one Bitmapset for each topmost parent
+ * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
+ * parent as well as its relevant non-leaf child partitions. Since (by
+ * construction of the rangetable list) parent partitions must have lower
+ * RT indexes than their children, we can distinguish the topmost parent
+ * as being the lowest set bit in the Bitmapset.
+ *
+ * 'partrelids' contains the RT indexes of a parent partitioned rel, and
+ * possibly some non-leaf children, that are newly identified as parents of
+ * some subpath rel passed to make_partition_pruneinfo(). These are added
+ * to an appropriate member of 'allpartrelids'.
+ *
+ * Note that the list contains only RT indexes of partitioned tables that
+ * are parents of some scan-level relation appearing in the 'subpaths' that
+ * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
+ * not allowed to be higher than the 'parentrel' associated with the append
+ * path. In this way, we avoid expending cycles on partitioned rels that
+ * can't contribute useful pruning information for the problem at hand.
+ * (It is possible for 'parentrel' to be a child partitioned table, and it
+ * is also possible for scan-level relations to be child partitioned tables
+ * rather than leaf partitions. Hence we must construct this relation set
+ * with reference to the particular append path we're dealing with, rather
+ * than looking at the full partitioning structure represented in the
+ * RelOptInfos.)
+ */
+static List *
+add_part_relids(List *allpartrelids, Bitmapset *partrelids)
+{
+ Index targetpart;
+ ListCell *lc;
+
+ /* We can easily get the lowest set bit this way: */
+ targetpart = bms_next_member(partrelids, -1);
+ Assert(targetpart > 0);
+
+ /* Look for a matching topmost parent */
+ foreach(lc, allpartrelids)
+ {
+ Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
+ Index currtarget = bms_next_member(currpartrelids, -1);
+
+ if (targetpart == currtarget)
+ {
+ /* Found a match, so add any new RT indexes to this hierarchy */
+ currpartrelids = bms_add_members(currpartrelids, partrelids);
+ lfirst(lc) = currpartrelids;
+ return allpartrelids;
+ }
+ }
+ /* No match, so add the new partition hierarchy to the list */
+ return lappend(allpartrelids, partrelids);
+}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 7179b22a05..213512a5f4 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -138,7 +138,6 @@ typedef struct PruneStepResult
} PruneStepResult;
-static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
RelOptInfo *parentrel,
List *prunequal,
@@ -218,33 +217,32 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'allpartrelids' contains Bitmapsets of RT indexes of partitioned parents
+ * whose partitions' Paths are in 'subpaths'; there's one Bitmapset for every
+ * partition tree involved.
*/
PartitionPruneInfo *
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ List *allpartrelids)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
- List *allpartrelids;
List *prunerelinfos;
int *relid_subplan_map;
ListCell *lc;
int i;
+ Assert(list_length(allpartrelids) > 0);
+
/*
- * Scan the subpaths to see which ones are scans of partition child
- * relations, and identify their parent partitioned rels. (Note: we must
- * restrict the parent partitioned rels to be parentrel or children of
- * parentrel, otherwise we couldn't translate prunequal to match.)
- *
- * Also construct a temporary array to map from partition-child-relation
- * relid to the index in 'subpaths' of the scan plan for that partition.
+ * Construct a temporary array to map from partition-child-relation relid
+ * to the index in 'subpaths' of the scan plan for that partition.
* (Use of "subplan" rather than "subpath" is a bit of a misnomer, but
* we'll let it stand.) For convenience, we use 1-based indexes here, so
* that zero can represent an un-filled array entry.
*/
- allpartrelids = NIL;
relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
i = 1;
@@ -253,50 +251,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Path *path = (Path *) lfirst(lc);
RelOptInfo *pathrel = path->parent;
- /* We don't consider partitioned joins here */
- if (pathrel->reloptkind == RELOPT_OTHER_MEMBER_REL)
- {
- RelOptInfo *prel = pathrel;
- Bitmapset *partrelids = NULL;
-
- /*
- * Traverse up to the pathrel's topmost partitioned parent,
- * collecting parent relids as we go; but stop if we reach
- * parentrel. (Normally, a pathrel's topmost partitioned parent
- * is either parentrel or a UNION ALL appendrel child of
- * parentrel. But when handling partitionwise joins of
- * multi-level partitioning trees, we can see an append path whose
- * parentrel is an intermediate partitioned table.)
- */
- do
- {
- AppendRelInfo *appinfo;
-
- Assert(prel->relid < root->simple_rel_array_size);
- appinfo = root->append_rel_array[prel->relid];
- prel = find_base_rel(root, appinfo->parent_relid);
- if (!IS_PARTITIONED_REL(prel))
- break; /* reached a non-partitioned parent */
- /* accept this level as an interesting parent */
- partrelids = bms_add_member(partrelids, prel->relid);
- if (prel == parentrel)
- break; /* don't traverse above parentrel */
- } while (prel->reloptkind == RELOPT_OTHER_MEMBER_REL);
-
- if (partrelids)
- {
- /*
- * Found some relevant parent partitions, which may or may not
- * overlap with partition trees we already found. Add new
- * information to the allpartrelids list.
- */
- allpartrelids = add_part_relids(allpartrelids, partrelids);
- /* Also record the subplan in relid_subplan_map[] */
- /* No duplicates please */
- Assert(relid_subplan_map[pathrel->relid] == 0);
- relid_subplan_map[pathrel->relid] = i;
- }
- }
+ /* No duplicates please */
+ Assert(relid_subplan_map[pathrel->relid] == 0);
+ relid_subplan_map[pathrel->relid] = i;
i++;
}
@@ -362,63 +319,6 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return pruneinfo;
}
-/*
- * add_part_relids
- * Add new info to a list of Bitmapsets of partitioned relids.
- *
- * Within 'allpartrelids', there is one Bitmapset for each topmost parent
- * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
- * parent as well as its relevant non-leaf child partitions. Since (by
- * construction of the rangetable list) parent partitions must have lower
- * RT indexes than their children, we can distinguish the topmost parent
- * as being the lowest set bit in the Bitmapset.
- *
- * 'partrelids' contains the RT indexes of a parent partitioned rel, and
- * possibly some non-leaf children, that are newly identified as parents of
- * some subpath rel passed to make_partition_pruneinfo(). These are added
- * to an appropriate member of 'allpartrelids'.
- *
- * Note that the list contains only RT indexes of partitioned tables that
- * are parents of some scan-level relation appearing in the 'subpaths' that
- * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
- * not allowed to be higher than the 'parentrel' associated with the append
- * path. In this way, we avoid expending cycles on partitioned rels that
- * can't contribute useful pruning information for the problem at hand.
- * (It is possible for 'parentrel' to be a child partitioned table, and it
- * is also possible for scan-level relations to be child partitioned tables
- * rather than leaf partitions. Hence we must construct this relation set
- * with reference to the particular append path we're dealing with, rather
- * than looking at the full partitioning structure represented in the
- * RelOptInfos.)
- */
-static List *
-add_part_relids(List *allpartrelids, Bitmapset *partrelids)
-{
- Index targetpart;
- ListCell *lc;
-
- /* We can easily get the lowest set bit this way: */
- targetpart = bms_next_member(partrelids, -1);
- Assert(targetpart > 0);
-
- /* Look for a matching topmost parent */
- foreach(lc, allpartrelids)
- {
- Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
- Index currtarget = bms_next_member(currpartrelids, -1);
-
- if (targetpart == currtarget)
- {
- /* Found a match, so add any new RT indexes to this hierarchy */
- currpartrelids = bms_add_members(currpartrelids, partrelids);
- lfirst(lc) = currpartrelids;
- return allpartrelids;
- }
- }
- /* No match, so add the new partition hierarchy to the list */
- return lappend(allpartrelids, partrelids);
-}
-
/*
* make_partitionedrel_pruneinfo
* Build a List of PartitionedRelPruneInfos, one for each interesting
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 10c5cda169..74a471e3e3 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -601,6 +601,7 @@ exec_rt_fetch(Index rti, EState *estate)
extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Index rti);
+extern void ExecLockAppendNonLeafPartitions(EState *estate, List *allpartrelids);
extern int executor_errposition(EState *estate, int location);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 1b787fe031..7a5f3ba625 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -267,6 +267,13 @@ typedef struct Append
List *appendplans;
int nasyncplans; /* # of asynchronous plans */
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * appendplans. The list contains a bitmapset for every partition tree
+ * covered by this Append.
+ */
+ List *allpartrelids;
+
/*
* All 'appendplans' preceding this index are non-partial plans. All
* 'appendplans' from this index onwards are partial plans.
@@ -291,6 +298,13 @@ typedef struct MergeAppend
List *mergeplans;
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * mergeplans. The list contains a bitmapset for every partition tree
+ * covered by this MergeAppend.
+ */
+ List *allpartrelids;
+
/* these fields are just like the sort-key info in struct Sort: */
/* number of sort-key columns */
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index a05f91f77d..1621a7319a 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -46,5 +46,8 @@ extern void add_row_identity_columns(PlannerInfo *root, Index rtindex,
RangeTblEntry *target_rte,
Relation target_relation);
extern void distribute_row_identity_vars(PlannerInfo *root);
+extern List *add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids);
#endif /* APPENDINFO_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 8636e04e37..caa774a111 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -73,7 +73,8 @@ typedef struct PartitionPruneContext
extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ List *allpartrelids);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v47-0009-Track-opened-range-table-relations-in-a-List-in-.patchapplication/octet-stream; name=v47-0009-Track-opened-range-table-relations-in-a-List-in-.patchDownload
From 1bc15cf1211111a6986e097f5a7ac3899c9a784b Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:54:19 +0900
Subject: [PATCH v47 9/9] Track opened range table relations in a List in
EState
This makes ExecCloseRangeTableRelations faster when there are many
relations in the range table but only a few are opened during
execution, such as when run-time pruning kicks in on an Append
containing thousands of partition subplans.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 9 +++++----
src/backend/executor/execUtils.c | 3 +++
src/include/nodes/execnodes.h | 2 ++
3 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2804ec70f1..d559c1de61 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1649,12 +1649,13 @@ ExecCloseResultRelations(EState *estate)
void
ExecCloseRangeTableRelations(EState *estate)
{
- int i;
+ ListCell *lc;
- for (i = 0; i < estate->es_range_table_size; i++)
+ foreach(lc, estate->es_opened_relations)
{
- if (estate->es_relations[i])
- table_close(estate->es_relations[i], NoLock);
+ Relation rel = lfirst(lc);
+
+ table_close(rel, NoLock);
}
}
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 2b7a08c9ba..1dfef44495 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -822,6 +822,9 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
}
estate->es_relations[rti - 1] = rel;
+ if (rel != NULL)
+ estate->es_opened_relations = lappend(estate->es_opened_relations,
+ rel);
}
return rel;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index bb5734edb5..8bbe1f6b14 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,8 @@ typedef struct EState
Index es_range_table_size; /* size of the range table arrays */
Relation *es_relations; /* Array of per-range-table-entry Relation
* pointers, or NULL if not yet opened */
+ List *es_opened_relations; /* List of non-NULL entries in
+ * es_relations in no specific order */
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
--
2.35.3
v47-0006-Assert-that-relations-needing-their-permissions-.patchapplication/octet-stream; name=v47-0006-Assert-that-relations-needing-their-permissions-.patchDownload
From 2b3187b0443f8569ddbb3fd55cb72f1c0af34a43 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Mon, 25 Sep 2023 11:52:02 +0900
Subject: [PATCH v47 6/9] Assert that relations needing their permissions
checked are locked
---
src/backend/executor/execMain.c | 11 +++++++
src/backend/storage/lmgr/lmgr.c | 45 +++++++++++++++++++++++++++++
src/backend/utils/cache/lsyscache.c | 21 ++++++++++++++
src/include/storage/lmgr.h | 1 +
src/include/utils/lsyscache.h | 1 +
5 files changed, 79 insertions(+)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 5755336abd..ffc62e379a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -626,6 +626,17 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
(rte->rtekind == RTE_SUBQUERY &&
rte->relkind == RELKIND_VIEW));
+ /*
+ * Relations whose permissions need to be checked must already
+ * have been locked by the parser or by GetCachedPlan() if a
+ * cached plan is being executed.
+ *
+ * XXX Maybe we should we skip calling ExecCheckPermissions from
+ * InitPlan in a parallel worker.
+ */
+ Assert(IsParallelWorker() ||
+ CheckRelLockedByMe(rte->relid, AccessShareLock, true));
+
(void) getRTEPermissionInfo(rteperminfos, rte);
/* Many-to-one mapping not allowed */
Assert(!bms_is_member(rte->perminfoindex, indexset));
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a672..c807e9cdcc 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -27,6 +27,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
@@ -364,6 +365,50 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
return false;
}
+/*
+ * CheckRelLockedByMe
+ *
+ * Returns true if current transaction holds a lock on the given relation of
+ * mode 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
+ */
+bool
+CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+ Oid dbId = get_rel_relisshared(relid) ? InvalidOid : MyDatabaseId;
+ LOCKTAG tag;
+
+ SET_LOCKTAG_RELATION(tag, dbId, relid);
+
+ if (LockHeldByMe(&tag, lockmode))
+ return true;
+
+ if (orstronger)
+ {
+ LOCKMODE slockmode;
+
+ for (slockmode = lockmode + 1;
+ slockmode <= MaxLockMode;
+ slockmode++)
+ {
+ if (LockHeldByMe(&tag, slockmode))
+ {
+#ifdef NOT_USED
+ /* Sometimes this might be useful for debugging purposes */
+ elog(WARNING, "lock mode %s substituted for %s on relation %s",
+ GetLockmodeName(tag.locktag_lockmethodid, slockmode),
+ GetLockmodeName(tag.locktag_lockmethodid, lockmode),
+ RelationGetRelationName(relation));
+#endif
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
/*
* LockHasWaitersRelation
*
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index fc6d267e44..2725d02312 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2095,6 +2095,27 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * get_rel_relisshared
+ *
+ * Returns if the given relation is shared or not
+ */
+bool
+get_rel_relisshared(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relisshared;
+ ReleaseSysCache(tp);
+
+ return result;
+}
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 4ee91e3cf9..598bf2688a 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,7 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
bool orstronger);
+extern bool CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger);
extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index f5fdbfe116..a024e5dcd0 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -140,6 +140,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_relisshared(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
--
2.35.3
v47-0008-Delay-locking-of-child-tables-in-cached-plans-un.patchapplication/octet-stream; name=v47-0008-Delay-locking-of-child-tables-in-cached-plans-un.patchDownload
From a9e257693790a8b3823b598edd071de7478b6bd0 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:54:15 +0900
Subject: [PATCH v47 8/9] Delay locking of child tables in cached plans until
ExecutorStart()
Currently, GetCachedPlan() takes a lock on all relations contained in
a cached plan before returning it as a valid plan to its callers for
execution. One disadvantage is that if the plan contains partitions
that are prunable with conditions involving EXTERN parameters and
other stable expressions (known as "initial pruning"), many of them
would be locked unnecessarily, because only those that survive
initial pruning need to have been locked. Locking all partitions this
way causes significant delay when there are many partitions. Note
that initial pruning occurs during executor's initialization of the
plan, that is, ExecInitNode().
Previous commits have made all the necessary adjustment to make the
executor lock child tables, to detect invalidation of the CachedPlan
resulting from that, and to retry the execution with a new CachePlan.
So, this commit simply removes the code in plancache.c that does the
"for execution" locking, aka AcquireExecutorLocks().
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/spi.c | 2 +-
src/backend/tcop/pquery.c | 6 +-
src/backend/utils/cache/plancache.c | 154 +++++++----------
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 67 +++++++-
.../expected/cached-plan-replan.out | 158 ++++++++++++++++++
.../specs/cached-plan-replan.spec | 61 +++++++
7 files changed, 343 insertions(+), 108 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-replan.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-replan.spec
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 814ff1390f..9c4ed74240 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2680,7 +2680,7 @@ replan:
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
- NULL,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index fcf9925ed4..8d0772ae29 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -412,7 +412,7 @@ PortalStart(Portal portal, ParamListInfo params,
* set the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
- NULL,
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -443,6 +443,7 @@ PortalStart(Portal portal, ParamListInfo params,
*/
if (!ExecutorStart(queryDesc, myeflags))
{
+ Assert(queryDesc->cplan);
ExecutorEnd(queryDesc);
FreeQueryDesc(queryDesc);
PopActiveSnapshot();
@@ -542,7 +543,7 @@ PortalStart(Portal portal, ParamListInfo params,
* PortalRunMulti() before calling ExecutorRun().
*/
queryDesc = CreateQueryDesc(plan,
- NULL,
+ portal->cplan,
portal->sourceText,
!is_utility ?
GetActiveSnapshot() :
@@ -566,6 +567,7 @@ PortalStart(Portal portal, ParamListInfo params,
if (!ExecutorStart(queryDesc, myeflags))
{
PopActiveSnapshot();
+ Assert(queryDesc->cplan);
ExecutorEnd(queryDesc);
FreeQueryDesc(queryDesc);
plan_valid = false;
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 7d4168f82f..35d903cb98 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -104,13 +104,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -792,8 +792,13 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * If the plan includes child relations introduced by the planner, they
+ * wouldn't be locked yet. This is because AcquirePlannerLocks() only locks
+ * relations present in the original query's range table (before planner
+ * entry). Hence, the plan might become stale if child relations are modified
+ * concurrently. During the plan initialization, the executor must ensure the
+ * plan (CachedPlan) remains valid after locking each child table. If found
+ * invalid, the caller should be prompted to recreate the plan.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -807,60 +812,56 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
-
- return false;
+ return cplan->is_valid;
}
/*
@@ -1130,8 +1131,16 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * Typically, the plan returned by this function is valid. However, a caveat
+ * arises with inheritance/partition child tables. These aren't locked by
+ * this function, as we only lock tables directly mentioned in the original
+ * query here. The task of locking these child tables falls to the executor
+ * during plan tree setup. If acquiring these locks invalidates the plan, the
+ * executor should inform the caller to regenerate the plan by invoking this
+ * function again. The reason for this deferred child table locking mechanism
+ * is efficiency: not all might need to be locked. Some could be pruned during
+ * executor initialization, especially if their corresponding plan nodes
+ * facilitate partition pruning.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1166,7 +1175,10 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
{
if (CheckCachedPlan(plansource))
{
- /* We want a generic plan, and we already have a valid one */
+ /*
+ * We want a generic plan, and we already have a valid one, though
+ * see the header comment.
+ */
plan = plansource->gplan;
Assert(plan->magic == CACHEDPLAN_MAGIC);
}
@@ -1364,8 +1376,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if the executor would need to take additional locks, that is, in
+ * addition to those taken by AcquirePlannerLocks() on a given query.
*/
foreach(lc, plan->stmt_list)
{
@@ -1741,58 +1753,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..2fca84d027 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-replan
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7cd76eb34b..ce189156ad 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2023, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,45 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static bool
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ bool plan_valid;
+
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ plan_valid = prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ plan_valid ? "valid" : "not valid");
+
+ return plan_valid;
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +127,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-replan.out b/src/test/modules/delay_execution/expected/cached-plan-replan.out
new file mode 100644
index 0000000000..122d81f2ee
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-replan.out
@@ -0,0 +1,158 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+----------------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo11 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo11_a
+ Index Cond: (a = $1)
+(7 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-----------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo11 foo_1
+ Filter: (a = $1)
+(5 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+----------------------------------
+Bitmap Heap Scan on foo11 foo
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on foo11_a
+ Index Cond: (a = 1)
+(4 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------
+Seq Scan on foo11 foo
+ Filter: (a = 1)
+(2 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+------------------------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Index Only Scan using foo11_a on foo11 t1
+ -> Materialize
+ -> Index Scan using foo11_a on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(18 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Sort
+ Sort Key: t1.a
+ -> Seq Scan on foo11 t1
+ -> Sort
+ Sort Key: t2.a
+ -> Seq Scan on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(21 rows)
+
diff --git a/src/test/modules/delay_execution/specs/cached-plan-replan.spec b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
new file mode 100644
index 0000000000..2d0607b176
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
@@ -0,0 +1,61 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo1 PARTITION OF foo FOR VALUES IN (1) PARTITION BY LIST (a);
+ CREATE TABLE foo11 PARTITION OF foo1 FOR VALUES IN (1);
+ CREATE INDEX foo11_a ON foo11 (a);
+ CREATE TABLE foo2 PARTITION OF foo FOR VALUES IN (2);
+ CREATE VIEW foov AS SELECT * FROM foo;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP TABLE foo;
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# no Append case (only one partition selected by the planner)
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Append with partition-wise join aggregate and join plans as child subplans
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo11_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
--
2.35.3
v47-0002-Check-pointer-NULLness-before-cleanup-in-ExecEnd.patchapplication/octet-stream; name=v47-0002-Check-pointer-NULLness-before-cleanup-in-ExecEnd.patchDownload
From 95af485bf11e05d49452601e0673ee3c415176dd Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:53:16 +0900
Subject: [PATCH v47 2/9] Check pointer NULLness before cleanup in ExecEnd*
routines
Many routines already perform this check, but a few instances remain.
Currently, these NULLness checks might seem redundant since ExecEnd*
routines operate under the assumption that their matching ExecInit*
routine would have fully executed, ensuring pointers are set. However,
a forthcoming patch will modify ExecInit* routines to sometimes exit
early, potentially leaving some pointers in an undetermined state,
so it will become crucial to have these NULLness checks in place.
Other than the ExecEnd* routines, this also adds a guard at the
begigging of EvalPlanQualEnd() to return early if the EPQState does
not appear to have been initialized.
While at it, set cleaned up pointers to NULL more consistently across
ExecEnd* routines. Also for consistency, change all the NULLness
checks to follow the "if (pointer != NULL)" style over "if (pointer)".
Reviewed-by: Robert Haas
---
contrib/postgres_fdw/postgres_fdw.c | 5 ++-
src/backend/executor/execMain.c | 4 ++
src/backend/executor/nodeAgg.c | 27 +++++++++----
src/backend/executor/nodeAppend.c | 3 ++
src/backend/executor/nodeBitmapAnd.c | 4 +-
src/backend/executor/nodeBitmapHeapscan.c | 47 +++++++++++++++-------
src/backend/executor/nodeBitmapIndexscan.c | 23 +++++------
src/backend/executor/nodeBitmapOr.c | 4 +-
src/backend/executor/nodeForeignscan.c | 17 ++++----
src/backend/executor/nodeGather.c | 1 +
src/backend/executor/nodeGatherMerge.c | 1 +
src/backend/executor/nodeGroup.c | 6 +--
src/backend/executor/nodeHash.c | 6 +--
src/backend/executor/nodeHashjoin.c | 4 +-
src/backend/executor/nodeIncrementalSort.c | 13 +++++-
src/backend/executor/nodeIndexonlyscan.c | 25 ++++++------
src/backend/executor/nodeIndexscan.c | 23 +++++------
src/backend/executor/nodeLimit.c | 1 +
src/backend/executor/nodeLockRows.c | 1 +
src/backend/executor/nodeMaterial.c | 5 ++-
src/backend/executor/nodeMemoize.c | 8 +++-
src/backend/executor/nodeMergeAppend.c | 3 ++
src/backend/executor/nodeMergejoin.c | 2 +
src/backend/executor/nodeModifyTable.c | 11 ++++-
src/backend/executor/nodeNestloop.c | 2 +
src/backend/executor/nodeProjectSet.c | 1 +
src/backend/executor/nodeRecursiveunion.c | 24 +++++++++--
src/backend/executor/nodeResult.c | 1 +
src/backend/executor/nodeSamplescan.c | 7 +++-
src/backend/executor/nodeSeqscan.c | 16 +++-----
src/backend/executor/nodeSetOp.c | 6 ++-
src/backend/executor/nodeSort.c | 5 ++-
src/backend/executor/nodeSubqueryscan.c | 1 +
src/backend/executor/nodeTableFuncscan.c | 4 +-
src/backend/executor/nodeTidrangescan.c | 12 ++++--
src/backend/executor/nodeTidscan.c | 8 +++-
src/backend/executor/nodeUnique.c | 1 +
src/backend/executor/nodeWindowAgg.c | 41 ++++++++++++++-----
38 files changed, 252 insertions(+), 121 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 1393716587..802f76c73e 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2126,7 +2126,10 @@ postgresEndForeignModify(EState *estate,
{
PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
- /* If fmstate is NULL, we are in EXPLAIN; nothing to do */
+ /*
+ * If fmstate is NULL, we are either in EXPLAIN or if BeginForeignModify
+ * wasn't called; nothing to do in any case.
+ */
if (fmstate == NULL)
return;
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4c5a7bbf62..f7f18d3054 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -3010,6 +3010,10 @@ EvalPlanQualEnd(EPQState *epqstate)
MemoryContext oldcontext;
ListCell *l;
+ /* Nothing to do if no EvalPlanQualInit() was done to begin with. */
+ if (epqstate->parentestate == NULL)
+ return;
+
rtsize = epqstate->parentestate->es_range_table_size;
/*
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index f154f28902..af22b1676f 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -4304,7 +4304,6 @@ GetAggInitVal(Datum textInitVal, Oid transtype)
void
ExecEndAgg(AggState *node)
{
- PlanState *outerPlan;
int transno;
int numGroupingSets = Max(node->maxsets, 1);
int setno;
@@ -4314,7 +4313,7 @@ ExecEndAgg(AggState *node)
* worker back into shared memory so that it can be picked up by the main
* process to report in EXPLAIN ANALYZE.
*/
- if (node->shared_info && IsParallelWorker())
+ if (node->shared_info != NULL && IsParallelWorker())
{
AggregateInstrumentation *si;
@@ -4327,10 +4326,16 @@ ExecEndAgg(AggState *node)
/* Make sure we have closed any open tuplesorts */
- if (node->sort_in)
+ if (node->sort_in != NULL)
+ {
tuplesort_end(node->sort_in);
- if (node->sort_out)
+ node->sort_in = NULL;
+ }
+ if (node->sort_out != NULL)
+ {
tuplesort_end(node->sort_out);
+ node->sort_out = NULL;
+ }
hashagg_reset_spill_state(node);
@@ -4346,19 +4351,25 @@ ExecEndAgg(AggState *node)
for (setno = 0; setno < numGroupingSets; setno++)
{
- if (pertrans->sortstates[setno])
+ if (pertrans->sortstates[setno] != NULL)
tuplesort_end(pertrans->sortstates[setno]);
}
}
/* And ensure any agg shutdown callbacks have been called */
for (setno = 0; setno < numGroupingSets; setno++)
+ {
ReScanExprContext(node->aggcontexts[setno]);
- if (node->hashcontext)
+ node->aggcontexts[setno] = NULL;
+ }
+ if (node->hashcontext != NULL)
+ {
ReScanExprContext(node->hashcontext);
+ node->hashcontext = NULL;
+ }
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..a2af221e05 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -399,7 +399,10 @@ ExecEndAppend(AppendState *node)
* shut down each of the subscans
*/
for (i = 0; i < nplans; i++)
+ {
ExecEndNode(appendplans[i]);
+ appendplans[i] = NULL;
+ }
}
void
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4c5eb2b23b..4abb0609a0 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -192,8 +192,8 @@ ExecEndBitmapAnd(BitmapAndState *node)
*/
for (i = 0; i < nplans; i++)
{
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
+ ExecEndNode(bitmapplans[i]);
+ bitmapplans[i] = NULL;
}
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 2db0acfc76..d3f58c22f9 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -648,40 +648,59 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node)
void
ExecEndBitmapHeapScan(BitmapHeapScanState *node)
{
- TableScanDesc scanDesc;
-
- /*
- * extract information from the node
- */
- scanDesc = node->ss.ss_currentScanDesc;
-
/*
* close down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
/*
* release bitmaps and buffers if any
*/
- if (node->tbmiterator)
+ if (node->tbmiterator != NULL)
+ {
tbm_end_iterate(node->tbmiterator);
- if (node->prefetch_iterator)
+ node->tbmiterator = NULL;
+ }
+ if (node->prefetch_iterator != NULL)
+ {
tbm_end_iterate(node->prefetch_iterator);
- if (node->tbm)
+ node->prefetch_iterator = NULL;
+ }
+ if (node->tbm != NULL)
+ {
tbm_free(node->tbm);
- if (node->shared_tbmiterator)
+ node->tbm = NULL;
+ }
+ if (node->shared_tbmiterator != NULL)
+ {
tbm_end_shared_iterate(node->shared_tbmiterator);
- if (node->shared_prefetch_iterator)
+ node->shared_tbmiterator = NULL;
+ }
+ if (node->shared_prefetch_iterator != NULL)
+ {
tbm_end_shared_iterate(node->shared_prefetch_iterator);
+ node->shared_prefetch_iterator = NULL;
+ }
if (node->vmbuffer != InvalidBuffer)
+ {
ReleaseBuffer(node->vmbuffer);
+ node->vmbuffer = InvalidBuffer;
+ }
if (node->pvmbuffer != InvalidBuffer)
+ {
ReleaseBuffer(node->pvmbuffer);
+ node->pvmbuffer = InvalidBuffer;
+ }
/*
- * close heap scan
+ * close heap scan (no-op if we didn't start it)
*/
- table_endscan(scanDesc);
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
+ table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 7cf8532bc9..488f11a3ff 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -175,22 +175,21 @@ ExecReScanBitmapIndexScan(BitmapIndexScanState *node)
void
ExecEndBitmapIndexScan(BitmapIndexScanState *node)
{
- Relation indexRelationDesc;
- IndexScanDesc indexScanDesc;
-
- /*
- * extract information from the node
- */
- indexRelationDesc = node->biss_RelationDesc;
- indexScanDesc = node->biss_ScanDesc;
+ /* close the scan (no-op if we didn't start it) */
+ if (node->biss_ScanDesc != NULL)
+ {
+ index_endscan(node->biss_ScanDesc);
+ node->biss_ScanDesc = NULL;
+ }
/*
* close the index relation (no-op if we didn't open it)
*/
- if (indexScanDesc)
- index_endscan(indexScanDesc);
- if (indexRelationDesc)
- index_close(indexRelationDesc, NoLock);
+ if (node->biss_RelationDesc != NULL)
+ {
+ index_close(node->biss_RelationDesc, NoLock);
+ node->biss_RelationDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 0bf8af9652..ace18593aa 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -210,8 +210,8 @@ ExecEndBitmapOr(BitmapOrState *node)
*/
for (i = 0; i < nplans; i++)
{
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
+ ExecEndNode(bitmapplans[i]);
+ bitmapplans[i] = NULL;
}
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 73913ebb18..3aba28285a 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -301,17 +301,20 @@ ExecEndForeignScan(ForeignScanState *node)
EState *estate = node->ss.ps.state;
/* Let the FDW shut down */
- if (plan->operation != CMD_SELECT)
+ if (node->fdwroutine != NULL)
{
- if (estate->es_epq_active == NULL)
- node->fdwroutine->EndDirectModify(node);
+ if (plan->operation != CMD_SELECT)
+ {
+ if (estate->es_epq_active == NULL)
+ node->fdwroutine->EndDirectModify(node);
+ }
+ else
+ node->fdwroutine->EndForeignScan(node);
}
- else
- node->fdwroutine->EndForeignScan(node);
/* Shut down any outer plan. */
- if (outerPlanState(node))
- ExecEndNode(outerPlanState(node));
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index bb2500a469..1a3c8abdad 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -249,6 +249,7 @@ void
ExecEndGather(GatherState *node)
{
ExecEndNode(outerPlanState(node)); /* let children clean up first */
+ outerPlanState(node) = NULL;
ExecShutdownGather(node);
}
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 7a71a58509..c6fb45fee0 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -289,6 +289,7 @@ void
ExecEndGatherMerge(GatherMergeState *node)
{
ExecEndNode(outerPlanState(node)); /* let children clean up first */
+ outerPlanState(node) = NULL;
ExecShutdownGatherMerge(node);
}
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 8c650f0e46..6dfe5a1d23 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -226,10 +226,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
void
ExecEndGroup(GroupState *node)
{
- PlanState *outerPlan;
-
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index e72f0986c2..88ba336882 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -413,13 +413,11 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
void
ExecEndHash(HashState *node)
{
- PlanState *outerPlan;
-
/*
* shut down the subplan
*/
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index aea44a9d56..6dc43b9ff2 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -861,7 +861,7 @@ ExecEndHashJoin(HashJoinState *node)
/*
* Free hash table
*/
- if (node->hj_HashTable)
+ if (node->hj_HashTable != NULL)
{
ExecHashTableDestroy(node->hj_HashTable);
node->hj_HashTable = NULL;
@@ -871,7 +871,9 @@ ExecEndHashJoin(HashJoinState *node)
* clean up subtrees
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
}
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index cd094a190c..28a0e81cb3 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1079,8 +1079,16 @@ ExecEndIncrementalSort(IncrementalSortState *node)
{
SO_printf("ExecEndIncrementalSort: shutting down sort node\n");
- ExecDropSingleTupleTableSlot(node->group_pivot);
- ExecDropSingleTupleTableSlot(node->transfer_tuple);
+ if (node->group_pivot != NULL)
+ {
+ ExecDropSingleTupleTableSlot(node->group_pivot);
+ node->group_pivot = NULL;
+ }
+ if (node->transfer_tuple != NULL)
+ {
+ ExecDropSingleTupleTableSlot(node->transfer_tuple);
+ node->transfer_tuple = NULL;
+ }
/*
* Release tuplesort resources.
@@ -1100,6 +1108,7 @@ ExecEndIncrementalSort(IncrementalSortState *node)
* Shut down the subplan.
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
SO_printf("ExecEndIncrementalSort: sort node shutdown\n");
}
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f1db35665c..1f3843abe9 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -364,15 +364,6 @@ ExecReScanIndexOnlyScan(IndexOnlyScanState *node)
void
ExecEndIndexOnlyScan(IndexOnlyScanState *node)
{
- Relation indexRelationDesc;
- IndexScanDesc indexScanDesc;
-
- /*
- * extract information from the node
- */
- indexRelationDesc = node->ioss_RelationDesc;
- indexScanDesc = node->ioss_ScanDesc;
-
/* Release VM buffer pin, if any. */
if (node->ioss_VMBuffer != InvalidBuffer)
{
@@ -380,13 +371,21 @@ ExecEndIndexOnlyScan(IndexOnlyScanState *node)
node->ioss_VMBuffer = InvalidBuffer;
}
+ /* close the scan (no-op if we didn't start it) */
+ if (node->ioss_ScanDesc != NULL)
+ {
+ index_endscan(node->ioss_ScanDesc);
+ node->ioss_ScanDesc = NULL;
+ }
+
/*
* close the index relation (no-op if we didn't open it)
*/
- if (indexScanDesc)
- index_endscan(indexScanDesc);
- if (indexRelationDesc)
- index_close(indexRelationDesc, NoLock);
+ if (node->ioss_RelationDesc != NULL)
+ {
+ index_close(node->ioss_RelationDesc, NoLock);
+ node->ioss_RelationDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 14b9c00217..32e1714f15 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -785,22 +785,21 @@ ExecIndexAdvanceArrayKeys(IndexArrayKeyInfo *arrayKeys, int numArrayKeys)
void
ExecEndIndexScan(IndexScanState *node)
{
- Relation indexRelationDesc;
- IndexScanDesc indexScanDesc;
-
- /*
- * extract information from the node
- */
- indexRelationDesc = node->iss_RelationDesc;
- indexScanDesc = node->iss_ScanDesc;
+ /* close the scan (no-op if we didn't start it) */
+ if (node->iss_ScanDesc != NULL)
+ {
+ index_endscan(node->iss_ScanDesc);
+ node->iss_ScanDesc = NULL;
+ }
/*
* close the index relation (no-op if we didn't open it)
*/
- if (indexScanDesc)
- index_endscan(indexScanDesc);
- if (indexRelationDesc)
- index_close(indexRelationDesc, NoLock);
+ if (node->iss_RelationDesc != NULL)
+ {
+ index_close(node->iss_RelationDesc, NoLock);
+ node->iss_RelationDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 5654158e3e..a97bac9f6d 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -535,6 +535,7 @@ void
ExecEndLimit(LimitState *node)
{
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index e459971d32..26fbe95c57 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -387,6 +387,7 @@ ExecEndLockRows(LockRowsState *node)
/* We may have shut down EPQ already, but no harm in another call */
EvalPlanQualEnd(&node->lr_epqstate);
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 753ea28915..03c514900b 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -243,13 +243,16 @@ ExecEndMaterial(MaterialState *node)
* Release tuplestore resources
*/
if (node->tuplestorestate != NULL)
+ {
tuplestore_end(node->tuplestorestate);
- node->tuplestorestate = NULL;
+ node->tuplestorestate = NULL;
+ }
/*
* shut down the subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 94bf479287..ee4749c852 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -1043,6 +1043,7 @@ ExecEndMemoize(MemoizeState *node)
{
#ifdef USE_ASSERT_CHECKING
/* Validate the memory accounting code is correct in assert builds. */
+ if (node->hashtable != NULL)
{
int count;
uint64 mem = 0;
@@ -1089,12 +1090,17 @@ ExecEndMemoize(MemoizeState *node)
}
/* Remove the cache context */
- MemoryContextDelete(node->tableContext);
+ if (node->tableContext != NULL)
+ {
+ MemoryContextDelete(node->tableContext);
+ node->tableContext = NULL;
+ }
/*
* shut down the subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 21b5726e6e..0a42a04b19 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -333,7 +333,10 @@ ExecEndMergeAppend(MergeAppendState *node)
* shut down each of the subscans
*/
for (i = 0; i < nplans; i++)
+ {
ExecEndNode(mergeplans[i]);
+ mergeplans[i] = NULL;
+ }
}
void
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 648fdd9a5f..4d7d73a684 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1646,7 +1646,9 @@ ExecEndMergeJoin(MergeJoinState *node)
* shut down the subplans
*/
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
MJ1_printf("ExecEndMergeJoin: %s\n",
"node processing ended");
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index d21a178ad5..ea043c57c1 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4430,7 +4430,9 @@ ExecEndModifyTable(ModifyTableState *node)
for (j = 0; j < resultRelInfo->ri_NumSlotsInitialized; j++)
{
ExecDropSingleTupleTableSlot(resultRelInfo->ri_Slots[j]);
+ resultRelInfo->ri_Slots[j] = NULL;
ExecDropSingleTupleTableSlot(resultRelInfo->ri_PlanSlots[j]);
+ resultRelInfo->ri_PlanSlots[j] = NULL;
}
}
@@ -4438,12 +4440,16 @@ ExecEndModifyTable(ModifyTableState *node)
* Close all the partitioned tables, leaf partitions, and their indices
* and release the slot used for tuple routing, if set.
*/
- if (node->mt_partition_tuple_routing)
+ if (node->mt_partition_tuple_routing != NULL)
{
ExecCleanupTupleRouting(node, node->mt_partition_tuple_routing);
+ node->mt_partition_tuple_routing = NULL;
- if (node->mt_root_tuple_slot)
+ if (node->mt_root_tuple_slot != NULL)
+ {
ExecDropSingleTupleTableSlot(node->mt_root_tuple_slot);
+ node->mt_root_tuple_slot = NULL;
+ }
}
/*
@@ -4455,6 +4461,7 @@ ExecEndModifyTable(ModifyTableState *node)
* shut down subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index fc8f833d8b..76e02449f5 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -367,7 +367,9 @@ ExecEndNestLoop(NestLoopState *node)
* close down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
NL1_printf("ExecEndNestLoop: %s\n",
"node processing ended");
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index b4bbdc89b1..e9b96416d3 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -324,6 +324,7 @@ ExecEndProjectSet(ProjectSetState *node)
* shut down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index e781003934..f6d60bcd6c 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -272,20 +272,36 @@ void
ExecEndRecursiveUnion(RecursiveUnionState *node)
{
/* Release tuplestores */
- tuplestore_end(node->working_table);
- tuplestore_end(node->intermediate_table);
+ if (node->working_table != NULL)
+ {
+ tuplestore_end(node->working_table);
+ node->working_table = NULL;
+ }
+ if (node->intermediate_table != NULL)
+ {
+ tuplestore_end(node->intermediate_table);
+ node->intermediate_table = NULL;
+ }
/* free subsidiary stuff including hashtable */
- if (node->tempContext)
+ if (node->tempContext != NULL)
+ {
MemoryContextDelete(node->tempContext);
- if (node->tableContext)
+ node->tempContext = NULL;
+ }
+ if (node->tableContext != NULL)
+ {
MemoryContextDelete(node->tableContext);
+ node->tableContext = NULL;
+ }
/*
* close down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index e9f5732f33..f15902e840 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -244,6 +244,7 @@ ExecEndResult(ResultState *node)
* shut down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 41c1ea37ad..a6813559e6 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -185,14 +185,17 @@ ExecEndSampleScan(SampleScanState *node)
/*
* Tell sampling function that we finished the scan.
*/
- if (node->tsmroutine->EndSampleScan)
+ if (node->tsmroutine != NULL && node->tsmroutine->EndSampleScan)
node->tsmroutine->EndSampleScan(node);
/*
- * close heap scan
+ * close heap scan (no-op if we didn't start it)
*/
if (node->ss.ss_currentScanDesc)
+ {
table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 49a5933aff..911266da07 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -183,18 +183,14 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
void
ExecEndSeqScan(SeqScanState *node)
{
- TableScanDesc scanDesc;
-
- /*
- * get information from node
- */
- scanDesc = node->ss.ss_currentScanDesc;
-
/*
- * close heap scan
+ * close heap scan (no-op if we didn't start it)
*/
- if (scanDesc != NULL)
- table_endscan(scanDesc);
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
+ table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 98c1b84d43..5c2861d243 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -583,10 +583,14 @@ void
ExecEndSetOp(SetOpState *node)
{
/* free subsidiary stuff including hashtable */
- if (node->tableContext)
+ if (node->tableContext != NULL)
+ {
MemoryContextDelete(node->tableContext);
+ node->tableContext = NULL;
+ }
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index eea7f2ae15..c8a35b64a8 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -307,13 +307,16 @@ ExecEndSort(SortState *node)
* Release tuplesort resources
*/
if (node->tuplesortstate != NULL)
+ {
tuplesort_end((Tuplesortstate *) node->tuplesortstate);
- node->tuplesortstate = NULL;
+ node->tuplesortstate = NULL;
+ }
/*
* shut down the subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
SO1_printf("ExecEndSort: %s\n",
"sort node shutdown");
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 1ee6295660..91d7ae82ce 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -171,6 +171,7 @@ ExecEndSubqueryScan(SubqueryScanState *node)
* close down subquery
*/
ExecEndNode(node->subplan);
+ node->subplan = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTableFuncscan.c b/src/backend/executor/nodeTableFuncscan.c
index a60dcd4943..80ed4b26a8 100644
--- a/src/backend/executor/nodeTableFuncscan.c
+++ b/src/backend/executor/nodeTableFuncscan.c
@@ -217,8 +217,10 @@ ExecEndTableFuncScan(TableFuncScanState *node)
* Release tuplestore resources
*/
if (node->tupstore != NULL)
+ {
tuplestore_end(node->tupstore);
- node->tupstore = NULL;
+ node->tupstore = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index da622d3f5f..9147e4afa8 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -327,10 +327,14 @@ ExecReScanTidRangeScan(TidRangeScanState *node)
void
ExecEndTidRangeScan(TidRangeScanState *node)
{
- TableScanDesc scan = node->ss.ss_currentScanDesc;
-
- if (scan != NULL)
- table_endscan(scan);
+ /*
+ * close heap scan (no-op if we didn't start it)
+ */
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
+ table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 15055077d0..74ec6afdcc 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -470,8 +470,14 @@ ExecReScanTidScan(TidScanState *node)
void
ExecEndTidScan(TidScanState *node)
{
- if (node->ss.ss_currentScanDesc)
+ /*
+ * close heap scan (no-op if we didn't start it)
+ */
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 01f951197c..13c556326a 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -169,6 +169,7 @@ void
ExecEndUnique(UniqueState *node)
{
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 77724a6daa..c4c6f009ba 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -1351,11 +1351,14 @@ release_partition(WindowAggState *winstate)
* any aggregate temp data). We don't rely on retail pfree because some
* aggregates might have allocated data we don't have direct pointers to.
*/
- MemoryContextResetAndDeleteChildren(winstate->partcontext);
- MemoryContextResetAndDeleteChildren(winstate->aggcontext);
+ if (winstate->partcontext != NULL)
+ MemoryContextResetAndDeleteChildren(winstate->partcontext);
+ if (winstate->aggcontext != NULL)
+ MemoryContextResetAndDeleteChildren(winstate->aggcontext);
for (i = 0; i < winstate->numaggs; i++)
{
- if (winstate->peragg[i].aggcontext != winstate->aggcontext)
+ if (winstate->peragg[i].aggcontext != NULL &&
+ winstate->peragg[i].aggcontext != winstate->aggcontext)
MemoryContextResetAndDeleteChildren(winstate->peragg[i].aggcontext);
}
@@ -2681,24 +2684,40 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
void
ExecEndWindowAgg(WindowAggState *node)
{
- PlanState *outerPlan;
int i;
release_partition(node);
for (i = 0; i < node->numaggs; i++)
{
- if (node->peragg[i].aggcontext != node->aggcontext)
+ if (node->peragg[i].aggcontext != NULL &&
+ node->peragg[i].aggcontext != node->aggcontext)
MemoryContextDelete(node->peragg[i].aggcontext);
}
- MemoryContextDelete(node->partcontext);
- MemoryContextDelete(node->aggcontext);
+ if (node->partcontext != NULL)
+ {
+ MemoryContextDelete(node->partcontext);
+ node->partcontext = NULL;
+ }
+ if (node->aggcontext != NULL)
+ {
+ MemoryContextDelete(node->aggcontext);
+ node->aggcontext = NULL;
+ }
- pfree(node->perfunc);
- pfree(node->peragg);
+ if (node->perfunc != NULL)
+ {
+ pfree(node->perfunc);
+ node->perfunc = NULL;
+ }
+ if (node->peragg != NULL)
+ {
+ pfree(node->peragg);
+ node->peragg = NULL;
+ }
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
/* -----------------
--
2.35.3
v47-0004-Adjustments-to-allow-ExecutorStart-to-sometimes-.patchapplication/octet-stream; name=v47-0004-Adjustments-to-allow-ExecutorStart-to-sometimes-.patchDownload
From 0fa7321f8c265515fefe53ef632ea09d64a87976 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:53:46 +0900
Subject: [PATCH v47 4/9] Adjustments to allow ExecutorStart() to sometimes
fail
Upon passing a plan tree from a CachedPlan to the executor, there's a
possibility that ExecutorStart() might return an incompletely set up
planstate tree. This can happen if the CachedPlan undergoes invalidation
during the ExecInitNode() initialization process. In such cases, the
execution should be reattempted using a fresh CachedPlan. Also, any
partially initialized EState must be cleaned up by invoking both
ExecutorEnd() and FreeExecutorState().
ExecutorStart() (and ExecutorStart_hook()) now return a Boolean telling
the caller if the plan initialization failed.
For the replan loop in that context, it makes more sense to have
ExecutorStart() either in the same scope or closer to where
GetCachedPlan() is invoked. So this commit modifies the following
sites:
* The ExecutorStart() call in ExplainOnePlan() is moved into a new
function ExplainQueryDesc() along with CreateQueryDesc(). Callers
of ExplainOnePlan() should now call the new function first.
* The ExecutorStart() call in _SPI_pquery() is moved to its caller
_SPI_execute_plan().
* The ExecutorStart() call in PortalRunMulti() is moved to
PortalStart(). This requires a new List field in PortalData to
store the QueryDescs created in PortalStart() and a new memory
context for those. One unintended consequence is that
CommandCounterIncrement() between queries in the PORTAL_MULTI_QUERY
case is now done in the loop in PortalStart() and not in
PortalRunMulti(). That still works because the Snapshot registered
in QueryDesc/EState is updated to account for the CCI().
This commit also adds a new flag to EState called es_canceled that
complements es_finished to denote the new scenario where
ExecutorStart() returns with a partially setup planstate tree. Also,
to reset the AFTER trigger state that would have been set up in the
ExecutorStart(), this adds a new function AfterTriggerCancelQuery()
which is called from ExecutorEnd() (not ExecutorFinish()) when
es_canceled is true.
Note that this commit by itself doesn't make any functional change,
because the CachedPlan is not passed into the executor yet.
---
contrib/auto_explain/auto_explain.c | 12 +-
.../pg_stat_statements/pg_stat_statements.c | 12 +-
src/backend/commands/copyto.c | 5 +-
src/backend/commands/createas.c | 9 +-
src/backend/commands/explain.c | 145 +++++---
src/backend/commands/extension.c | 6 +-
src/backend/commands/matview.c | 9 +-
src/backend/commands/portalcmds.c | 6 +-
src/backend/commands/prepare.c | 31 +-
src/backend/commands/trigger.c | 13 +
src/backend/executor/execMain.c | 44 ++-
src/backend/executor/execParallel.c | 6 +-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 7 +-
src/backend/executor/spi.c | 48 ++-
src/backend/tcop/postgres.c | 18 +-
src/backend/tcop/pquery.c | 346 +++++++++---------
src/backend/utils/mmgr/portalmem.c | 9 +
src/include/commands/explain.h | 7 +-
src/include/commands/trigger.h | 1 +
src/include/executor/executor.h | 6 +-
src/include/nodes/execnodes.h | 3 +
src/include/tcop/pquery.h | 2 +-
src/include/utils/portal.h | 2 +
24 files changed, 466 insertions(+), 282 deletions(-)
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index c3ac27ae99..a0630d7944 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -78,7 +78,7 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
-static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void explain_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count, bool execute_once);
@@ -258,9 +258,11 @@ _PG_init(void)
/*
* ExecutorStart hook: start up logging if needed
*/
-static void
+static bool
explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
/*
* At the beginning of each top-level statement, decide whether we'll
* sample this statement. If nested-statement explaining is enabled,
@@ -296,9 +298,9 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
}
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
if (auto_explain_enabled())
{
@@ -316,6 +318,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return plan_valid;
}
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index a46f2db352..58cb62e872 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -330,7 +330,7 @@ static PlannedStmt *pgss_planner(Query *parse,
const char *query_string,
int cursorOptions,
ParamListInfo boundParams);
-static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void pgss_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count, bool execute_once);
@@ -967,13 +967,15 @@ pgss_planner(Query *parse,
/*
* ExecutorStart hook: start up tracking if needed
*/
-static void
+static bool
pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
/*
* If query has queryId zero, don't track it. This prevents double
@@ -996,6 +998,8 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return plan_valid;
}
/*
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 0e3547c35b..f7730c8702 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -568,8 +568,11 @@ BeginCopyTo(ParseState *pstate,
* Call ExecutorStart to prepare the plan for execution.
*
* ExecutorStart computes a result tupdesc for us
+ *
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
*/
- ExecutorStart(cstate->queryDesc, 0);
+ (void) ExecutorStart(cstate->queryDesc, 0);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 18b07c0200..4a950c03ff 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -329,8 +329,13 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * call ExecutorStart to prepare the plan for execution
+ *
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
+ */
+ (void) ExecutorStart(queryDesc, GetIntoRelEFlags(into));
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 281c47b2ee..8d1fe5738b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -393,6 +393,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
else
{
PlannedStmt *plan;
+ QueryDesc *queryDesc;
instr_time planstart,
planduration;
BufferUsage bufusage_start,
@@ -415,12 +416,90 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
+ queryDesc = ExplainQueryDesc(plan, NULL, queryString, into, es,
+ params, queryEnv);
+ Assert(queryDesc);
+
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(queryDesc, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
+/*
+ * ExplainQueryDesc
+ * Set up QueryDesc for EXPLAINing a given plan
+ *
+ * This returns NULL if cplan is found to have been invalidated after
+ * calling ExecutorStart().
+ */
+QueryDesc *
+ExplainQueryDesc(PlannedStmt *stmt, CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv)
+{
+ QueryDesc *queryDesc;
+ DestReceiver *dest;
+ int eflags;
+ int instrument_option = 0;
+
+ /*
+ * Normally we discard the query's output, but if explaining CREATE TABLE
+ * AS, we'd better use the appropriate tuple receiver.
+ */
+ if (into)
+ dest = CreateIntoRelDestReceiver(into);
+ else
+ dest = None_Receiver;
+
+ if (es->analyze && es->timing)
+ instrument_option |= INSTRUMENT_TIMER;
+ else if (es->analyze)
+ instrument_option |= INSTRUMENT_ROWS;
+
+ if (es->buffers)
+ instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
+
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+
+ /* Create a QueryDesc for the query */
+ queryDesc = CreateQueryDesc(stmt, cplan, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv, instrument_option);
+
+ /* Select execution options */
+ if (es->analyze)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_EXPLAIN_ONLY;
+ if (es->generic)
+ eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
+ if (into)
+ eflags |= GetIntoRelEFlags(into);
+
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated during plan intialization.
+ */
+ if (!ExecutorStart(queryDesc, eflags))
+ {
+ /* Clean up. */
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ return NULL;
+ }
+
+ return queryDesc;
+}
+
/*
* ExplainOneUtility -
* print out the execution plan for one utility statement
@@ -524,29 +603,16 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
{
- DestReceiver *dest;
- QueryDesc *queryDesc;
instr_time starttime;
double totaltime = 0;
- int eflags;
- int instrument_option = 0;
-
- Assert(plannedstmt->commandType != CMD_UTILITY);
- if (es->analyze && es->timing)
- instrument_option |= INSTRUMENT_TIMER;
- else if (es->analyze)
- instrument_option |= INSTRUMENT_ROWS;
-
- if (es->buffers)
- instrument_option |= INSTRUMENT_BUFFERS;
- if (es->wal)
- instrument_option |= INSTRUMENT_WAL;
+ Assert(queryDesc->plannedstmt->commandType != CMD_UTILITY);
/*
* We always collect timing for the entire statement, even when node-level
@@ -555,40 +621,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
- /*
- * Normally we discard the query's output, but if explaining CREATE TABLE
- * AS, we'd better use the appropriate tuple receiver.
- */
- if (into)
- dest = CreateIntoRelDestReceiver(into);
- else
- dest = None_Receiver;
-
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, NULL, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
-
- /* Select execution options */
- if (es->analyze)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_EXPLAIN_ONLY;
- if (es->generic)
- eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
- if (into)
- eflags |= GetIntoRelEFlags(into);
-
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
-
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -4873,6 +4905,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index b287a2e84c..127d2a3b0a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -802,7 +802,11 @@ execute_sql_string(const char *sql)
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
- ExecutorStart(qdesc, 0);
+ /*
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
+ */
+ (void) ExecutorStart(qdesc, 0);
ExecutorRun(qdesc, ForwardScanDirection, 0, true);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 22b8b820c3..7083fb2350 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -412,8 +412,13 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, 0);
+ /*
+ * call ExecutorStart to prepare the plan for execution
+ *
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
+ */
+ (void) ExecutorStart(queryDesc, 0);
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 73ed7aa2f0..a1ee5c0acd 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -142,9 +142,11 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
/*
* Start execution, inserting parameters if any.
+ *
+ * OK to ignore the return value; plan can't become invalid here,
+ * because there's no CachedPlan.
*/
- PortalStart(portal, params, 0, GetActiveSnapshot());
-
+ (void) PortalStart(portal, params, 0, GetActiveSnapshot());
Assert(portal->strategy == PORTAL_ONE_SELECT);
/*
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..f8d0b0ee25 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -183,6 +183,7 @@ ExecuteQuery(ParseState *pstate,
paramLI = EvaluateParams(pstate, entry, stmt->params, estate);
}
+replan:
/* Create a new portal to run the query in */
portal = CreateNewPortal();
/* Don't display the portal in pg_cursors, it is for internal use only */
@@ -251,9 +252,15 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal has a cached plan and
+ * it's found to be invalidated during the initialization of its plan
+ * trees, the plan must be regenerated.
*/
- PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ if (!PortalStart(portal, paramLI, eflags, GetActiveSnapshot()))
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
(void) PortalRun(portal, count, false, true, dest, dest, qc);
@@ -574,7 +581,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -618,6 +625,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI,
CurrentResourceOwner, queryEnv);
@@ -639,8 +647,21 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ {
+ QueryDesc *queryDesc;
+
+ queryDesc = ExplainQueryDesc(pstmt, cplan, queryString,
+ into, es, paramLI, queryEnv);
+ if (queryDesc == NULL)
+ {
+ ExplainResetOutput(es);
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ goto replan;
+ }
+ ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
+ }
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 52177759ab..dd139432b9 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5009,6 +5009,19 @@ AfterTriggerBeginQuery(void)
afterTriggers.query_depth++;
}
+/* ----------
+ * AfterTriggerCancelQuery()
+ *
+ * Called from ExecutorEnd() if the query execution was canceled.
+ * ----------
+ */
+void
+AfterTriggerCancelQuery(void)
+{
+ /* Set to a value denoting that no query is active. */
+ afterTriggers.query_depth = -1;
+}
+
/* ----------
* AfterTriggerEndQuery()
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index de7bf7ca67..5755336abd 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -119,6 +119,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
*
* eflags contains flag bits as described in executor.h.
*
+ * Plan initialization may fail if the input plan tree is found to have been
+ * invalidated, which can happen if it comes from a CachedPlan.
+ *
+ * Returns true if plan was successfully initialized and false otherwise. If
+ * the latter, the caller must call ExecutorEnd() on 'queryDesc' to clean up
+ * after failed plan initialization.
+ *
* NB: the CurrentMemoryContext when this is called will become the parent
* of the per-query context used for this Executor invocation.
*
@@ -128,7 +135,7 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
*
* ----------------------------------------------------------------
*/
-void
+bool
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
/*
@@ -140,14 +147,15 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
if (ExecutorStart_hook)
- (*ExecutorStart_hook) (queryDesc, eflags);
- else
- standard_ExecutorStart(queryDesc, eflags);
+ return (*ExecutorStart_hook) (queryDesc, eflags);
+
+ return standard_ExecutorStart(queryDesc, eflags);
}
-void
+bool
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
EState *estate;
MemoryContext oldcontext;
@@ -263,9 +271,14 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Initialize the plan state tree
*/
- (void) InitPlan(queryDesc, eflags);
+ plan_valid = InitPlan(queryDesc, eflags);
+
+ /* Mark execution as canceled if plan won't be executed. */
+ estate->es_canceled = !plan_valid;
MemoryContextSwitchTo(oldcontext);
+
+ return plan_valid;
}
/* ----------------------------------------------------------------
@@ -325,6 +338,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
estate = queryDesc->estate;
Assert(estate != NULL);
+ Assert(!estate->es_canceled);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -429,7 +443,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* This should be run once and only once per Executor instance */
- Assert(!estate->es_finished);
+ Assert(!estate->es_finished && !estate->es_canceled);
/* Switch into per-query memory context */
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -488,11 +502,11 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
Assert(estate != NULL);
/*
- * Check that ExecutorFinish was called, unless in EXPLAIN-only mode. This
- * Assert is needed because ExecutorFinish is new as of 9.1, and callers
- * might forget to call it.
+ * Check that ExecutorFinish was called, unless in EXPLAIN-only mode or if
+ * execution was canceled. This Assert is needed because ExecutorFinish is
+ * new as of 9.1, and callers might forget to call it.
*/
- Assert(estate->es_finished ||
+ Assert(estate->es_finished || estate->es_canceled ||
(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -506,6 +520,14 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
UnregisterSnapshot(estate->es_snapshot);
UnregisterSnapshot(estate->es_crosscheck_snapshot);
+ /*
+ * Cancel trigger execution too if the query execution was canceled.
+ */
+ if (estate->es_canceled &&
+ !(estate->es_top_eflags &
+ (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
+ AfterTriggerCancelQuery();
+
/*
* Must switch out of context before destroying it
*/
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 457ee46faf..13d2820a41 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1437,7 +1437,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
- ExecutorStart(queryDesc, fpes->eflags);
+ /*
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
+ */
+ (void) ExecutorStart(queryDesc, fpes->eflags);
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 16704c0c2f..f0f5740c26 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -151,6 +151,7 @@ CreateExecutorState(void)
estate->es_top_eflags = 0;
estate->es_instrument = 0;
estate->es_finished = false;
+ estate->es_canceled = false;
estate->es_exprcontexts = NIL;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 7e452ed743..606da72535 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -863,7 +863,12 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
eflags = EXEC_FLAG_SKIP_TRIGGERS;
else
eflags = 0; /* default run-to-completion flags */
- ExecutorStart(es->qd, eflags);
+
+ /*
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
+ */
+ (void) ExecutorStart(es->qd, eflags);
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index f2cca807ef..814ff1390f 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -71,7 +71,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1582,6 +1582,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
Snapshot snapshot;
MemoryContext oldcontext;
Portal portal;
+ bool plan_valid;
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -1623,6 +1624,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
_SPI_current->processed = 0;
_SPI_current->tuptable = NULL;
+replan:
/* Create the portal */
if (name == NULL || name[0] == '\0')
{
@@ -1766,15 +1768,23 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if the cached plan was found to have been invalidated when
+ * initializing one of the plan trees contained in it.
*/
- PortalStart(portal, paramLI, 0, snapshot);
+ plan_valid = PortalStart(portal, paramLI, 0, snapshot);
Assert(portal->strategy != PORTAL_MULTI_QUERY);
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
+ if (!plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/* Pop the SPI stack */
_SPI_end_call(true);
@@ -2552,6 +2562,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2661,6 +2672,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2675,8 +2687,23 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+
+ if (!ExecutorStart(qdesc, eflags))
+ {
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ Assert(cplan);
+ ReleaseCachedPlan(cplan, plan_owner);
+ goto replan;
+ }
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2851,10 +2878,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2898,14 +2924,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 21b9763183..4f923bbcae 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1230,7 +1230,12 @@ exec_simple_query(const char *query_string)
/*
* Start the portal. No parameters here.
*/
- PortalStart(portal, NULL, 0, InvalidSnapshot);
+ {
+ bool plan_valid PG_USED_FOR_ASSERTS_ONLY;
+
+ plan_valid = PortalStart(portal, NULL, 0, InvalidSnapshot);
+ Assert(plan_valid);
+ }
/*
* Select the appropriate output format: text unless we are doing a
@@ -1735,6 +1740,7 @@ exec_bind_message(StringInfo input_message)
"commands ignored until end of transaction block"),
errdetail_abort()));
+replan:
/*
* Create the portal. Allow silent replacement of an existing portal only
* if the unnamed portal is specified.
@@ -2026,9 +2032,15 @@ exec_bind_message(StringInfo input_message)
PopActiveSnapshot();
/*
- * And we're ready to start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if the cached plan was found to have been invalidated when
+ * initializing one of the plan trees contained in it.
*/
- PortalStart(portal, params, 0, InvalidSnapshot);
+ if (!PortalStart(portal, params, 0, InvalidSnapshot))
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
/*
* Apply the result format requests to the portal.
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 4ef349df8b..fcf9925ed4 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -35,12 +36,6 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc);
static void FillPortalStore(Portal portal, bool isTopLevel);
static uint64 RunFromStore(Portal portal, ScanDirection direction, uint64 count,
DestReceiver *dest);
@@ -118,86 +113,6 @@ FreeQueryDesc(QueryDesc *qdesc)
}
-/*
- * ProcessQuery
- * Execute a single plannable query within a PORTAL_MULTI_QUERY,
- * PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
- *
- * plan: the plan tree for the query
- * sourceText: the source text of the query
- * params: any parameters needed
- * dest: where to send results
- * qc: where to store the command completion status data.
- *
- * qc may be NULL if caller doesn't want a status string.
- *
- * Must be called in a memory context that will be reset or deleted on
- * error; otherwise the executor's memory usage will be leaked.
- */
-static void
-ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc)
-{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, NULL, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
-
- /*
- * Run the plan to completion.
- */
- ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
-
- /*
- * Build command completion status data, if caller wants one.
- */
- if (qc)
- {
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
- SetQueryCompletion(qc, CMDTAG_SELECT, queryDesc->estate->es_processed);
- break;
- case CMD_INSERT:
- SetQueryCompletion(qc, CMDTAG_INSERT, queryDesc->estate->es_processed);
- break;
- case CMD_UPDATE:
- SetQueryCompletion(qc, CMDTAG_UPDATE, queryDesc->estate->es_processed);
- break;
- case CMD_DELETE:
- SetQueryCompletion(qc, CMDTAG_DELETE, queryDesc->estate->es_processed);
- break;
- case CMD_MERGE:
- SetQueryCompletion(qc, CMDTAG_MERGE, queryDesc->estate->es_processed);
- break;
- default:
- SetQueryCompletion(qc, CMDTAG_UNKNOWN, queryDesc->estate->es_processed);
- break;
- }
- }
-
- /*
- * Now, we close down all the scans and free allocated resources.
- */
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
-
- FreeQueryDesc(queryDesc);
-}
-
/*
* ChoosePortalStrategy
* Select portal execution strategy given the intended statement list.
@@ -428,19 +343,21 @@ FetchStatementTargetList(Node *stmt)
* presently ignored for non-PORTAL_ONE_SELECT portals (it's only intended
* to be used for cursors).
*
- * On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * True is returned if portal is ready to accept PortalRun() calls, and the
+ * result tupdesc (if any) is known. False if the plan tree is no longer
+ * valid, in which case, the caller must retry after generating a new
+ * CachedPlan.
*/
-void
+bool
PortalStart(Portal portal, ParamListInfo params,
int eflags, Snapshot snapshot)
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
- int myeflags;
+ int myeflags = 0;
+ bool plan_valid = true;
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_DEFINED);
@@ -450,15 +367,13 @@ PortalStart(Portal portal, ParamListInfo params,
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -474,6 +389,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -491,8 +408,8 @@ PortalStart(Portal portal, ParamListInfo params,
*/
/*
- * Create QueryDesc in portal's context; for the moment, set
- * the destination to DestNone.
+ * Create QueryDesc in portal->queryContext; for the moment,
+ * set the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
NULL,
@@ -504,30 +421,51 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti(). */
+ if (portal->strategy == PORTAL_ONE_RETURNING ||
+ portal->strategy == PORTAL_ONE_MOD_WITH)
+ portal->qdescs = list_make1(queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Call ExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated during plan intialization.
*/
- ExecutorStart(queryDesc, myeflags);
+ if (!ExecutorStart(queryDesc, myeflags))
+ {
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ plan_valid = false;
+ goto plan_init_failed;
+ }
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -539,29 +477,6 @@ PortalStart(Portal portal, ParamListInfo params,
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -584,7 +499,82 @@ PortalStart(Portal portal, ParamListInfo params,
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool first = true;
+
+ myeflags = eflags;
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /*
+ * Push the snapshot to be used by the executor.
+ */
+ if (!is_utility)
+ {
+ /*
+ * Must copy the snapshot for all statements
+ * except thec first as we'll need to update its
+ * command ID.
+ */
+ if (!first)
+ PushCopiedSnapshot(GetTransactionSnapshot());
+ else
+ PushActiveSnapshot(GetTransactionSnapshot());
+ }
+
+ /*
+ * From the 2nd statement onwards, update the command
+ * ID and the snapshot to match.
+ */
+ if (!first)
+ {
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+ }
+
+ first = false;
+
+ /*
+ * Create the QueryDesc. DestReceiver will be set in
+ * PortalRunMulti() before calling ExecutorRun().
+ */
+ queryDesc = CreateQueryDesc(plan,
+ NULL,
+ portal->sourceText,
+ !is_utility ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ if (is_utility)
+ continue;
+
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated
+ * during plan intialization.
+ */
+ if (!ExecutorStart(queryDesc, myeflags))
+ {
+ PopActiveSnapshot();
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ plan_valid = false;
+ goto plan_init_failed;
+ }
+ PopActiveSnapshot();
+ }
+ }
+
portal->tupDesc = NULL;
break;
}
@@ -597,19 +587,20 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+plan_init_failed:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
- portal->status = PORTAL_READY;
+ return plan_valid;
}
/*
@@ -1196,7 +1187,7 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1217,9 +1208,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = (QueryDesc *) lfirst(qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1236,33 +1228,26 @@ PortalRunMulti(Portal portal,
if (log_executor_stats)
ResetUsage();
- /*
- * Must always have a snapshot for plannable queries. First time
- * through, take a new snapshot; for subsequent queries in the
- * same portal, just update the snapshot's copy of the command
- * counter.
- */
+ /* Push the snapshot for plannable queries. */
if (!active_snapshot_set)
{
- Snapshot snapshot = GetTransactionSnapshot();
+ Snapshot snapshot = qdesc->snapshot;
- /* If told to, register the snapshot and save in portal */
+ /*
+ * If told to, register the snapshot and save in portal
+ *
+ * Note that the command ID of qdesc->snapshot for 2nd query
+ * onwards would have been updated in PortalStart() to account
+ * for CCI() done between queries, but it's OK that here we
+ * don't likewise update holdSnapshot's command ID.
+ */
if (setHoldSnapshot)
{
snapshot = RegisterSnapshot(snapshot);
portal->holdSnapshot = snapshot;
}
- /*
- * We can't have the holdSnapshot also be the active one,
- * because UpdateActiveSnapshotCommandId would complain. So
- * force an extra snapshot copy. Plain PushActiveSnapshot
- * would have copied the transaction snapshot anyway, so this
- * only adds a copy step when setHoldSnapshot is true. (It's
- * okay for the command ID of the active snapshot to diverge
- * from what holdSnapshot has.)
- */
- PushCopiedSnapshot(snapshot);
+ PushActiveSnapshot(snapshot);
/*
* As for PORTAL_ONE_SELECT portals, it does not seem
@@ -1271,26 +1256,39 @@ PortalRunMulti(Portal portal,
active_snapshot_set = true;
}
- else
- UpdateActiveSnapshotCommandId();
+ /*
+ * Run the plan to completion.
+ */
+ qdesc->dest = dest;
+ ExecutorRun(qdesc, ForwardScanDirection, 0, true);
+
+ /*
+ * Build command completion status data if needed.
+ */
if (pstmt->canSetTag)
{
- /* statement can set tag string */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- dest, qc);
- }
- else
- {
- /* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- altdest, NULL);
+ switch (qdesc->operation)
+ {
+ case CMD_SELECT:
+ SetQueryCompletion(qc, CMDTAG_SELECT, qdesc->estate->es_processed);
+ break;
+ case CMD_INSERT:
+ SetQueryCompletion(qc, CMDTAG_INSERT, qdesc->estate->es_processed);
+ break;
+ case CMD_UPDATE:
+ SetQueryCompletion(qc, CMDTAG_UPDATE, qdesc->estate->es_processed);
+ break;
+ case CMD_DELETE:
+ SetQueryCompletion(qc, CMDTAG_DELETE, qdesc->estate->es_processed);
+ break;
+ case CMD_MERGE:
+ SetQueryCompletion(qc, CMDTAG_MERGE, qdesc->estate->es_processed);
+ break;
+ default:
+ SetQueryCompletion(qc, CMDTAG_UNKNOWN, qdesc->estate->es_processed);
+ break;
+ }
}
if (log_executor_stats)
@@ -1345,12 +1343,12 @@ PortalRunMulti(Portal portal,
if (portal->stmts == NIL)
break;
- /*
- * Increment command counter between queries, but not after the last
- * one.
- */
- if (lnext(portal->stmts, stmtlist_item) != NULL)
- CommandCounterIncrement();
+ if (qdesc->estate)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ }
+ FreeQueryDesc(qdesc);
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 06dfa85f04..0cad450dcd 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -201,6 +201,13 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /*
+ * initialize portal's query context to store QueryDescs created during
+ * PortalStart() and then used in PortalRun().
+ */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -224,6 +231,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -594,6 +602,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3d3e632a0c..392abb5150 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -88,7 +88,11 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt, struct CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv);
+extern void ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
@@ -104,6 +108,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 430e3ca7dd..d4f7c29301 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -257,6 +257,7 @@ extern void ExecASTruncateTriggers(EState *estate,
extern void AfterTriggerBeginXact(void);
extern void AfterTriggerBeginQuery(void);
+extern void AfterTriggerCancelQuery(void);
extern void AfterTriggerEndQuery(EState *estate);
extern void AfterTriggerFireDeferred(void);
extern void AfterTriggerEndXact(bool isCommit);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 72cbf120c5..10c5cda169 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -73,7 +73,7 @@
/* Hook for plugins to get control in ExecutorStart() */
-typedef void (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
+typedef bool (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
/* Hook for plugins to get control in ExecutorRun() */
@@ -198,8 +198,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
-extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
-extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count, bool execute_once);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 846eb32a1d..bb5734edb5 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -670,6 +670,9 @@ typedef struct EState
int es_top_eflags; /* eflags passed to ExecutorStart */
int es_instrument; /* OR of InstrumentOption flags */
bool es_finished; /* true when ExecutorFinish is done */
+ bool es_canceled; /* true when execution was canceled
+ * upon encountering that plan was invalided
+ * during ExecInitNode() */
List *es_exprcontexts; /* List of ExprContexts within EState */
diff --git a/src/include/tcop/pquery.h b/src/include/tcop/pquery.h
index a5e65b98aa..577b81a9ee 100644
--- a/src/include/tcop/pquery.h
+++ b/src/include/tcop/pquery.h
@@ -29,7 +29,7 @@ extern List *FetchPortalTargetList(Portal portal);
extern List *FetchStatementTargetList(Node *stmt);
-extern void PortalStart(Portal portal, ParamListInfo params,
+extern bool PortalStart(Portal portal, ParamListInfo params,
int eflags, Snapshot snapshot);
extern void PortalSetResultFormat(Portal portal, int nFormats,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aa08b1e0fc..af059e30f8 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,8 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
--
2.35.3
v47-0003-Prepare-executor-to-support-detecting-CachedPlan.patchapplication/octet-stream; name=v47-0003-Prepare-executor-to-support-detecting-CachedPlan.patchDownload
From 5062207c1e6879cf865508e1325dcade88d39cc6 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 22 Sep 2023 18:12:04 +0900
Subject: [PATCH v47 3/9] Prepare executor to support detecting CachedPlan
invalidation
This adds checks at various points during the executor's
initialization of the plan tree to determine whether the originating
CachedPlan has become invalid as a result of taking locks on the
relations referenced in the plan. This includes addding the check
after every call to ExecOpenScanRelation() and to ExecInitNode(),
including the recursive ones to initialize child nodes.
If a given ExecInit*() function detects that the plan has become
invalid, it should return immediately even if the PlanState node
it's building may only be partially valid. That is crucial for
two reasons depending on where the check is:
* The checks following ExecOpenScanRelation() may find the plan
having become invalid because the requested relation was dropped
or had its schema changed concurrently in a manner that risks
unsafe operations in the code that follows. For example, it
might try to dereference a NULL pointer when the check failed
because the relation was dropped.
* For the checks following ExecInitNode(), the returned child
PlanState node might be only partially invalid. The code that
follows may misbehave if it depends on inspecting the child
PlanState. Note that this commit adds the check following all
calls of ExecInitNode() that exist in the code base, even at
sites where there is no code that might misbehave today, because
it might misbehave in the future. It seems like a good idea to
put the guards in place today rather than in the future when the
need arises.
To pass the CachedPlan that the executor will use for these checks,
this adds a new field to QueryDesc and a new parameter to
CreateQueryDesc(). No caller of CreateQueryDesc() is made to pass
an actual CachedPlan though, so there is no functional change.
Reviewed-by: Robert Haas
---
contrib/postgres_fdw/postgres_fdw.c | 4 +++
src/backend/commands/copyto.c | 3 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 2 +-
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/executor/execMain.c | 39 ++++++++++++++++++----
src/backend/executor/execParallel.c | 9 ++++-
src/backend/executor/execProcnode.c | 4 +++
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeAgg.c | 2 ++
src/backend/executor/nodeAppend.c | 10 +++---
src/backend/executor/nodeBitmapAnd.c | 2 ++
src/backend/executor/nodeBitmapHeapscan.c | 4 +++
src/backend/executor/nodeBitmapOr.c | 2 ++
src/backend/executor/nodeCustom.c | 2 ++
src/backend/executor/nodeForeignscan.c | 4 +++
src/backend/executor/nodeGather.c | 2 ++
src/backend/executor/nodeGatherMerge.c | 2 ++
src/backend/executor/nodeGroup.c | 2 ++
src/backend/executor/nodeHash.c | 2 ++
src/backend/executor/nodeHashjoin.c | 4 +++
src/backend/executor/nodeIncrementalSort.c | 2 ++
src/backend/executor/nodeIndexonlyscan.c | 2 ++
src/backend/executor/nodeIndexscan.c | 2 ++
src/backend/executor/nodeLimit.c | 2 ++
src/backend/executor/nodeLockRows.c | 2 ++
src/backend/executor/nodeMaterial.c | 2 ++
src/backend/executor/nodeMemoize.c | 2 ++
src/backend/executor/nodeMergeAppend.c | 4 ++-
src/backend/executor/nodeMergejoin.c | 4 +++
src/backend/executor/nodeModifyTable.c | 13 ++++++++
src/backend/executor/nodeNestloop.c | 4 +++
src/backend/executor/nodeProjectSet.c | 2 ++
src/backend/executor/nodeRecursiveunion.c | 4 +++
src/backend/executor/nodeResult.c | 2 ++
src/backend/executor/nodeSamplescan.c | 2 ++
src/backend/executor/nodeSeqscan.c | 2 ++
src/backend/executor/nodeSetOp.c | 2 ++
src/backend/executor/nodeSort.c | 2 ++
src/backend/executor/nodeSubqueryscan.c | 2 ++
src/backend/executor/nodeTidrangescan.c | 2 ++
src/backend/executor/nodeTidscan.c | 2 ++
src/backend/executor/nodeUnique.c | 2 ++
src/backend/executor/nodeWindowAgg.c | 2 ++
src/backend/executor/spi.c | 1 +
src/backend/tcop/pquery.c | 5 ++-
src/include/executor/execdesc.h | 4 +++
src/include/executor/executor.h | 10 ++++++
src/include/nodes/execnodes.h | 2 ++
src/include/utils/plancache.h | 14 ++++++++
51 files changed, 189 insertions(+), 17 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 802f76c73e..4fa6bb121c 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2663,7 +2663,11 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
/* Get info about foreign table. */
rtindex = node->resultRelInfo->ri_RangeTableIndex;
if (fsplan->scan.scanrelid == 0)
+ {
dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return;
+ }
else
dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index eaa3172793..0e3547c35b 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,8 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL,
+ pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index e91920ca14..18b07c0200 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 13217807ee..281c47b2ee 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -572,7 +572,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 535072d181..b287a2e84c 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -797,6 +797,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ac2e74fa3f..22b8b820c3 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index f7f18d3054..de7bf7ca67 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -79,7 +79,7 @@ ExecutorEnd_hook_type ExecutorEnd_hook = NULL;
ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
/* decls for local routines only used within this module */
-static void InitPlan(QueryDesc *queryDesc, int eflags);
+static bool InitPlan(QueryDesc *queryDesc, int eflags);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
static void ExecEndPlan(PlanState *planstate, EState *estate);
@@ -263,7 +263,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Initialize the plan state tree
*/
- InitPlan(queryDesc, eflags);
+ (void) InitPlan(queryDesc, eflags);
MemoryContextSwitchTo(oldcontext);
}
@@ -829,9 +829,13 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ * Returns true if the plan tree is successfully initialized for execution,
+ * false otherwise. The latter case may occur if the CachedPlan that provides
+ * the plan tree (queryDesc->cplan) got invalidated during the initialization.
* ----------------------------------------------------------------
*/
-static void
+static bool
InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
@@ -839,11 +843,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
- PlanState *planstate;
- TupleDesc tupType;
+ PlanState *planstate = NULL;
+ TupleDesc tupType = NULL;
ListCell *l;
int i;
+ Assert(queryDesc->planstate == NULL);
+ Assert(queryDesc->tupDesc == NULL);
+
/*
* Do permissions checks
*/
@@ -855,6 +862,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = queryDesc->cplan;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
@@ -886,6 +894,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (unlikely(relation == NULL))
+ return false;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -956,6 +966,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return false;
i++;
}
@@ -966,6 +978,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return false;
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -1009,6 +1023,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
queryDesc->tupDesc = tupType;
queryDesc->planstate = planstate;
+
+ return true;
}
/*
@@ -2858,7 +2874,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2947,6 +2964,13 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
subplanstate = ExecInitNode(subplan, rcestate, 0);
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
+
+ /*
+ * All the necessary locks must already have been taken when
+ * initializing the parent's copy of subplanstate, so the CachedPlan,
+ * if any, should not have become invalid during ExecInitNode().
+ */
+ Assert(ExecPlanStillValid(rcestate));
}
/*
@@ -2988,6 +3012,9 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
*/
epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
+ /* See the comment above. */
+ Assert(ExecPlanStillValid(rcestate));
+
MemoryContextSwitchTo(oldcontext);
}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index cc2b8ccab7..457ee46faf 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1248,8 +1248,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Set up a QueryDesc for the query. While the leader might've sourced
+ * the plan tree from a CachedPlan, we don't have one here. This isn't
+ * an issue since the leader ensured the required locks, making our
+ * plan tree valid. Even as we get our own lock copies in
+ * ExecGetRangeTableRelation(), they're all already held by the leader.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 6098cdca69..f2264c7d84 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -136,6 +136,10 @@ static bool ExecShutdownNode_walker(PlanState *node, void *context);
* 'eflags' is a bitwise OR of flag bits described in executor.h
*
* Returns a PlanState node corresponding to the given Plan node.
+ *
+ * Callers should check upon returning that ExecPlanStillValid(estate)
+ * returns true before continuing further with its processing, because the
+ * returned PlanState might be only partially valid otherwise.
* ------------------------------------------------------------------------
*/
PlanState *
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f55424eb5a..7e452ed743 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -838,6 +838,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL, /* fmgr_sql() doesn't use CachedPlans */
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index af22b1676f..597d68139e 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3304,6 +3304,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return aggstate;
/*
* initialize source tuple type.
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index a2af221e05..53ca9dc85d 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -185,8 +185,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->ps.resultopsset = true;
appendstate->ps.resultopsfixed = false;
- appendplanstates = (PlanState **) palloc(nplans *
- sizeof(PlanState *));
+ appendplanstates = (PlanState **) palloc0(nplans *
+ sizeof(PlanState *));
+ appendstate->appendplans = appendplanstates;
+ appendstate->as_nplans = nplans;
/*
* call ExecInitNode on each of the valid plans to be executed and save
@@ -221,11 +223,11 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return appendstate;
}
appendstate->as_first_partial_plan = firstvalid;
- appendstate->appendplans = appendplanstates;
- appendstate->as_nplans = nplans;
/* Initialize async state */
appendstate->as_asyncplans = asyncplans;
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4abb0609a0..7556be713c 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -89,6 +89,8 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return bitmapandstate;
i++;
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index d3f58c22f9..f1f8e16b17 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -770,11 +770,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index ace18593aa..7d2bf45d9c 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -90,6 +90,8 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return bitmaporstate;
i++;
}
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index 28b5bb9353..a0befbd0c6 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -61,6 +61,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return css;
css->ss.ss_currentRelation = scan_rel;
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 3aba28285a..336acff719 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -173,6 +173,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -264,6 +266,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/*
* Tell the FDW to initialize the scan.
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 1a3c8abdad..c524022c04 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -89,6 +89,8 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return gatherstate;
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index c6fb45fee0..676faabef5 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -108,6 +108,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return gm_state;
/*
* Leader may access ExecProcNode result directly (if
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 6dfe5a1d23..efa1c44ab4 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -185,6 +185,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return grpstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 88ba336882..1a4bd5504e 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -386,6 +386,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return hashstate;
/*
* initialize our result slot and type. No need to build projection
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 6dc43b9ff2..c0919074b0 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -752,8 +752,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return hjstate;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return hjstate;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 28a0e81cb3..621ffafe02 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1041,6 +1041,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return incrsortstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 1f3843abe9..c555c14888 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -495,6 +495,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 32e1714f15..a3bd1f7fb0 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -908,6 +908,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index a97bac9f6d..ab133f1580 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -476,6 +476,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return limitstate;
/*
* initialize child expressions
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 26fbe95c57..e1ef768571 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -322,6 +322,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return lrstate;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 03c514900b..c38eef099d 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return matstate;
/*
* Initialize result type and slot. No need to initialize projection info
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index ee4749c852..a6bf66029c 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -938,6 +938,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mstate;
/*
* Initialize return slot and type. No need to initialize projection info
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 0a42a04b19..52c3edf278 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -120,7 +120,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ms_prune_state = NULL;
}
- mergeplanstates = (PlanState **) palloc(nplans * sizeof(PlanState *));
+ mergeplanstates = (PlanState **) palloc0(nplans * sizeof(PlanState *));
mergestate->mergeplans = mergeplanstates;
mergestate->ms_nplans = nplans;
@@ -151,6 +151,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mergestate;
}
mergestate->ps.ps_ProjInfo = NULL;
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 4d7d73a684..634c6d9fe5 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1490,11 +1490,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mergestate;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mergestate;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index ea043c57c1..95d909c1d0 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3985,6 +3985,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ /*
+ * ExecInitResultRelation() may have returned without initializing
+ * rootResultRelInfo if the plan got invalidated, so check.
+ */
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mtstate;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
node->epqParam, node->resultRelations);
@@ -4013,6 +4020,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ /* See the comment above. */
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mtstate;
+
/*
* For child result relations, store the root result relation
* pointer. We do so for the convenience of places that want to
@@ -4039,6 +4050,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mtstate;
/*
* Do additional per-result-relation initialization.
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index 76e02449f5..64a24cb965 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -295,11 +295,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return nlstate;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return nlstate;
/*
* Initialize result slot, type and projection.
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index e9b96416d3..706cc23a21 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -247,6 +247,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return state;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index f6d60bcd6c..27dc318acb 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return rustate;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return rustate;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index f15902e840..6820d3bfd5 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -208,6 +208,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return resstate;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index a6813559e6..02051fea51 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -125,6 +125,8 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 911266da07..9e3ef94388 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,8 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 5c2861d243..475af4df24 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return setopstate;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index c8a35b64a8..9de717aa7c 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return sortstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 91d7ae82ce..d9c10d1f6f 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return subquerystate;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 9147e4afa8..a7482aee50 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -378,6 +378,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return tidrangestate;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 74ec6afdcc..657411ef19 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -523,6 +523,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return tidstate;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 13c556326a..ee30688417 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -136,6 +136,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return uniquestate;
/*
* Initialize result slot and type. Unique nodes do no projections, so
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index c4c6f009ba..1246d7919a 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2461,6 +2461,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return winstate;
/*
* initialize source tuple type (which is also the tuple type that we'll
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 33975687b3..f2cca807ef 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2668,6 +2668,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ NULL,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5565f200c3..4ef349df8b 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan, if plan is from one */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -145,7 +147,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, NULL, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -493,6 +495,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ NULL,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index af2bf36dfb..4b7368a0dc 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -32,9 +32,12 @@
*/
typedef struct QueryDesc
{
+ NodeTag type;
+
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ struct CachedPlan *cplan; /* CachedPlan, if plannedstmt is from one */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +60,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ struct CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index aeebe0e0ff..72cbf120c5 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -256,6 +257,15 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the CachedPlan in es_cachedplan still valid?
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanStillValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cb714f4a19..846eb32a1d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -623,6 +623,8 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct CachedPlan *es_cachedplan; /* CachedPlan if plannedstmt is from
+ * one, or NULL if not */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 916e59d9fe..0a9e041d51 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -221,6 +221,20 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Invoked by the executor for each relation lock acquired during the
+ * initialization of the plan tree within the CachedPlan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
--
2.35.3
v47-0001-Remove-obsolete-executor-cleanup-code.patchapplication/octet-stream; name=v47-0001-Remove-obsolete-executor-cleanup-code.patchDownload
From e015ba798b4b92bc5142ca94baaf12a1cda089d9 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:52:39 +0900
Subject: [PATCH v47 1/9] Remove obsolete executor cleanup code
This commit removes unnecessary ExecExprFreeContext() calls in ExecEnd*
routines as the actual cleanup is managed by FreeExecutorState. With
no remaining callers for ExecExprFreeContext(), this commit also
removes the function.
This commit also drops redundant ExecClearTuple() calls, as
ExecResetTupleTable() in ExecEndPlan() already takes care of resetting
all TupleTableSlots initialized with ExecInitScanTupleSlot() and
ExecInitExtraTupleSlot().
After these modifications, the ExecEnd*() routines for ValuesScan,
NamedTuplestoreScan, and WorkTableScan became redundant. Thus, this
commit removes them.
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execProcnode.c | 18 +++++--------
src/backend/executor/execUtils.c | 26 -------------------
src/backend/executor/nodeAgg.c | 10 -------
src/backend/executor/nodeBitmapHeapscan.c | 12 ---------
src/backend/executor/nodeBitmapIndexscan.c | 8 ------
src/backend/executor/nodeCtescan.c | 12 ---------
src/backend/executor/nodeCustom.c | 7 -----
src/backend/executor/nodeForeignscan.c | 8 ------
src/backend/executor/nodeFunctionscan.c | 15 -----------
src/backend/executor/nodeGather.c | 3 ---
src/backend/executor/nodeGatherMerge.c | 3 ---
src/backend/executor/nodeGroup.c | 5 ----
src/backend/executor/nodeHash.c | 5 ----
src/backend/executor/nodeHashjoin.c | 12 ---------
src/backend/executor/nodeIncrementalSort.c | 5 ----
src/backend/executor/nodeIndexonlyscan.c | 16 ------------
src/backend/executor/nodeIndexscan.c | 16 ------------
src/backend/executor/nodeLimit.c | 1 -
src/backend/executor/nodeMaterial.c | 5 ----
src/backend/executor/nodeMemoize.c | 9 -------
src/backend/executor/nodeMergejoin.c | 12 ---------
src/backend/executor/nodeModifyTable.c | 11 --------
.../executor/nodeNamedtuplestorescan.c | 22 ----------------
src/backend/executor/nodeNestloop.c | 11 --------
src/backend/executor/nodeProjectSet.c | 10 -------
src/backend/executor/nodeResult.c | 10 -------
src/backend/executor/nodeSamplescan.c | 12 ---------
src/backend/executor/nodeSeqscan.c | 12 ---------
src/backend/executor/nodeSetOp.c | 4 ---
src/backend/executor/nodeSort.c | 7 -----
src/backend/executor/nodeSubqueryscan.c | 12 ---------
src/backend/executor/nodeTableFuncscan.c | 12 ---------
src/backend/executor/nodeTidrangescan.c | 12 ---------
src/backend/executor/nodeTidscan.c | 12 ---------
src/backend/executor/nodeUnique.c | 5 ----
src/backend/executor/nodeValuesscan.c | 24 -----------------
src/backend/executor/nodeWindowAgg.c | 17 ------------
src/backend/executor/nodeWorktablescan.c | 22 ----------------
src/include/executor/executor.h | 1 -
.../executor/nodeNamedtuplestorescan.h | 1 -
src/include/executor/nodeValuesscan.h | 1 -
src/include/executor/nodeWorktablescan.h | 1 -
42 files changed, 6 insertions(+), 421 deletions(-)
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d4..6098cdca69 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -667,22 +667,10 @@ ExecEndNode(PlanState *node)
ExecEndTableFuncScan((TableFuncScanState *) node);
break;
- case T_ValuesScanState:
- ExecEndValuesScan((ValuesScanState *) node);
- break;
-
case T_CteScanState:
ExecEndCteScan((CteScanState *) node);
break;
- case T_NamedTuplestoreScanState:
- ExecEndNamedTuplestoreScan((NamedTuplestoreScanState *) node);
- break;
-
- case T_WorkTableScanState:
- ExecEndWorkTableScan((WorkTableScanState *) node);
- break;
-
case T_ForeignScanState:
ExecEndForeignScan((ForeignScanState *) node);
break;
@@ -757,6 +745,12 @@ ExecEndNode(PlanState *node)
ExecEndLimit((LimitState *) node);
break;
+ /* No clean up actions for these nodes. */
+ case T_ValuesScanState:
+ case T_NamedTuplestoreScanState:
+ case T_WorkTableScanState:
+ break;
+
default:
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));
break;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index c06b228858..16704c0c2f 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -638,32 +638,6 @@ tlist_matches_tupdesc(PlanState *ps, List *tlist, int varno, TupleDesc tupdesc)
return true;
}
-/* ----------------
- * ExecFreeExprContext
- *
- * A plan node's ExprContext should be freed explicitly during executor
- * shutdown because there may be shutdown callbacks to call. (Other resources
- * made by the above routines, such as projection info, don't need to be freed
- * explicitly because they're just memory in the per-query memory context.)
- *
- * However ... there is no particular need to do it during ExecEndNode,
- * because FreeExecutorState will free any remaining ExprContexts within
- * the EState. Letting FreeExecutorState do it allows the ExprContexts to
- * be freed in reverse order of creation, rather than order of creation as
- * will happen if we delete them here, which saves O(N^2) work in the list
- * cleanup inside FreeExprContext.
- * ----------------
- */
-void
-ExecFreeExprContext(PlanState *planstate)
-{
- /*
- * Per above discussion, don't actually delete the ExprContext. We do
- * unlink it from the plan node, though.
- */
- planstate->ps_ExprContext = NULL;
-}
-
/* ----------------------------------------------------------------
* Scan node support
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 468db94fe5..f154f28902 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -4357,16 +4357,6 @@ ExecEndAgg(AggState *node)
if (node->hashcontext)
ReScanExprContext(node->hashcontext);
- /*
- * We don't actually free any ExprContexts here (see comment in
- * ExecFreeExprContext), just unlinking the output one from the plan node
- * suffices.
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /* clean up tuple table */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index f35df0b8bf..2db0acfc76 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -655,18 +655,6 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
*/
scanDesc = node->ss.ss_currentScanDesc;
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clear out tuple table slots
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close down subplans
*/
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 83ec9ede89..7cf8532bc9 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -184,14 +184,6 @@ ExecEndBitmapIndexScan(BitmapIndexScanState *node)
indexRelationDesc = node->biss_RelationDesc;
indexScanDesc = node->biss_ScanDesc;
- /*
- * Free the exprcontext ... now dead code, see ExecFreeExprContext
- */
-#ifdef NOT_USED
- if (node->biss_RuntimeContext)
- FreeExprContext(node->biss_RuntimeContext, true);
-#endif
-
/*
* close the index relation (no-op if we didn't open it)
*/
diff --git a/src/backend/executor/nodeCtescan.c b/src/backend/executor/nodeCtescan.c
index cc4c4243e2..a0c0c4be33 100644
--- a/src/backend/executor/nodeCtescan.c
+++ b/src/backend/executor/nodeCtescan.c
@@ -287,18 +287,6 @@ ExecInitCteScan(CteScan *node, EState *estate, int eflags)
void
ExecEndCteScan(CteScanState *node)
{
- /*
- * Free exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* If I am the leader, free the tuplestore.
*/
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index bd42c65b29..28b5bb9353 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -129,13 +129,6 @@ ExecEndCustomScan(CustomScanState *node)
{
Assert(node->methods->EndCustomScan != NULL);
node->methods->EndCustomScan(node);
-
- /* Free the exprcontext */
- ExecFreeExprContext(&node->ss.ps);
-
- /* Clean out the tuple table */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
void
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index c2139acca0..73913ebb18 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -312,14 +312,6 @@ ExecEndForeignScan(ForeignScanState *node)
/* Shut down any outer plan. */
if (outerPlanState(node))
ExecEndNode(outerPlanState(node));
-
- /* Free the exprcontext */
- ExecFreeExprContext(&node->ss.ps);
-
- /* clean out the tuple table */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeFunctionscan.c b/src/backend/executor/nodeFunctionscan.c
index dd06ef8aee..2dddbcda14 100644
--- a/src/backend/executor/nodeFunctionscan.c
+++ b/src/backend/executor/nodeFunctionscan.c
@@ -523,18 +523,6 @@ ExecEndFunctionScan(FunctionScanState *node)
{
int i;
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* Release slots and tuplestore resources
*/
@@ -542,9 +530,6 @@ ExecEndFunctionScan(FunctionScanState *node)
{
FunctionScanPerFuncState *fs = &node->funcstates[i];
- if (fs->func_slot)
- ExecClearTuple(fs->func_slot);
-
if (fs->tstore != NULL)
{
tuplestore_end(node->funcstates[i].tstore);
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 307fc10eea..bb2500a469 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -250,9 +250,6 @@ ExecEndGather(GatherState *node)
{
ExecEndNode(outerPlanState(node)); /* let children clean up first */
ExecShutdownGather(node);
- ExecFreeExprContext(&node->ps);
- if (node->ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
}
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 9d5e1a46e9..7a71a58509 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -290,9 +290,6 @@ ExecEndGatherMerge(GatherMergeState *node)
{
ExecEndNode(outerPlanState(node)); /* let children clean up first */
ExecShutdownGatherMerge(node);
- ExecFreeExprContext(&node->ps);
- if (node->ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 25a1618952..8c650f0e46 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -228,11 +228,6 @@ ExecEndGroup(GroupState *node)
{
PlanState *outerPlan;
- ExecFreeExprContext(&node->ss.ps);
-
- /* clean up tuple table */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
outerPlan = outerPlanState(node);
ExecEndNode(outerPlan);
}
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 8b5c35b82b..e72f0986c2 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -415,11 +415,6 @@ ExecEndHash(HashState *node)
{
PlanState *outerPlan;
- /*
- * free exprcontext
- */
- ExecFreeExprContext(&node->ps);
-
/*
* shut down the subplan
*/
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 980746128b..aea44a9d56 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -867,18 +867,6 @@ ExecEndHashJoin(HashJoinState *node)
node->hj_HashTable = NULL;
}
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->js.ps);
-
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->hj_OuterTupleSlot);
- ExecClearTuple(node->hj_HashTupleSlot);
-
/*
* clean up subtrees
*/
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 7683e3341c..cd094a190c 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1079,11 +1079,6 @@ ExecEndIncrementalSort(IncrementalSortState *node)
{
SO_printf("ExecEndIncrementalSort: shutting down sort node\n");
- /* clean out the scan tuple */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
- /* must drop pointer to sort result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- /* must drop standalone tuple slots from outer node */
ExecDropSingleTupleTableSlot(node->group_pivot);
ExecDropSingleTupleTableSlot(node->transfer_tuple);
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0b43a9b969..f1db35665c 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -380,22 +380,6 @@ ExecEndIndexOnlyScan(IndexOnlyScanState *node)
node->ioss_VMBuffer = InvalidBuffer;
}
- /*
- * Free the exprcontext(s) ... now dead code, see ExecFreeExprContext
- */
-#ifdef NOT_USED
- ExecFreeExprContext(&node->ss.ps);
- if (node->ioss_RuntimeContext)
- FreeExprContext(node->ioss_RuntimeContext, true);
-#endif
-
- /*
- * clear out tuple table slots
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close the index relation (no-op if we didn't open it)
*/
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 4540c7781d..14b9c00217 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -794,22 +794,6 @@ ExecEndIndexScan(IndexScanState *node)
indexRelationDesc = node->iss_RelationDesc;
indexScanDesc = node->iss_ScanDesc;
- /*
- * Free the exprcontext(s) ... now dead code, see ExecFreeExprContext
- */
-#ifdef NOT_USED
- ExecFreeExprContext(&node->ss.ps);
- if (node->iss_RuntimeContext)
- FreeExprContext(node->iss_RuntimeContext, true);
-#endif
-
- /*
- * clear out tuple table slots
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close the index relation (no-op if we didn't open it)
*/
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 425fbfc405..5654158e3e 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -534,7 +534,6 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
void
ExecEndLimit(LimitState *node)
{
- ExecFreeExprContext(&node->ps);
ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 09632678b0..753ea28915 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -239,11 +239,6 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
void
ExecEndMaterial(MaterialState *node)
{
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* Release tuplestore resources
*/
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 4f04269e26..94bf479287 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -1091,15 +1091,6 @@ ExecEndMemoize(MemoizeState *node)
/* Remove the cache context */
MemoryContextDelete(node->tableContext);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
- /* must drop pointer to cache result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-
- /*
- * free exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
/*
* shut down the subplan
*/
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 00f96d045e..648fdd9a5f 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1642,18 +1642,6 @@ ExecEndMergeJoin(MergeJoinState *node)
{
MJ1_printf("ExecEndMergeJoin: %s\n",
"ending node processing");
-
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->js.ps);
-
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->mj_MarkedTupleSlot);
-
/*
* shut down the subplans
*/
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 5005d8c0d1..d21a178ad5 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4446,17 +4446,6 @@ ExecEndModifyTable(ModifyTableState *node)
ExecDropSingleTupleTableSlot(node->mt_root_tuple_slot);
}
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
/*
* Terminate EPQ execution if active
*/
diff --git a/src/backend/executor/nodeNamedtuplestorescan.c b/src/backend/executor/nodeNamedtuplestorescan.c
index 46832ad82f..3547dc2b10 100644
--- a/src/backend/executor/nodeNamedtuplestorescan.c
+++ b/src/backend/executor/nodeNamedtuplestorescan.c
@@ -155,28 +155,6 @@ ExecInitNamedTuplestoreScan(NamedTuplestoreScan *node, EState *estate, int eflag
return scanstate;
}
-/* ----------------------------------------------------------------
- * ExecEndNamedTuplestoreScan
- *
- * frees any storage allocated through C routines.
- * ----------------------------------------------------------------
- */
-void
-ExecEndNamedTuplestoreScan(NamedTuplestoreScanState *node)
-{
- /*
- * Free exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-}
-
/* ----------------------------------------------------------------
* ExecReScanNamedTuplestoreScan
*
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b3d52e69ec..fc8f833d8b 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -363,17 +363,6 @@ ExecEndNestLoop(NestLoopState *node)
{
NL1_printf("ExecEndNestLoop: %s\n",
"ending node processing");
-
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->js.ps);
-
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
-
/*
* close down subplans
*/
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index f6ff3dc44c..b4bbdc89b1 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -320,16 +320,6 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
void
ExecEndProjectSet(ProjectSetState *node)
{
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ps);
-
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
/*
* shut down subplans
*/
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 4219712d30..e9f5732f33 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -240,16 +240,6 @@ ExecInitResult(Result *node, EState *estate, int eflags)
void
ExecEndResult(ResultState *node)
{
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ps);
-
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
/*
* shut down subplans
*/
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index d7e22b1dbb..41c1ea37ad 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -188,18 +188,6 @@ ExecEndSampleScan(SampleScanState *node)
if (node->tsmroutine->EndSampleScan)
node->tsmroutine->EndSampleScan(node);
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close heap scan
*/
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 4da0f28f7b..49a5933aff 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -190,18 +190,6 @@ ExecEndSeqScan(SeqScanState *node)
*/
scanDesc = node->ss.ss_currentScanDesc;
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close heap scan
*/
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 4bc2406b89..98c1b84d43 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -582,13 +582,9 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
void
ExecEndSetOp(SetOpState *node)
{
- /* clean up tuple table */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
/* free subsidiary stuff including hashtable */
if (node->tableContext)
MemoryContextDelete(node->tableContext);
- ExecFreeExprContext(&node->ps);
ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index c6c72c6e67..eea7f2ae15 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -303,13 +303,6 @@ ExecEndSort(SortState *node)
SO1_printf("ExecEndSort: %s\n",
"shutting down sort node");
- /*
- * clean out the tuple table
- */
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
- /* must drop pointer to sort result tuple */
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-
/*
* Release tuplesort resources
*/
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 42471bfc04..1ee6295660 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -167,18 +167,6 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
void
ExecEndSubqueryScan(SubqueryScanState *node)
{
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the upper tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* close down subquery
*/
diff --git a/src/backend/executor/nodeTableFuncscan.c b/src/backend/executor/nodeTableFuncscan.c
index 791cbd2372..a60dcd4943 100644
--- a/src/backend/executor/nodeTableFuncscan.c
+++ b/src/backend/executor/nodeTableFuncscan.c
@@ -213,18 +213,6 @@ ExecInitTableFuncScan(TableFuncScan *node, EState *estate, int eflags)
void
ExecEndTableFuncScan(TableFuncScanState *node)
{
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
/*
* Release tuplestore resources
*/
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 2124c55ef5..da622d3f5f 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -331,18 +331,6 @@ ExecEndTidRangeScan(TidRangeScanState *node)
if (scan != NULL)
table_endscan(scan);
-
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clear out tuple table slots
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 862bd0330b..15055077d0 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -472,18 +472,6 @@ ExecEndTidScan(TidScanState *node)
{
if (node->ss.ss_currentScanDesc)
table_endscan(node->ss.ss_currentScanDesc);
-
- /*
- * Free the exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clear out tuple table slots
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 45035d74fa..01f951197c 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -168,11 +168,6 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
void
ExecEndUnique(UniqueState *node)
{
- /* clean up tuple table */
- ExecClearTuple(node->ps.ps_ResultTupleSlot);
-
- ExecFreeExprContext(&node->ps);
-
ExecEndNode(outerPlanState(node));
}
diff --git a/src/backend/executor/nodeValuesscan.c b/src/backend/executor/nodeValuesscan.c
index 32ace63017..fbfb067f3b 100644
--- a/src/backend/executor/nodeValuesscan.c
+++ b/src/backend/executor/nodeValuesscan.c
@@ -319,30 +319,6 @@ ExecInitValuesScan(ValuesScan *node, EState *estate, int eflags)
return scanstate;
}
-/* ----------------------------------------------------------------
- * ExecEndValuesScan
- *
- * frees any storage allocated through C routines.
- * ----------------------------------------------------------------
- */
-void
-ExecEndValuesScan(ValuesScanState *node)
-{
- /*
- * Free both exprcontexts
- */
- ExecFreeExprContext(&node->ss.ps);
- node->ss.ps.ps_ExprContext = node->rowcontext;
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-}
-
/* ----------------------------------------------------------------
* ExecReScanValuesScan
*
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 310ac23e3a..77724a6daa 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2686,23 +2686,6 @@ ExecEndWindowAgg(WindowAggState *node)
release_partition(node);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
- ExecClearTuple(node->first_part_slot);
- ExecClearTuple(node->agg_row_slot);
- ExecClearTuple(node->temp_slot_1);
- ExecClearTuple(node->temp_slot_2);
- if (node->framehead_slot)
- ExecClearTuple(node->framehead_slot);
- if (node->frametail_slot)
- ExecClearTuple(node->frametail_slot);
-
- /*
- * Free both the expr contexts.
- */
- ExecFreeExprContext(&node->ss.ps);
- node->ss.ps.ps_ExprContext = node->tmpcontext;
- ExecFreeExprContext(&node->ss.ps);
-
for (i = 0; i < node->numaggs; i++)
{
if (node->peragg[i].aggcontext != node->aggcontext)
diff --git a/src/backend/executor/nodeWorktablescan.c b/src/backend/executor/nodeWorktablescan.c
index 0c13448236..17a548865e 100644
--- a/src/backend/executor/nodeWorktablescan.c
+++ b/src/backend/executor/nodeWorktablescan.c
@@ -181,28 +181,6 @@ ExecInitWorkTableScan(WorkTableScan *node, EState *estate, int eflags)
return scanstate;
}
-/* ----------------------------------------------------------------
- * ExecEndWorkTableScan
- *
- * frees any storage allocated through C routines.
- * ----------------------------------------------------------------
- */
-void
-ExecEndWorkTableScan(WorkTableScanState *node)
-{
- /*
- * Free exprcontext
- */
- ExecFreeExprContext(&node->ss.ps);
-
- /*
- * clean out the tuple table
- */
- if (node->ss.ps.ps_ResultTupleSlot)
- ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
-}
-
/* ----------------------------------------------------------------
* ExecReScanWorkTableScan
*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index c677e490d7..aeebe0e0ff 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -569,7 +569,6 @@ extern void ExecAssignProjectionInfo(PlanState *planstate,
TupleDesc inputDesc);
extern void ExecConditionalAssignProjectionInfo(PlanState *planstate,
TupleDesc inputDesc, int varno);
-extern void ExecFreeExprContext(PlanState *planstate);
extern void ExecAssignScanType(ScanState *scanstate, TupleDesc tupDesc);
extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
ScanState *scanstate,
diff --git a/src/include/executor/nodeNamedtuplestorescan.h b/src/include/executor/nodeNamedtuplestorescan.h
index 3ff687023a..9d80236fe5 100644
--- a/src/include/executor/nodeNamedtuplestorescan.h
+++ b/src/include/executor/nodeNamedtuplestorescan.h
@@ -17,7 +17,6 @@
#include "nodes/execnodes.h"
extern NamedTuplestoreScanState *ExecInitNamedTuplestoreScan(NamedTuplestoreScan *node, EState *estate, int eflags);
-extern void ExecEndNamedTuplestoreScan(NamedTuplestoreScanState *node);
extern void ExecReScanNamedTuplestoreScan(NamedTuplestoreScanState *node);
#endif /* NODENAMEDTUPLESTORESCAN_H */
diff --git a/src/include/executor/nodeValuesscan.h b/src/include/executor/nodeValuesscan.h
index a52fa678df..fe3f043951 100644
--- a/src/include/executor/nodeValuesscan.h
+++ b/src/include/executor/nodeValuesscan.h
@@ -17,7 +17,6 @@
#include "nodes/execnodes.h"
extern ValuesScanState *ExecInitValuesScan(ValuesScan *node, EState *estate, int eflags);
-extern void ExecEndValuesScan(ValuesScanState *node);
extern void ExecReScanValuesScan(ValuesScanState *node);
#endif /* NODEVALUESSCAN_H */
diff --git a/src/include/executor/nodeWorktablescan.h b/src/include/executor/nodeWorktablescan.h
index e553a453f3..f31b22cec4 100644
--- a/src/include/executor/nodeWorktablescan.h
+++ b/src/include/executor/nodeWorktablescan.h
@@ -17,7 +17,6 @@
#include "nodes/execnodes.h"
extern WorkTableScanState *ExecInitWorkTableScan(WorkTableScan *node, EState *estate, int eflags);
-extern void ExecEndWorkTableScan(WorkTableScanState *node);
extern void ExecReScanWorkTableScan(WorkTableScanState *node);
#endif /* NODEWORKTABLESCAN_H */
--
2.35.3
On Tue, Sep 26, 2023 at 10:06 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Mon, Sep 25, 2023 at 9:57 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Wed, Sep 6, 2023 at 11:20 PM Robert Haas <robertmhaas@gmail.com> wrote:
- Is there any point to all of these early exit cases? For example, in
ExecInitBitmapAnd, why exit early if initialization fails? Why not
just plunge ahead and if initialization failed the caller will notice
that and when we ExecEndNode some of the child node pointers will be
NULL but who cares? The obvious disadvantage of this approach is that
we're doing a bunch of unnecessary initialization, but we're also
speeding up the common case where we don't need to abort by avoiding a
branch that will rarely be taken. I'm not quite sure what the right
thing to do is here.I thought about this some and figured that adding the
is-CachedPlan-still-valid tests in the following places should suffice
after all:1. In InitPlan() right after the top-level ExecInitNode() calls
2. In ExecInit*() functions of Scan nodes, right after
ExecOpenScanRelation() callsAfter sleeping on this, I think we do need the checks after all the
ExecInitNode() calls too, because we have many instances of the code
like the following one:outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
<some code that dereferences outDesc>If outerNode is a SeqScan and ExecInitSeqScan() returned early because
ExecOpenScanRelation() detected that plan was invalidated, then
tupDesc would be NULL in this case, causing the code to crash.Now one might say that perhaps we should only add the
is-CachedPlan-valid test in the instances where there is an actual
risk of such misbehavior, but that could lead to confusion, now or
later. It seems better to add them after every ExecInitNode() call
while we're inventing the notion, because doing so relieves the
authors of future enhancements of the ExecInit*() routines from
worrying about any of this.Attached 0003 should show how that turned out.
Updated 0002 as mentioned in the previous reply -- setting pointers to
NULL after freeing them more consistently across various ExecEnd*()
routines and using the `if (pointer != NULL)` style over the `if
(pointer)` more consistently.Updated 0001's commit message to remove the mention of its relation to
any future commits. I intend to push it tomorrow.
Pushed that one. Here are the rebased patches.
0001 seems ready to me, but I'll wait a couple more days for others to
weigh in. Just to highlight a kind of change that others may have
differing opinions on, consider this hunk from the patch:
- MemoryContextDelete(node->aggcontext);
+ if (node->aggcontext != NULL)
+ {
+ MemoryContextDelete(node->aggcontext);
+ node->aggcontext = NULL;
+ }
...
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
So the patch wants to enhance the consistency of setting the pointer
to NULL after freeing part. Robert mentioned his preference for doing
it in the patch, which I agree with.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v48-0005-Assert-that-relations-needing-their-permissions-.patchapplication/octet-stream; name=v48-0005-Assert-that-relations-needing-their-permissions-.patchDownload
From ff1dfbb0df6d86acb5d4d6dabee623d74df17ab7 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Mon, 25 Sep 2023 11:52:02 +0900
Subject: [PATCH v48 5/8] Assert that relations needing their permissions
checked are locked
---
src/backend/executor/execMain.c | 11 +++++++
src/backend/storage/lmgr/lmgr.c | 45 +++++++++++++++++++++++++++++
src/backend/utils/cache/lsyscache.c | 21 ++++++++++++++
src/include/storage/lmgr.h | 1 +
src/include/utils/lsyscache.h | 1 +
5 files changed, 79 insertions(+)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 5755336abd..ffc62e379a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -626,6 +626,17 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
(rte->rtekind == RTE_SUBQUERY &&
rte->relkind == RELKIND_VIEW));
+ /*
+ * Relations whose permissions need to be checked must already
+ * have been locked by the parser or by GetCachedPlan() if a
+ * cached plan is being executed.
+ *
+ * XXX Maybe we should we skip calling ExecCheckPermissions from
+ * InitPlan in a parallel worker.
+ */
+ Assert(IsParallelWorker() ||
+ CheckRelLockedByMe(rte->relid, AccessShareLock, true));
+
(void) getRTEPermissionInfo(rteperminfos, rte);
/* Many-to-one mapping not allowed */
Assert(!bms_is_member(rte->perminfoindex, indexset));
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a672..c807e9cdcc 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -27,6 +27,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
@@ -364,6 +365,50 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
return false;
}
+/*
+ * CheckRelLockedByMe
+ *
+ * Returns true if current transaction holds a lock on the given relation of
+ * mode 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
+ */
+bool
+CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+ Oid dbId = get_rel_relisshared(relid) ? InvalidOid : MyDatabaseId;
+ LOCKTAG tag;
+
+ SET_LOCKTAG_RELATION(tag, dbId, relid);
+
+ if (LockHeldByMe(&tag, lockmode))
+ return true;
+
+ if (orstronger)
+ {
+ LOCKMODE slockmode;
+
+ for (slockmode = lockmode + 1;
+ slockmode <= MaxLockMode;
+ slockmode++)
+ {
+ if (LockHeldByMe(&tag, slockmode))
+ {
+#ifdef NOT_USED
+ /* Sometimes this might be useful for debugging purposes */
+ elog(WARNING, "lock mode %s substituted for %s on relation %s",
+ GetLockmodeName(tag.locktag_lockmethodid, slockmode),
+ GetLockmodeName(tag.locktag_lockmethodid, lockmode),
+ RelationGetRelationName(relation));
+#endif
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
/*
* LockHasWaitersRelation
*
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index fc6d267e44..2725d02312 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2095,6 +2095,27 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * get_rel_relisshared
+ *
+ * Returns if the given relation is shared or not
+ */
+bool
+get_rel_relisshared(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relisshared;
+ ReleaseSysCache(tp);
+
+ return result;
+}
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 4ee91e3cf9..598bf2688a 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,7 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
bool orstronger);
+extern bool CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger);
extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index f5fdbfe116..a024e5dcd0 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -140,6 +140,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_relisshared(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
--
2.35.3
v48-0004-Teach-the-executor-to-lock-child-tables-in-some-.patchapplication/octet-stream; name=v48-0004-Teach-the-executor-to-lock-child-tables-in-some-.patchDownload
From 49787b3c0ed2a14ca5a68f33b47e892cf90b913f Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 22 Sep 2023 18:17:15 +0900
Subject: [PATCH v48 4/8] Teach the executor to lock child tables in some cases
An upcoming commit will move the locking of child tables referenced
in a cached plan tree from GetCachedPlan() to the executor
initialization of the plan tree in ExecutorStart(). This commit
teaches ExecGetRangeTableRelation() to lock child tables if
EState.es_cachedplan points to a CachedPlan.
The executor must now deal with the cases where an unlocked child
table might have been concurrently dropped, so this modifies
ExecGetRangeTableRelation() to use try_table_open(). All of its
callers (and those of ExecOpenScanRelation() that calls it) must
now account for the child table disappearing, which means to abort
initializing the table's Scan node in the middle.
ExecGetRangeTableRelation() now examines inFromCl field of an RTE
to determine that a given range table relation is a child table, so
this commit also makes the planner set inFromCl to false in the
child tables' RTEs that it manufactures.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.comk
---
src/backend/executor/README | 36 +++++++++++++++++++++++-
src/backend/executor/execPartition.c | 2 ++
src/backend/executor/execUtils.c | 41 +++++++++++++++++++++-------
src/backend/optimizer/util/inherit.c | 7 +++++
src/backend/parser/analyze.c | 7 ++---
src/include/nodes/parsenodes.h | 8 ++++--
6 files changed, 84 insertions(+), 17 deletions(-)
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..6d2240610d 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,34 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Typically, when the executor initializes a plan tree for execution, it doesn't
+lock non-index relations if the plan tree is freshly generated and not derived
+from a CachedPlan. This is because such locks have already been established
+during the query's parsing, rewriting, and planning phases. However, with a
+cached plan tree, there can be relations that remain unlocked. The function
+GetCachedPlan() locks relations existing in the query's range table pre-planning
+but doesn't account for those added during the planning phase. Consequently,
+inheritance child tables, introduced to the query's range table during planning,
+won't be locked when the cached plan reaches the executor.
+
+The decision to defer locking child tables with GetCachedPlan() arises from the
+fact that not all might be accessed during plan execution. For instance, if
+child tables are partitions, some might be omitted due to pruning at
+execution-initialization-time. Thus, the responsibility of locking these child
+tables is pushed to execution-initialization-time, taking place in ExecInitNode()
+for plan nodes encompassing these tables.
+
+This approach opens a window where a cached plan tree with child tables could
+become outdated if another backend modifies these tables before ExecInitNode()
+locks them. Given this, the executor has the added duty to confirm the plan
+tree's validity whenever it locks a child table post execution-initialization-
+pruning. This validation is done by checking the CachedPlan.is_valid attribute
+of the CachedPlan provided. If the plan tree is outdated (is_valid=false), the
+executor halts any further initialization and alerts the caller that they should
+retry execution with another freshly created plan tree.
Query Processing Control Flow
-----------------------------
@@ -316,7 +344,13 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale during one of the recursive calls of ExecInitNode() after taking a
+lock on a child table, the control is immmediately returned to the caller of
+ExecutorStart(), which must redo the steps from CreateQueryDesc with a new
+plan tree.
+
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index eb8a87fd63..84978c5525 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1927,6 +1927,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (unlikely(partrel == NULL))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index f0f5740c26..117773706a 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -697,6 +697,8 @@ ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
*
* Open the heap relation to be scanned by a base-level scan plan node.
* This should be called during the node's ExecInit routine.
+ *
+ * NULL is returned if the relation is found to have been dropped.
* ----------------------------------------------------------------
*/
Relation
@@ -706,6 +708,8 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
/* Open the relation. */
rel = ExecGetRangeTableRelation(estate, scanrelid);
+ if (unlikely(rel == NULL))
+ return NULL;
/*
* Complain if we're attempting a scan of an unscannable relation, except
@@ -763,6 +767,9 @@ ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
* Open the Relation for a range table entry, if not already done
*
* The Relations will be closed again in ExecEndPlan().
+ *
+ * Returned value may be NULL if the relation is a child relation that is not
+ * already locked.
*/
Relation
ExecGetRangeTableRelation(EState *estate, Index rti)
@@ -779,7 +786,28 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (IsParallelWorker() ||
+ (estate->es_cachedplan != NULL && !rte->inFromCl))
+ {
+ /*
+ * Take a lock if we are a parallel worker or if this is a child
+ * table referenced in a cached plan.
+ *
+ * Parallel workers need to have their own local lock on the
+ * relation. This ensures sane behavior in case the parent process
+ * exits before we do.
+ *
+ * When executing a cached plan, child tables must be locked
+ * here, because plancache.c (GetCachedPlan()) would only have
+ * locked tables mentioned in the query, that is, tables whose
+ * RTEs' inFromCl is true.
+ *
+ * Note that we use try_table_open() here, because without a lock
+ * held on the relation, it may have disappeared from under us.
+ */
+ rel = try_table_open(rte->relid, rte->rellockmode);
+ }
+ else
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -792,15 +820,6 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rellockmode == AccessShareLock ||
CheckRelationLockedByMe(rel, rte->rellockmode, false));
}
- else
- {
- /*
- * If we are a parallel worker, we need to obtain our own local
- * lock on the relation. This ensures sane behavior in case the
- * parent process exits before we do.
- */
- rel = table_open(rte->relid, rte->rellockmode);
- }
estate->es_relations[rti - 1] = rel;
}
@@ -823,6 +842,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (unlikely(resultRelationDesc == NULL))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 94de855a22..1b30c0ff87 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -492,6 +492,13 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
}
else
childrte->inh = false;
+
+ /*
+ * Flag child tables as indirectly referenced in the query. This helps
+ * the executor's ExecGetRangeTableRelation() recognize them as
+ * inheritance children.
+ */
+ childrte->inFromCl = false;
childrte->securityQuals = NIL;
/*
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 7a1dfb6364..cf269f8c53 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -3305,10 +3305,9 @@ transformLockingClause(ParseState *pstate, Query *qry, LockingClause *lc,
/*
* Lock all regular tables used in query and its subqueries. We
* examine inFromCl to exclude auto-added RTEs, particularly NEW/OLD
- * in rules. This is a bit of an abuse of a mostly-obsolete flag, but
- * it's convenient. We can't rely on the namespace mechanism that has
- * largely replaced inFromCl, since for example we need to lock
- * base-relation RTEs even if they are masked by upper joins.
+ * in rules. We can't rely on the namespace mechanism since for
+ * example we need to lock base-relation RTEs even if they are masked
+ * by upper joins.
*/
i = 0;
foreach(rt, qry->rtable)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index f637937cd2..acf87580a1 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -994,11 +994,15 @@ typedef struct PartitionCmd
*
* inFromCl marks those range variables that are listed in the FROM clause.
* It's false for RTEs that are added to a query behind the scenes, such
- * as the NEW and OLD variables for a rule, or the subqueries of a UNION.
+ * as the NEW and OLD variables for a rule, or the subqueries of a UNION,
+ * or the RTEs of inheritance child tables that are added by the planner.
* This flag is not used during parsing (except in transformLockingClause,
* q.v.); the parser now uses a separate "namespace" data structure to
* control visibility. But it is needed by ruleutils.c to determine
- * whether RTEs should be shown in decompiled queries.
+ * whether RTEs should be shown in decompiled queries. The executor uses
+ * this to ascertain if an RTE_RELATION entry is for a table explicitly
+ * named in the query or a child table added by the planner. This
+ * distinction is vital when child tables in a plan must be locked.
*
* securityQuals is a list of security barrier quals (boolean expressions),
* to be tested in the listed order before returning a row from the
--
2.35.3
v48-0003-Adjustments-to-allow-ExecutorStart-to-sometimes-.patchapplication/octet-stream; name=v48-0003-Adjustments-to-allow-ExecutorStart-to-sometimes-.patchDownload
From 36eb6e04907d4ab44ad424e11d54f9589309d0d2 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:53:46 +0900
Subject: [PATCH v48 3/8] Adjustments to allow ExecutorStart() to sometimes
fail
Upon passing a plan tree from a CachedPlan to the executor, there's a
possibility that ExecutorStart() might return an incompletely set up
planstate tree. This can happen if the CachedPlan undergoes invalidation
during the ExecInitNode() initialization process. In such cases, the
execution should be reattempted using a fresh CachedPlan. Also, any
partially initialized EState must be cleaned up by invoking both
ExecutorEnd() and FreeExecutorState().
ExecutorStart() (and ExecutorStart_hook()) now return a Boolean telling
the caller if the plan initialization failed.
For the replan loop in that context, it makes more sense to have
ExecutorStart() either in the same scope or closer to where
GetCachedPlan() is invoked. So this commit modifies the following
sites:
* The ExecutorStart() call in ExplainOnePlan() is moved into a new
function ExplainQueryDesc() along with CreateQueryDesc(). Callers
of ExplainOnePlan() should now call the new function first.
* The ExecutorStart() call in _SPI_pquery() is moved to its caller
_SPI_execute_plan().
* The ExecutorStart() call in PortalRunMulti() is moved to
PortalStart(). This requires a new List field in PortalData to
store the QueryDescs created in PortalStart() and a new memory
context for those. One unintended consequence is that
CommandCounterIncrement() between queries in the PORTAL_MULTI_QUERY
case is now done in the loop in PortalStart() and not in
PortalRunMulti(). That still works because the Snapshot registered
in QueryDesc/EState is updated to account for the CCI().
This commit also adds a new flag to EState called es_canceled that
complements es_finished to denote the new scenario where
ExecutorStart() returns with a partially setup planstate tree. Also,
to reset the AFTER trigger state that would have been set up in the
ExecutorStart(), this adds a new function AfterTriggerCancelQuery()
which is called from ExecutorEnd() (not ExecutorFinish()) when
es_canceled is true.
Note that this commit by itself doesn't make any functional change,
because the CachedPlan is not passed into the executor yet.
---
contrib/auto_explain/auto_explain.c | 12 +-
.../pg_stat_statements/pg_stat_statements.c | 12 +-
src/backend/commands/copyto.c | 5 +-
src/backend/commands/createas.c | 9 +-
src/backend/commands/explain.c | 145 +++++---
src/backend/commands/extension.c | 6 +-
src/backend/commands/matview.c | 9 +-
src/backend/commands/portalcmds.c | 6 +-
src/backend/commands/prepare.c | 31 +-
src/backend/commands/trigger.c | 13 +
src/backend/executor/execMain.c | 44 ++-
src/backend/executor/execParallel.c | 6 +-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 7 +-
src/backend/executor/spi.c | 48 ++-
src/backend/tcop/postgres.c | 18 +-
src/backend/tcop/pquery.c | 346 +++++++++---------
src/backend/utils/mmgr/portalmem.c | 9 +
src/include/commands/explain.h | 7 +-
src/include/commands/trigger.h | 1 +
src/include/executor/executor.h | 6 +-
src/include/nodes/execnodes.h | 3 +
src/include/tcop/pquery.h | 2 +-
src/include/utils/portal.h | 2 +
24 files changed, 466 insertions(+), 282 deletions(-)
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index c3ac27ae99..a0630d7944 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -78,7 +78,7 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
-static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void explain_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count, bool execute_once);
@@ -258,9 +258,11 @@ _PG_init(void)
/*
* ExecutorStart hook: start up logging if needed
*/
-static void
+static bool
explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
/*
* At the beginning of each top-level statement, decide whether we'll
* sample this statement. If nested-statement explaining is enabled,
@@ -296,9 +298,9 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
}
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
if (auto_explain_enabled())
{
@@ -316,6 +318,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return plan_valid;
}
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index a46f2db352..58cb62e872 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -330,7 +330,7 @@ static PlannedStmt *pgss_planner(Query *parse,
const char *query_string,
int cursorOptions,
ParamListInfo boundParams);
-static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void pgss_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count, bool execute_once);
@@ -967,13 +967,15 @@ pgss_planner(Query *parse,
/*
* ExecutorStart hook: start up tracking if needed
*/
-static void
+static bool
pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
/*
* If query has queryId zero, don't track it. This prevents double
@@ -996,6 +998,8 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return plan_valid;
}
/*
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 0e3547c35b..f7730c8702 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -568,8 +568,11 @@ BeginCopyTo(ParseState *pstate,
* Call ExecutorStart to prepare the plan for execution.
*
* ExecutorStart computes a result tupdesc for us
+ *
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
*/
- ExecutorStart(cstate->queryDesc, 0);
+ (void) ExecutorStart(cstate->queryDesc, 0);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 18b07c0200..4a950c03ff 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -329,8 +329,13 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * call ExecutorStart to prepare the plan for execution
+ *
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
+ */
+ (void) ExecutorStart(queryDesc, GetIntoRelEFlags(into));
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 281c47b2ee..8d1fe5738b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -393,6 +393,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
else
{
PlannedStmt *plan;
+ QueryDesc *queryDesc;
instr_time planstart,
planduration;
BufferUsage bufusage_start,
@@ -415,12 +416,90 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
+ queryDesc = ExplainQueryDesc(plan, NULL, queryString, into, es,
+ params, queryEnv);
+ Assert(queryDesc);
+
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(queryDesc, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
+/*
+ * ExplainQueryDesc
+ * Set up QueryDesc for EXPLAINing a given plan
+ *
+ * This returns NULL if cplan is found to have been invalidated after
+ * calling ExecutorStart().
+ */
+QueryDesc *
+ExplainQueryDesc(PlannedStmt *stmt, CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv)
+{
+ QueryDesc *queryDesc;
+ DestReceiver *dest;
+ int eflags;
+ int instrument_option = 0;
+
+ /*
+ * Normally we discard the query's output, but if explaining CREATE TABLE
+ * AS, we'd better use the appropriate tuple receiver.
+ */
+ if (into)
+ dest = CreateIntoRelDestReceiver(into);
+ else
+ dest = None_Receiver;
+
+ if (es->analyze && es->timing)
+ instrument_option |= INSTRUMENT_TIMER;
+ else if (es->analyze)
+ instrument_option |= INSTRUMENT_ROWS;
+
+ if (es->buffers)
+ instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
+
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+
+ /* Create a QueryDesc for the query */
+ queryDesc = CreateQueryDesc(stmt, cplan, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv, instrument_option);
+
+ /* Select execution options */
+ if (es->analyze)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_EXPLAIN_ONLY;
+ if (es->generic)
+ eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
+ if (into)
+ eflags |= GetIntoRelEFlags(into);
+
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated during plan intialization.
+ */
+ if (!ExecutorStart(queryDesc, eflags))
+ {
+ /* Clean up. */
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ return NULL;
+ }
+
+ return queryDesc;
+}
+
/*
* ExplainOneUtility -
* print out the execution plan for one utility statement
@@ -524,29 +603,16 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
{
- DestReceiver *dest;
- QueryDesc *queryDesc;
instr_time starttime;
double totaltime = 0;
- int eflags;
- int instrument_option = 0;
-
- Assert(plannedstmt->commandType != CMD_UTILITY);
- if (es->analyze && es->timing)
- instrument_option |= INSTRUMENT_TIMER;
- else if (es->analyze)
- instrument_option |= INSTRUMENT_ROWS;
-
- if (es->buffers)
- instrument_option |= INSTRUMENT_BUFFERS;
- if (es->wal)
- instrument_option |= INSTRUMENT_WAL;
+ Assert(queryDesc->plannedstmt->commandType != CMD_UTILITY);
/*
* We always collect timing for the entire statement, even when node-level
@@ -555,40 +621,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
- /*
- * Normally we discard the query's output, but if explaining CREATE TABLE
- * AS, we'd better use the appropriate tuple receiver.
- */
- if (into)
- dest = CreateIntoRelDestReceiver(into);
- else
- dest = None_Receiver;
-
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, NULL, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
-
- /* Select execution options */
- if (es->analyze)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_EXPLAIN_ONLY;
- if (es->generic)
- eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
- if (into)
- eflags |= GetIntoRelEFlags(into);
-
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
-
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -4873,6 +4905,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index b287a2e84c..127d2a3b0a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -802,7 +802,11 @@ execute_sql_string(const char *sql)
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
- ExecutorStart(qdesc, 0);
+ /*
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
+ */
+ (void) ExecutorStart(qdesc, 0);
ExecutorRun(qdesc, ForwardScanDirection, 0, true);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 22b8b820c3..7083fb2350 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -412,8 +412,13 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, 0);
+ /*
+ * call ExecutorStart to prepare the plan for execution
+ *
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
+ */
+ (void) ExecutorStart(queryDesc, 0);
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 73ed7aa2f0..a1ee5c0acd 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -142,9 +142,11 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
/*
* Start execution, inserting parameters if any.
+ *
+ * OK to ignore the return value; plan can't become invalid here,
+ * because there's no CachedPlan.
*/
- PortalStart(portal, params, 0, GetActiveSnapshot());
-
+ (void) PortalStart(portal, params, 0, GetActiveSnapshot());
Assert(portal->strategy == PORTAL_ONE_SELECT);
/*
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..f8d0b0ee25 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -183,6 +183,7 @@ ExecuteQuery(ParseState *pstate,
paramLI = EvaluateParams(pstate, entry, stmt->params, estate);
}
+replan:
/* Create a new portal to run the query in */
portal = CreateNewPortal();
/* Don't display the portal in pg_cursors, it is for internal use only */
@@ -251,9 +252,15 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal has a cached plan and
+ * it's found to be invalidated during the initialization of its plan
+ * trees, the plan must be regenerated.
*/
- PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ if (!PortalStart(portal, paramLI, eflags, GetActiveSnapshot()))
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
(void) PortalRun(portal, count, false, true, dest, dest, qc);
@@ -574,7 +581,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -618,6 +625,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI,
CurrentResourceOwner, queryEnv);
@@ -639,8 +647,21 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ {
+ QueryDesc *queryDesc;
+
+ queryDesc = ExplainQueryDesc(pstmt, cplan, queryString,
+ into, es, paramLI, queryEnv);
+ if (queryDesc == NULL)
+ {
+ ExplainResetOutput(es);
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ goto replan;
+ }
+ ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
+ }
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 52177759ab..dd139432b9 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5009,6 +5009,19 @@ AfterTriggerBeginQuery(void)
afterTriggers.query_depth++;
}
+/* ----------
+ * AfterTriggerCancelQuery()
+ *
+ * Called from ExecutorEnd() if the query execution was canceled.
+ * ----------
+ */
+void
+AfterTriggerCancelQuery(void)
+{
+ /* Set to a value denoting that no query is active. */
+ afterTriggers.query_depth = -1;
+}
+
/* ----------
* AfterTriggerEndQuery()
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index de7bf7ca67..5755336abd 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -119,6 +119,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
*
* eflags contains flag bits as described in executor.h.
*
+ * Plan initialization may fail if the input plan tree is found to have been
+ * invalidated, which can happen if it comes from a CachedPlan.
+ *
+ * Returns true if plan was successfully initialized and false otherwise. If
+ * the latter, the caller must call ExecutorEnd() on 'queryDesc' to clean up
+ * after failed plan initialization.
+ *
* NB: the CurrentMemoryContext when this is called will become the parent
* of the per-query context used for this Executor invocation.
*
@@ -128,7 +135,7 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
*
* ----------------------------------------------------------------
*/
-void
+bool
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
/*
@@ -140,14 +147,15 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
if (ExecutorStart_hook)
- (*ExecutorStart_hook) (queryDesc, eflags);
- else
- standard_ExecutorStart(queryDesc, eflags);
+ return (*ExecutorStart_hook) (queryDesc, eflags);
+
+ return standard_ExecutorStart(queryDesc, eflags);
}
-void
+bool
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
EState *estate;
MemoryContext oldcontext;
@@ -263,9 +271,14 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Initialize the plan state tree
*/
- (void) InitPlan(queryDesc, eflags);
+ plan_valid = InitPlan(queryDesc, eflags);
+
+ /* Mark execution as canceled if plan won't be executed. */
+ estate->es_canceled = !plan_valid;
MemoryContextSwitchTo(oldcontext);
+
+ return plan_valid;
}
/* ----------------------------------------------------------------
@@ -325,6 +338,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
estate = queryDesc->estate;
Assert(estate != NULL);
+ Assert(!estate->es_canceled);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -429,7 +443,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* This should be run once and only once per Executor instance */
- Assert(!estate->es_finished);
+ Assert(!estate->es_finished && !estate->es_canceled);
/* Switch into per-query memory context */
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -488,11 +502,11 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
Assert(estate != NULL);
/*
- * Check that ExecutorFinish was called, unless in EXPLAIN-only mode. This
- * Assert is needed because ExecutorFinish is new as of 9.1, and callers
- * might forget to call it.
+ * Check that ExecutorFinish was called, unless in EXPLAIN-only mode or if
+ * execution was canceled. This Assert is needed because ExecutorFinish is
+ * new as of 9.1, and callers might forget to call it.
*/
- Assert(estate->es_finished ||
+ Assert(estate->es_finished || estate->es_canceled ||
(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -506,6 +520,14 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
UnregisterSnapshot(estate->es_snapshot);
UnregisterSnapshot(estate->es_crosscheck_snapshot);
+ /*
+ * Cancel trigger execution too if the query execution was canceled.
+ */
+ if (estate->es_canceled &&
+ !(estate->es_top_eflags &
+ (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
+ AfterTriggerCancelQuery();
+
/*
* Must switch out of context before destroying it
*/
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 457ee46faf..13d2820a41 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1437,7 +1437,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
- ExecutorStart(queryDesc, fpes->eflags);
+ /*
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
+ */
+ (void) ExecutorStart(queryDesc, fpes->eflags);
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 16704c0c2f..f0f5740c26 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -151,6 +151,7 @@ CreateExecutorState(void)
estate->es_top_eflags = 0;
estate->es_instrument = 0;
estate->es_finished = false;
+ estate->es_canceled = false;
estate->es_exprcontexts = NIL;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 7e452ed743..606da72535 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -863,7 +863,12 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
eflags = EXEC_FLAG_SKIP_TRIGGERS;
else
eflags = 0; /* default run-to-completion flags */
- ExecutorStart(es->qd, eflags);
+
+ /*
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
+ */
+ (void) ExecutorStart(es->qd, eflags);
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index f2cca807ef..814ff1390f 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -71,7 +71,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1582,6 +1582,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
Snapshot snapshot;
MemoryContext oldcontext;
Portal portal;
+ bool plan_valid;
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -1623,6 +1624,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
_SPI_current->processed = 0;
_SPI_current->tuptable = NULL;
+replan:
/* Create the portal */
if (name == NULL || name[0] == '\0')
{
@@ -1766,15 +1768,23 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if the cached plan was found to have been invalidated when
+ * initializing one of the plan trees contained in it.
*/
- PortalStart(portal, paramLI, 0, snapshot);
+ plan_valid = PortalStart(portal, paramLI, 0, snapshot);
Assert(portal->strategy != PORTAL_MULTI_QUERY);
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
+ if (!plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/* Pop the SPI stack */
_SPI_end_call(true);
@@ -2552,6 +2562,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2661,6 +2672,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2675,8 +2687,23 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+
+ if (!ExecutorStart(qdesc, eflags))
+ {
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ Assert(cplan);
+ ReleaseCachedPlan(cplan, plan_owner);
+ goto replan;
+ }
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2851,10 +2878,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2898,14 +2924,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 21b9763183..4f923bbcae 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1230,7 +1230,12 @@ exec_simple_query(const char *query_string)
/*
* Start the portal. No parameters here.
*/
- PortalStart(portal, NULL, 0, InvalidSnapshot);
+ {
+ bool plan_valid PG_USED_FOR_ASSERTS_ONLY;
+
+ plan_valid = PortalStart(portal, NULL, 0, InvalidSnapshot);
+ Assert(plan_valid);
+ }
/*
* Select the appropriate output format: text unless we are doing a
@@ -1735,6 +1740,7 @@ exec_bind_message(StringInfo input_message)
"commands ignored until end of transaction block"),
errdetail_abort()));
+replan:
/*
* Create the portal. Allow silent replacement of an existing portal only
* if the unnamed portal is specified.
@@ -2026,9 +2032,15 @@ exec_bind_message(StringInfo input_message)
PopActiveSnapshot();
/*
- * And we're ready to start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if the cached plan was found to have been invalidated when
+ * initializing one of the plan trees contained in it.
*/
- PortalStart(portal, params, 0, InvalidSnapshot);
+ if (!PortalStart(portal, params, 0, InvalidSnapshot))
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
/*
* Apply the result format requests to the portal.
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 4ef349df8b..fcf9925ed4 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -35,12 +36,6 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc);
static void FillPortalStore(Portal portal, bool isTopLevel);
static uint64 RunFromStore(Portal portal, ScanDirection direction, uint64 count,
DestReceiver *dest);
@@ -118,86 +113,6 @@ FreeQueryDesc(QueryDesc *qdesc)
}
-/*
- * ProcessQuery
- * Execute a single plannable query within a PORTAL_MULTI_QUERY,
- * PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
- *
- * plan: the plan tree for the query
- * sourceText: the source text of the query
- * params: any parameters needed
- * dest: where to send results
- * qc: where to store the command completion status data.
- *
- * qc may be NULL if caller doesn't want a status string.
- *
- * Must be called in a memory context that will be reset or deleted on
- * error; otherwise the executor's memory usage will be leaked.
- */
-static void
-ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc)
-{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, NULL, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
-
- /*
- * Run the plan to completion.
- */
- ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
-
- /*
- * Build command completion status data, if caller wants one.
- */
- if (qc)
- {
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
- SetQueryCompletion(qc, CMDTAG_SELECT, queryDesc->estate->es_processed);
- break;
- case CMD_INSERT:
- SetQueryCompletion(qc, CMDTAG_INSERT, queryDesc->estate->es_processed);
- break;
- case CMD_UPDATE:
- SetQueryCompletion(qc, CMDTAG_UPDATE, queryDesc->estate->es_processed);
- break;
- case CMD_DELETE:
- SetQueryCompletion(qc, CMDTAG_DELETE, queryDesc->estate->es_processed);
- break;
- case CMD_MERGE:
- SetQueryCompletion(qc, CMDTAG_MERGE, queryDesc->estate->es_processed);
- break;
- default:
- SetQueryCompletion(qc, CMDTAG_UNKNOWN, queryDesc->estate->es_processed);
- break;
- }
- }
-
- /*
- * Now, we close down all the scans and free allocated resources.
- */
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
-
- FreeQueryDesc(queryDesc);
-}
-
/*
* ChoosePortalStrategy
* Select portal execution strategy given the intended statement list.
@@ -428,19 +343,21 @@ FetchStatementTargetList(Node *stmt)
* presently ignored for non-PORTAL_ONE_SELECT portals (it's only intended
* to be used for cursors).
*
- * On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * True is returned if portal is ready to accept PortalRun() calls, and the
+ * result tupdesc (if any) is known. False if the plan tree is no longer
+ * valid, in which case, the caller must retry after generating a new
+ * CachedPlan.
*/
-void
+bool
PortalStart(Portal portal, ParamListInfo params,
int eflags, Snapshot snapshot)
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
- int myeflags;
+ int myeflags = 0;
+ bool plan_valid = true;
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_DEFINED);
@@ -450,15 +367,13 @@ PortalStart(Portal portal, ParamListInfo params,
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -474,6 +389,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -491,8 +408,8 @@ PortalStart(Portal portal, ParamListInfo params,
*/
/*
- * Create QueryDesc in portal's context; for the moment, set
- * the destination to DestNone.
+ * Create QueryDesc in portal->queryContext; for the moment,
+ * set the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
NULL,
@@ -504,30 +421,51 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti(). */
+ if (portal->strategy == PORTAL_ONE_RETURNING ||
+ portal->strategy == PORTAL_ONE_MOD_WITH)
+ portal->qdescs = list_make1(queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Call ExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated during plan intialization.
*/
- ExecutorStart(queryDesc, myeflags);
+ if (!ExecutorStart(queryDesc, myeflags))
+ {
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ plan_valid = false;
+ goto plan_init_failed;
+ }
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -539,29 +477,6 @@ PortalStart(Portal portal, ParamListInfo params,
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -584,7 +499,82 @@ PortalStart(Portal portal, ParamListInfo params,
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool first = true;
+
+ myeflags = eflags;
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /*
+ * Push the snapshot to be used by the executor.
+ */
+ if (!is_utility)
+ {
+ /*
+ * Must copy the snapshot for all statements
+ * except thec first as we'll need to update its
+ * command ID.
+ */
+ if (!first)
+ PushCopiedSnapshot(GetTransactionSnapshot());
+ else
+ PushActiveSnapshot(GetTransactionSnapshot());
+ }
+
+ /*
+ * From the 2nd statement onwards, update the command
+ * ID and the snapshot to match.
+ */
+ if (!first)
+ {
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+ }
+
+ first = false;
+
+ /*
+ * Create the QueryDesc. DestReceiver will be set in
+ * PortalRunMulti() before calling ExecutorRun().
+ */
+ queryDesc = CreateQueryDesc(plan,
+ NULL,
+ portal->sourceText,
+ !is_utility ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ if (is_utility)
+ continue;
+
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated
+ * during plan intialization.
+ */
+ if (!ExecutorStart(queryDesc, myeflags))
+ {
+ PopActiveSnapshot();
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ plan_valid = false;
+ goto plan_init_failed;
+ }
+ PopActiveSnapshot();
+ }
+ }
+
portal->tupDesc = NULL;
break;
}
@@ -597,19 +587,20 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+plan_init_failed:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
- portal->status = PORTAL_READY;
+ return plan_valid;
}
/*
@@ -1196,7 +1187,7 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1217,9 +1208,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = (QueryDesc *) lfirst(qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1236,33 +1228,26 @@ PortalRunMulti(Portal portal,
if (log_executor_stats)
ResetUsage();
- /*
- * Must always have a snapshot for plannable queries. First time
- * through, take a new snapshot; for subsequent queries in the
- * same portal, just update the snapshot's copy of the command
- * counter.
- */
+ /* Push the snapshot for plannable queries. */
if (!active_snapshot_set)
{
- Snapshot snapshot = GetTransactionSnapshot();
+ Snapshot snapshot = qdesc->snapshot;
- /* If told to, register the snapshot and save in portal */
+ /*
+ * If told to, register the snapshot and save in portal
+ *
+ * Note that the command ID of qdesc->snapshot for 2nd query
+ * onwards would have been updated in PortalStart() to account
+ * for CCI() done between queries, but it's OK that here we
+ * don't likewise update holdSnapshot's command ID.
+ */
if (setHoldSnapshot)
{
snapshot = RegisterSnapshot(snapshot);
portal->holdSnapshot = snapshot;
}
- /*
- * We can't have the holdSnapshot also be the active one,
- * because UpdateActiveSnapshotCommandId would complain. So
- * force an extra snapshot copy. Plain PushActiveSnapshot
- * would have copied the transaction snapshot anyway, so this
- * only adds a copy step when setHoldSnapshot is true. (It's
- * okay for the command ID of the active snapshot to diverge
- * from what holdSnapshot has.)
- */
- PushCopiedSnapshot(snapshot);
+ PushActiveSnapshot(snapshot);
/*
* As for PORTAL_ONE_SELECT portals, it does not seem
@@ -1271,26 +1256,39 @@ PortalRunMulti(Portal portal,
active_snapshot_set = true;
}
- else
- UpdateActiveSnapshotCommandId();
+ /*
+ * Run the plan to completion.
+ */
+ qdesc->dest = dest;
+ ExecutorRun(qdesc, ForwardScanDirection, 0, true);
+
+ /*
+ * Build command completion status data if needed.
+ */
if (pstmt->canSetTag)
{
- /* statement can set tag string */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- dest, qc);
- }
- else
- {
- /* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- altdest, NULL);
+ switch (qdesc->operation)
+ {
+ case CMD_SELECT:
+ SetQueryCompletion(qc, CMDTAG_SELECT, qdesc->estate->es_processed);
+ break;
+ case CMD_INSERT:
+ SetQueryCompletion(qc, CMDTAG_INSERT, qdesc->estate->es_processed);
+ break;
+ case CMD_UPDATE:
+ SetQueryCompletion(qc, CMDTAG_UPDATE, qdesc->estate->es_processed);
+ break;
+ case CMD_DELETE:
+ SetQueryCompletion(qc, CMDTAG_DELETE, qdesc->estate->es_processed);
+ break;
+ case CMD_MERGE:
+ SetQueryCompletion(qc, CMDTAG_MERGE, qdesc->estate->es_processed);
+ break;
+ default:
+ SetQueryCompletion(qc, CMDTAG_UNKNOWN, qdesc->estate->es_processed);
+ break;
+ }
}
if (log_executor_stats)
@@ -1345,12 +1343,12 @@ PortalRunMulti(Portal portal,
if (portal->stmts == NIL)
break;
- /*
- * Increment command counter between queries, but not after the last
- * one.
- */
- if (lnext(portal->stmts, stmtlist_item) != NULL)
- CommandCounterIncrement();
+ if (qdesc->estate)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ }
+ FreeQueryDesc(qdesc);
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 06dfa85f04..0cad450dcd 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -201,6 +201,13 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /*
+ * initialize portal's query context to store QueryDescs created during
+ * PortalStart() and then used in PortalRun().
+ */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -224,6 +231,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -594,6 +602,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3d3e632a0c..392abb5150 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -88,7 +88,11 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt, struct CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv);
+extern void ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
@@ -104,6 +108,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 430e3ca7dd..d4f7c29301 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -257,6 +257,7 @@ extern void ExecASTruncateTriggers(EState *estate,
extern void AfterTriggerBeginXact(void);
extern void AfterTriggerBeginQuery(void);
+extern void AfterTriggerCancelQuery(void);
extern void AfterTriggerEndQuery(EState *estate);
extern void AfterTriggerFireDeferred(void);
extern void AfterTriggerEndXact(bool isCommit);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 72cbf120c5..10c5cda169 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -73,7 +73,7 @@
/* Hook for plugins to get control in ExecutorStart() */
-typedef void (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
+typedef bool (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
/* Hook for plugins to get control in ExecutorRun() */
@@ -198,8 +198,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
-extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
-extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count, bool execute_once);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 846eb32a1d..bb5734edb5 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -670,6 +670,9 @@ typedef struct EState
int es_top_eflags; /* eflags passed to ExecutorStart */
int es_instrument; /* OR of InstrumentOption flags */
bool es_finished; /* true when ExecutorFinish is done */
+ bool es_canceled; /* true when execution was canceled
+ * upon encountering that plan was invalided
+ * during ExecInitNode() */
List *es_exprcontexts; /* List of ExprContexts within EState */
diff --git a/src/include/tcop/pquery.h b/src/include/tcop/pquery.h
index a5e65b98aa..577b81a9ee 100644
--- a/src/include/tcop/pquery.h
+++ b/src/include/tcop/pquery.h
@@ -29,7 +29,7 @@ extern List *FetchPortalTargetList(Portal portal);
extern List *FetchStatementTargetList(Node *stmt);
-extern void PortalStart(Portal portal, ParamListInfo params,
+extern bool PortalStart(Portal portal, ParamListInfo params,
int eflags, Snapshot snapshot);
extern void PortalSetResultFormat(Portal portal, int nFormats,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aa08b1e0fc..af059e30f8 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,8 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
--
2.35.3
v48-0001-Assorted-tightening-in-various-ExecEnd-routines.patchapplication/octet-stream; name=v48-0001-Assorted-tightening-in-various-ExecEnd-routines.patchDownload
From 96983c4519fe018b10b0b8517d205cdebfcb95a2 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 28 Sep 2023 16:56:29 +0900
Subject: [PATCH v48 1/8] Assorted tightening in various ExecEnd()* routines
This includes adding NULLness checks on pointers before cleaning them
in up. Many ExecEnd*() routines already perform this check, but a few
instances remain. These NULLness checks might seem redundant as
things stand since the ExecEnd*() routines operate under the
assumption that their matching ExecInit* routine would have fully
executed, ensuring pointers are set. However, a forthcoming patch will
modify ExecInit* routines to sometimes exit early, potentially leaving
some pointers in an undetermined state, so it will become crucial to
have these NULLness checks in place.
This also adds a guard at the begigging of EvalPlanQualEnd() to return
early if the EPQState does not appear to have been initialized. That
case can happen if the corresponding ExecInit*() routine returned
early without calling EvalPlanQualInit().
While at it, this commit ensures that pointers are consistently set
to NULL after cleanup in all ExecEnd*() routines.
Finally, for enhanced consistency, the format of NULLness checks has
been standardized to "if (pointer != NULL)", replacing the previous
"if (pointer)" style.
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 4 ++
src/backend/executor/nodeAgg.c | 27 +++++++++----
src/backend/executor/nodeAppend.c | 3 ++
src/backend/executor/nodeBitmapAnd.c | 4 +-
src/backend/executor/nodeBitmapHeapscan.c | 47 +++++++++++++++-------
src/backend/executor/nodeBitmapIndexscan.c | 23 +++++------
src/backend/executor/nodeBitmapOr.c | 4 +-
src/backend/executor/nodeForeignscan.c | 17 ++++----
src/backend/executor/nodeGather.c | 1 +
src/backend/executor/nodeGatherMerge.c | 1 +
src/backend/executor/nodeGroup.c | 6 +--
src/backend/executor/nodeHash.c | 6 +--
src/backend/executor/nodeHashjoin.c | 4 +-
src/backend/executor/nodeIncrementalSort.c | 13 +++++-
src/backend/executor/nodeIndexonlyscan.c | 25 ++++++------
src/backend/executor/nodeIndexscan.c | 23 +++++------
src/backend/executor/nodeLimit.c | 1 +
src/backend/executor/nodeLockRows.c | 1 +
src/backend/executor/nodeMaterial.c | 5 ++-
src/backend/executor/nodeMemoize.c | 8 +++-
src/backend/executor/nodeMergeAppend.c | 3 ++
src/backend/executor/nodeMergejoin.c | 2 +
src/backend/executor/nodeModifyTable.c | 11 ++++-
src/backend/executor/nodeNestloop.c | 2 +
src/backend/executor/nodeProjectSet.c | 1 +
src/backend/executor/nodeRecursiveunion.c | 24 +++++++++--
src/backend/executor/nodeResult.c | 1 +
src/backend/executor/nodeSamplescan.c | 7 +++-
src/backend/executor/nodeSeqscan.c | 16 +++-----
src/backend/executor/nodeSetOp.c | 6 ++-
src/backend/executor/nodeSort.c | 5 ++-
src/backend/executor/nodeSubqueryscan.c | 1 +
src/backend/executor/nodeTableFuncscan.c | 4 +-
src/backend/executor/nodeTidrangescan.c | 12 ++++--
src/backend/executor/nodeTidscan.c | 8 +++-
src/backend/executor/nodeUnique.c | 1 +
src/backend/executor/nodeWindowAgg.c | 41 ++++++++++++++-----
37 files changed, 248 insertions(+), 120 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4c5a7bbf62..f7f18d3054 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -3010,6 +3010,10 @@ EvalPlanQualEnd(EPQState *epqstate)
MemoryContext oldcontext;
ListCell *l;
+ /* Nothing to do if no EvalPlanQualInit() was done to begin with. */
+ if (epqstate->parentestate == NULL)
+ return;
+
rtsize = epqstate->parentestate->es_range_table_size;
/*
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index f154f28902..af22b1676f 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -4304,7 +4304,6 @@ GetAggInitVal(Datum textInitVal, Oid transtype)
void
ExecEndAgg(AggState *node)
{
- PlanState *outerPlan;
int transno;
int numGroupingSets = Max(node->maxsets, 1);
int setno;
@@ -4314,7 +4313,7 @@ ExecEndAgg(AggState *node)
* worker back into shared memory so that it can be picked up by the main
* process to report in EXPLAIN ANALYZE.
*/
- if (node->shared_info && IsParallelWorker())
+ if (node->shared_info != NULL && IsParallelWorker())
{
AggregateInstrumentation *si;
@@ -4327,10 +4326,16 @@ ExecEndAgg(AggState *node)
/* Make sure we have closed any open tuplesorts */
- if (node->sort_in)
+ if (node->sort_in != NULL)
+ {
tuplesort_end(node->sort_in);
- if (node->sort_out)
+ node->sort_in = NULL;
+ }
+ if (node->sort_out != NULL)
+ {
tuplesort_end(node->sort_out);
+ node->sort_out = NULL;
+ }
hashagg_reset_spill_state(node);
@@ -4346,19 +4351,25 @@ ExecEndAgg(AggState *node)
for (setno = 0; setno < numGroupingSets; setno++)
{
- if (pertrans->sortstates[setno])
+ if (pertrans->sortstates[setno] != NULL)
tuplesort_end(pertrans->sortstates[setno]);
}
}
/* And ensure any agg shutdown callbacks have been called */
for (setno = 0; setno < numGroupingSets; setno++)
+ {
ReScanExprContext(node->aggcontexts[setno]);
- if (node->hashcontext)
+ node->aggcontexts[setno] = NULL;
+ }
+ if (node->hashcontext != NULL)
+ {
ReScanExprContext(node->hashcontext);
+ node->hashcontext = NULL;
+ }
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..a2af221e05 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -399,7 +399,10 @@ ExecEndAppend(AppendState *node)
* shut down each of the subscans
*/
for (i = 0; i < nplans; i++)
+ {
ExecEndNode(appendplans[i]);
+ appendplans[i] = NULL;
+ }
}
void
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4c5eb2b23b..4abb0609a0 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -192,8 +192,8 @@ ExecEndBitmapAnd(BitmapAndState *node)
*/
for (i = 0; i < nplans; i++)
{
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
+ ExecEndNode(bitmapplans[i]);
+ bitmapplans[i] = NULL;
}
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 2db0acfc76..d3f58c22f9 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -648,40 +648,59 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node)
void
ExecEndBitmapHeapScan(BitmapHeapScanState *node)
{
- TableScanDesc scanDesc;
-
- /*
- * extract information from the node
- */
- scanDesc = node->ss.ss_currentScanDesc;
-
/*
* close down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
/*
* release bitmaps and buffers if any
*/
- if (node->tbmiterator)
+ if (node->tbmiterator != NULL)
+ {
tbm_end_iterate(node->tbmiterator);
- if (node->prefetch_iterator)
+ node->tbmiterator = NULL;
+ }
+ if (node->prefetch_iterator != NULL)
+ {
tbm_end_iterate(node->prefetch_iterator);
- if (node->tbm)
+ node->prefetch_iterator = NULL;
+ }
+ if (node->tbm != NULL)
+ {
tbm_free(node->tbm);
- if (node->shared_tbmiterator)
+ node->tbm = NULL;
+ }
+ if (node->shared_tbmiterator != NULL)
+ {
tbm_end_shared_iterate(node->shared_tbmiterator);
- if (node->shared_prefetch_iterator)
+ node->shared_tbmiterator = NULL;
+ }
+ if (node->shared_prefetch_iterator != NULL)
+ {
tbm_end_shared_iterate(node->shared_prefetch_iterator);
+ node->shared_prefetch_iterator = NULL;
+ }
if (node->vmbuffer != InvalidBuffer)
+ {
ReleaseBuffer(node->vmbuffer);
+ node->vmbuffer = InvalidBuffer;
+ }
if (node->pvmbuffer != InvalidBuffer)
+ {
ReleaseBuffer(node->pvmbuffer);
+ node->pvmbuffer = InvalidBuffer;
+ }
/*
- * close heap scan
+ * close heap scan (no-op if we didn't start it)
*/
- table_endscan(scanDesc);
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
+ table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 7cf8532bc9..488f11a3ff 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -175,22 +175,21 @@ ExecReScanBitmapIndexScan(BitmapIndexScanState *node)
void
ExecEndBitmapIndexScan(BitmapIndexScanState *node)
{
- Relation indexRelationDesc;
- IndexScanDesc indexScanDesc;
-
- /*
- * extract information from the node
- */
- indexRelationDesc = node->biss_RelationDesc;
- indexScanDesc = node->biss_ScanDesc;
+ /* close the scan (no-op if we didn't start it) */
+ if (node->biss_ScanDesc != NULL)
+ {
+ index_endscan(node->biss_ScanDesc);
+ node->biss_ScanDesc = NULL;
+ }
/*
* close the index relation (no-op if we didn't open it)
*/
- if (indexScanDesc)
- index_endscan(indexScanDesc);
- if (indexRelationDesc)
- index_close(indexRelationDesc, NoLock);
+ if (node->biss_RelationDesc != NULL)
+ {
+ index_close(node->biss_RelationDesc, NoLock);
+ node->biss_RelationDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 0bf8af9652..ace18593aa 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -210,8 +210,8 @@ ExecEndBitmapOr(BitmapOrState *node)
*/
for (i = 0; i < nplans; i++)
{
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
+ ExecEndNode(bitmapplans[i]);
+ bitmapplans[i] = NULL;
}
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 73913ebb18..3aba28285a 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -301,17 +301,20 @@ ExecEndForeignScan(ForeignScanState *node)
EState *estate = node->ss.ps.state;
/* Let the FDW shut down */
- if (plan->operation != CMD_SELECT)
+ if (node->fdwroutine != NULL)
{
- if (estate->es_epq_active == NULL)
- node->fdwroutine->EndDirectModify(node);
+ if (plan->operation != CMD_SELECT)
+ {
+ if (estate->es_epq_active == NULL)
+ node->fdwroutine->EndDirectModify(node);
+ }
+ else
+ node->fdwroutine->EndForeignScan(node);
}
- else
- node->fdwroutine->EndForeignScan(node);
/* Shut down any outer plan. */
- if (outerPlanState(node))
- ExecEndNode(outerPlanState(node));
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index bb2500a469..1a3c8abdad 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -249,6 +249,7 @@ void
ExecEndGather(GatherState *node)
{
ExecEndNode(outerPlanState(node)); /* let children clean up first */
+ outerPlanState(node) = NULL;
ExecShutdownGather(node);
}
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 7a71a58509..c6fb45fee0 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -289,6 +289,7 @@ void
ExecEndGatherMerge(GatherMergeState *node)
{
ExecEndNode(outerPlanState(node)); /* let children clean up first */
+ outerPlanState(node) = NULL;
ExecShutdownGatherMerge(node);
}
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 8c650f0e46..6dfe5a1d23 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -226,10 +226,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
void
ExecEndGroup(GroupState *node)
{
- PlanState *outerPlan;
-
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index e72f0986c2..88ba336882 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -413,13 +413,11 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
void
ExecEndHash(HashState *node)
{
- PlanState *outerPlan;
-
/*
* shut down the subplan
*/
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index aea44a9d56..6dc43b9ff2 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -861,7 +861,7 @@ ExecEndHashJoin(HashJoinState *node)
/*
* Free hash table
*/
- if (node->hj_HashTable)
+ if (node->hj_HashTable != NULL)
{
ExecHashTableDestroy(node->hj_HashTable);
node->hj_HashTable = NULL;
@@ -871,7 +871,9 @@ ExecEndHashJoin(HashJoinState *node)
* clean up subtrees
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
}
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index cd094a190c..28a0e81cb3 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1079,8 +1079,16 @@ ExecEndIncrementalSort(IncrementalSortState *node)
{
SO_printf("ExecEndIncrementalSort: shutting down sort node\n");
- ExecDropSingleTupleTableSlot(node->group_pivot);
- ExecDropSingleTupleTableSlot(node->transfer_tuple);
+ if (node->group_pivot != NULL)
+ {
+ ExecDropSingleTupleTableSlot(node->group_pivot);
+ node->group_pivot = NULL;
+ }
+ if (node->transfer_tuple != NULL)
+ {
+ ExecDropSingleTupleTableSlot(node->transfer_tuple);
+ node->transfer_tuple = NULL;
+ }
/*
* Release tuplesort resources.
@@ -1100,6 +1108,7 @@ ExecEndIncrementalSort(IncrementalSortState *node)
* Shut down the subplan.
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
SO_printf("ExecEndIncrementalSort: sort node shutdown\n");
}
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f1db35665c..1f3843abe9 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -364,15 +364,6 @@ ExecReScanIndexOnlyScan(IndexOnlyScanState *node)
void
ExecEndIndexOnlyScan(IndexOnlyScanState *node)
{
- Relation indexRelationDesc;
- IndexScanDesc indexScanDesc;
-
- /*
- * extract information from the node
- */
- indexRelationDesc = node->ioss_RelationDesc;
- indexScanDesc = node->ioss_ScanDesc;
-
/* Release VM buffer pin, if any. */
if (node->ioss_VMBuffer != InvalidBuffer)
{
@@ -380,13 +371,21 @@ ExecEndIndexOnlyScan(IndexOnlyScanState *node)
node->ioss_VMBuffer = InvalidBuffer;
}
+ /* close the scan (no-op if we didn't start it) */
+ if (node->ioss_ScanDesc != NULL)
+ {
+ index_endscan(node->ioss_ScanDesc);
+ node->ioss_ScanDesc = NULL;
+ }
+
/*
* close the index relation (no-op if we didn't open it)
*/
- if (indexScanDesc)
- index_endscan(indexScanDesc);
- if (indexRelationDesc)
- index_close(indexRelationDesc, NoLock);
+ if (node->ioss_RelationDesc != NULL)
+ {
+ index_close(node->ioss_RelationDesc, NoLock);
+ node->ioss_RelationDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 14b9c00217..32e1714f15 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -785,22 +785,21 @@ ExecIndexAdvanceArrayKeys(IndexArrayKeyInfo *arrayKeys, int numArrayKeys)
void
ExecEndIndexScan(IndexScanState *node)
{
- Relation indexRelationDesc;
- IndexScanDesc indexScanDesc;
-
- /*
- * extract information from the node
- */
- indexRelationDesc = node->iss_RelationDesc;
- indexScanDesc = node->iss_ScanDesc;
+ /* close the scan (no-op if we didn't start it) */
+ if (node->iss_ScanDesc != NULL)
+ {
+ index_endscan(node->iss_ScanDesc);
+ node->iss_ScanDesc = NULL;
+ }
/*
* close the index relation (no-op if we didn't open it)
*/
- if (indexScanDesc)
- index_endscan(indexScanDesc);
- if (indexRelationDesc)
- index_close(indexRelationDesc, NoLock);
+ if (node->iss_RelationDesc != NULL)
+ {
+ index_close(node->iss_RelationDesc, NoLock);
+ node->iss_RelationDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 5654158e3e..a97bac9f6d 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -535,6 +535,7 @@ void
ExecEndLimit(LimitState *node)
{
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index e459971d32..26fbe95c57 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -387,6 +387,7 @@ ExecEndLockRows(LockRowsState *node)
/* We may have shut down EPQ already, but no harm in another call */
EvalPlanQualEnd(&node->lr_epqstate);
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 753ea28915..03c514900b 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -243,13 +243,16 @@ ExecEndMaterial(MaterialState *node)
* Release tuplestore resources
*/
if (node->tuplestorestate != NULL)
+ {
tuplestore_end(node->tuplestorestate);
- node->tuplestorestate = NULL;
+ node->tuplestorestate = NULL;
+ }
/*
* shut down the subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 94bf479287..ee4749c852 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -1043,6 +1043,7 @@ ExecEndMemoize(MemoizeState *node)
{
#ifdef USE_ASSERT_CHECKING
/* Validate the memory accounting code is correct in assert builds. */
+ if (node->hashtable != NULL)
{
int count;
uint64 mem = 0;
@@ -1089,12 +1090,17 @@ ExecEndMemoize(MemoizeState *node)
}
/* Remove the cache context */
- MemoryContextDelete(node->tableContext);
+ if (node->tableContext != NULL)
+ {
+ MemoryContextDelete(node->tableContext);
+ node->tableContext = NULL;
+ }
/*
* shut down the subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 21b5726e6e..0a42a04b19 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -333,7 +333,10 @@ ExecEndMergeAppend(MergeAppendState *node)
* shut down each of the subscans
*/
for (i = 0; i < nplans; i++)
+ {
ExecEndNode(mergeplans[i]);
+ mergeplans[i] = NULL;
+ }
}
void
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index ed3ebe92e5..c84f53e0bd 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1647,7 +1647,9 @@ ExecEndMergeJoin(MergeJoinState *node)
* shut down the subplans
*/
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
MJ1_printf("ExecEndMergeJoin: %s\n",
"node processing ended");
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index d21a178ad5..ea043c57c1 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4430,7 +4430,9 @@ ExecEndModifyTable(ModifyTableState *node)
for (j = 0; j < resultRelInfo->ri_NumSlotsInitialized; j++)
{
ExecDropSingleTupleTableSlot(resultRelInfo->ri_Slots[j]);
+ resultRelInfo->ri_Slots[j] = NULL;
ExecDropSingleTupleTableSlot(resultRelInfo->ri_PlanSlots[j]);
+ resultRelInfo->ri_PlanSlots[j] = NULL;
}
}
@@ -4438,12 +4440,16 @@ ExecEndModifyTable(ModifyTableState *node)
* Close all the partitioned tables, leaf partitions, and their indices
* and release the slot used for tuple routing, if set.
*/
- if (node->mt_partition_tuple_routing)
+ if (node->mt_partition_tuple_routing != NULL)
{
ExecCleanupTupleRouting(node, node->mt_partition_tuple_routing);
+ node->mt_partition_tuple_routing = NULL;
- if (node->mt_root_tuple_slot)
+ if (node->mt_root_tuple_slot != NULL)
+ {
ExecDropSingleTupleTableSlot(node->mt_root_tuple_slot);
+ node->mt_root_tuple_slot = NULL;
+ }
}
/*
@@ -4455,6 +4461,7 @@ ExecEndModifyTable(ModifyTableState *node)
* shut down subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index ebd1406843..1211d871ea 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -368,7 +368,9 @@ ExecEndNestLoop(NestLoopState *node)
* close down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
NL1_printf("ExecEndNestLoop: %s\n",
"node processing ended");
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index b4bbdc89b1..e9b96416d3 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -324,6 +324,7 @@ ExecEndProjectSet(ProjectSetState *node)
* shut down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index e781003934..f6d60bcd6c 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -272,20 +272,36 @@ void
ExecEndRecursiveUnion(RecursiveUnionState *node)
{
/* Release tuplestores */
- tuplestore_end(node->working_table);
- tuplestore_end(node->intermediate_table);
+ if (node->working_table != NULL)
+ {
+ tuplestore_end(node->working_table);
+ node->working_table = NULL;
+ }
+ if (node->intermediate_table != NULL)
+ {
+ tuplestore_end(node->intermediate_table);
+ node->intermediate_table = NULL;
+ }
/* free subsidiary stuff including hashtable */
- if (node->tempContext)
+ if (node->tempContext != NULL)
+ {
MemoryContextDelete(node->tempContext);
- if (node->tableContext)
+ node->tempContext = NULL;
+ }
+ if (node->tableContext != NULL)
+ {
MemoryContextDelete(node->tableContext);
+ node->tableContext = NULL;
+ }
/*
* close down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index e9f5732f33..f15902e840 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -244,6 +244,7 @@ ExecEndResult(ResultState *node)
* shut down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 41c1ea37ad..a6813559e6 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -185,14 +185,17 @@ ExecEndSampleScan(SampleScanState *node)
/*
* Tell sampling function that we finished the scan.
*/
- if (node->tsmroutine->EndSampleScan)
+ if (node->tsmroutine != NULL && node->tsmroutine->EndSampleScan)
node->tsmroutine->EndSampleScan(node);
/*
- * close heap scan
+ * close heap scan (no-op if we didn't start it)
*/
if (node->ss.ss_currentScanDesc)
+ {
table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 49a5933aff..911266da07 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -183,18 +183,14 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
void
ExecEndSeqScan(SeqScanState *node)
{
- TableScanDesc scanDesc;
-
- /*
- * get information from node
- */
- scanDesc = node->ss.ss_currentScanDesc;
-
/*
- * close heap scan
+ * close heap scan (no-op if we didn't start it)
*/
- if (scanDesc != NULL)
- table_endscan(scanDesc);
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
+ table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 98c1b84d43..5c2861d243 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -583,10 +583,14 @@ void
ExecEndSetOp(SetOpState *node)
{
/* free subsidiary stuff including hashtable */
- if (node->tableContext)
+ if (node->tableContext != NULL)
+ {
MemoryContextDelete(node->tableContext);
+ node->tableContext = NULL;
+ }
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index eea7f2ae15..c8a35b64a8 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -307,13 +307,16 @@ ExecEndSort(SortState *node)
* Release tuplesort resources
*/
if (node->tuplesortstate != NULL)
+ {
tuplesort_end((Tuplesortstate *) node->tuplesortstate);
- node->tuplesortstate = NULL;
+ node->tuplesortstate = NULL;
+ }
/*
* shut down the subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
SO1_printf("ExecEndSort: %s\n",
"sort node shutdown");
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 1ee6295660..91d7ae82ce 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -171,6 +171,7 @@ ExecEndSubqueryScan(SubqueryScanState *node)
* close down subquery
*/
ExecEndNode(node->subplan);
+ node->subplan = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTableFuncscan.c b/src/backend/executor/nodeTableFuncscan.c
index a60dcd4943..80ed4b26a8 100644
--- a/src/backend/executor/nodeTableFuncscan.c
+++ b/src/backend/executor/nodeTableFuncscan.c
@@ -217,8 +217,10 @@ ExecEndTableFuncScan(TableFuncScanState *node)
* Release tuplestore resources
*/
if (node->tupstore != NULL)
+ {
tuplestore_end(node->tupstore);
- node->tupstore = NULL;
+ node->tupstore = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index da622d3f5f..9147e4afa8 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -327,10 +327,14 @@ ExecReScanTidRangeScan(TidRangeScanState *node)
void
ExecEndTidRangeScan(TidRangeScanState *node)
{
- TableScanDesc scan = node->ss.ss_currentScanDesc;
-
- if (scan != NULL)
- table_endscan(scan);
+ /*
+ * close heap scan (no-op if we didn't start it)
+ */
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
+ table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 15055077d0..74ec6afdcc 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -470,8 +470,14 @@ ExecReScanTidScan(TidScanState *node)
void
ExecEndTidScan(TidScanState *node)
{
- if (node->ss.ss_currentScanDesc)
+ /*
+ * close heap scan (no-op if we didn't start it)
+ */
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 01f951197c..13c556326a 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -169,6 +169,7 @@ void
ExecEndUnique(UniqueState *node)
{
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 77724a6daa..c4c6f009ba 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -1351,11 +1351,14 @@ release_partition(WindowAggState *winstate)
* any aggregate temp data). We don't rely on retail pfree because some
* aggregates might have allocated data we don't have direct pointers to.
*/
- MemoryContextResetAndDeleteChildren(winstate->partcontext);
- MemoryContextResetAndDeleteChildren(winstate->aggcontext);
+ if (winstate->partcontext != NULL)
+ MemoryContextResetAndDeleteChildren(winstate->partcontext);
+ if (winstate->aggcontext != NULL)
+ MemoryContextResetAndDeleteChildren(winstate->aggcontext);
for (i = 0; i < winstate->numaggs; i++)
{
- if (winstate->peragg[i].aggcontext != winstate->aggcontext)
+ if (winstate->peragg[i].aggcontext != NULL &&
+ winstate->peragg[i].aggcontext != winstate->aggcontext)
MemoryContextResetAndDeleteChildren(winstate->peragg[i].aggcontext);
}
@@ -2681,24 +2684,40 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
void
ExecEndWindowAgg(WindowAggState *node)
{
- PlanState *outerPlan;
int i;
release_partition(node);
for (i = 0; i < node->numaggs; i++)
{
- if (node->peragg[i].aggcontext != node->aggcontext)
+ if (node->peragg[i].aggcontext != NULL &&
+ node->peragg[i].aggcontext != node->aggcontext)
MemoryContextDelete(node->peragg[i].aggcontext);
}
- MemoryContextDelete(node->partcontext);
- MemoryContextDelete(node->aggcontext);
+ if (node->partcontext != NULL)
+ {
+ MemoryContextDelete(node->partcontext);
+ node->partcontext = NULL;
+ }
+ if (node->aggcontext != NULL)
+ {
+ MemoryContextDelete(node->aggcontext);
+ node->aggcontext = NULL;
+ }
- pfree(node->perfunc);
- pfree(node->peragg);
+ if (node->perfunc != NULL)
+ {
+ pfree(node->perfunc);
+ node->perfunc = NULL;
+ }
+ if (node->peragg != NULL)
+ {
+ pfree(node->peragg);
+ node->peragg = NULL;
+ }
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
/* -----------------
--
2.35.3
v48-0002-Prepare-executor-to-support-detecting-CachedPlan.patchapplication/octet-stream; name=v48-0002-Prepare-executor-to-support-detecting-CachedPlan.patchDownload
From a32d7916a4788d958a3dcd7e59a494916d78ddce Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 22 Sep 2023 18:12:04 +0900
Subject: [PATCH v48 2/8] Prepare executor to support detecting CachedPlan
invalidation
This adds checks at various points during the executor's
initialization of the plan tree to determine whether the originating
CachedPlan has become invalid as a result of taking locks on the
relations referenced in the plan. This includes addding the check
after every call to ExecOpenScanRelation() and to ExecInitNode(),
including the recursive ones to initialize child nodes.
If a given ExecInit*() function detects that the plan has become
invalid, it should return immediately even if the PlanState node
it's building may only be partially valid. That is crucial for
two reasons depending on where the check is:
* The checks following ExecOpenScanRelation() may find the plan
having become invalid because the requested relation was dropped
or had its schema changed concurrently in a manner that risks
unsafe operations in the code that follows. For example, it
might try to dereference a NULL pointer when the check failed
because the relation was dropped.
* For the checks following ExecInitNode(), the returned child
PlanState node might be only partially invalid. The code that
follows may misbehave if it depends on inspecting the child
PlanState. Note that this commit adds the check following all
calls of ExecInitNode() that exist in the code base, even at
sites where there is no code that might misbehave today, because
it might misbehave in the future. It seems like a good idea to
put the guards in place today rather than in the future when the
need arises.
To pass the CachedPlan that the executor will use for these checks,
this adds a new field to QueryDesc and a new parameter to
CreateQueryDesc(). No caller of CreateQueryDesc() is made to pass
an actual CachedPlan though, so there is no functional change.
Reviewed-by: Robert Haas
---
contrib/postgres_fdw/postgres_fdw.c | 10 +++++-
src/backend/commands/copyto.c | 3 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 2 +-
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/executor/execMain.c | 39 ++++++++++++++++++----
src/backend/executor/execParallel.c | 9 ++++-
src/backend/executor/execProcnode.c | 4 +++
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeAgg.c | 2 ++
src/backend/executor/nodeAppend.c | 10 +++---
src/backend/executor/nodeBitmapAnd.c | 2 ++
src/backend/executor/nodeBitmapHeapscan.c | 4 +++
src/backend/executor/nodeBitmapOr.c | 2 ++
src/backend/executor/nodeCustom.c | 2 ++
src/backend/executor/nodeForeignscan.c | 4 +++
src/backend/executor/nodeGather.c | 2 ++
src/backend/executor/nodeGatherMerge.c | 2 ++
src/backend/executor/nodeGroup.c | 2 ++
src/backend/executor/nodeHash.c | 2 ++
src/backend/executor/nodeHashjoin.c | 4 +++
src/backend/executor/nodeIncrementalSort.c | 2 ++
src/backend/executor/nodeIndexonlyscan.c | 2 ++
src/backend/executor/nodeIndexscan.c | 2 ++
src/backend/executor/nodeLimit.c | 2 ++
src/backend/executor/nodeLockRows.c | 2 ++
src/backend/executor/nodeMaterial.c | 2 ++
src/backend/executor/nodeMemoize.c | 2 ++
src/backend/executor/nodeMergeAppend.c | 4 ++-
src/backend/executor/nodeMergejoin.c | 4 +++
src/backend/executor/nodeModifyTable.c | 13 ++++++++
src/backend/executor/nodeNestloop.c | 4 +++
src/backend/executor/nodeProjectSet.c | 2 ++
src/backend/executor/nodeRecursiveunion.c | 4 +++
src/backend/executor/nodeResult.c | 2 ++
src/backend/executor/nodeSamplescan.c | 2 ++
src/backend/executor/nodeSeqscan.c | 2 ++
src/backend/executor/nodeSetOp.c | 2 ++
src/backend/executor/nodeSort.c | 2 ++
src/backend/executor/nodeSubqueryscan.c | 2 ++
src/backend/executor/nodeTidrangescan.c | 2 ++
src/backend/executor/nodeTidscan.c | 2 ++
src/backend/executor/nodeUnique.c | 2 ++
src/backend/executor/nodeWindowAgg.c | 2 ++
src/backend/executor/spi.c | 1 +
src/backend/tcop/pquery.c | 5 ++-
src/include/executor/execdesc.h | 4 +++
src/include/executor/executor.h | 10 ++++++
src/include/nodes/execnodes.h | 2 ++
src/include/utils/plancache.h | 14 ++++++++
51 files changed, 194 insertions(+), 18 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 1393716587..0af60463c2 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2126,7 +2126,11 @@ postgresEndForeignModify(EState *estate,
{
PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
- /* If fmstate is NULL, we are in EXPLAIN; nothing to do */
+ /*
+ * fmstate could be NULL under two conditions: during an EXPLAIN
+ * operation, or if BeginForeignModify() hasn't been invoked.
+ * In either case, no action is required.
+ */
if (fmstate == NULL)
return;
@@ -2660,7 +2664,11 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
/* Get info about foreign table. */
rtindex = node->resultRelInfo->ri_RangeTableIndex;
if (fsplan->scan.scanrelid == 0)
+ {
dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return;
+ }
else
dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index eaa3172793..0e3547c35b 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,8 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL,
+ pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index e91920ca14..18b07c0200 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 13217807ee..281c47b2ee 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -572,7 +572,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 535072d181..b287a2e84c 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -797,6 +797,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ac2e74fa3f..22b8b820c3 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index f7f18d3054..de7bf7ca67 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -79,7 +79,7 @@ ExecutorEnd_hook_type ExecutorEnd_hook = NULL;
ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
/* decls for local routines only used within this module */
-static void InitPlan(QueryDesc *queryDesc, int eflags);
+static bool InitPlan(QueryDesc *queryDesc, int eflags);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
static void ExecEndPlan(PlanState *planstate, EState *estate);
@@ -263,7 +263,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Initialize the plan state tree
*/
- InitPlan(queryDesc, eflags);
+ (void) InitPlan(queryDesc, eflags);
MemoryContextSwitchTo(oldcontext);
}
@@ -829,9 +829,13 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ * Returns true if the plan tree is successfully initialized for execution,
+ * false otherwise. The latter case may occur if the CachedPlan that provides
+ * the plan tree (queryDesc->cplan) got invalidated during the initialization.
* ----------------------------------------------------------------
*/
-static void
+static bool
InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
@@ -839,11 +843,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
- PlanState *planstate;
- TupleDesc tupType;
+ PlanState *planstate = NULL;
+ TupleDesc tupType = NULL;
ListCell *l;
int i;
+ Assert(queryDesc->planstate == NULL);
+ Assert(queryDesc->tupDesc == NULL);
+
/*
* Do permissions checks
*/
@@ -855,6 +862,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = queryDesc->cplan;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
@@ -886,6 +894,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (unlikely(relation == NULL))
+ return false;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -956,6 +966,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return false;
i++;
}
@@ -966,6 +978,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return false;
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -1009,6 +1023,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
queryDesc->tupDesc = tupType;
queryDesc->planstate = planstate;
+
+ return true;
}
/*
@@ -2858,7 +2874,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2947,6 +2964,13 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
subplanstate = ExecInitNode(subplan, rcestate, 0);
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
+
+ /*
+ * All the necessary locks must already have been taken when
+ * initializing the parent's copy of subplanstate, so the CachedPlan,
+ * if any, should not have become invalid during ExecInitNode().
+ */
+ Assert(ExecPlanStillValid(rcestate));
}
/*
@@ -2988,6 +3012,9 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
*/
epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
+ /* See the comment above. */
+ Assert(ExecPlanStillValid(rcestate));
+
MemoryContextSwitchTo(oldcontext);
}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index cc2b8ccab7..457ee46faf 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1248,8 +1248,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Set up a QueryDesc for the query. While the leader might've sourced
+ * the plan tree from a CachedPlan, we don't have one here. This isn't
+ * an issue since the leader ensured the required locks, making our
+ * plan tree valid. Even as we get our own lock copies in
+ * ExecGetRangeTableRelation(), they're all already held by the leader.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index b4b5c562c0..febaa194c4 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -136,6 +136,10 @@ static bool ExecShutdownNode_walker(PlanState *node, void *context);
* 'eflags' is a bitwise OR of flag bits described in executor.h
*
* Returns a PlanState node corresponding to the given Plan node.
+ *
+ * Callers should check upon returning that ExecPlanStillValid(estate)
+ * returns true before continuing further with its processing, because the
+ * returned PlanState might be only partially valid otherwise.
* ------------------------------------------------------------------------
*/
PlanState *
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f55424eb5a..7e452ed743 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -838,6 +838,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL, /* fmgr_sql() doesn't use CachedPlans */
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index af22b1676f..597d68139e 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3304,6 +3304,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return aggstate;
/*
* initialize source tuple type.
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index a2af221e05..53ca9dc85d 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -185,8 +185,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->ps.resultopsset = true;
appendstate->ps.resultopsfixed = false;
- appendplanstates = (PlanState **) palloc(nplans *
- sizeof(PlanState *));
+ appendplanstates = (PlanState **) palloc0(nplans *
+ sizeof(PlanState *));
+ appendstate->appendplans = appendplanstates;
+ appendstate->as_nplans = nplans;
/*
* call ExecInitNode on each of the valid plans to be executed and save
@@ -221,11 +223,11 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return appendstate;
}
appendstate->as_first_partial_plan = firstvalid;
- appendstate->appendplans = appendplanstates;
- appendstate->as_nplans = nplans;
/* Initialize async state */
appendstate->as_asyncplans = asyncplans;
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4abb0609a0..7556be713c 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -89,6 +89,8 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return bitmapandstate;
i++;
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index d3f58c22f9..f1f8e16b17 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -770,11 +770,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index ace18593aa..7d2bf45d9c 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -90,6 +90,8 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return bitmaporstate;
i++;
}
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index 28b5bb9353..a0befbd0c6 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -61,6 +61,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return css;
css->ss.ss_currentRelation = scan_rel;
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 3aba28285a..336acff719 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -173,6 +173,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -264,6 +266,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/*
* Tell the FDW to initialize the scan.
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 1a3c8abdad..c524022c04 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -89,6 +89,8 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return gatherstate;
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index c6fb45fee0..676faabef5 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -108,6 +108,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return gm_state;
/*
* Leader may access ExecProcNode result directly (if
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 6dfe5a1d23..efa1c44ab4 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -185,6 +185,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return grpstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 88ba336882..1a4bd5504e 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -386,6 +386,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return hashstate;
/*
* initialize our result slot and type. No need to build projection
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 6dc43b9ff2..c0919074b0 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -752,8 +752,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return hjstate;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return hjstate;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 28a0e81cb3..621ffafe02 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1041,6 +1041,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return incrsortstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 1f3843abe9..c555c14888 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -495,6 +495,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 32e1714f15..a3bd1f7fb0 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -908,6 +908,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index a97bac9f6d..ab133f1580 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -476,6 +476,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return limitstate;
/*
* initialize child expressions
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 26fbe95c57..e1ef768571 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -322,6 +322,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return lrstate;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 03c514900b..c38eef099d 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return matstate;
/*
* Initialize result type and slot. No need to initialize projection info
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index ee4749c852..a6bf66029c 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -938,6 +938,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mstate;
/*
* Initialize return slot and type. No need to initialize projection info
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 0a42a04b19..52c3edf278 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -120,7 +120,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ms_prune_state = NULL;
}
- mergeplanstates = (PlanState **) palloc(nplans * sizeof(PlanState *));
+ mergeplanstates = (PlanState **) palloc0(nplans * sizeof(PlanState *));
mergestate->mergeplans = mergeplanstates;
mergestate->ms_nplans = nplans;
@@ -151,6 +151,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mergestate;
}
mergestate->ps.ps_ProjInfo = NULL;
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index c84f53e0bd..887f519b10 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1490,11 +1490,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mergestate;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mergestate;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index ea043c57c1..95d909c1d0 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3985,6 +3985,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ /*
+ * ExecInitResultRelation() may have returned without initializing
+ * rootResultRelInfo if the plan got invalidated, so check.
+ */
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mtstate;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
node->epqParam, node->resultRelations);
@@ -4013,6 +4020,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ /* See the comment above. */
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mtstate;
+
/*
* For child result relations, store the root result relation
* pointer. We do so for the convenience of places that want to
@@ -4039,6 +4050,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mtstate;
/*
* Do additional per-result-relation initialization.
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index 1211d871ea..8d67d17e10 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -295,11 +295,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return nlstate;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return nlstate;
/*
* Initialize result slot, type and projection.
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index e9b96416d3..706cc23a21 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -247,6 +247,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return state;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index f6d60bcd6c..27dc318acb 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return rustate;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return rustate;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index f15902e840..6820d3bfd5 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -208,6 +208,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return resstate;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index a6813559e6..02051fea51 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -125,6 +125,8 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 911266da07..9e3ef94388 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,8 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 5c2861d243..475af4df24 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return setopstate;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index c8a35b64a8..9de717aa7c 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return sortstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 91d7ae82ce..d9c10d1f6f 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return subquerystate;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 9147e4afa8..a7482aee50 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -378,6 +378,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return tidrangestate;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 74ec6afdcc..657411ef19 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -523,6 +523,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return tidstate;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 13c556326a..ee30688417 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -136,6 +136,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return uniquestate;
/*
* Initialize result slot and type. Unique nodes do no projections, so
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index c4c6f009ba..1246d7919a 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2461,6 +2461,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return winstate;
/*
* initialize source tuple type (which is also the tuple type that we'll
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 33975687b3..f2cca807ef 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2668,6 +2668,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ NULL,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5565f200c3..4ef349df8b 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan, if plan is from one */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -145,7 +147,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, NULL, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -493,6 +495,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ NULL,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index af2bf36dfb..4b7368a0dc 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -32,9 +32,12 @@
*/
typedef struct QueryDesc
{
+ NodeTag type;
+
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ struct CachedPlan *cplan; /* CachedPlan, if plannedstmt is from one */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +60,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ struct CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index aeebe0e0ff..72cbf120c5 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -256,6 +257,15 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the CachedPlan in es_cachedplan still valid?
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanStillValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cb714f4a19..846eb32a1d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -623,6 +623,8 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct CachedPlan *es_cachedplan; /* CachedPlan if plannedstmt is from
+ * one, or NULL if not */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 916e59d9fe..0a9e041d51 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -221,6 +221,20 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Invoked by the executor for each relation lock acquired during the
+ * initialization of the plan tree within the CachedPlan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
--
2.35.3
v48-0007-Delay-locking-of-child-tables-in-cached-plans-un.patchapplication/octet-stream; name=v48-0007-Delay-locking-of-child-tables-in-cached-plans-un.patchDownload
From 8c0ff924890d173ddd9c6c087a725aeaccd210a0 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:54:15 +0900
Subject: [PATCH v48 7/8] Delay locking of child tables in cached plans until
ExecutorStart()
Currently, GetCachedPlan() takes a lock on all relations contained in
a cached plan before returning it as a valid plan to its callers for
execution. One disadvantage is that if the plan contains partitions
that are prunable with conditions involving EXTERN parameters and
other stable expressions (known as "initial pruning"), many of them
would be locked unnecessarily, because only those that survive
initial pruning need to have been locked. Locking all partitions this
way causes significant delay when there are many partitions. Note
that initial pruning occurs during executor's initialization of the
plan, that is, ExecInitNode().
Previous commits have made all the necessary adjustment to make the
executor lock child tables, to detect invalidation of the CachedPlan
resulting from that, and to retry the execution with a new CachePlan.
So, this commit simply removes the code in plancache.c that does the
"for execution" locking, aka AcquireExecutorLocks().
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/spi.c | 2 +-
src/backend/tcop/pquery.c | 6 +-
src/backend/utils/cache/plancache.c | 154 +++++++----------
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 67 +++++++-
.../expected/cached-plan-replan.out | 158 ++++++++++++++++++
.../specs/cached-plan-replan.spec | 61 +++++++
7 files changed, 343 insertions(+), 108 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-replan.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-replan.spec
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 814ff1390f..9c4ed74240 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2680,7 +2680,7 @@ replan:
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
- NULL,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index fcf9925ed4..8d0772ae29 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -412,7 +412,7 @@ PortalStart(Portal portal, ParamListInfo params,
* set the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
- NULL,
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -443,6 +443,7 @@ PortalStart(Portal portal, ParamListInfo params,
*/
if (!ExecutorStart(queryDesc, myeflags))
{
+ Assert(queryDesc->cplan);
ExecutorEnd(queryDesc);
FreeQueryDesc(queryDesc);
PopActiveSnapshot();
@@ -542,7 +543,7 @@ PortalStart(Portal portal, ParamListInfo params,
* PortalRunMulti() before calling ExecutorRun().
*/
queryDesc = CreateQueryDesc(plan,
- NULL,
+ portal->cplan,
portal->sourceText,
!is_utility ?
GetActiveSnapshot() :
@@ -566,6 +567,7 @@ PortalStart(Portal portal, ParamListInfo params,
if (!ExecutorStart(queryDesc, myeflags))
{
PopActiveSnapshot();
+ Assert(queryDesc->cplan);
ExecutorEnd(queryDesc);
FreeQueryDesc(queryDesc);
plan_valid = false;
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 7d4168f82f..35d903cb98 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -104,13 +104,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -792,8 +792,13 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * If the plan includes child relations introduced by the planner, they
+ * wouldn't be locked yet. This is because AcquirePlannerLocks() only locks
+ * relations present in the original query's range table (before planner
+ * entry). Hence, the plan might become stale if child relations are modified
+ * concurrently. During the plan initialization, the executor must ensure the
+ * plan (CachedPlan) remains valid after locking each child table. If found
+ * invalid, the caller should be prompted to recreate the plan.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -807,60 +812,56 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
-
- return false;
+ return cplan->is_valid;
}
/*
@@ -1130,8 +1131,16 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * Typically, the plan returned by this function is valid. However, a caveat
+ * arises with inheritance/partition child tables. These aren't locked by
+ * this function, as we only lock tables directly mentioned in the original
+ * query here. The task of locking these child tables falls to the executor
+ * during plan tree setup. If acquiring these locks invalidates the plan, the
+ * executor should inform the caller to regenerate the plan by invoking this
+ * function again. The reason for this deferred child table locking mechanism
+ * is efficiency: not all might need to be locked. Some could be pruned during
+ * executor initialization, especially if their corresponding plan nodes
+ * facilitate partition pruning.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1166,7 +1175,10 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
{
if (CheckCachedPlan(plansource))
{
- /* We want a generic plan, and we already have a valid one */
+ /*
+ * We want a generic plan, and we already have a valid one, though
+ * see the header comment.
+ */
plan = plansource->gplan;
Assert(plan->magic == CACHEDPLAN_MAGIC);
}
@@ -1364,8 +1376,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if the executor would need to take additional locks, that is, in
+ * addition to those taken by AcquirePlannerLocks() on a given query.
*/
foreach(lc, plan->stmt_list)
{
@@ -1741,58 +1753,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..2fca84d027 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-replan
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7cd76eb34b..ce189156ad 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2023, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,45 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static bool
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ bool plan_valid;
+
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ plan_valid = prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ plan_valid ? "valid" : "not valid");
+
+ return plan_valid;
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +127,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-replan.out b/src/test/modules/delay_execution/expected/cached-plan-replan.out
new file mode 100644
index 0000000000..122d81f2ee
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-replan.out
@@ -0,0 +1,158 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+----------------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo11 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo11_a
+ Index Cond: (a = $1)
+(7 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-----------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo11 foo_1
+ Filter: (a = $1)
+(5 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+----------------------------------
+Bitmap Heap Scan on foo11 foo
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on foo11_a
+ Index Cond: (a = 1)
+(4 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------
+Seq Scan on foo11 foo
+ Filter: (a = 1)
+(2 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+------------------------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Index Only Scan using foo11_a on foo11 t1
+ -> Materialize
+ -> Index Scan using foo11_a on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(18 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Sort
+ Sort Key: t1.a
+ -> Seq Scan on foo11 t1
+ -> Sort
+ Sort Key: t2.a
+ -> Seq Scan on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(21 rows)
+
diff --git a/src/test/modules/delay_execution/specs/cached-plan-replan.spec b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
new file mode 100644
index 0000000000..2d0607b176
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
@@ -0,0 +1,61 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo1 PARTITION OF foo FOR VALUES IN (1) PARTITION BY LIST (a);
+ CREATE TABLE foo11 PARTITION OF foo1 FOR VALUES IN (1);
+ CREATE INDEX foo11_a ON foo11 (a);
+ CREATE TABLE foo2 PARTITION OF foo FOR VALUES IN (2);
+ CREATE VIEW foov AS SELECT * FROM foo;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP TABLE foo;
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# no Append case (only one partition selected by the planner)
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Append with partition-wise join aggregate and join plans as child subplans
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo11_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
--
2.35.3
v48-0006-Add-field-to-store-parent-relids-to-Append-Merge.patchapplication/octet-stream; name=v48-0006-Add-field-to-store-parent-relids-to-Append-Merge.patchDownload
From bf5f136b6c6eaf278f2aefde525074afbef3958f Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:54:02 +0900
Subject: [PATCH v48 6/8] Add field to store parent relids to
Append/MergeAppend
There's no way currently in the executor to tell if the child
subplans of Append/MergeAppend are scanning partitions, and if
they indeed do, what the RT indexes of their parent/ancestor tables
are. Executor doesn't need to see their RT indexes except for
run-time pruning, in which case they can can be found in the
PartitionPruneInfo. A future commit will create a need for them to
be available at all times for the purpose of locking those
parent/ancestor tables when executing a cached plan, so add a
field called allpartrelids to Append/MergeAppend to store those
RT indexes. This also adds a function called
ExecLockAppendNonLeafTables() to lock those tables.
The code to look up partitioned parent relids for a given list of
partition scan subpaths of an Append/MergeAppend is already present
in make_partition_pruneinfo() but it's local to partprune.c. This
commit refactors that code into its own function called
add_append_subpath_partrelids() defined in appendinfo.c and
generalizes it to consider child join and aggregate paths. To
facilitate looking up of parent rels of child grouping rels in
add_append_subpath_partrelids(), parent links are now also set in
the RelOptInfos of child grouping rels too, like they are in
those of child base and join rels.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 2 +-
src/backend/executor/execUtils.c | 33 ++++++
src/backend/executor/nodeAppend.c | 14 +++
src/backend/executor/nodeMergeAppend.c | 14 +++
src/backend/optimizer/plan/createplan.c | 41 ++++++--
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/plan/setrefs.c | 4 +
src/backend/optimizer/util/appendinfo.c | 134 ++++++++++++++++++++++++
src/backend/partitioning/partprune.c | 124 +++-------------------
src/include/executor/executor.h | 1 +
src/include/nodes/plannodes.h | 14 +++
src/include/optimizer/appendinfo.h | 3 +
src/include/partitioning/partprune.h | 3 +-
13 files changed, 266 insertions(+), 124 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ffc62e379a..2804ec70f1 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1475,7 +1475,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked by the planner or ExecLockAppendNonLeafPartitions().
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 117773706a..2b7a08c9ba 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -827,6 +827,39 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
return rel;
}
+/*
+ * ExecLockAppendNonLeafPartitions
+ * Lock non-leaf partitions whose child partitions are scanned by a given
+ * Append/MergeAppend node
+ */
+void
+ExecLockAppendNonLeafPartitions(EState *estate, List *allpartrelids)
+{
+ ListCell *l;
+
+ /* This should get called only when executing cached plans. */
+ Assert(estate->es_cachedplan != NULL);
+ foreach(l, allpartrelids)
+ {
+ Bitmapset *partrelids = lfirst_node(Bitmapset, l);
+ int i = -1;
+
+ while ((i = bms_next_member(partrelids, i)) > 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(i, estate);
+
+ /*
+ * Don't lock the root parent mentioned in the query, because it
+ * should already have been locked before entering the executor.
+ */
+ if (!rte->inFromCl)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ Assert(CheckRelLockedByMe(rte->relid, rte->rellockmode, true));
+ }
+ }
+}
+
/*
* ExecInitResultRelation
* Open relation given by the passed-in RT index and fill its
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 53ca9dc85d..4759511f87 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -133,6 +133,20 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_syncdone = false;
appendstate->as_begun = false;
+ /*
+ * Lock non-leaf partitions whose leaf children are present in
+ * node->appendplans. Only need to do so if executing a cached
+ * plan, because child tables present in cached plans are not
+ * locked before execution.
+ *
+ * XXX - some of the non-leaf partitions may also be mentioned in
+ * part_prune_info, which if they are would get locked again in
+ * ExecInitPartitionPruning() because it calls
+ * ExecGetRangeTableRelation() which locks child tables.
+ */
+ if (estate->es_cachedplan)
+ ExecLockAppendNonLeafPartitions(estate, node->allpartrelids);
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 52c3edf278..158210aac1 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -81,6 +81,20 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.state = estate;
mergestate->ps.ExecProcNode = ExecMergeAppend;
+ /*
+ * Lock non-leaf partitions whose leaf children are present in
+ * node->mergeplans. Only need to do so if executing a cached
+ * plan, because child tables present in cached plans are not
+ * locked before execution.
+ *
+ * XXX - some of the non-leaf partitions may also be mentioned in
+ * part_prune_info, which if they are would get locked again in
+ * ExecInitPartitionPruning() because it calls
+ * ExecGetRangeTableRelation() which locks child tables.
+ */
+ if (estate->es_cachedplan)
+ ExecLockAppendNonLeafPartitions(estate, node->allpartrelids);
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 34ca6d4ac2..d1f4f606bf 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -25,6 +25,7 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
@@ -1229,6 +1230,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
Oid *nodeCollations = NULL;
bool *nodeNullsFirst = NULL;
bool consider_async = false;
+ List *allpartrelids = NIL;
/*
* The subpaths list could be empty, if every child was proven empty by
@@ -1370,15 +1372,23 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
++nasyncplans;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ plan->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1399,7 +1409,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
partpruneinfo =
make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
plan->appendplans = subplans;
@@ -1445,6 +1456,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ List *allpartrelids = NIL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1534,15 +1546,23 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplan = (Plan *) sort;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ node->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1554,7 +1574,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
partpruneinfo = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
node->mergeplans = subplans;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 44efb1f4eb..f97bc09113 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7855,8 +7855,11 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
agg_costs, gd, &child_extra,
&child_partially_grouped_rel);
+ /* Mark as child of grouped_rel. */
+ child_grouped_rel->parent = grouped_rel;
if (child_partially_grouped_rel)
{
+ child_partially_grouped_rel->parent = grouped_rel;
partially_grouped_live_children =
lappend(partially_grouped_live_children,
child_partially_grouped_rel);
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 7962200885..8e256dd779 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1766,6 +1766,8 @@ set_append_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) aplan, rtoffset);
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
+ foreach(l, aplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (aplan->part_prune_info)
{
@@ -1842,6 +1844,8 @@ set_mergeappend_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) mplan, rtoffset);
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
+ foreach(l, mplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (mplan->part_prune_info)
{
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index f456b3b0a4..5bd8e82b9b 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -41,6 +41,7 @@ static void make_inh_translation_list(Relation oldrelation,
AppendRelInfo *appinfo);
static Node *adjust_appendrel_attrs_mutator(Node *node,
adjust_appendrel_attrs_context *context);
+static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
/*
@@ -1035,3 +1036,136 @@ distribute_row_identity_vars(PlannerInfo *root)
}
}
}
+
+/*
+ * add_append_subpath_partrelids
+ * Look up a child subpath's rel's partitioned parent relids up to
+ * parentrel and add the bitmapset containing those into
+ * 'allpartrelids'
+ */
+List *
+add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids)
+{
+ RelOptInfo *prel = subpath->parent;
+ Relids partrelids = NULL;
+
+ /* Nothing to do if there's no parent to begin with. */
+ if (!IS_OTHER_REL(prel))
+ return allpartrelids;
+
+ /*
+ * Traverse up to the pathrel's topmost partitioned parent, collecting
+ * parent relids as we go; but stop if we reach parentrel. (Normally, a
+ * pathrel's topmost partitioned parent is either parentrel or a UNION ALL
+ * appendrel child of parentrel. But when handling partitionwise joins of
+ * multi-level partitioning trees, we can see an append path whose
+ * parentrel is an intermediate partitioned table.)
+ */
+ do
+ {
+ Relids parent_relids = NULL;
+
+ /*
+ * For simple child rels, we can simply set the parent_relids to
+ * prel->parent->relids. But for partitionwise join and aggregate
+ * child rels, while we can use prel->parent to move up the tree,
+ * parent_relids must be found the hard way through AppendInfoInfos,
+ * because 1) a joinrel's relids may point to RTE_JOIN entries,
+ * 2) topmost parent grouping rel's relids field is NULL.
+ */
+ if (IS_SIMPLE_REL(prel))
+ {
+ prel = prel->parent;
+ /* Stop once we reach the root partitioned rel. */
+ if (!IS_PARTITIONED_REL(prel))
+ break;
+ parent_relids = bms_add_members(parent_relids, prel->relids);
+ }
+ else
+ {
+ AppendRelInfo **appinfos;
+ int nappinfos,
+ i;
+
+ appinfos = find_appinfos_by_relids(root, prel->relids,
+ &nappinfos);
+ for (i = 0; i < nappinfos; i++)
+ {
+ AppendRelInfo *appinfo = appinfos[i];
+
+ parent_relids = bms_add_member(parent_relids,
+ appinfo->parent_relid);
+ }
+ pfree(appinfos);
+ prel = prel->parent;
+ }
+ /* accept this level as an interesting parent */
+ partrelids = bms_add_members(partrelids, parent_relids);
+ if (prel == parentrel)
+ break; /* don't traverse above parentrel */
+ } while (IS_OTHER_REL(prel));
+
+ if (partrelids == NULL)
+ return allpartrelids;
+
+ return add_part_relids(allpartrelids, partrelids);
+}
+
+/*
+ * add_part_relids
+ * Add new info to a list of Bitmapsets of partitioned relids.
+ *
+ * Within 'allpartrelids', there is one Bitmapset for each topmost parent
+ * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
+ * parent as well as its relevant non-leaf child partitions. Since (by
+ * construction of the rangetable list) parent partitions must have lower
+ * RT indexes than their children, we can distinguish the topmost parent
+ * as being the lowest set bit in the Bitmapset.
+ *
+ * 'partrelids' contains the RT indexes of a parent partitioned rel, and
+ * possibly some non-leaf children, that are newly identified as parents of
+ * some subpath rel passed to make_partition_pruneinfo(). These are added
+ * to an appropriate member of 'allpartrelids'.
+ *
+ * Note that the list contains only RT indexes of partitioned tables that
+ * are parents of some scan-level relation appearing in the 'subpaths' that
+ * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
+ * not allowed to be higher than the 'parentrel' associated with the append
+ * path. In this way, we avoid expending cycles on partitioned rels that
+ * can't contribute useful pruning information for the problem at hand.
+ * (It is possible for 'parentrel' to be a child partitioned table, and it
+ * is also possible for scan-level relations to be child partitioned tables
+ * rather than leaf partitions. Hence we must construct this relation set
+ * with reference to the particular append path we're dealing with, rather
+ * than looking at the full partitioning structure represented in the
+ * RelOptInfos.)
+ */
+static List *
+add_part_relids(List *allpartrelids, Bitmapset *partrelids)
+{
+ Index targetpart;
+ ListCell *lc;
+
+ /* We can easily get the lowest set bit this way: */
+ targetpart = bms_next_member(partrelids, -1);
+ Assert(targetpart > 0);
+
+ /* Look for a matching topmost parent */
+ foreach(lc, allpartrelids)
+ {
+ Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
+ Index currtarget = bms_next_member(currpartrelids, -1);
+
+ if (targetpart == currtarget)
+ {
+ /* Found a match, so add any new RT indexes to this hierarchy */
+ currpartrelids = bms_add_members(currpartrelids, partrelids);
+ lfirst(lc) = currpartrelids;
+ return allpartrelids;
+ }
+ }
+ /* No match, so add the new partition hierarchy to the list */
+ return lappend(allpartrelids, partrelids);
+}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 7179b22a05..213512a5f4 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -138,7 +138,6 @@ typedef struct PruneStepResult
} PruneStepResult;
-static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
RelOptInfo *parentrel,
List *prunequal,
@@ -218,33 +217,32 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'allpartrelids' contains Bitmapsets of RT indexes of partitioned parents
+ * whose partitions' Paths are in 'subpaths'; there's one Bitmapset for every
+ * partition tree involved.
*/
PartitionPruneInfo *
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ List *allpartrelids)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
- List *allpartrelids;
List *prunerelinfos;
int *relid_subplan_map;
ListCell *lc;
int i;
+ Assert(list_length(allpartrelids) > 0);
+
/*
- * Scan the subpaths to see which ones are scans of partition child
- * relations, and identify their parent partitioned rels. (Note: we must
- * restrict the parent partitioned rels to be parentrel or children of
- * parentrel, otherwise we couldn't translate prunequal to match.)
- *
- * Also construct a temporary array to map from partition-child-relation
- * relid to the index in 'subpaths' of the scan plan for that partition.
+ * Construct a temporary array to map from partition-child-relation relid
+ * to the index in 'subpaths' of the scan plan for that partition.
* (Use of "subplan" rather than "subpath" is a bit of a misnomer, but
* we'll let it stand.) For convenience, we use 1-based indexes here, so
* that zero can represent an un-filled array entry.
*/
- allpartrelids = NIL;
relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
i = 1;
@@ -253,50 +251,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Path *path = (Path *) lfirst(lc);
RelOptInfo *pathrel = path->parent;
- /* We don't consider partitioned joins here */
- if (pathrel->reloptkind == RELOPT_OTHER_MEMBER_REL)
- {
- RelOptInfo *prel = pathrel;
- Bitmapset *partrelids = NULL;
-
- /*
- * Traverse up to the pathrel's topmost partitioned parent,
- * collecting parent relids as we go; but stop if we reach
- * parentrel. (Normally, a pathrel's topmost partitioned parent
- * is either parentrel or a UNION ALL appendrel child of
- * parentrel. But when handling partitionwise joins of
- * multi-level partitioning trees, we can see an append path whose
- * parentrel is an intermediate partitioned table.)
- */
- do
- {
- AppendRelInfo *appinfo;
-
- Assert(prel->relid < root->simple_rel_array_size);
- appinfo = root->append_rel_array[prel->relid];
- prel = find_base_rel(root, appinfo->parent_relid);
- if (!IS_PARTITIONED_REL(prel))
- break; /* reached a non-partitioned parent */
- /* accept this level as an interesting parent */
- partrelids = bms_add_member(partrelids, prel->relid);
- if (prel == parentrel)
- break; /* don't traverse above parentrel */
- } while (prel->reloptkind == RELOPT_OTHER_MEMBER_REL);
-
- if (partrelids)
- {
- /*
- * Found some relevant parent partitions, which may or may not
- * overlap with partition trees we already found. Add new
- * information to the allpartrelids list.
- */
- allpartrelids = add_part_relids(allpartrelids, partrelids);
- /* Also record the subplan in relid_subplan_map[] */
- /* No duplicates please */
- Assert(relid_subplan_map[pathrel->relid] == 0);
- relid_subplan_map[pathrel->relid] = i;
- }
- }
+ /* No duplicates please */
+ Assert(relid_subplan_map[pathrel->relid] == 0);
+ relid_subplan_map[pathrel->relid] = i;
i++;
}
@@ -362,63 +319,6 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return pruneinfo;
}
-/*
- * add_part_relids
- * Add new info to a list of Bitmapsets of partitioned relids.
- *
- * Within 'allpartrelids', there is one Bitmapset for each topmost parent
- * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
- * parent as well as its relevant non-leaf child partitions. Since (by
- * construction of the rangetable list) parent partitions must have lower
- * RT indexes than their children, we can distinguish the topmost parent
- * as being the lowest set bit in the Bitmapset.
- *
- * 'partrelids' contains the RT indexes of a parent partitioned rel, and
- * possibly some non-leaf children, that are newly identified as parents of
- * some subpath rel passed to make_partition_pruneinfo(). These are added
- * to an appropriate member of 'allpartrelids'.
- *
- * Note that the list contains only RT indexes of partitioned tables that
- * are parents of some scan-level relation appearing in the 'subpaths' that
- * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
- * not allowed to be higher than the 'parentrel' associated with the append
- * path. In this way, we avoid expending cycles on partitioned rels that
- * can't contribute useful pruning information for the problem at hand.
- * (It is possible for 'parentrel' to be a child partitioned table, and it
- * is also possible for scan-level relations to be child partitioned tables
- * rather than leaf partitions. Hence we must construct this relation set
- * with reference to the particular append path we're dealing with, rather
- * than looking at the full partitioning structure represented in the
- * RelOptInfos.)
- */
-static List *
-add_part_relids(List *allpartrelids, Bitmapset *partrelids)
-{
- Index targetpart;
- ListCell *lc;
-
- /* We can easily get the lowest set bit this way: */
- targetpart = bms_next_member(partrelids, -1);
- Assert(targetpart > 0);
-
- /* Look for a matching topmost parent */
- foreach(lc, allpartrelids)
- {
- Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
- Index currtarget = bms_next_member(currpartrelids, -1);
-
- if (targetpart == currtarget)
- {
- /* Found a match, so add any new RT indexes to this hierarchy */
- currpartrelids = bms_add_members(currpartrelids, partrelids);
- lfirst(lc) = currpartrelids;
- return allpartrelids;
- }
- }
- /* No match, so add the new partition hierarchy to the list */
- return lappend(allpartrelids, partrelids);
-}
-
/*
* make_partitionedrel_pruneinfo
* Build a List of PartitionedRelPruneInfos, one for each interesting
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 10c5cda169..74a471e3e3 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -601,6 +601,7 @@ exec_rt_fetch(Index rti, EState *estate)
extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Index rti);
+extern void ExecLockAppendNonLeafPartitions(EState *estate, List *allpartrelids);
extern int executor_errposition(EState *estate, int location);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 1b787fe031..7a5f3ba625 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -267,6 +267,13 @@ typedef struct Append
List *appendplans;
int nasyncplans; /* # of asynchronous plans */
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * appendplans. The list contains a bitmapset for every partition tree
+ * covered by this Append.
+ */
+ List *allpartrelids;
+
/*
* All 'appendplans' preceding this index are non-partial plans. All
* 'appendplans' from this index onwards are partial plans.
@@ -291,6 +298,13 @@ typedef struct MergeAppend
List *mergeplans;
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * mergeplans. The list contains a bitmapset for every partition tree
+ * covered by this MergeAppend.
+ */
+ List *allpartrelids;
+
/* these fields are just like the sort-key info in struct Sort: */
/* number of sort-key columns */
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index a05f91f77d..1621a7319a 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -46,5 +46,8 @@ extern void add_row_identity_columns(PlannerInfo *root, Index rtindex,
RangeTblEntry *target_rte,
Relation target_relation);
extern void distribute_row_identity_vars(PlannerInfo *root);
+extern List *add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids);
#endif /* APPENDINFO_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 8636e04e37..caa774a111 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -73,7 +73,8 @@ typedef struct PartitionPruneContext
extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ List *allpartrelids);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v48-0008-Track-opened-range-table-relations-in-a-List-in-.patchapplication/octet-stream; name=v48-0008-Track-opened-range-table-relations-in-a-List-in-.patchDownload
From fef4457e294bcc6b48a910f148816b2d163905ec Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:54:19 +0900
Subject: [PATCH v48 8/8] Track opened range table relations in a List in
EState
This makes ExecCloseRangeTableRelations faster when there are many
relations in the range table but only a few are opened during
execution, such as when run-time pruning kicks in on an Append
containing thousands of partition subplans.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 9 +++++----
src/backend/executor/execUtils.c | 3 +++
src/include/nodes/execnodes.h | 2 ++
3 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2804ec70f1..d559c1de61 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1649,12 +1649,13 @@ ExecCloseResultRelations(EState *estate)
void
ExecCloseRangeTableRelations(EState *estate)
{
- int i;
+ ListCell *lc;
- for (i = 0; i < estate->es_range_table_size; i++)
+ foreach(lc, estate->es_opened_relations)
{
- if (estate->es_relations[i])
- table_close(estate->es_relations[i], NoLock);
+ Relation rel = lfirst(lc);
+
+ table_close(rel, NoLock);
}
}
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 2b7a08c9ba..1dfef44495 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -822,6 +822,9 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
}
estate->es_relations[rti - 1] = rel;
+ if (rel != NULL)
+ estate->es_opened_relations = lappend(estate->es_opened_relations,
+ rel);
}
return rel;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index bb5734edb5..8bbe1f6b14 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,8 @@ typedef struct EState
Index es_range_table_size; /* size of the range table arrays */
Relation *es_relations; /* Array of per-range-table-entry Relation
* pointers, or NULL if not yet opened */
+ List *es_opened_relations; /* List of non-NULL entries in
+ * es_relations in no specific order */
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
--
2.35.3
On Thu, Sep 28, 2023 at 5:26 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Tue, Sep 26, 2023 at 10:06 PM Amit Langote <amitlangote09@gmail.com> wrote:
After sleeping on this, I think we do need the checks after all the
ExecInitNode() calls too, because we have many instances of the code
like the following one:outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
<some code that dereferences outDesc>If outerNode is a SeqScan and ExecInitSeqScan() returned early because
ExecOpenScanRelation() detected that plan was invalidated, then
tupDesc would be NULL in this case, causing the code to crash.Now one might say that perhaps we should only add the
is-CachedPlan-valid test in the instances where there is an actual
risk of such misbehavior, but that could lead to confusion, now or
later. It seems better to add them after every ExecInitNode() call
while we're inventing the notion, because doing so relieves the
authors of future enhancements of the ExecInit*() routines from
worrying about any of this.Attached 0003 should show how that turned out.
Updated 0002 as mentioned in the previous reply -- setting pointers to
NULL after freeing them more consistently across various ExecEnd*()
routines and using the `if (pointer != NULL)` style over the `if
(pointer)` more consistently.Updated 0001's commit message to remove the mention of its relation to
any future commits. I intend to push it tomorrow.Pushed that one. Here are the rebased patches.
0001 seems ready to me, but I'll wait a couple more days for others to
weigh in. Just to highlight a kind of change that others may have
differing opinions on, consider this hunk from the patch:- MemoryContextDelete(node->aggcontext); + if (node->aggcontext != NULL) + { + MemoryContextDelete(node->aggcontext); + node->aggcontext = NULL; + } ... + ExecEndNode(outerPlanState(node)); + outerPlanState(node) = NULL;So the patch wants to enhance the consistency of setting the pointer
to NULL after freeing part. Robert mentioned his preference for doing
it in the patch, which I agree with.
Rebased.
I haven't been able to reproduce and debug a crash reported by cfbot
that I see every now and then:
https://cirrus-ci.com/task/5673432591892480?logs=cores#L0
[22:46:12.328] Program terminated with signal SIGSEGV, Segmentation fault.
[22:46:12.328] Address not mapped to object.
[22:46:12.838] #0 afterTriggerInvokeEvents
(events=events@entry=0x836db0460, firing_id=1,
estate=estate@entry=0x842eec100, delete_ok=<optimized out>) at
../src/backend/commands/trigger.c:4656
[22:46:12.838] #1 0x00000000006c67a8 in AfterTriggerEndQuery
(estate=estate@entry=0x842eec100) at
../src/backend/commands/trigger.c:5085
[22:46:12.838] #2 0x000000000065bfba in CopyFrom (cstate=0x836df9038)
at ../src/backend/commands/copyfrom.c:1293
...
While a patch in this series does change
src/backend/commands/trigger.c, I'm not yet sure about its relation
with the backtrace shown there.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v49-0006-Add-field-to-store-parent-relids-to-Append-Merge.patchapplication/octet-stream; name=v49-0006-Add-field-to-store-parent-relids-to-Append-Merge.patchDownload
From bbde32f56bfbaba07d699dec4838f21d43e40384 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:54:02 +0900
Subject: [PATCH v49 6/8] Add field to store parent relids to
Append/MergeAppend
There's no way currently in the executor to tell if the child
subplans of Append/MergeAppend are scanning partitions, and if
they indeed do, what the RT indexes of their parent/ancestor tables
are. Executor doesn't need to see their RT indexes except for
run-time pruning, in which case they can can be found in the
PartitionPruneInfo. A future commit will create a need for them to
be available at all times for the purpose of locking those
parent/ancestor tables when executing a cached plan, so add a
field called allpartrelids to Append/MergeAppend to store those
RT indexes. This also adds a function called
ExecLockAppendNonLeafTables() to lock those tables.
The code to look up partitioned parent relids for a given list of
partition scan subpaths of an Append/MergeAppend is already present
in make_partition_pruneinfo() but it's local to partprune.c. This
commit refactors that code into its own function called
add_append_subpath_partrelids() defined in appendinfo.c and
generalizes it to consider child join and aggregate paths. To
facilitate looking up of parent rels of child grouping rels in
add_append_subpath_partrelids(), parent links are now also set in
the RelOptInfos of child grouping rels too, like they are in
those of child base and join rels.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 2 +-
src/backend/executor/execUtils.c | 33 ++++++
src/backend/executor/nodeAppend.c | 14 +++
src/backend/executor/nodeMergeAppend.c | 14 +++
src/backend/optimizer/plan/createplan.c | 41 ++++++--
src/backend/optimizer/plan/planner.c | 3 +
src/backend/optimizer/plan/setrefs.c | 4 +
src/backend/optimizer/util/appendinfo.c | 134 ++++++++++++++++++++++++
src/backend/partitioning/partprune.c | 124 +++-------------------
src/include/executor/executor.h | 1 +
src/include/nodes/plannodes.h | 14 +++
src/include/optimizer/appendinfo.h | 3 +
src/include/partitioning/partprune.h | 3 +-
13 files changed, 266 insertions(+), 124 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ffc62e379a..2804ec70f1 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1475,7 +1475,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked by the planner or ExecLockAppendNonLeafPartitions().
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 117773706a..2b7a08c9ba 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -827,6 +827,39 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
return rel;
}
+/*
+ * ExecLockAppendNonLeafPartitions
+ * Lock non-leaf partitions whose child partitions are scanned by a given
+ * Append/MergeAppend node
+ */
+void
+ExecLockAppendNonLeafPartitions(EState *estate, List *allpartrelids)
+{
+ ListCell *l;
+
+ /* This should get called only when executing cached plans. */
+ Assert(estate->es_cachedplan != NULL);
+ foreach(l, allpartrelids)
+ {
+ Bitmapset *partrelids = lfirst_node(Bitmapset, l);
+ int i = -1;
+
+ while ((i = bms_next_member(partrelids, i)) > 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(i, estate);
+
+ /*
+ * Don't lock the root parent mentioned in the query, because it
+ * should already have been locked before entering the executor.
+ */
+ if (!rte->inFromCl)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ Assert(CheckRelLockedByMe(rte->relid, rte->rellockmode, true));
+ }
+ }
+}
+
/*
* ExecInitResultRelation
* Open relation given by the passed-in RT index and fill its
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 53ca9dc85d..4759511f87 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -133,6 +133,20 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_syncdone = false;
appendstate->as_begun = false;
+ /*
+ * Lock non-leaf partitions whose leaf children are present in
+ * node->appendplans. Only need to do so if executing a cached
+ * plan, because child tables present in cached plans are not
+ * locked before execution.
+ *
+ * XXX - some of the non-leaf partitions may also be mentioned in
+ * part_prune_info, which if they are would get locked again in
+ * ExecInitPartitionPruning() because it calls
+ * ExecGetRangeTableRelation() which locks child tables.
+ */
+ if (estate->es_cachedplan)
+ ExecLockAppendNonLeafPartitions(estate, node->allpartrelids);
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 52c3edf278..158210aac1 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -81,6 +81,20 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.state = estate;
mergestate->ps.ExecProcNode = ExecMergeAppend;
+ /*
+ * Lock non-leaf partitions whose leaf children are present in
+ * node->mergeplans. Only need to do so if executing a cached
+ * plan, because child tables present in cached plans are not
+ * locked before execution.
+ *
+ * XXX - some of the non-leaf partitions may also be mentioned in
+ * part_prune_info, which if they are would get locked again in
+ * ExecInitPartitionPruning() because it calls
+ * ExecGetRangeTableRelation() which locks child tables.
+ */
+ if (estate->es_cachedplan)
+ ExecLockAppendNonLeafPartitions(estate, node->allpartrelids);
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 34ca6d4ac2..d1f4f606bf 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -25,6 +25,7 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
@@ -1229,6 +1230,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
Oid *nodeCollations = NULL;
bool *nodeNullsFirst = NULL;
bool consider_async = false;
+ List *allpartrelids = NIL;
/*
* The subpaths list could be empty, if every child was proven empty by
@@ -1370,15 +1372,23 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
++nasyncplans;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ plan->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1399,7 +1409,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
partpruneinfo =
make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
plan->appendplans = subplans;
@@ -1445,6 +1456,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ List *allpartrelids = NIL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1534,15 +1546,23 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplan = (Plan *) sort;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ node->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1554,7 +1574,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
partpruneinfo = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
node->mergeplans = subplans;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a8cea5efe1..63a11e511e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7850,8 +7850,11 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
agg_costs, gd, &child_extra,
&child_partially_grouped_rel);
+ /* Mark as child of grouped_rel. */
+ child_grouped_rel->parent = grouped_rel;
if (child_partially_grouped_rel)
{
+ child_partially_grouped_rel->parent = grouped_rel;
partially_grouped_live_children =
lappend(partially_grouped_live_children,
child_partially_grouped_rel);
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index fc3709510d..04910c301f 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1766,6 +1766,8 @@ set_append_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) aplan, rtoffset);
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
+ foreach(l, aplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (aplan->part_prune_info)
{
@@ -1842,6 +1844,8 @@ set_mergeappend_references(PlannerInfo *root,
set_dummy_tlist_references((Plan *) mplan, rtoffset);
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
+ foreach(l, mplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
if (mplan->part_prune_info)
{
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index f456b3b0a4..5bd8e82b9b 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -41,6 +41,7 @@ static void make_inh_translation_list(Relation oldrelation,
AppendRelInfo *appinfo);
static Node *adjust_appendrel_attrs_mutator(Node *node,
adjust_appendrel_attrs_context *context);
+static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
/*
@@ -1035,3 +1036,136 @@ distribute_row_identity_vars(PlannerInfo *root)
}
}
}
+
+/*
+ * add_append_subpath_partrelids
+ * Look up a child subpath's rel's partitioned parent relids up to
+ * parentrel and add the bitmapset containing those into
+ * 'allpartrelids'
+ */
+List *
+add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids)
+{
+ RelOptInfo *prel = subpath->parent;
+ Relids partrelids = NULL;
+
+ /* Nothing to do if there's no parent to begin with. */
+ if (!IS_OTHER_REL(prel))
+ return allpartrelids;
+
+ /*
+ * Traverse up to the pathrel's topmost partitioned parent, collecting
+ * parent relids as we go; but stop if we reach parentrel. (Normally, a
+ * pathrel's topmost partitioned parent is either parentrel or a UNION ALL
+ * appendrel child of parentrel. But when handling partitionwise joins of
+ * multi-level partitioning trees, we can see an append path whose
+ * parentrel is an intermediate partitioned table.)
+ */
+ do
+ {
+ Relids parent_relids = NULL;
+
+ /*
+ * For simple child rels, we can simply set the parent_relids to
+ * prel->parent->relids. But for partitionwise join and aggregate
+ * child rels, while we can use prel->parent to move up the tree,
+ * parent_relids must be found the hard way through AppendInfoInfos,
+ * because 1) a joinrel's relids may point to RTE_JOIN entries,
+ * 2) topmost parent grouping rel's relids field is NULL.
+ */
+ if (IS_SIMPLE_REL(prel))
+ {
+ prel = prel->parent;
+ /* Stop once we reach the root partitioned rel. */
+ if (!IS_PARTITIONED_REL(prel))
+ break;
+ parent_relids = bms_add_members(parent_relids, prel->relids);
+ }
+ else
+ {
+ AppendRelInfo **appinfos;
+ int nappinfos,
+ i;
+
+ appinfos = find_appinfos_by_relids(root, prel->relids,
+ &nappinfos);
+ for (i = 0; i < nappinfos; i++)
+ {
+ AppendRelInfo *appinfo = appinfos[i];
+
+ parent_relids = bms_add_member(parent_relids,
+ appinfo->parent_relid);
+ }
+ pfree(appinfos);
+ prel = prel->parent;
+ }
+ /* accept this level as an interesting parent */
+ partrelids = bms_add_members(partrelids, parent_relids);
+ if (prel == parentrel)
+ break; /* don't traverse above parentrel */
+ } while (IS_OTHER_REL(prel));
+
+ if (partrelids == NULL)
+ return allpartrelids;
+
+ return add_part_relids(allpartrelids, partrelids);
+}
+
+/*
+ * add_part_relids
+ * Add new info to a list of Bitmapsets of partitioned relids.
+ *
+ * Within 'allpartrelids', there is one Bitmapset for each topmost parent
+ * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
+ * parent as well as its relevant non-leaf child partitions. Since (by
+ * construction of the rangetable list) parent partitions must have lower
+ * RT indexes than their children, we can distinguish the topmost parent
+ * as being the lowest set bit in the Bitmapset.
+ *
+ * 'partrelids' contains the RT indexes of a parent partitioned rel, and
+ * possibly some non-leaf children, that are newly identified as parents of
+ * some subpath rel passed to make_partition_pruneinfo(). These are added
+ * to an appropriate member of 'allpartrelids'.
+ *
+ * Note that the list contains only RT indexes of partitioned tables that
+ * are parents of some scan-level relation appearing in the 'subpaths' that
+ * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
+ * not allowed to be higher than the 'parentrel' associated with the append
+ * path. In this way, we avoid expending cycles on partitioned rels that
+ * can't contribute useful pruning information for the problem at hand.
+ * (It is possible for 'parentrel' to be a child partitioned table, and it
+ * is also possible for scan-level relations to be child partitioned tables
+ * rather than leaf partitions. Hence we must construct this relation set
+ * with reference to the particular append path we're dealing with, rather
+ * than looking at the full partitioning structure represented in the
+ * RelOptInfos.)
+ */
+static List *
+add_part_relids(List *allpartrelids, Bitmapset *partrelids)
+{
+ Index targetpart;
+ ListCell *lc;
+
+ /* We can easily get the lowest set bit this way: */
+ targetpart = bms_next_member(partrelids, -1);
+ Assert(targetpart > 0);
+
+ /* Look for a matching topmost parent */
+ foreach(lc, allpartrelids)
+ {
+ Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
+ Index currtarget = bms_next_member(currpartrelids, -1);
+
+ if (targetpart == currtarget)
+ {
+ /* Found a match, so add any new RT indexes to this hierarchy */
+ currpartrelids = bms_add_members(currpartrelids, partrelids);
+ lfirst(lc) = currpartrelids;
+ return allpartrelids;
+ }
+ }
+ /* No match, so add the new partition hierarchy to the list */
+ return lappend(allpartrelids, partrelids);
+}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 3f31ecc956..42461a5b2c 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -138,7 +138,6 @@ typedef struct PruneStepResult
} PruneStepResult;
-static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
RelOptInfo *parentrel,
List *prunequal,
@@ -216,33 +215,32 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'allpartrelids' contains Bitmapsets of RT indexes of partitioned parents
+ * whose partitions' Paths are in 'subpaths'; there's one Bitmapset for every
+ * partition tree involved.
*/
PartitionPruneInfo *
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ List *allpartrelids)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
- List *allpartrelids;
List *prunerelinfos;
int *relid_subplan_map;
ListCell *lc;
int i;
+ Assert(list_length(allpartrelids) > 0);
+
/*
- * Scan the subpaths to see which ones are scans of partition child
- * relations, and identify their parent partitioned rels. (Note: we must
- * restrict the parent partitioned rels to be parentrel or children of
- * parentrel, otherwise we couldn't translate prunequal to match.)
- *
- * Also construct a temporary array to map from partition-child-relation
- * relid to the index in 'subpaths' of the scan plan for that partition.
+ * Construct a temporary array to map from partition-child-relation relid
+ * to the index in 'subpaths' of the scan plan for that partition.
* (Use of "subplan" rather than "subpath" is a bit of a misnomer, but
* we'll let it stand.) For convenience, we use 1-based indexes here, so
* that zero can represent an un-filled array entry.
*/
- allpartrelids = NIL;
relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
i = 1;
@@ -251,50 +249,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Path *path = (Path *) lfirst(lc);
RelOptInfo *pathrel = path->parent;
- /* We don't consider partitioned joins here */
- if (pathrel->reloptkind == RELOPT_OTHER_MEMBER_REL)
- {
- RelOptInfo *prel = pathrel;
- Bitmapset *partrelids = NULL;
-
- /*
- * Traverse up to the pathrel's topmost partitioned parent,
- * collecting parent relids as we go; but stop if we reach
- * parentrel. (Normally, a pathrel's topmost partitioned parent
- * is either parentrel or a UNION ALL appendrel child of
- * parentrel. But when handling partitionwise joins of
- * multi-level partitioning trees, we can see an append path whose
- * parentrel is an intermediate partitioned table.)
- */
- do
- {
- AppendRelInfo *appinfo;
-
- Assert(prel->relid < root->simple_rel_array_size);
- appinfo = root->append_rel_array[prel->relid];
- prel = find_base_rel(root, appinfo->parent_relid);
- if (!IS_PARTITIONED_REL(prel))
- break; /* reached a non-partitioned parent */
- /* accept this level as an interesting parent */
- partrelids = bms_add_member(partrelids, prel->relid);
- if (prel == parentrel)
- break; /* don't traverse above parentrel */
- } while (prel->reloptkind == RELOPT_OTHER_MEMBER_REL);
-
- if (partrelids)
- {
- /*
- * Found some relevant parent partitions, which may or may not
- * overlap with partition trees we already found. Add new
- * information to the allpartrelids list.
- */
- allpartrelids = add_part_relids(allpartrelids, partrelids);
- /* Also record the subplan in relid_subplan_map[] */
- /* No duplicates please */
- Assert(relid_subplan_map[pathrel->relid] == 0);
- relid_subplan_map[pathrel->relid] = i;
- }
- }
+ /* No duplicates please */
+ Assert(relid_subplan_map[pathrel->relid] == 0);
+ relid_subplan_map[pathrel->relid] = i;
i++;
}
@@ -360,63 +317,6 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return pruneinfo;
}
-/*
- * add_part_relids
- * Add new info to a list of Bitmapsets of partitioned relids.
- *
- * Within 'allpartrelids', there is one Bitmapset for each topmost parent
- * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
- * parent as well as its relevant non-leaf child partitions. Since (by
- * construction of the rangetable list) parent partitions must have lower
- * RT indexes than their children, we can distinguish the topmost parent
- * as being the lowest set bit in the Bitmapset.
- *
- * 'partrelids' contains the RT indexes of a parent partitioned rel, and
- * possibly some non-leaf children, that are newly identified as parents of
- * some subpath rel passed to make_partition_pruneinfo(). These are added
- * to an appropriate member of 'allpartrelids'.
- *
- * Note that the list contains only RT indexes of partitioned tables that
- * are parents of some scan-level relation appearing in the 'subpaths' that
- * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
- * not allowed to be higher than the 'parentrel' associated with the append
- * path. In this way, we avoid expending cycles on partitioned rels that
- * can't contribute useful pruning information for the problem at hand.
- * (It is possible for 'parentrel' to be a child partitioned table, and it
- * is also possible for scan-level relations to be child partitioned tables
- * rather than leaf partitions. Hence we must construct this relation set
- * with reference to the particular append path we're dealing with, rather
- * than looking at the full partitioning structure represented in the
- * RelOptInfos.)
- */
-static List *
-add_part_relids(List *allpartrelids, Bitmapset *partrelids)
-{
- Index targetpart;
- ListCell *lc;
-
- /* We can easily get the lowest set bit this way: */
- targetpart = bms_next_member(partrelids, -1);
- Assert(targetpart > 0);
-
- /* Look for a matching topmost parent */
- foreach(lc, allpartrelids)
- {
- Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
- Index currtarget = bms_next_member(currpartrelids, -1);
-
- if (targetpart == currtarget)
- {
- /* Found a match, so add any new RT indexes to this hierarchy */
- currpartrelids = bms_add_members(currpartrelids, partrelids);
- lfirst(lc) = currpartrelids;
- return allpartrelids;
- }
- }
- /* No match, so add the new partition hierarchy to the list */
- return lappend(allpartrelids, partrelids);
-}
-
/*
* make_partitionedrel_pruneinfo
* Build a List of PartitionedRelPruneInfos, one for each interesting
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 4f183ec6cd..350a625107 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -601,6 +601,7 @@ exec_rt_fetch(Index rti, EState *estate)
extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Index rti);
+extern void ExecLockAppendNonLeafPartitions(EState *estate, List *allpartrelids);
extern int executor_errposition(EState *estate, int location);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index d40af8e59f..ffcc8490d7 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -267,6 +267,13 @@ typedef struct Append
List *appendplans;
int nasyncplans; /* # of asynchronous plans */
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * appendplans. The list contains a bitmapset for every partition tree
+ * covered by this Append.
+ */
+ List *allpartrelids;
+
/*
* All 'appendplans' preceding this index are non-partial plans. All
* 'appendplans' from this index onwards are partial plans.
@@ -291,6 +298,13 @@ typedef struct MergeAppend
List *mergeplans;
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * mergeplans. The list contains a bitmapset for every partition tree
+ * covered by this MergeAppend.
+ */
+ List *allpartrelids;
+
/* these fields are just like the sort-key info in struct Sort: */
/* number of sort-key columns */
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index a05f91f77d..1621a7319a 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -46,5 +46,8 @@ extern void add_row_identity_columns(PlannerInfo *root, Index rtindex,
RangeTblEntry *target_rte,
Relation target_relation);
extern void distribute_row_identity_vars(PlannerInfo *root);
+extern List *add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids);
#endif /* APPENDINFO_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 8636e04e37..caa774a111 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -73,7 +73,8 @@ typedef struct PartitionPruneContext
extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ List *allpartrelids);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
v49-0007-Delay-locking-of-child-tables-in-cached-plans-un.patchapplication/octet-stream; name=v49-0007-Delay-locking-of-child-tables-in-cached-plans-un.patchDownload
From a8bc206d5367decfb04f95283a67645b85ba5c81 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:54:15 +0900
Subject: [PATCH v49 7/8] Delay locking of child tables in cached plans until
ExecutorStart()
Currently, GetCachedPlan() takes a lock on all relations contained in
a cached plan before returning it as a valid plan to its callers for
execution. One disadvantage is that if the plan contains partitions
that are prunable with conditions involving EXTERN parameters and
other stable expressions (known as "initial pruning"), many of them
would be locked unnecessarily, because only those that survive
initial pruning need to have been locked. Locking all partitions this
way causes significant delay when there are many partitions. Note
that initial pruning occurs during executor's initialization of the
plan, that is, ExecInitNode().
Previous commits have made all the necessary adjustment to make the
executor lock child tables, to detect invalidation of the CachedPlan
resulting from that, and to retry the execution with a new CachePlan.
So, this commit simply removes the code in plancache.c that does the
"for execution" locking, aka AcquireExecutorLocks().
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/spi.c | 2 +-
src/backend/tcop/pquery.c | 6 +-
src/backend/utils/cache/plancache.c | 154 +++++++----------
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 67 +++++++-
.../expected/cached-plan-replan.out | 158 ++++++++++++++++++
.../specs/cached-plan-replan.spec | 61 +++++++
7 files changed, 343 insertions(+), 108 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-replan.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-replan.spec
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 60e2632cd2..99f4ed92fb 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2680,7 +2680,7 @@ replan:
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
- NULL,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index fcf9925ed4..8d0772ae29 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -412,7 +412,7 @@ PortalStart(Portal portal, ParamListInfo params,
* set the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
- NULL,
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -443,6 +443,7 @@ PortalStart(Portal portal, ParamListInfo params,
*/
if (!ExecutorStart(queryDesc, myeflags))
{
+ Assert(queryDesc->cplan);
ExecutorEnd(queryDesc);
FreeQueryDesc(queryDesc);
PopActiveSnapshot();
@@ -542,7 +543,7 @@ PortalStart(Portal portal, ParamListInfo params,
* PortalRunMulti() before calling ExecutorRun().
*/
queryDesc = CreateQueryDesc(plan,
- NULL,
+ portal->cplan,
portal->sourceText,
!is_utility ?
GetActiveSnapshot() :
@@ -566,6 +567,7 @@ PortalStart(Portal portal, ParamListInfo params,
if (!ExecutorStart(queryDesc, myeflags))
{
PopActiveSnapshot();
+ Assert(queryDesc->cplan);
ExecutorEnd(queryDesc);
FreeQueryDesc(queryDesc);
plan_valid = false;
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 8f95520f2f..fad1457192 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -104,13 +104,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -817,8 +817,13 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * If the plan includes child relations introduced by the planner, they
+ * wouldn't be locked yet. This is because AcquirePlannerLocks() only locks
+ * relations present in the original query's range table (before planner
+ * entry). Hence, the plan might become stale if child relations are modified
+ * concurrently. During the plan initialization, the executor must ensure the
+ * plan (CachedPlan) remains valid after locking each child table. If found
+ * invalid, the caller should be prompted to recreate the plan.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -832,60 +837,56 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
-
- return false;
+ return cplan->is_valid;
}
/*
@@ -1155,8 +1156,16 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * Typically, the plan returned by this function is valid. However, a caveat
+ * arises with inheritance/partition child tables. These aren't locked by
+ * this function, as we only lock tables directly mentioned in the original
+ * query here. The task of locking these child tables falls to the executor
+ * during plan tree setup. If acquiring these locks invalidates the plan, the
+ * executor should inform the caller to regenerate the plan by invoking this
+ * function again. The reason for this deferred child table locking mechanism
+ * is efficiency: not all might need to be locked. Some could be pruned during
+ * executor initialization, especially if their corresponding plan nodes
+ * facilitate partition pruning.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1191,7 +1200,10 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
{
if (CheckCachedPlan(plansource))
{
- /* We want a generic plan, and we already have a valid one */
+ /*
+ * We want a generic plan, and we already have a valid one, though
+ * see the header comment.
+ */
plan = plansource->gplan;
Assert(plan->magic == CACHEDPLAN_MAGIC);
}
@@ -1389,8 +1401,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if the executor would need to take additional locks, that is, in
+ * addition to those taken by AcquirePlannerLocks() on a given query.
*/
foreach(lc, plan->stmt_list)
{
@@ -1766,58 +1778,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..2fca84d027 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-replan
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 7cd76eb34b..ce189156ad 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2023, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,45 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static bool
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ bool plan_valid;
+
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ plan_valid = prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ plan_valid ? "valid" : "not valid");
+
+ return plan_valid;
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +127,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-replan.out b/src/test/modules/delay_execution/expected/cached-plan-replan.out
new file mode 100644
index 0000000000..122d81f2ee
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-replan.out
@@ -0,0 +1,158 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+----------------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo11 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo11_a
+ Index Cond: (a = $1)
+(7 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-----------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo11 foo_1
+ Filter: (a = $1)
+(5 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+----------------------------------
+Bitmap Heap Scan on foo11 foo
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on foo11_a
+ Index Cond: (a = 1)
+(4 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------
+Seq Scan on foo11 foo
+ Filter: (a = 1)
+(2 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+------------------------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Index Only Scan using foo11_a on foo11 t1
+ -> Materialize
+ -> Index Scan using foo11_a on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(18 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Sort
+ Sort Key: t1.a
+ -> Seq Scan on foo11 t1
+ -> Sort
+ Sort Key: t2.a
+ -> Seq Scan on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(21 rows)
+
diff --git a/src/test/modules/delay_execution/specs/cached-plan-replan.spec b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
new file mode 100644
index 0000000000..2d0607b176
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
@@ -0,0 +1,61 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo1 PARTITION OF foo FOR VALUES IN (1) PARTITION BY LIST (a);
+ CREATE TABLE foo11 PARTITION OF foo1 FOR VALUES IN (1);
+ CREATE INDEX foo11_a ON foo11 (a);
+ CREATE TABLE foo2 PARTITION OF foo FOR VALUES IN (2);
+ CREATE VIEW foov AS SELECT * FROM foo;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP TABLE foo;
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# no Append case (only one partition selected by the planner)
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Append with partition-wise join aggregate and join plans as child subplans
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo11_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
--
2.35.3
v49-0005-Assert-that-relations-needing-their-permissions-.patchapplication/octet-stream; name=v49-0005-Assert-that-relations-needing-their-permissions-.patchDownload
From 703d7f12172a74c2ef26fa7723105846b4365b4c Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Mon, 25 Sep 2023 11:52:02 +0900
Subject: [PATCH v49 5/8] Assert that relations needing their permissions
checked are locked
---
src/backend/executor/execMain.c | 11 +++++++
src/backend/storage/lmgr/lmgr.c | 45 +++++++++++++++++++++++++++++
src/backend/utils/cache/lsyscache.c | 21 ++++++++++++++
src/include/storage/lmgr.h | 1 +
src/include/utils/lsyscache.h | 1 +
5 files changed, 79 insertions(+)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 5755336abd..ffc62e379a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -626,6 +626,17 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
(rte->rtekind == RTE_SUBQUERY &&
rte->relkind == RELKIND_VIEW));
+ /*
+ * Relations whose permissions need to be checked must already
+ * have been locked by the parser or by GetCachedPlan() if a
+ * cached plan is being executed.
+ *
+ * XXX Maybe we should we skip calling ExecCheckPermissions from
+ * InitPlan in a parallel worker.
+ */
+ Assert(IsParallelWorker() ||
+ CheckRelLockedByMe(rte->relid, AccessShareLock, true));
+
(void) getRTEPermissionInfo(rteperminfos, rte);
/* Many-to-one mapping not allowed */
Assert(!bms_is_member(rte->perminfoindex, indexset));
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index b447ddf11b..3c05d8d87d 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -27,6 +27,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
@@ -364,6 +365,50 @@ CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
return false;
}
+/*
+ * CheckRelLockedByMe
+ *
+ * Returns true if current transaction holds a lock on the given relation of
+ * mode 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
+ */
+bool
+CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger)
+{
+ Oid dbId = get_rel_relisshared(relid) ? InvalidOid : MyDatabaseId;
+ LOCKTAG tag;
+
+ SET_LOCKTAG_RELATION(tag, dbId, relid);
+
+ if (LockHeldByMe(&tag, lockmode))
+ return true;
+
+ if (orstronger)
+ {
+ LOCKMODE slockmode;
+
+ for (slockmode = lockmode + 1;
+ slockmode <= MaxLockMode;
+ slockmode++)
+ {
+ if (LockHeldByMe(&tag, slockmode))
+ {
+#ifdef NOT_USED
+ /* Sometimes this might be useful for debugging purposes */
+ elog(WARNING, "lock mode %s substituted for %s on relation %s",
+ GetLockmodeName(tag.locktag_lockmethodid, slockmode),
+ GetLockmodeName(tag.locktag_lockmethodid, lockmode),
+ RelationGetRelationName(relation));
+#endif
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
/*
* LockHasWaitersRelation
*
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index fc6d267e44..2725d02312 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2095,6 +2095,27 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * get_rel_relisshared
+ *
+ * Returns if the given relation is shared or not
+ */
+bool
+get_rel_relisshared(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relisshared;
+ ReleaseSysCache(tp);
+
+ return result;
+}
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index 39f0e346b0..426614050d 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -48,6 +48,7 @@ extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
bool orstronger);
+extern bool CheckRelLockedByMe(Oid relid, LOCKMODE lockmode, bool orstronger);
extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);
extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index c22cabdf42..177e5d7f12 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -140,6 +140,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_relisshared(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
--
2.35.3
v49-0004-Teach-the-executor-to-lock-child-tables-in-some-.patchapplication/octet-stream; name=v49-0004-Teach-the-executor-to-lock-child-tables-in-some-.patchDownload
From dce6657e911afecef82cddda90e0d236c273fe18 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 22 Sep 2023 18:17:15 +0900
Subject: [PATCH v49 4/8] Teach the executor to lock child tables in some cases
An upcoming commit will move the locking of child tables referenced
in a cached plan tree from GetCachedPlan() to the executor
initialization of the plan tree in ExecutorStart(). This commit
teaches ExecGetRangeTableRelation() to lock child tables if
EState.es_cachedplan points to a CachedPlan.
The executor must now deal with the cases where an unlocked child
table might have been concurrently dropped, so this modifies
ExecGetRangeTableRelation() to use try_table_open(). All of its
callers (and those of ExecOpenScanRelation() that calls it) must
now account for the child table disappearing, which means to abort
initializing the table's Scan node in the middle.
ExecGetRangeTableRelation() now examines inFromCl field of an RTE
to determine that a given range table relation is a child table, so
this commit also makes the planner set inFromCl to false in the
child tables' RTEs that it manufactures.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.comk
---
src/backend/executor/README | 36 +++++++++++++++++++++++-
src/backend/executor/execPartition.c | 2 ++
src/backend/executor/execUtils.c | 41 +++++++++++++++++++++-------
src/backend/optimizer/util/inherit.c | 7 +++++
src/backend/parser/analyze.c | 7 ++---
src/include/nodes/parsenodes.h | 8 ++++--
6 files changed, 84 insertions(+), 17 deletions(-)
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 642d63be61..6cd840d3a7 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,34 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Typically, when the executor initializes a plan tree for execution, it doesn't
+lock non-index relations if the plan tree is freshly generated and not derived
+from a CachedPlan. This is because such locks have already been established
+during the query's parsing, rewriting, and planning phases. However, with a
+cached plan tree, there can be relations that remain unlocked. The function
+GetCachedPlan() locks relations existing in the query's range table pre-planning
+but doesn't account for those added during the planning phase. Consequently,
+inheritance child tables, introduced to the query's range table during planning,
+won't be locked when the cached plan reaches the executor.
+
+The decision to defer locking child tables with GetCachedPlan() arises from the
+fact that not all might be accessed during plan execution. For instance, if
+child tables are partitions, some might be omitted due to pruning at
+execution-initialization-time. Thus, the responsibility of locking these child
+tables is pushed to execution-initialization-time, taking place in ExecInitNode()
+for plan nodes encompassing these tables.
+
+This approach opens a window where a cached plan tree with child tables could
+become outdated if another backend modifies these tables before ExecInitNode()
+locks them. Given this, the executor has the added duty to confirm the plan
+tree's validity whenever it locks a child table post execution-initialization-
+pruning. This validation is done by checking the CachedPlan.is_valid attribute
+of the CachedPlan provided. If the plan tree is outdated (is_valid=false), the
+executor halts any further initialization and alerts the caller that they should
+retry execution with another freshly created plan tree.
Query Processing Control Flow
-----------------------------
@@ -316,7 +344,13 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale during one of the recursive calls of ExecInitNode() after taking a
+lock on a child table, the control is immmediately returned to the caller of
+ExecutorStart(), which must redo the steps from CreateQueryDesc with a new
+plan tree.
+
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index f6c34328b8..532734d758 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1927,6 +1927,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (unlikely(partrel == NULL))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index f0f5740c26..117773706a 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -697,6 +697,8 @@ ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
*
* Open the heap relation to be scanned by a base-level scan plan node.
* This should be called during the node's ExecInit routine.
+ *
+ * NULL is returned if the relation is found to have been dropped.
* ----------------------------------------------------------------
*/
Relation
@@ -706,6 +708,8 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
/* Open the relation. */
rel = ExecGetRangeTableRelation(estate, scanrelid);
+ if (unlikely(rel == NULL))
+ return NULL;
/*
* Complain if we're attempting a scan of an unscannable relation, except
@@ -763,6 +767,9 @@ ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
* Open the Relation for a range table entry, if not already done
*
* The Relations will be closed again in ExecEndPlan().
+ *
+ * Returned value may be NULL if the relation is a child relation that is not
+ * already locked.
*/
Relation
ExecGetRangeTableRelation(EState *estate, Index rti)
@@ -779,7 +786,28 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (IsParallelWorker() ||
+ (estate->es_cachedplan != NULL && !rte->inFromCl))
+ {
+ /*
+ * Take a lock if we are a parallel worker or if this is a child
+ * table referenced in a cached plan.
+ *
+ * Parallel workers need to have their own local lock on the
+ * relation. This ensures sane behavior in case the parent process
+ * exits before we do.
+ *
+ * When executing a cached plan, child tables must be locked
+ * here, because plancache.c (GetCachedPlan()) would only have
+ * locked tables mentioned in the query, that is, tables whose
+ * RTEs' inFromCl is true.
+ *
+ * Note that we use try_table_open() here, because without a lock
+ * held on the relation, it may have disappeared from under us.
+ */
+ rel = try_table_open(rte->relid, rte->rellockmode);
+ }
+ else
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -792,15 +820,6 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rellockmode == AccessShareLock ||
CheckRelationLockedByMe(rel, rte->rellockmode, false));
}
- else
- {
- /*
- * If we are a parallel worker, we need to obtain our own local
- * lock on the relation. This ensures sane behavior in case the
- * parent process exits before we do.
- */
- rel = table_open(rte->relid, rte->rellockmode);
- }
estate->es_relations[rti - 1] = rel;
}
@@ -823,6 +842,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (unlikely(resultRelationDesc == NULL))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index f9d3ff1e7a..1b9d79e341 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -493,6 +493,13 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
}
else
childrte->inh = false;
+
+ /*
+ * Flag child tables as indirectly referenced in the query. This helps
+ * the executor's ExecGetRangeTableRelation() recognize them as
+ * inheritance children.
+ */
+ childrte->inFromCl = false;
childrte->securityQuals = NIL;
/* No permission checking for child RTEs. */
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 7a1dfb6364..cf269f8c53 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -3305,10 +3305,9 @@ transformLockingClause(ParseState *pstate, Query *qry, LockingClause *lc,
/*
* Lock all regular tables used in query and its subqueries. We
* examine inFromCl to exclude auto-added RTEs, particularly NEW/OLD
- * in rules. This is a bit of an abuse of a mostly-obsolete flag, but
- * it's convenient. We can't rely on the namespace mechanism that has
- * largely replaced inFromCl, since for example we need to lock
- * base-relation RTEs even if they are masked by upper joins.
+ * in rules. We can't rely on the namespace mechanism since for
+ * example we need to lock base-relation RTEs even if they are masked
+ * by upper joins.
*/
i = 0;
foreach(rt, qry->rtable)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index e494309da8..642c7bdfea 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -987,11 +987,15 @@ typedef struct PartitionCmd
*
* inFromCl marks those range variables that are listed in the FROM clause.
* It's false for RTEs that are added to a query behind the scenes, such
- * as the NEW and OLD variables for a rule, or the subqueries of a UNION.
+ * as the NEW and OLD variables for a rule, or the subqueries of a UNION,
+ * or the RTEs of inheritance child tables that are added by the planner.
* This flag is not used during parsing (except in transformLockingClause,
* q.v.); the parser now uses a separate "namespace" data structure to
* control visibility. But it is needed by ruleutils.c to determine
- * whether RTEs should be shown in decompiled queries.
+ * whether RTEs should be shown in decompiled queries. The executor uses
+ * this to ascertain if an RTE_RELATION entry is for a table explicitly
+ * named in the query or a child table added by the planner. This
+ * distinction is vital when child tables in a plan must be locked.
*
* securityQuals is a list of security barrier quals (boolean expressions),
* to be tested in the listed order before returning a row from the
--
2.35.3
v49-0008-Track-opened-range-table-relations-in-a-List-in-.patchapplication/octet-stream; name=v49-0008-Track-opened-range-table-relations-in-a-List-in-.patchDownload
From bae4dc2096b673c1ff8ff7da2f4e068ae39d859e Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:54:19 +0900
Subject: [PATCH v49 8/8] Track opened range table relations in a List in
EState
This makes ExecCloseRangeTableRelations faster when there are many
relations in the range table but only a few are opened during
execution, such as when run-time pruning kicks in on an Append
containing thousands of partition subplans.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 9 +++++----
src/backend/executor/execUtils.c | 3 +++
src/include/nodes/execnodes.h | 2 ++
3 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2804ec70f1..d559c1de61 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1649,12 +1649,13 @@ ExecCloseResultRelations(EState *estate)
void
ExecCloseRangeTableRelations(EState *estate)
{
- int i;
+ ListCell *lc;
- for (i = 0; i < estate->es_range_table_size; i++)
+ foreach(lc, estate->es_opened_relations)
{
- if (estate->es_relations[i])
- table_close(estate->es_relations[i], NoLock);
+ Relation rel = lfirst(lc);
+
+ table_close(rel, NoLock);
}
}
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 2b7a08c9ba..1dfef44495 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -822,6 +822,9 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
}
estate->es_relations[rti - 1] = rel;
+ if (rel != NULL)
+ estate->es_opened_relations = lappend(estate->es_opened_relations,
+ rel);
}
return rel;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cad2329ac9..c7bd235737 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -618,6 +618,8 @@ typedef struct EState
Index es_range_table_size; /* size of the range table arrays */
Relation *es_relations; /* Array of per-range-table-entry Relation
* pointers, or NULL if not yet opened */
+ List *es_opened_relations; /* List of non-NULL entries in
+ * es_relations in no specific order */
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
--
2.35.3
v49-0002-Prepare-executor-to-support-detecting-CachedPlan.patchapplication/octet-stream; name=v49-0002-Prepare-executor-to-support-detecting-CachedPlan.patchDownload
From 1c134f99b792c8421d9d5f9590c2f761c6dd31c7 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 22 Sep 2023 18:12:04 +0900
Subject: [PATCH v49 2/8] Prepare executor to support detecting CachedPlan
invalidation
This adds checks at various points during the executor's
initialization of the plan tree to determine whether the originating
CachedPlan has become invalid as a result of taking locks on the
relations referenced in the plan. This includes addding the check
after every call to ExecOpenScanRelation() and to ExecInitNode(),
including the recursive ones to initialize child nodes.
If a given ExecInit*() function detects that the plan has become
invalid, it should return immediately even if the PlanState node
it's building may only be partially valid. That is crucial for
two reasons depending on where the check is:
* The checks following ExecOpenScanRelation() may find the plan
having become invalid because the requested relation was dropped
or had its schema changed concurrently in a manner that risks
unsafe operations in the code that follows. For example, it
might try to dereference a NULL pointer when the check failed
because the relation was dropped.
* For the checks following ExecInitNode(), the returned child
PlanState node might be only partially invalid. The code that
follows may misbehave if it depends on inspecting the child
PlanState. Note that this commit adds the check following all
calls of ExecInitNode() that exist in the code base, even at
sites where there is no code that might misbehave today, because
it might misbehave in the future. It seems like a good idea to
put the guards in place today rather than in the future when the
need arises.
To pass the CachedPlan that the executor will use for these checks,
this adds a new field to QueryDesc and a new parameter to
CreateQueryDesc(). No caller of CreateQueryDesc() is made to pass
an actual CachedPlan though, so there is no functional change.
Reviewed-by: Robert Haas
---
contrib/postgres_fdw/postgres_fdw.c | 10 +++++-
src/backend/commands/copyto.c | 3 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 2 +-
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/executor/execMain.c | 39 ++++++++++++++++++----
src/backend/executor/execParallel.c | 9 ++++-
src/backend/executor/execProcnode.c | 4 +++
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeAgg.c | 2 ++
src/backend/executor/nodeAppend.c | 10 +++---
src/backend/executor/nodeBitmapAnd.c | 2 ++
src/backend/executor/nodeBitmapHeapscan.c | 4 +++
src/backend/executor/nodeBitmapOr.c | 2 ++
src/backend/executor/nodeCustom.c | 2 ++
src/backend/executor/nodeForeignscan.c | 4 +++
src/backend/executor/nodeGather.c | 2 ++
src/backend/executor/nodeGatherMerge.c | 2 ++
src/backend/executor/nodeGroup.c | 2 ++
src/backend/executor/nodeHash.c | 2 ++
src/backend/executor/nodeHashjoin.c | 4 +++
src/backend/executor/nodeIncrementalSort.c | 2 ++
src/backend/executor/nodeIndexonlyscan.c | 2 ++
src/backend/executor/nodeIndexscan.c | 2 ++
src/backend/executor/nodeLimit.c | 2 ++
src/backend/executor/nodeLockRows.c | 2 ++
src/backend/executor/nodeMaterial.c | 2 ++
src/backend/executor/nodeMemoize.c | 2 ++
src/backend/executor/nodeMergeAppend.c | 4 ++-
src/backend/executor/nodeMergejoin.c | 4 +++
src/backend/executor/nodeModifyTable.c | 13 ++++++++
src/backend/executor/nodeNestloop.c | 4 +++
src/backend/executor/nodeProjectSet.c | 2 ++
src/backend/executor/nodeRecursiveunion.c | 4 +++
src/backend/executor/nodeResult.c | 2 ++
src/backend/executor/nodeSamplescan.c | 2 ++
src/backend/executor/nodeSeqscan.c | 2 ++
src/backend/executor/nodeSetOp.c | 2 ++
src/backend/executor/nodeSort.c | 2 ++
src/backend/executor/nodeSubqueryscan.c | 2 ++
src/backend/executor/nodeTidrangescan.c | 2 ++
src/backend/executor/nodeTidscan.c | 2 ++
src/backend/executor/nodeUnique.c | 2 ++
src/backend/executor/nodeWindowAgg.c | 2 ++
src/backend/executor/spi.c | 1 +
src/backend/tcop/pquery.c | 5 ++-
src/include/executor/execdesc.h | 4 +++
src/include/executor/executor.h | 10 ++++++
src/include/nodes/execnodes.h | 2 ++
src/include/utils/plancache.h | 14 ++++++++
51 files changed, 194 insertions(+), 18 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 6de2bec3b7..491200a1fd 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2126,7 +2126,11 @@ postgresEndForeignModify(EState *estate,
{
PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
- /* If fmstate is NULL, we are in EXPLAIN; nothing to do */
+ /*
+ * fmstate could be NULL under two conditions: during an EXPLAIN
+ * operation, or if BeginForeignModify() hasn't been invoked.
+ * In either case, no action is required.
+ */
if (fmstate == NULL)
return;
@@ -2660,7 +2664,11 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
/* Get info about foreign table. */
rtindex = node->resultRelInfo->ri_RangeTableIndex;
if (fsplan->scan.scanrelid == 0)
+ {
dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return;
+ }
else
dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c66a047c4a..0929ad929a 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,8 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL,
+ pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index e91920ca14..18b07c0200 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f1d71bc54e..c698e54fec 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -572,7 +572,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 535072d181..b287a2e84c 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -797,6 +797,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ac2e74fa3f..22b8b820c3 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index f7f18d3054..de7bf7ca67 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -79,7 +79,7 @@ ExecutorEnd_hook_type ExecutorEnd_hook = NULL;
ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
/* decls for local routines only used within this module */
-static void InitPlan(QueryDesc *queryDesc, int eflags);
+static bool InitPlan(QueryDesc *queryDesc, int eflags);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
static void ExecEndPlan(PlanState *planstate, EState *estate);
@@ -263,7 +263,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Initialize the plan state tree
*/
- InitPlan(queryDesc, eflags);
+ (void) InitPlan(queryDesc, eflags);
MemoryContextSwitchTo(oldcontext);
}
@@ -829,9 +829,13 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ * Returns true if the plan tree is successfully initialized for execution,
+ * false otherwise. The latter case may occur if the CachedPlan that provides
+ * the plan tree (queryDesc->cplan) got invalidated during the initialization.
* ----------------------------------------------------------------
*/
-static void
+static bool
InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
@@ -839,11 +843,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
- PlanState *planstate;
- TupleDesc tupType;
+ PlanState *planstate = NULL;
+ TupleDesc tupType = NULL;
ListCell *l;
int i;
+ Assert(queryDesc->planstate == NULL);
+ Assert(queryDesc->tupDesc == NULL);
+
/*
* Do permissions checks
*/
@@ -855,6 +862,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = queryDesc->cplan;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
@@ -886,6 +894,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (unlikely(relation == NULL))
+ return false;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -956,6 +966,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return false;
i++;
}
@@ -966,6 +978,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return false;
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -1009,6 +1023,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
queryDesc->tupDesc = tupType;
queryDesc->planstate = planstate;
+
+ return true;
}
/*
@@ -2858,7 +2874,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2947,6 +2964,13 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
subplanstate = ExecInitNode(subplan, rcestate, 0);
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
+
+ /*
+ * All the necessary locks must already have been taken when
+ * initializing the parent's copy of subplanstate, so the CachedPlan,
+ * if any, should not have become invalid during ExecInitNode().
+ */
+ Assert(ExecPlanStillValid(rcestate));
}
/*
@@ -2988,6 +3012,9 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
*/
epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
+ /* See the comment above. */
+ Assert(ExecPlanStillValid(rcestate));
+
MemoryContextSwitchTo(oldcontext);
}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index cc2b8ccab7..457ee46faf 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1248,8 +1248,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Set up a QueryDesc for the query. While the leader might've sourced
+ * the plan tree from a CachedPlan, we don't have one here. This isn't
+ * an issue since the leader ensured the required locks, making our
+ * plan tree valid. Even as we get our own lock copies in
+ * ExecGetRangeTableRelation(), they're all already held by the leader.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index b4b5c562c0..febaa194c4 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -136,6 +136,10 @@ static bool ExecShutdownNode_walker(PlanState *node, void *context);
* 'eflags' is a bitwise OR of flag bits described in executor.h
*
* Returns a PlanState node corresponding to the given Plan node.
+ *
+ * Callers should check upon returning that ExecPlanStillValid(estate)
+ * returns true before continuing further with its processing, because the
+ * returned PlanState might be only partially valid otherwise.
* ------------------------------------------------------------------------
*/
PlanState *
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index bace25234c..66636b05a5 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -838,6 +838,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL, /* fmgr_sql() doesn't use CachedPlans */
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index af22b1676f..597d68139e 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3304,6 +3304,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return aggstate;
/*
* initialize source tuple type.
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index a2af221e05..53ca9dc85d 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -185,8 +185,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->ps.resultopsset = true;
appendstate->ps.resultopsfixed = false;
- appendplanstates = (PlanState **) palloc(nplans *
- sizeof(PlanState *));
+ appendplanstates = (PlanState **) palloc0(nplans *
+ sizeof(PlanState *));
+ appendstate->appendplans = appendplanstates;
+ appendstate->as_nplans = nplans;
/*
* call ExecInitNode on each of the valid plans to be executed and save
@@ -221,11 +223,11 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return appendstate;
}
appendstate->as_first_partial_plan = firstvalid;
- appendstate->appendplans = appendplanstates;
- appendstate->as_nplans = nplans;
/* Initialize async state */
appendstate->as_asyncplans = asyncplans;
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4abb0609a0..7556be713c 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -89,6 +89,8 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return bitmapandstate;
i++;
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index d3f58c22f9..f1f8e16b17 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -770,11 +770,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index ace18593aa..7d2bf45d9c 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -90,6 +90,8 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return bitmaporstate;
i++;
}
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index 28b5bb9353..a0befbd0c6 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -61,6 +61,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return css;
css->ss.ss_currentRelation = scan_rel;
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 3aba28285a..336acff719 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -173,6 +173,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -264,6 +266,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/*
* Tell the FDW to initialize the scan.
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 1a3c8abdad..c524022c04 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -89,6 +89,8 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return gatherstate;
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index c6fb45fee0..676faabef5 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -108,6 +108,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return gm_state;
/*
* Leader may access ExecProcNode result directly (if
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 6dfe5a1d23..efa1c44ab4 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -185,6 +185,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return grpstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 88ba336882..1a4bd5504e 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -386,6 +386,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return hashstate;
/*
* initialize our result slot and type. No need to build projection
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 03dd931527..e3b3c2305f 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -752,8 +752,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return hjstate;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return hjstate;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 28a0e81cb3..621ffafe02 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1041,6 +1041,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return incrsortstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 1f3843abe9..c555c14888 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -495,6 +495,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 32e1714f15..a3bd1f7fb0 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -908,6 +908,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index a97bac9f6d..ab133f1580 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -476,6 +476,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return limitstate;
/*
* initialize child expressions
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 26fbe95c57..e1ef768571 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -322,6 +322,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return lrstate;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 03c514900b..c38eef099d 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return matstate;
/*
* Initialize result type and slot. No need to initialize projection info
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 9b13e2e552..dfd695ad07 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -957,6 +957,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mstate;
/*
* Initialize return slot and type. No need to initialize projection info
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 0a42a04b19..52c3edf278 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -120,7 +120,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ms_prune_state = NULL;
}
- mergeplanstates = (PlanState **) palloc(nplans * sizeof(PlanState *));
+ mergeplanstates = (PlanState **) palloc0(nplans * sizeof(PlanState *));
mergestate->mergeplans = mergeplanstates;
mergestate->ms_nplans = nplans;
@@ -151,6 +151,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mergestate;
}
mergestate->ps.ps_ProjInfo = NULL;
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 4b181621f9..e5d4f8e21d 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1490,11 +1490,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mergestate;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mergestate;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index bc82fb033a..e75500a1f5 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4002,6 +4002,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ /*
+ * ExecInitResultRelation() may have returned without initializing
+ * rootResultRelInfo if the plan got invalidated, so check.
+ */
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mtstate;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
node->epqParam, node->resultRelations);
@@ -4030,6 +4037,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ /* See the comment above. */
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mtstate;
+
/*
* For child result relations, store the root result relation
* pointer. We do so for the convenience of places that want to
@@ -4056,6 +4067,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mtstate;
/*
* Do additional per-result-relation initialization.
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index 1211d871ea..8d67d17e10 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -295,11 +295,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return nlstate;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return nlstate;
/*
* Initialize result slot, type and projection.
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index 514669558c..2074ba6683 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -255,6 +255,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return state;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index 08637a64db..f07205f958 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return rustate;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return rustate;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index f15902e840..6820d3bfd5 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -208,6 +208,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return resstate;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index a6813559e6..02051fea51 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -125,6 +125,8 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 911266da07..9e3ef94388 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,8 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 92bd2da8e0..46e294ba52 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return setopstate;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index c8a35b64a8..9de717aa7c 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return sortstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 91d7ae82ce..d9c10d1f6f 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return subquerystate;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index d6b4ed2e42..831f514c4d 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -378,6 +378,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return tidrangestate;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 74ec6afdcc..657411ef19 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -523,6 +523,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return tidstate;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 13c556326a..ee30688417 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -136,6 +136,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return uniquestate;
/*
* Initialize result slot and type. Unique nodes do no projections, so
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 0253c47448..809cedf187 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2461,6 +2461,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return winstate;
/*
* initialize source tuple type (which is also the tuple type that we'll
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 0e46c59d25..892b2853ed 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2668,6 +2668,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ NULL,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5565f200c3..4ef349df8b 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan, if plan is from one */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -145,7 +147,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, NULL, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -493,6 +495,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ NULL,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index af2bf36dfb..4b7368a0dc 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -32,9 +32,12 @@
*/
typedef struct QueryDesc
{
+ NodeTag type;
+
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ struct CachedPlan *cplan; /* CachedPlan, if plannedstmt is from one */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +60,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ struct CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index e1eefb400b..3b33b38196 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -256,6 +257,15 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the CachedPlan in es_cachedplan still valid?
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanStillValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5d7f17dee0..08670bc5ed 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -622,6 +622,8 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct CachedPlan *es_cachedplan; /* CachedPlan if plannedstmt is from
+ * one, or NULL if not */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 110e649fce..5a54f8a917 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -223,6 +223,20 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Invoked by the executor for each relation lock acquired during the
+ * initialization of the plan tree within the CachedPlan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
--
2.35.3
v49-0001-Assorted-tightening-in-various-ExecEnd-routines.patchapplication/octet-stream; name=v49-0001-Assorted-tightening-in-various-ExecEnd-routines.patchDownload
From f7902c69704598e9dbbc1ee72d42bc9ca6c98a21 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 28 Sep 2023 16:56:29 +0900
Subject: [PATCH v49 1/8] Assorted tightening in various ExecEnd()* routines
This includes adding NULLness checks on pointers before cleaning them
in up. Many ExecEnd*() routines already perform this check, but a few
instances remain. These NULLness checks might seem redundant as
things stand since the ExecEnd*() routines operate under the
assumption that their matching ExecInit* routine would have fully
executed, ensuring pointers are set. However, a forthcoming patch will
modify ExecInit* routines to sometimes exit early, potentially leaving
some pointers in an undetermined state, so it will become crucial to
have these NULLness checks in place.
This also adds a guard at the begigging of EvalPlanQualEnd() to return
early if the EPQState does not appear to have been initialized. That
case can happen if the corresponding ExecInit*() routine returned
early without calling EvalPlanQualInit().
While at it, this commit ensures that pointers are consistently set
to NULL after cleanup in all ExecEnd*() routines.
Finally, for enhanced consistency, the format of NULLness checks has
been standardized to "if (pointer != NULL)", replacing the previous
"if (pointer)" style.
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 4 ++
src/backend/executor/nodeAgg.c | 27 +++++++++----
src/backend/executor/nodeAppend.c | 3 ++
src/backend/executor/nodeBitmapAnd.c | 4 +-
src/backend/executor/nodeBitmapHeapscan.c | 47 +++++++++++++++-------
src/backend/executor/nodeBitmapIndexscan.c | 23 +++++------
src/backend/executor/nodeBitmapOr.c | 4 +-
src/backend/executor/nodeForeignscan.c | 17 ++++----
src/backend/executor/nodeGather.c | 1 +
src/backend/executor/nodeGatherMerge.c | 1 +
src/backend/executor/nodeGroup.c | 6 +--
src/backend/executor/nodeHash.c | 6 +--
src/backend/executor/nodeHashjoin.c | 4 +-
src/backend/executor/nodeIncrementalSort.c | 13 +++++-
src/backend/executor/nodeIndexonlyscan.c | 25 ++++++------
src/backend/executor/nodeIndexscan.c | 23 +++++------
src/backend/executor/nodeLimit.c | 1 +
src/backend/executor/nodeLockRows.c | 1 +
src/backend/executor/nodeMaterial.c | 5 ++-
src/backend/executor/nodeMemoize.c | 8 +++-
src/backend/executor/nodeMergeAppend.c | 3 ++
src/backend/executor/nodeMergejoin.c | 2 +
src/backend/executor/nodeModifyTable.c | 11 ++++-
src/backend/executor/nodeNestloop.c | 2 +
src/backend/executor/nodeProjectSet.c | 1 +
src/backend/executor/nodeRecursiveunion.c | 24 +++++++++--
src/backend/executor/nodeResult.c | 1 +
src/backend/executor/nodeSamplescan.c | 7 +++-
src/backend/executor/nodeSeqscan.c | 16 +++-----
src/backend/executor/nodeSetOp.c | 6 ++-
src/backend/executor/nodeSort.c | 5 ++-
src/backend/executor/nodeSubqueryscan.c | 1 +
src/backend/executor/nodeTableFuncscan.c | 4 +-
src/backend/executor/nodeTidrangescan.c | 12 ++++--
src/backend/executor/nodeTidscan.c | 8 +++-
src/backend/executor/nodeUnique.c | 1 +
src/backend/executor/nodeWindowAgg.c | 41 ++++++++++++++-----
37 files changed, 248 insertions(+), 120 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4c5a7bbf62..f7f18d3054 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -3010,6 +3010,10 @@ EvalPlanQualEnd(EPQState *epqstate)
MemoryContext oldcontext;
ListCell *l;
+ /* Nothing to do if no EvalPlanQualInit() was done to begin with. */
+ if (epqstate->parentestate == NULL)
+ return;
+
rtsize = epqstate->parentestate->es_range_table_size;
/*
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index f154f28902..af22b1676f 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -4304,7 +4304,6 @@ GetAggInitVal(Datum textInitVal, Oid transtype)
void
ExecEndAgg(AggState *node)
{
- PlanState *outerPlan;
int transno;
int numGroupingSets = Max(node->maxsets, 1);
int setno;
@@ -4314,7 +4313,7 @@ ExecEndAgg(AggState *node)
* worker back into shared memory so that it can be picked up by the main
* process to report in EXPLAIN ANALYZE.
*/
- if (node->shared_info && IsParallelWorker())
+ if (node->shared_info != NULL && IsParallelWorker())
{
AggregateInstrumentation *si;
@@ -4327,10 +4326,16 @@ ExecEndAgg(AggState *node)
/* Make sure we have closed any open tuplesorts */
- if (node->sort_in)
+ if (node->sort_in != NULL)
+ {
tuplesort_end(node->sort_in);
- if (node->sort_out)
+ node->sort_in = NULL;
+ }
+ if (node->sort_out != NULL)
+ {
tuplesort_end(node->sort_out);
+ node->sort_out = NULL;
+ }
hashagg_reset_spill_state(node);
@@ -4346,19 +4351,25 @@ ExecEndAgg(AggState *node)
for (setno = 0; setno < numGroupingSets; setno++)
{
- if (pertrans->sortstates[setno])
+ if (pertrans->sortstates[setno] != NULL)
tuplesort_end(pertrans->sortstates[setno]);
}
}
/* And ensure any agg shutdown callbacks have been called */
for (setno = 0; setno < numGroupingSets; setno++)
+ {
ReScanExprContext(node->aggcontexts[setno]);
- if (node->hashcontext)
+ node->aggcontexts[setno] = NULL;
+ }
+ if (node->hashcontext != NULL)
+ {
ReScanExprContext(node->hashcontext);
+ node->hashcontext = NULL;
+ }
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..a2af221e05 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -399,7 +399,10 @@ ExecEndAppend(AppendState *node)
* shut down each of the subscans
*/
for (i = 0; i < nplans; i++)
+ {
ExecEndNode(appendplans[i]);
+ appendplans[i] = NULL;
+ }
}
void
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 4c5eb2b23b..4abb0609a0 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -192,8 +192,8 @@ ExecEndBitmapAnd(BitmapAndState *node)
*/
for (i = 0; i < nplans; i++)
{
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
+ ExecEndNode(bitmapplans[i]);
+ bitmapplans[i] = NULL;
}
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 2db0acfc76..d3f58c22f9 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -648,40 +648,59 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node)
void
ExecEndBitmapHeapScan(BitmapHeapScanState *node)
{
- TableScanDesc scanDesc;
-
- /*
- * extract information from the node
- */
- scanDesc = node->ss.ss_currentScanDesc;
-
/*
* close down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
/*
* release bitmaps and buffers if any
*/
- if (node->tbmiterator)
+ if (node->tbmiterator != NULL)
+ {
tbm_end_iterate(node->tbmiterator);
- if (node->prefetch_iterator)
+ node->tbmiterator = NULL;
+ }
+ if (node->prefetch_iterator != NULL)
+ {
tbm_end_iterate(node->prefetch_iterator);
- if (node->tbm)
+ node->prefetch_iterator = NULL;
+ }
+ if (node->tbm != NULL)
+ {
tbm_free(node->tbm);
- if (node->shared_tbmiterator)
+ node->tbm = NULL;
+ }
+ if (node->shared_tbmiterator != NULL)
+ {
tbm_end_shared_iterate(node->shared_tbmiterator);
- if (node->shared_prefetch_iterator)
+ node->shared_tbmiterator = NULL;
+ }
+ if (node->shared_prefetch_iterator != NULL)
+ {
tbm_end_shared_iterate(node->shared_prefetch_iterator);
+ node->shared_prefetch_iterator = NULL;
+ }
if (node->vmbuffer != InvalidBuffer)
+ {
ReleaseBuffer(node->vmbuffer);
+ node->vmbuffer = InvalidBuffer;
+ }
if (node->pvmbuffer != InvalidBuffer)
+ {
ReleaseBuffer(node->pvmbuffer);
+ node->pvmbuffer = InvalidBuffer;
+ }
/*
- * close heap scan
+ * close heap scan (no-op if we didn't start it)
*/
- table_endscan(scanDesc);
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
+ table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 7cf8532bc9..488f11a3ff 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -175,22 +175,21 @@ ExecReScanBitmapIndexScan(BitmapIndexScanState *node)
void
ExecEndBitmapIndexScan(BitmapIndexScanState *node)
{
- Relation indexRelationDesc;
- IndexScanDesc indexScanDesc;
-
- /*
- * extract information from the node
- */
- indexRelationDesc = node->biss_RelationDesc;
- indexScanDesc = node->biss_ScanDesc;
+ /* close the scan (no-op if we didn't start it) */
+ if (node->biss_ScanDesc != NULL)
+ {
+ index_endscan(node->biss_ScanDesc);
+ node->biss_ScanDesc = NULL;
+ }
/*
* close the index relation (no-op if we didn't open it)
*/
- if (indexScanDesc)
- index_endscan(indexScanDesc);
- if (indexRelationDesc)
- index_close(indexRelationDesc, NoLock);
+ if (node->biss_RelationDesc != NULL)
+ {
+ index_close(node->biss_RelationDesc, NoLock);
+ node->biss_RelationDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 0bf8af9652..ace18593aa 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -210,8 +210,8 @@ ExecEndBitmapOr(BitmapOrState *node)
*/
for (i = 0; i < nplans; i++)
{
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
+ ExecEndNode(bitmapplans[i]);
+ bitmapplans[i] = NULL;
}
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 73913ebb18..3aba28285a 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -301,17 +301,20 @@ ExecEndForeignScan(ForeignScanState *node)
EState *estate = node->ss.ps.state;
/* Let the FDW shut down */
- if (plan->operation != CMD_SELECT)
+ if (node->fdwroutine != NULL)
{
- if (estate->es_epq_active == NULL)
- node->fdwroutine->EndDirectModify(node);
+ if (plan->operation != CMD_SELECT)
+ {
+ if (estate->es_epq_active == NULL)
+ node->fdwroutine->EndDirectModify(node);
+ }
+ else
+ node->fdwroutine->EndForeignScan(node);
}
- else
- node->fdwroutine->EndForeignScan(node);
/* Shut down any outer plan. */
- if (outerPlanState(node))
- ExecEndNode(outerPlanState(node));
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index bb2500a469..1a3c8abdad 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -249,6 +249,7 @@ void
ExecEndGather(GatherState *node)
{
ExecEndNode(outerPlanState(node)); /* let children clean up first */
+ outerPlanState(node) = NULL;
ExecShutdownGather(node);
}
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 7a71a58509..c6fb45fee0 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -289,6 +289,7 @@ void
ExecEndGatherMerge(GatherMergeState *node)
{
ExecEndNode(outerPlanState(node)); /* let children clean up first */
+ outerPlanState(node) = NULL;
ExecShutdownGatherMerge(node);
}
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 8c650f0e46..6dfe5a1d23 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -226,10 +226,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
void
ExecEndGroup(GroupState *node)
{
- PlanState *outerPlan;
-
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index e72f0986c2..88ba336882 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -413,13 +413,11 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
void
ExecEndHash(HashState *node)
{
- PlanState *outerPlan;
-
/*
* shut down the subplan
*/
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 25a2d78f15..03dd931527 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -861,7 +861,7 @@ ExecEndHashJoin(HashJoinState *node)
/*
* Free hash table
*/
- if (node->hj_HashTable)
+ if (node->hj_HashTable != NULL)
{
ExecHashTableDestroy(node->hj_HashTable);
node->hj_HashTable = NULL;
@@ -871,7 +871,9 @@ ExecEndHashJoin(HashJoinState *node)
* clean up subtrees
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
}
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index cd094a190c..28a0e81cb3 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1079,8 +1079,16 @@ ExecEndIncrementalSort(IncrementalSortState *node)
{
SO_printf("ExecEndIncrementalSort: shutting down sort node\n");
- ExecDropSingleTupleTableSlot(node->group_pivot);
- ExecDropSingleTupleTableSlot(node->transfer_tuple);
+ if (node->group_pivot != NULL)
+ {
+ ExecDropSingleTupleTableSlot(node->group_pivot);
+ node->group_pivot = NULL;
+ }
+ if (node->transfer_tuple != NULL)
+ {
+ ExecDropSingleTupleTableSlot(node->transfer_tuple);
+ node->transfer_tuple = NULL;
+ }
/*
* Release tuplesort resources.
@@ -1100,6 +1108,7 @@ ExecEndIncrementalSort(IncrementalSortState *node)
* Shut down the subplan.
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
SO_printf("ExecEndIncrementalSort: sort node shutdown\n");
}
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f1db35665c..1f3843abe9 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -364,15 +364,6 @@ ExecReScanIndexOnlyScan(IndexOnlyScanState *node)
void
ExecEndIndexOnlyScan(IndexOnlyScanState *node)
{
- Relation indexRelationDesc;
- IndexScanDesc indexScanDesc;
-
- /*
- * extract information from the node
- */
- indexRelationDesc = node->ioss_RelationDesc;
- indexScanDesc = node->ioss_ScanDesc;
-
/* Release VM buffer pin, if any. */
if (node->ioss_VMBuffer != InvalidBuffer)
{
@@ -380,13 +371,21 @@ ExecEndIndexOnlyScan(IndexOnlyScanState *node)
node->ioss_VMBuffer = InvalidBuffer;
}
+ /* close the scan (no-op if we didn't start it) */
+ if (node->ioss_ScanDesc != NULL)
+ {
+ index_endscan(node->ioss_ScanDesc);
+ node->ioss_ScanDesc = NULL;
+ }
+
/*
* close the index relation (no-op if we didn't open it)
*/
- if (indexScanDesc)
- index_endscan(indexScanDesc);
- if (indexRelationDesc)
- index_close(indexRelationDesc, NoLock);
+ if (node->ioss_RelationDesc != NULL)
+ {
+ index_close(node->ioss_RelationDesc, NoLock);
+ node->ioss_RelationDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 14b9c00217..32e1714f15 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -785,22 +785,21 @@ ExecIndexAdvanceArrayKeys(IndexArrayKeyInfo *arrayKeys, int numArrayKeys)
void
ExecEndIndexScan(IndexScanState *node)
{
- Relation indexRelationDesc;
- IndexScanDesc indexScanDesc;
-
- /*
- * extract information from the node
- */
- indexRelationDesc = node->iss_RelationDesc;
- indexScanDesc = node->iss_ScanDesc;
+ /* close the scan (no-op if we didn't start it) */
+ if (node->iss_ScanDesc != NULL)
+ {
+ index_endscan(node->iss_ScanDesc);
+ node->iss_ScanDesc = NULL;
+ }
/*
* close the index relation (no-op if we didn't open it)
*/
- if (indexScanDesc)
- index_endscan(indexScanDesc);
- if (indexRelationDesc)
- index_close(indexRelationDesc, NoLock);
+ if (node->iss_RelationDesc != NULL)
+ {
+ index_close(node->iss_RelationDesc, NoLock);
+ node->iss_RelationDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 5654158e3e..a97bac9f6d 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -535,6 +535,7 @@ void
ExecEndLimit(LimitState *node)
{
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index e459971d32..26fbe95c57 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -387,6 +387,7 @@ ExecEndLockRows(LockRowsState *node)
/* We may have shut down EPQ already, but no harm in another call */
EvalPlanQualEnd(&node->lr_epqstate);
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 753ea28915..03c514900b 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -243,13 +243,16 @@ ExecEndMaterial(MaterialState *node)
* Release tuplestore resources
*/
if (node->tuplestorestate != NULL)
+ {
tuplestore_end(node->tuplestorestate);
- node->tuplestorestate = NULL;
+ node->tuplestorestate = NULL;
+ }
/*
* shut down the subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 1085b3c79d..9b13e2e552 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -1062,6 +1062,7 @@ ExecEndMemoize(MemoizeState *node)
{
#ifdef USE_ASSERT_CHECKING
/* Validate the memory accounting code is correct in assert builds. */
+ if (node->hashtable != NULL)
{
int count;
uint64 mem = 0;
@@ -1108,12 +1109,17 @@ ExecEndMemoize(MemoizeState *node)
}
/* Remove the cache context */
- MemoryContextDelete(node->tableContext);
+ if (node->tableContext != NULL)
+ {
+ MemoryContextDelete(node->tableContext);
+ node->tableContext = NULL;
+ }
/*
* shut down the subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 21b5726e6e..0a42a04b19 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -333,7 +333,10 @@ ExecEndMergeAppend(MergeAppendState *node)
* shut down each of the subscans
*/
for (i = 0; i < nplans; i++)
+ {
ExecEndNode(mergeplans[i]);
+ mergeplans[i] = NULL;
+ }
}
void
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 3cdab77dfc..4b181621f9 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1647,7 +1647,9 @@ ExecEndMergeJoin(MergeJoinState *node)
* shut down the subplans
*/
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
MJ1_printf("ExecEndMergeJoin: %s\n",
"node processing ended");
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index b16fbe9e22..bc82fb033a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4447,7 +4447,9 @@ ExecEndModifyTable(ModifyTableState *node)
for (j = 0; j < resultRelInfo->ri_NumSlotsInitialized; j++)
{
ExecDropSingleTupleTableSlot(resultRelInfo->ri_Slots[j]);
+ resultRelInfo->ri_Slots[j] = NULL;
ExecDropSingleTupleTableSlot(resultRelInfo->ri_PlanSlots[j]);
+ resultRelInfo->ri_PlanSlots[j] = NULL;
}
}
@@ -4455,12 +4457,16 @@ ExecEndModifyTable(ModifyTableState *node)
* Close all the partitioned tables, leaf partitions, and their indices
* and release the slot used for tuple routing, if set.
*/
- if (node->mt_partition_tuple_routing)
+ if (node->mt_partition_tuple_routing != NULL)
{
ExecCleanupTupleRouting(node, node->mt_partition_tuple_routing);
+ node->mt_partition_tuple_routing = NULL;
- if (node->mt_root_tuple_slot)
+ if (node->mt_root_tuple_slot != NULL)
+ {
ExecDropSingleTupleTableSlot(node->mt_root_tuple_slot);
+ node->mt_root_tuple_slot = NULL;
+ }
}
/*
@@ -4472,6 +4478,7 @@ ExecEndModifyTable(ModifyTableState *node)
* shut down subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index ebd1406843..1211d871ea 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -368,7 +368,9 @@ ExecEndNestLoop(NestLoopState *node)
* close down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
NL1_printf("ExecEndNestLoop: %s\n",
"node processing ended");
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index aee26d3813..514669558c 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -332,6 +332,7 @@ ExecEndProjectSet(ProjectSetState *node)
* shut down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index 3207643156..08637a64db 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -272,20 +272,36 @@ void
ExecEndRecursiveUnion(RecursiveUnionState *node)
{
/* Release tuplestores */
- tuplestore_end(node->working_table);
- tuplestore_end(node->intermediate_table);
+ if (node->working_table != NULL)
+ {
+ tuplestore_end(node->working_table);
+ node->working_table = NULL;
+ }
+ if (node->intermediate_table != NULL)
+ {
+ tuplestore_end(node->intermediate_table);
+ node->intermediate_table = NULL;
+ }
/* free subsidiary stuff including hashtable */
- if (node->tempContext)
+ if (node->tempContext != NULL)
+ {
MemoryContextDelete(node->tempContext);
- if (node->tableContext)
+ node->tempContext = NULL;
+ }
+ if (node->tableContext != NULL)
+ {
MemoryContextDelete(node->tableContext);
+ node->tableContext = NULL;
+ }
/*
* close down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index e9f5732f33..f15902e840 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -244,6 +244,7 @@ ExecEndResult(ResultState *node)
* shut down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 41c1ea37ad..a6813559e6 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -185,14 +185,17 @@ ExecEndSampleScan(SampleScanState *node)
/*
* Tell sampling function that we finished the scan.
*/
- if (node->tsmroutine->EndSampleScan)
+ if (node->tsmroutine != NULL && node->tsmroutine->EndSampleScan)
node->tsmroutine->EndSampleScan(node);
/*
- * close heap scan
+ * close heap scan (no-op if we didn't start it)
*/
if (node->ss.ss_currentScanDesc)
+ {
table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 49a5933aff..911266da07 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -183,18 +183,14 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
void
ExecEndSeqScan(SeqScanState *node)
{
- TableScanDesc scanDesc;
-
- /*
- * get information from node
- */
- scanDesc = node->ss.ss_currentScanDesc;
-
/*
- * close heap scan
+ * close heap scan (no-op if we didn't start it)
*/
- if (scanDesc != NULL)
- table_endscan(scanDesc);
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
+ table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 5a84969cf8..92bd2da8e0 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -583,10 +583,14 @@ void
ExecEndSetOp(SetOpState *node)
{
/* free subsidiary stuff including hashtable */
- if (node->tableContext)
+ if (node->tableContext != NULL)
+ {
MemoryContextDelete(node->tableContext);
+ node->tableContext = NULL;
+ }
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index eea7f2ae15..c8a35b64a8 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -307,13 +307,16 @@ ExecEndSort(SortState *node)
* Release tuplesort resources
*/
if (node->tuplesortstate != NULL)
+ {
tuplesort_end((Tuplesortstate *) node->tuplesortstate);
- node->tuplesortstate = NULL;
+ node->tuplesortstate = NULL;
+ }
/*
* shut down the subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
SO1_printf("ExecEndSort: %s\n",
"sort node shutdown");
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 1ee6295660..91d7ae82ce 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -171,6 +171,7 @@ ExecEndSubqueryScan(SubqueryScanState *node)
* close down subquery
*/
ExecEndNode(node->subplan);
+ node->subplan = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTableFuncscan.c b/src/backend/executor/nodeTableFuncscan.c
index a60dcd4943..80ed4b26a8 100644
--- a/src/backend/executor/nodeTableFuncscan.c
+++ b/src/backend/executor/nodeTableFuncscan.c
@@ -217,8 +217,10 @@ ExecEndTableFuncScan(TableFuncScanState *node)
* Release tuplestore resources
*/
if (node->tupstore != NULL)
+ {
tuplestore_end(node->tupstore);
- node->tupstore = NULL;
+ node->tupstore = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 6f97c35daa..d6b4ed2e42 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -327,10 +327,14 @@ ExecReScanTidRangeScan(TidRangeScanState *node)
void
ExecEndTidRangeScan(TidRangeScanState *node)
{
- TableScanDesc scan = node->ss.ss_currentScanDesc;
-
- if (scan != NULL)
- table_endscan(scan);
+ /*
+ * close heap scan (no-op if we didn't start it)
+ */
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
+ table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 15055077d0..74ec6afdcc 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -470,8 +470,14 @@ ExecReScanTidScan(TidScanState *node)
void
ExecEndTidScan(TidScanState *node)
{
- if (node->ss.ss_currentScanDesc)
+ /*
+ * close heap scan (no-op if we didn't start it)
+ */
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index 01f951197c..13c556326a 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -169,6 +169,7 @@ void
ExecEndUnique(UniqueState *node)
{
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 3258305f57..0253c47448 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -1351,11 +1351,14 @@ release_partition(WindowAggState *winstate)
* any aggregate temp data). We don't rely on retail pfree because some
* aggregates might have allocated data we don't have direct pointers to.
*/
- MemoryContextReset(winstate->partcontext);
- MemoryContextReset(winstate->aggcontext);
+ if (winstate->partcontext != NULL)
+ MemoryContextReset(winstate->partcontext);
+ if (winstate->aggcontext != NULL)
+ MemoryContextReset(winstate->aggcontext);
for (i = 0; i < winstate->numaggs; i++)
{
- if (winstate->peragg[i].aggcontext != winstate->aggcontext)
+ if (winstate->peragg[i].aggcontext != NULL &&
+ winstate->peragg[i].aggcontext != winstate->aggcontext)
MemoryContextReset(winstate->peragg[i].aggcontext);
}
@@ -2681,24 +2684,40 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
void
ExecEndWindowAgg(WindowAggState *node)
{
- PlanState *outerPlan;
int i;
release_partition(node);
for (i = 0; i < node->numaggs; i++)
{
- if (node->peragg[i].aggcontext != node->aggcontext)
+ if (node->peragg[i].aggcontext != NULL &&
+ node->peragg[i].aggcontext != node->aggcontext)
MemoryContextDelete(node->peragg[i].aggcontext);
}
- MemoryContextDelete(node->partcontext);
- MemoryContextDelete(node->aggcontext);
+ if (node->partcontext != NULL)
+ {
+ MemoryContextDelete(node->partcontext);
+ node->partcontext = NULL;
+ }
+ if (node->aggcontext != NULL)
+ {
+ MemoryContextDelete(node->aggcontext);
+ node->aggcontext = NULL;
+ }
- pfree(node->perfunc);
- pfree(node->peragg);
+ if (node->perfunc != NULL)
+ {
+ pfree(node->perfunc);
+ node->perfunc = NULL;
+ }
+ if (node->peragg != NULL)
+ {
+ pfree(node->peragg);
+ node->peragg = NULL;
+ }
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
/* -----------------
--
2.35.3
v49-0003-Adjustments-to-allow-ExecutorStart-to-sometimes-.patchapplication/octet-stream; name=v49-0003-Adjustments-to-allow-ExecutorStart-to-sometimes-.patchDownload
From fa5e9ab0316bd57f0248982729f906ddfbdca80e Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:53:46 +0900
Subject: [PATCH v49 3/8] Adjustments to allow ExecutorStart() to sometimes
fail
Upon passing a plan tree from a CachedPlan to the executor, there's a
possibility that ExecutorStart() might return an incompletely set up
planstate tree. This can happen if the CachedPlan undergoes invalidation
during the ExecInitNode() initialization process. In such cases, the
execution should be reattempted using a fresh CachedPlan. Also, any
partially initialized EState must be cleaned up by invoking both
ExecutorEnd() and FreeExecutorState().
ExecutorStart() (and ExecutorStart_hook()) now return a Boolean telling
the caller if the plan initialization failed.
For the replan loop in that context, it makes more sense to have
ExecutorStart() either in the same scope or closer to where
GetCachedPlan() is invoked. So this commit modifies the following
sites:
* The ExecutorStart() call in ExplainOnePlan() is moved into a new
function ExplainQueryDesc() along with CreateQueryDesc(). Callers
of ExplainOnePlan() should now call the new function first.
* The ExecutorStart() call in _SPI_pquery() is moved to its caller
_SPI_execute_plan().
* The ExecutorStart() call in PortalRunMulti() is moved to
PortalStart(). This requires a new List field in PortalData to
store the QueryDescs created in PortalStart() and a new memory
context for those. One unintended consequence is that
CommandCounterIncrement() between queries in the PORTAL_MULTI_QUERY
case is now done in the loop in PortalStart() and not in
PortalRunMulti(). That still works because the Snapshot registered
in QueryDesc/EState is updated to account for the CCI().
This commit also adds a new flag to EState called es_canceled that
complements es_finished to denote the new scenario where
ExecutorStart() returns with a partially setup planstate tree. Also,
to reset the AFTER trigger state that would have been set up in the
ExecutorStart(), this adds a new function AfterTriggerCancelQuery()
which is called from ExecutorEnd() (not ExecutorFinish()) when
es_canceled is true.
Note that this commit by itself doesn't make any functional change,
because the CachedPlan is not passed into the executor yet.
---
contrib/auto_explain/auto_explain.c | 12 +-
.../pg_stat_statements/pg_stat_statements.c | 12 +-
src/backend/commands/copyto.c | 5 +-
src/backend/commands/createas.c | 9 +-
src/backend/commands/explain.c | 145 +++++---
src/backend/commands/extension.c | 6 +-
src/backend/commands/matview.c | 9 +-
src/backend/commands/portalcmds.c | 6 +-
src/backend/commands/prepare.c | 31 +-
src/backend/commands/trigger.c | 13 +
src/backend/executor/execMain.c | 44 ++-
src/backend/executor/execParallel.c | 6 +-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 7 +-
src/backend/executor/spi.c | 48 ++-
src/backend/tcop/postgres.c | 18 +-
src/backend/tcop/pquery.c | 346 +++++++++---------
src/backend/utils/mmgr/portalmem.c | 9 +
src/include/commands/explain.h | 7 +-
src/include/commands/trigger.h | 1 +
src/include/executor/executor.h | 6 +-
src/include/nodes/execnodes.h | 3 +
src/include/tcop/pquery.h | 2 +-
src/include/utils/portal.h | 2 +
24 files changed, 466 insertions(+), 282 deletions(-)
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index c3ac27ae99..a0630d7944 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -78,7 +78,7 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
-static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void explain_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count, bool execute_once);
@@ -258,9 +258,11 @@ _PG_init(void)
/*
* ExecutorStart hook: start up logging if needed
*/
-static void
+static bool
explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
/*
* At the beginning of each top-level statement, decide whether we'll
* sample this statement. If nested-statement explaining is enabled,
@@ -296,9 +298,9 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
}
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
if (auto_explain_enabled())
{
@@ -316,6 +318,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return plan_valid;
}
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 6c63acf989..6325eeaf46 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -326,7 +326,7 @@ static PlannedStmt *pgss_planner(Query *parse,
const char *query_string,
int cursorOptions,
ParamListInfo boundParams);
-static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void pgss_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count, bool execute_once);
@@ -979,13 +979,15 @@ pgss_planner(Query *parse,
/*
* ExecutorStart hook: start up tracking if needed
*/
-static void
+static bool
pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
+
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, eflags);
/*
* If query has queryId zero, don't track it. This prevents double
@@ -1008,6 +1010,8 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return plan_valid;
}
/*
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 0929ad929a..699060fd55 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -568,8 +568,11 @@ BeginCopyTo(ParseState *pstate,
* Call ExecutorStart to prepare the plan for execution.
*
* ExecutorStart computes a result tupdesc for us
+ *
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
*/
- ExecutorStart(cstate->queryDesc, 0);
+ (void) ExecutorStart(cstate->queryDesc, 0);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 18b07c0200..4a950c03ff 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -329,8 +329,13 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * call ExecutorStart to prepare the plan for execution
+ *
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
+ */
+ (void) ExecutorStart(queryDesc, GetIntoRelEFlags(into));
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c698e54fec..7b57cd02aa 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -393,6 +393,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
else
{
PlannedStmt *plan;
+ QueryDesc *queryDesc;
instr_time planstart,
planduration;
BufferUsage bufusage_start,
@@ -415,12 +416,90 @@ ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
+ queryDesc = ExplainQueryDesc(plan, NULL, queryString, into, es,
+ params, queryEnv);
+ Assert(queryDesc);
+
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(queryDesc, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
+/*
+ * ExplainQueryDesc
+ * Set up QueryDesc for EXPLAINing a given plan
+ *
+ * This returns NULL if cplan is found to have been invalidated after
+ * calling ExecutorStart().
+ */
+QueryDesc *
+ExplainQueryDesc(PlannedStmt *stmt, CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv)
+{
+ QueryDesc *queryDesc;
+ DestReceiver *dest;
+ int eflags;
+ int instrument_option = 0;
+
+ /*
+ * Normally we discard the query's output, but if explaining CREATE TABLE
+ * AS, we'd better use the appropriate tuple receiver.
+ */
+ if (into)
+ dest = CreateIntoRelDestReceiver(into);
+ else
+ dest = None_Receiver;
+
+ if (es->analyze && es->timing)
+ instrument_option |= INSTRUMENT_TIMER;
+ else if (es->analyze)
+ instrument_option |= INSTRUMENT_ROWS;
+
+ if (es->buffers)
+ instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
+
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+
+ /* Create a QueryDesc for the query */
+ queryDesc = CreateQueryDesc(stmt, cplan, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv, instrument_option);
+
+ /* Select execution options */
+ if (es->analyze)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_EXPLAIN_ONLY;
+ if (es->generic)
+ eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
+ if (into)
+ eflags |= GetIntoRelEFlags(into);
+
+ /*
+ * Call ExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated during plan intialization.
+ */
+ if (!ExecutorStart(queryDesc, eflags))
+ {
+ /* Clean up. */
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ return NULL;
+ }
+
+ return queryDesc;
+}
+
/*
* ExplainOneUtility -
* print out the execution plan for one utility statement
@@ -524,29 +603,16 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
{
- DestReceiver *dest;
- QueryDesc *queryDesc;
instr_time starttime;
double totaltime = 0;
- int eflags;
- int instrument_option = 0;
-
- Assert(plannedstmt->commandType != CMD_UTILITY);
- if (es->analyze && es->timing)
- instrument_option |= INSTRUMENT_TIMER;
- else if (es->analyze)
- instrument_option |= INSTRUMENT_ROWS;
-
- if (es->buffers)
- instrument_option |= INSTRUMENT_BUFFERS;
- if (es->wal)
- instrument_option |= INSTRUMENT_WAL;
+ Assert(queryDesc->plannedstmt->commandType != CMD_UTILITY);
/*
* We always collect timing for the entire statement, even when node-level
@@ -555,40 +621,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
- /*
- * Normally we discard the query's output, but if explaining CREATE TABLE
- * AS, we'd better use the appropriate tuple receiver.
- */
- if (into)
- dest = CreateIntoRelDestReceiver(into);
- else
- dest = None_Receiver;
-
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, NULL, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
-
- /* Select execution options */
- if (es->analyze)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_EXPLAIN_ONLY;
- if (es->generic)
- eflags |= EXEC_FLAG_EXPLAIN_GENERIC;
- if (into)
- eflags |= GetIntoRelEFlags(into);
-
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
-
/* Execute the plan for statistics if asked for */
if (es->analyze)
{
@@ -4895,6 +4927,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index b287a2e84c..127d2a3b0a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -802,7 +802,11 @@ execute_sql_string(const char *sql)
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
- ExecutorStart(qdesc, 0);
+ /*
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
+ */
+ (void) ExecutorStart(qdesc, 0);
ExecutorRun(qdesc, ForwardScanDirection, 0, true);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 22b8b820c3..7083fb2350 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -412,8 +412,13 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, 0);
+ /*
+ * call ExecutorStart to prepare the plan for execution
+ *
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
+ */
+ (void) ExecutorStart(queryDesc, 0);
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 73ed7aa2f0..a1ee5c0acd 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -142,9 +142,11 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
/*
* Start execution, inserting parameters if any.
+ *
+ * OK to ignore the return value; plan can't become invalid here,
+ * because there's no CachedPlan.
*/
- PortalStart(portal, params, 0, GetActiveSnapshot());
-
+ (void) PortalStart(portal, params, 0, GetActiveSnapshot());
Assert(portal->strategy == PORTAL_ONE_SELECT);
/*
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 18f70319fc..f8d0b0ee25 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -183,6 +183,7 @@ ExecuteQuery(ParseState *pstate,
paramLI = EvaluateParams(pstate, entry, stmt->params, estate);
}
+replan:
/* Create a new portal to run the query in */
portal = CreateNewPortal();
/* Don't display the portal in pg_cursors, it is for internal use only */
@@ -251,9 +252,15 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal has a cached plan and
+ * it's found to be invalidated during the initialization of its plan
+ * trees, the plan must be regenerated.
*/
- PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ if (!PortalStart(portal, paramLI, eflags, GetActiveSnapshot()))
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
(void) PortalRun(portal, count, false, true, dest, dest, qc);
@@ -574,7 +581,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -618,6 +625,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI,
CurrentResourceOwner, queryEnv);
@@ -639,8 +647,21 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ {
+ QueryDesc *queryDesc;
+
+ queryDesc = ExplainQueryDesc(pstmt, cplan, queryString,
+ into, es, paramLI, queryEnv);
+ if (queryDesc == NULL)
+ {
+ ExplainResetOutput(es);
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ goto replan;
+ }
+ ExplainOnePlan(queryDesc, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
+ }
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 52177759ab..dd139432b9 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5009,6 +5009,19 @@ AfterTriggerBeginQuery(void)
afterTriggers.query_depth++;
}
+/* ----------
+ * AfterTriggerCancelQuery()
+ *
+ * Called from ExecutorEnd() if the query execution was canceled.
+ * ----------
+ */
+void
+AfterTriggerCancelQuery(void)
+{
+ /* Set to a value denoting that no query is active. */
+ afterTriggers.query_depth = -1;
+}
+
/* ----------
* AfterTriggerEndQuery()
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index de7bf7ca67..5755336abd 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -119,6 +119,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
*
* eflags contains flag bits as described in executor.h.
*
+ * Plan initialization may fail if the input plan tree is found to have been
+ * invalidated, which can happen if it comes from a CachedPlan.
+ *
+ * Returns true if plan was successfully initialized and false otherwise. If
+ * the latter, the caller must call ExecutorEnd() on 'queryDesc' to clean up
+ * after failed plan initialization.
+ *
* NB: the CurrentMemoryContext when this is called will become the parent
* of the per-query context used for this Executor invocation.
*
@@ -128,7 +135,7 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
*
* ----------------------------------------------------------------
*/
-void
+bool
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
/*
@@ -140,14 +147,15 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
if (ExecutorStart_hook)
- (*ExecutorStart_hook) (queryDesc, eflags);
- else
- standard_ExecutorStart(queryDesc, eflags);
+ return (*ExecutorStart_hook) (queryDesc, eflags);
+
+ return standard_ExecutorStart(queryDesc, eflags);
}
-void
+bool
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
+ bool plan_valid;
EState *estate;
MemoryContext oldcontext;
@@ -263,9 +271,14 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Initialize the plan state tree
*/
- (void) InitPlan(queryDesc, eflags);
+ plan_valid = InitPlan(queryDesc, eflags);
+
+ /* Mark execution as canceled if plan won't be executed. */
+ estate->es_canceled = !plan_valid;
MemoryContextSwitchTo(oldcontext);
+
+ return plan_valid;
}
/* ----------------------------------------------------------------
@@ -325,6 +338,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
estate = queryDesc->estate;
Assert(estate != NULL);
+ Assert(!estate->es_canceled);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -429,7 +443,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* This should be run once and only once per Executor instance */
- Assert(!estate->es_finished);
+ Assert(!estate->es_finished && !estate->es_canceled);
/* Switch into per-query memory context */
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -488,11 +502,11 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
Assert(estate != NULL);
/*
- * Check that ExecutorFinish was called, unless in EXPLAIN-only mode. This
- * Assert is needed because ExecutorFinish is new as of 9.1, and callers
- * might forget to call it.
+ * Check that ExecutorFinish was called, unless in EXPLAIN-only mode or if
+ * execution was canceled. This Assert is needed because ExecutorFinish is
+ * new as of 9.1, and callers might forget to call it.
*/
- Assert(estate->es_finished ||
+ Assert(estate->es_finished || estate->es_canceled ||
(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -506,6 +520,14 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
UnregisterSnapshot(estate->es_snapshot);
UnregisterSnapshot(estate->es_crosscheck_snapshot);
+ /*
+ * Cancel trigger execution too if the query execution was canceled.
+ */
+ if (estate->es_canceled &&
+ !(estate->es_top_eflags &
+ (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
+ AfterTriggerCancelQuery();
+
/*
* Must switch out of context before destroying it
*/
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 457ee46faf..13d2820a41 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1437,7 +1437,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
- ExecutorStart(queryDesc, fpes->eflags);
+ /*
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
+ */
+ (void) ExecutorStart(queryDesc, fpes->eflags);
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 16704c0c2f..f0f5740c26 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -151,6 +151,7 @@ CreateExecutorState(void)
estate->es_top_eflags = 0;
estate->es_instrument = 0;
estate->es_finished = false;
+ estate->es_canceled = false;
estate->es_exprcontexts = NIL;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 66636b05a5..27565a8b78 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -863,7 +863,12 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
eflags = EXEC_FLAG_SKIP_TRIGGERS;
else
eflags = 0; /* default run-to-completion flags */
- ExecutorStart(es->qd, eflags);
+
+ /*
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
+ */
+ (void) ExecutorStart(es->qd, eflags);
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 892b2853ed..60e2632cd2 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -71,7 +71,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1582,6 +1582,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
Snapshot snapshot;
MemoryContext oldcontext;
Portal portal;
+ bool plan_valid;
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -1623,6 +1624,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
_SPI_current->processed = 0;
_SPI_current->tuptable = NULL;
+replan:
/* Create the portal */
if (name == NULL || name[0] == '\0')
{
@@ -1766,15 +1768,23 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if the cached plan was found to have been invalidated when
+ * initializing one of the plan trees contained in it.
*/
- PortalStart(portal, paramLI, 0, snapshot);
+ plan_valid = PortalStart(portal, paramLI, 0, snapshot);
Assert(portal->strategy != PORTAL_MULTI_QUERY);
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
+ if (!plan_valid)
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/* Pop the SPI stack */
_SPI_end_call(true);
@@ -2552,6 +2562,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2661,6 +2672,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2675,8 +2687,23 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+
+ if (!ExecutorStart(qdesc, eflags))
+ {
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ Assert(cplan);
+ ReleaseCachedPlan(cplan, plan_owner);
+ goto replan;
+ }
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2851,10 +2878,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2898,14 +2924,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e415cf1f34..70e7a023d5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1231,7 +1231,12 @@ exec_simple_query(const char *query_string)
/*
* Start the portal. No parameters here.
*/
- PortalStart(portal, NULL, 0, InvalidSnapshot);
+ {
+ bool plan_valid PG_USED_FOR_ASSERTS_ONLY;
+
+ plan_valid = PortalStart(portal, NULL, 0, InvalidSnapshot);
+ Assert(plan_valid);
+ }
/*
* Select the appropriate output format: text unless we are doing a
@@ -1736,6 +1741,7 @@ exec_bind_message(StringInfo input_message)
"commands ignored until end of transaction block"),
errdetail_abort()));
+replan:
/*
* Create the portal. Allow silent replacement of an existing portal only
* if the unnamed portal is specified.
@@ -2023,9 +2029,15 @@ exec_bind_message(StringInfo input_message)
PopActiveSnapshot();
/*
- * And we're ready to start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if the cached plan was found to have been invalidated when
+ * initializing one of the plan trees contained in it.
*/
- PortalStart(portal, params, 0, InvalidSnapshot);
+ if (!PortalStart(portal, params, 0, InvalidSnapshot))
+ {
+ PortalDrop(portal, false);
+ goto replan;
+ }
/*
* Apply the result format requests to the portal.
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 4ef349df8b..fcf9925ed4 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -35,12 +36,6 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc);
static void FillPortalStore(Portal portal, bool isTopLevel);
static uint64 RunFromStore(Portal portal, ScanDirection direction, uint64 count,
DestReceiver *dest);
@@ -118,86 +113,6 @@ FreeQueryDesc(QueryDesc *qdesc)
}
-/*
- * ProcessQuery
- * Execute a single plannable query within a PORTAL_MULTI_QUERY,
- * PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
- *
- * plan: the plan tree for the query
- * sourceText: the source text of the query
- * params: any parameters needed
- * dest: where to send results
- * qc: where to store the command completion status data.
- *
- * qc may be NULL if caller doesn't want a status string.
- *
- * Must be called in a memory context that will be reset or deleted on
- * error; otherwise the executor's memory usage will be leaked.
- */
-static void
-ProcessQuery(PlannedStmt *plan,
- const char *sourceText,
- ParamListInfo params,
- QueryEnvironment *queryEnv,
- DestReceiver *dest,
- QueryCompletion *qc)
-{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, NULL, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
-
- /*
- * Run the plan to completion.
- */
- ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
-
- /*
- * Build command completion status data, if caller wants one.
- */
- if (qc)
- {
- switch (queryDesc->operation)
- {
- case CMD_SELECT:
- SetQueryCompletion(qc, CMDTAG_SELECT, queryDesc->estate->es_processed);
- break;
- case CMD_INSERT:
- SetQueryCompletion(qc, CMDTAG_INSERT, queryDesc->estate->es_processed);
- break;
- case CMD_UPDATE:
- SetQueryCompletion(qc, CMDTAG_UPDATE, queryDesc->estate->es_processed);
- break;
- case CMD_DELETE:
- SetQueryCompletion(qc, CMDTAG_DELETE, queryDesc->estate->es_processed);
- break;
- case CMD_MERGE:
- SetQueryCompletion(qc, CMDTAG_MERGE, queryDesc->estate->es_processed);
- break;
- default:
- SetQueryCompletion(qc, CMDTAG_UNKNOWN, queryDesc->estate->es_processed);
- break;
- }
- }
-
- /*
- * Now, we close down all the scans and free allocated resources.
- */
- ExecutorFinish(queryDesc);
- ExecutorEnd(queryDesc);
-
- FreeQueryDesc(queryDesc);
-}
-
/*
* ChoosePortalStrategy
* Select portal execution strategy given the intended statement list.
@@ -428,19 +343,21 @@ FetchStatementTargetList(Node *stmt)
* presently ignored for non-PORTAL_ONE_SELECT portals (it's only intended
* to be used for cursors).
*
- * On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * True is returned if portal is ready to accept PortalRun() calls, and the
+ * result tupdesc (if any) is known. False if the plan tree is no longer
+ * valid, in which case, the caller must retry after generating a new
+ * CachedPlan.
*/
-void
+bool
PortalStart(Portal portal, ParamListInfo params,
int eflags, Snapshot snapshot)
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
- int myeflags;
+ int myeflags = 0;
+ bool plan_valid = true;
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_DEFINED);
@@ -450,15 +367,13 @@ PortalStart(Portal portal, ParamListInfo params,
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -474,6 +389,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -491,8 +408,8 @@ PortalStart(Portal portal, ParamListInfo params,
*/
/*
- * Create QueryDesc in portal's context; for the moment, set
- * the destination to DestNone.
+ * Create QueryDesc in portal->queryContext; for the moment,
+ * set the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
NULL,
@@ -504,30 +421,51 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti(). */
+ if (portal->strategy == PORTAL_ONE_RETURNING ||
+ portal->strategy == PORTAL_ONE_MOD_WITH)
+ portal->qdescs = list_make1(queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Call ExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated during plan intialization.
*/
- ExecutorStart(queryDesc, myeflags);
+ if (!ExecutorStart(queryDesc, myeflags))
+ {
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ PopActiveSnapshot();
+ plan_valid = false;
+ goto plan_init_failed;
+ }
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -539,29 +477,6 @@ PortalStart(Portal portal, ParamListInfo params,
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -584,7 +499,82 @@ PortalStart(Portal portal, ParamListInfo params,
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool first = true;
+
+ myeflags = eflags;
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /*
+ * Push the snapshot to be used by the executor.
+ */
+ if (!is_utility)
+ {
+ /*
+ * Must copy the snapshot for all statements
+ * except thec first as we'll need to update its
+ * command ID.
+ */
+ if (!first)
+ PushCopiedSnapshot(GetTransactionSnapshot());
+ else
+ PushActiveSnapshot(GetTransactionSnapshot());
+ }
+
+ /*
+ * From the 2nd statement onwards, update the command
+ * ID and the snapshot to match.
+ */
+ if (!first)
+ {
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+ }
+
+ first = false;
+
+ /*
+ * Create the QueryDesc. DestReceiver will be set in
+ * PortalRunMulti() before calling ExecutorRun().
+ */
+ queryDesc = CreateQueryDesc(plan,
+ NULL,
+ portal->sourceText,
+ !is_utility ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ if (is_utility)
+ continue;
+
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated
+ * during plan intialization.
+ */
+ if (!ExecutorStart(queryDesc, myeflags))
+ {
+ PopActiveSnapshot();
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ plan_valid = false;
+ goto plan_init_failed;
+ }
+ PopActiveSnapshot();
+ }
+ }
+
portal->tupDesc = NULL;
break;
}
@@ -597,19 +587,20 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+plan_init_failed:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
- portal->status = PORTAL_READY;
+ return plan_valid;
}
/*
@@ -1196,7 +1187,7 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1217,9 +1208,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = (QueryDesc *) lfirst(qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1236,33 +1228,26 @@ PortalRunMulti(Portal portal,
if (log_executor_stats)
ResetUsage();
- /*
- * Must always have a snapshot for plannable queries. First time
- * through, take a new snapshot; for subsequent queries in the
- * same portal, just update the snapshot's copy of the command
- * counter.
- */
+ /* Push the snapshot for plannable queries. */
if (!active_snapshot_set)
{
- Snapshot snapshot = GetTransactionSnapshot();
+ Snapshot snapshot = qdesc->snapshot;
- /* If told to, register the snapshot and save in portal */
+ /*
+ * If told to, register the snapshot and save in portal
+ *
+ * Note that the command ID of qdesc->snapshot for 2nd query
+ * onwards would have been updated in PortalStart() to account
+ * for CCI() done between queries, but it's OK that here we
+ * don't likewise update holdSnapshot's command ID.
+ */
if (setHoldSnapshot)
{
snapshot = RegisterSnapshot(snapshot);
portal->holdSnapshot = snapshot;
}
- /*
- * We can't have the holdSnapshot also be the active one,
- * because UpdateActiveSnapshotCommandId would complain. So
- * force an extra snapshot copy. Plain PushActiveSnapshot
- * would have copied the transaction snapshot anyway, so this
- * only adds a copy step when setHoldSnapshot is true. (It's
- * okay for the command ID of the active snapshot to diverge
- * from what holdSnapshot has.)
- */
- PushCopiedSnapshot(snapshot);
+ PushActiveSnapshot(snapshot);
/*
* As for PORTAL_ONE_SELECT portals, it does not seem
@@ -1271,26 +1256,39 @@ PortalRunMulti(Portal portal,
active_snapshot_set = true;
}
- else
- UpdateActiveSnapshotCommandId();
+ /*
+ * Run the plan to completion.
+ */
+ qdesc->dest = dest;
+ ExecutorRun(qdesc, ForwardScanDirection, 0, true);
+
+ /*
+ * Build command completion status data if needed.
+ */
if (pstmt->canSetTag)
{
- /* statement can set tag string */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- dest, qc);
- }
- else
- {
- /* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
- portal->sourceText,
- portal->portalParams,
- portal->queryEnv,
- altdest, NULL);
+ switch (qdesc->operation)
+ {
+ case CMD_SELECT:
+ SetQueryCompletion(qc, CMDTAG_SELECT, qdesc->estate->es_processed);
+ break;
+ case CMD_INSERT:
+ SetQueryCompletion(qc, CMDTAG_INSERT, qdesc->estate->es_processed);
+ break;
+ case CMD_UPDATE:
+ SetQueryCompletion(qc, CMDTAG_UPDATE, qdesc->estate->es_processed);
+ break;
+ case CMD_DELETE:
+ SetQueryCompletion(qc, CMDTAG_DELETE, qdesc->estate->es_processed);
+ break;
+ case CMD_MERGE:
+ SetQueryCompletion(qc, CMDTAG_MERGE, qdesc->estate->es_processed);
+ break;
+ default:
+ SetQueryCompletion(qc, CMDTAG_UNKNOWN, qdesc->estate->es_processed);
+ break;
+ }
}
if (log_executor_stats)
@@ -1345,12 +1343,12 @@ PortalRunMulti(Portal portal,
if (portal->stmts == NIL)
break;
- /*
- * Increment command counter between queries, but not after the last
- * one.
- */
- if (lnext(portal->stmts, stmtlist_item) != NULL)
- CommandCounterIncrement();
+ if (qdesc->estate)
+ {
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ }
+ FreeQueryDesc(qdesc);
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 06dfa85f04..0cad450dcd 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -201,6 +201,13 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /*
+ * initialize portal's query context to store QueryDescs created during
+ * PortalStart() and then used in PortalRun().
+ */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -224,6 +231,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -594,6 +602,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index f9525fb572..054132823c 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -88,7 +88,11 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern QueryDesc *ExplainQueryDesc(PlannedStmt *stmt, struct CachedPlan *cplan,
+ const char *queryString, IntoClause *into, ExplainState *es,
+ ParamListInfo params, QueryEnvironment *queryEnv);
+extern void ExplainOnePlan(QueryDesc *queryDesc,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
@@ -104,6 +108,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 430e3ca7dd..d4f7c29301 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -257,6 +257,7 @@ extern void ExecASTruncateTriggers(EState *estate,
extern void AfterTriggerBeginXact(void);
extern void AfterTriggerBeginQuery(void);
+extern void AfterTriggerCancelQuery(void);
extern void AfterTriggerEndQuery(EState *estate);
extern void AfterTriggerFireDeferred(void);
extern void AfterTriggerEndXact(bool isCommit);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 3b33b38196..4f183ec6cd 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -73,7 +73,7 @@
/* Hook for plugins to get control in ExecutorStart() */
-typedef void (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
+typedef bool (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
/* Hook for plugins to get control in ExecutorRun() */
@@ -198,8 +198,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
-extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
-extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count, bool execute_once);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 08670bc5ed..cad2329ac9 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -669,6 +669,9 @@ typedef struct EState
int es_top_eflags; /* eflags passed to ExecutorStart */
int es_instrument; /* OR of InstrumentOption flags */
bool es_finished; /* true when ExecutorFinish is done */
+ bool es_canceled; /* true when execution was canceled
+ * upon encountering that plan was invalided
+ * during ExecInitNode() */
List *es_exprcontexts; /* List of ExprContexts within EState */
diff --git a/src/include/tcop/pquery.h b/src/include/tcop/pquery.h
index a5e65b98aa..577b81a9ee 100644
--- a/src/include/tcop/pquery.h
+++ b/src/include/tcop/pquery.h
@@ -29,7 +29,7 @@ extern List *FetchPortalTargetList(Portal portal);
extern List *FetchStatementTargetList(Node *stmt);
-extern void PortalStart(Portal portal, ParamListInfo params,
+extern bool PortalStart(Portal portal, ParamListInfo params,
int eflags, Snapshot snapshot);
extern void PortalSetResultFormat(Portal portal, int nFormats,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 8b4471cbe5..513e3c388d 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,8 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
--
2.35.3
Reviewing 0001:
Perhaps ExecEndCteScan needs an adjustment. What if node->leader was never set?
Other than that, I think this is in good shape. Maybe there are other
things we'd want to adjust here, or maybe there aren't, but there
doesn't seem to be any good reason to bundle more changes into the
same patch.
Reviewing 0002 and beyond:
I think it's good that you have tried to divide up a big change into
little pieces, but I'm finding the result difficult to understand. It
doesn't really seem like each patch stands on its own. I keep flipping
between patches to try to understand why other patches are doing
things, which kind of defeats the purpose of splitting stuff up. For
example, 0002 adds a NodeTag field to QueryDesc, but it doesn't even
seem to initialize that field, let alone use it for anything. It adds
a CachedPlan pointer to QueryDesc too, and adapts CreateQueryDesc to
allow one as an argument, but none of the callers actually pass
anything. I suspect that that the first change (adding a NodeTag)
field is a bug, and that the second one is intentional, but it's hard
to tell without flipping through all of the other patches to see how
they build on what 0002 does. And even when something isn't a bug,
it's also hard to tell whether it's the right design, again because
you can't consider each patch in isolation. Ideally, splitting a patch
set should bring related changes together in a single patch and push
unrelated changes apart into different patches, but I don't really see
this particular split having that effect.
There is a chicken and egg problem here, to be fair. If we add code
that can make plan initialization fail without teaching the planner to
cope with failures, then we have broken the server, and if we do the
reverse, then we have a bunch of dead code that we can't test. Neither
is very satisfactory. But I still hope there's some better division
possible than what you have here currently. For instance, I wonder if
it would be possible to add all the stuff to cope with plan
initialization failing and then have a test patch that makes
initialization randomly fail with some probability (or maybe you can
even cause failures at specific points). Then you could test that
infrastructure by running the regression tests in a loop with various
values of the relevant setting.
Another overall comment that I have is that it doesn't feel like
there's enough high-level explanation of the design. I don't know how
much of that should go in comments vs. commit messages vs. a README
that accompanies the patch set vs. whatever else, and I strongly
suspect that some of the stuff that seems confusing now is actually
stuff that at one point I understood and have just forgotten about.
But rediscovering it shouldn't be quite so hard. For example, consider
the question "why are we storing the CachedPlan in the QueryDesc?" I
eventually figured out that it's so that ExecPlanStillValid can call
CachedPlanStillValid which can then consult the cached plan's is_valid
flag. But is that the only access to the CachedPlan that we ever
expect to occur via the QueryDesc? If not, what else is allowable? If
so, why not just store a Boolean in the QueryDesc and arrange for the
plancache to be able to flip it when invalidating? I'm not saying
that's a better design -- I'm saying that it looks hard to understand
your thought process from the patch set. And also, you know, assuming
the current design is correct, could there be some way of dividing up
the patch set so that this one change, where we add the CachedPlan to
the QueryDesc, isn't so spread out across the whole series?
Some more detailed review comments below. This isn't really a full
review because I don't understand the patches well enough for that,
but it's some stuff I noticed.
In 0002:
+ * result-rel info, etc. Also, we don't pass the parent't copy of the
Typo.
+ /*
+ * All the necessary locks must already have been taken when
+ * initializing the parent's copy of subplanstate, so the CachedPlan,
+ * if any, should not have become invalid during ExecInitNode().
+ */
+ Assert(ExecPlanStillValid(rcestate));
This -- and the other similar instance -- feel very uncomfortable.
There's a lot of action at a distance here. If this assertion ever
failed, how would anyone ever figure out what went wrong? You wouldn't
for example know which object got invalidated, presumably
corresponding to a lock that you failed to take. Unless the problem
were easily reproducible in a test environment, trying to guess what
happened might be pretty awful; imagine seeing this assertion failure
in a customer log file and trying to back-track to the find the
underlying bug. A further problem is that what would actually happen
is you *wouldn't* see this in the customer log file, because
assertions wouldn't be enabled, so you'd just see queries occasionally
returning wrong answers, I guess? Or crashing in some other random
part of the code? Which seems even worse. At a minimum I think this
should be upgraded to a test-and-elog, and maybe there's some value in
trying to think of what should get printed by that elog to facilitate
proper debugging, if it happens.
In 0003:
+ *
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
*/
- ExecutorStart(cstate->queryDesc, 0);
+ (void) ExecutorStart(cstate->queryDesc, 0);
This also feels awkward, for similar reasons. Sure, it shouldn't
return false, but also, if it did, you'd just blindly continue. Maybe
there should be test-and-elog here too. Or maybe this is an indication
that we need less action at a distance. Like, if ExecutorStart took
the CachedPlan as an argument instead of feeding it through the
QueryDesc, then you could document that ExecutorStart returns true if
that value is passed as NULL and true or false otherwise. Here,
whether ExecutorStart can return true or false depends on the contents
of the queryDesc ... which, granted, in this case is just built a line
or two before anyway, but if you just passed to to ExecutorStart then
you wouldn't need to feed it through the QueryDesc, it seems to me.
Even better, maybe there should be ExecutorStart() that continues
returning void and ExecutorStartExtended() that takes a cached plan as
an additional argument and returns a bool.
/*
- * Check that ExecutorFinish was called, unless in
EXPLAIN-only mode. This
- * Assert is needed because ExecutorFinish is new as of 9.1, and callers
- * might forget to call it.
+ * Check that ExecutorFinish was called, unless in
EXPLAIN-only mode or if
+ * execution was canceled. This Assert is needed because
ExecutorFinish is
+ * new as of 9.1, and callers might forget to call it.
*/
Maybe we could drop the second sentence at this point.
In 0005:
+ * XXX Maybe we should we skip calling
ExecCheckPermissions from
+ * InitPlan in a parallel worker.
Why? If the thinking is to save overhead, then perhaps try to assess
the overhead. If the thinking is that we don't want it to fail
spuriously, then we have to weight that against the (security) risk of
succeeding spuriously.
+ * Returns true if current transaction holds a lock on the given relation of
+ * mode 'lockmode'. If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
I don't particularly enjoy seeing this comment cut and pasted into
some new place. Especially the tongue-in-cheek parenthetical part.
Better to refer to the original comment or something instead of
cut-and-pasting. Also, why is it appropriate to pass orstronger = true
here? Don't we expect the *exact* lock mode that we have planned to be
held, and isn't it a sure sign of a bug if it isn't? Maybe orstronger
should just be ripped out here (and the comment could then go away
too).
In 0006:
+ /*
+ * RTIs of all partitioned tables whose children are scanned by
+ * appendplans. The list contains a bitmapset for every partition tree
+ * covered by this Append.
+ */
The first sentence of this comment makes this sound like a list of
integers, the RTIs of all partitioned tables that are scanned. The
second sentence makes it sound like a list of bitmapsets, but what
does it mean to take about each partition tree covered by this Append?
This is far from a complete review but I'm running out of steam for
today. I hope that it's at least somewhat useful.
...Robert
On Mon, 20 Nov 2023 at 10:00, Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Sep 28, 2023 at 5:26 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Tue, Sep 26, 2023 at 10:06 PM Amit Langote <amitlangote09@gmail.com> wrote:
After sleeping on this, I think we do need the checks after all the
ExecInitNode() calls too, because we have many instances of the code
like the following one:outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
<some code that dereferences outDesc>If outerNode is a SeqScan and ExecInitSeqScan() returned early because
ExecOpenScanRelation() detected that plan was invalidated, then
tupDesc would be NULL in this case, causing the code to crash.Now one might say that perhaps we should only add the
is-CachedPlan-valid test in the instances where there is an actual
risk of such misbehavior, but that could lead to confusion, now or
later. It seems better to add them after every ExecInitNode() call
while we're inventing the notion, because doing so relieves the
authors of future enhancements of the ExecInit*() routines from
worrying about any of this.Attached 0003 should show how that turned out.
Updated 0002 as mentioned in the previous reply -- setting pointers to
NULL after freeing them more consistently across various ExecEnd*()
routines and using the `if (pointer != NULL)` style over the `if
(pointer)` more consistently.Updated 0001's commit message to remove the mention of its relation to
any future commits. I intend to push it tomorrow.Pushed that one. Here are the rebased patches.
0001 seems ready to me, but I'll wait a couple more days for others to
weigh in. Just to highlight a kind of change that others may have
differing opinions on, consider this hunk from the patch:- MemoryContextDelete(node->aggcontext); + if (node->aggcontext != NULL) + { + MemoryContextDelete(node->aggcontext); + node->aggcontext = NULL; + } ... + ExecEndNode(outerPlanState(node)); + outerPlanState(node) = NULL;So the patch wants to enhance the consistency of setting the pointer
to NULL after freeing part. Robert mentioned his preference for doing
it in the patch, which I agree with.Rebased.
There is a leak reported at [1], details for the same is available at [2]:
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/select_views.out
/tmp/cirrus-ci-build/build/testrun/regress-running/regress/results/select_views.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/select_views.out
2023-12-19 23:00:04.677385000 +0000
+++ /tmp/cirrus-ci-build/build/testrun/regress-running/regress/results/select_views.out
2023-12-19 23:06:26.870259000 +0000
@@ -1288,6 +1288,7 @@
(102, '2011-10-12', 120),
(102, '2011-10-28', 200),
(103, '2011-10-15', 480);
+WARNING: resource was not closed: relation "customer_pkey"
CREATE VIEW my_property_normal AS
SELECT * FROM customer WHERE name = current_user;
CREATE VIEW my_property_secure WITH (security_barrier) A
[1]: https://cirrus-ci.com/task/6494009196019712
[2]: https://api.cirrus-ci.com/v1/artifact/task/6494009196019712/testrun/build/testrun/regress-running/regress/regression.diffs
Regards,
Vingesh
On 6 Dec 2023, at 23:52, Robert Haas <robertmhaas@gmail.com> wrote:
I hope that it's at least somewhat useful.
On 5 Jan 2024, at 15:46, vignesh C <vignesh21@gmail.com> wrote:
There is a leak reported
Hi Amit,
this is a kind reminder that some feedback on your patch[0]https://commitfest.postgresql.org/47/3478/ is waiting for your reply.
Thank you for your work!
Best regards, Andrey Borodin.
Hi Andrey,
On Sun, Mar 31, 2024 at 2:03 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
On 6 Dec 2023, at 23:52, Robert Haas <robertmhaas@gmail.com> wrote:
I hope that it's at least somewhat useful.
On 5 Jan 2024, at 15:46, vignesh C <vignesh21@gmail.com> wrote:
There is a leak reported
Hi Amit,
this is a kind reminder that some feedback on your patch[0] is waiting for your reply.
Thank you for your work!
Thanks for moving this to the next CF.
My apologies (especially to Robert) for not replying on this thread
for a long time.
I plan to start working on this soon.
--
Thanks, Amit Langote
On Fri, 20 Jan 2023 at 08:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I spent some time re-reading this whole thread, and the more I read
the less happy I got. We are adding a lot of complexity and introducing
coding hazards that will surely bite somebody someday. And after awhile
I had what felt like an epiphany: the whole problem arises because the
system is wrongly factored. We should get rid of AcquireExecutorLocks
altogether, allowing the plancache to hand back a generic plan that
it's not certain of the validity of, and instead integrate the
responsibility for acquiring locks into executor startup. It'd have
to be optional there, since we don't need new locks in the case of
executing a just-planned plan; but we can easily add another eflags
bit (EXEC_FLAG_GET_LOCKS or so). Then there has to be a convention
whereby the ExecInitNode traversal can return an indicator that
"we failed because the plan is stale, please make a new plan".
I also reread the entire thread up to this point yesterday. I've also
been thinking about this recently as Amit has mentioned it to me a few
times over the past few months.
With the caveat of not yet having looked at the latest patch, my
thoughts are that having the executor startup responsible for taking
locks is a bad idea and I don't think we should go down this path. My
reasons are:
1. No ability to control the order that the locks are obtained. The
order in which the locks are taken will be at the mercy of the plan
the planner chooses.
2. It introduces lots of complexity regarding how to cleanly clean up
after a failed executor startup which is likely to make exec startup
slower and the code more complex
3. It puts us even further down the path of actually needing an
executor startup phase.
For #1, the locks taken for SELECT queries are less likely to conflict
with other locks obtained by PostgreSQL, but at least at the moment if
someone is getting deadlocks with a DDL type operation, they can
change their query or DDL script so that locks are taken in the same
order. If we allowed executor startup to do this then if someone
comes complaining that PG18 deadlocks when PG17 didn't we'd just have
to tell them to live with it. There's a comment at the bottom of
find_inheritance_children_extended() just above the qsort() which
explains about the deadlocking issue.
I don't have much extra to say about #2. As mentioned, I've not
looked at the patch. On paper, it sounds possible, but it also sounds
bug-prone and ugly.
For #3, I've been thinking about what improvements we can do to make
the executor more efficient. In [1]/messages/by-id/20180525033538.6ypfwcqcxce6zkjj@alap3.anarazel.de, Andres talks about some very
interesting things. In particular, in his email items 3) and 5) are
relevant here. If we did move lots of executor startup code into the
planner, I think it would be possible to one day get rid of executor
startup and have the plan record how much memory is needed for the
non-readonly part of the executor state and tag each plan node with
the offset in bytes they should use for their portion of the executor
working state. This would be a single memory allocation for the entire
plan. The exact details are not important here, but I feel like if we
load up executor startup with more responsibilities, it'll just make
doing something like this harder. The init run-time pruning code that
I worked on likely already has done that, but I don't think it's
closed the door on it as it might just mean allocating more executor
state memory than we need to. Providing the plan node records the
offset into that memory, I think it could be made to work, just with
the inefficiency of having a (possibly) large unused hole in that
state memory.
As far as I understand it, your objection to the original proposal is
just on the grounds of concerns about introducing hazards that could
turn into bugs. I think we could come up with some way to make the
prior method of doing pruning before executor startup work. I think
what Amit had before your objection was starting to turn into
something workable and we should switch back to working on that.
David
[1]: /messages/by-id/20180525033538.6ypfwcqcxce6zkjj@alap3.anarazel.de
David Rowley <dgrowleyml@gmail.com> writes:
With the caveat of not yet having looked at the latest patch, my
thoughts are that having the executor startup responsible for taking
locks is a bad idea and I don't think we should go down this path.
OK, it's certainly still up for argument, but ...
1. No ability to control the order that the locks are obtained. The
order in which the locks are taken will be at the mercy of the plan
the planner chooses.
I do not think I buy this argument, because plancache.c doesn't
provide any "ability to control the order" today, and never has.
The order in which AcquireExecutorLocks re-gets relation locks is only
weakly related to the order in which the parser/planner got them
originally. The order in which AcquirePlannerLocks re-gets the locks
is even less related to the original. This doesn't cause any big
problems that I'm aware of, because these locks are fairly weak.
I think we do have a guarantee that for partitioned tables, parents
will be locked before children, and that's probably valuable.
But an executor-driven lock order could preserve that property too.
2. It introduces lots of complexity regarding how to cleanly clean up
after a failed executor startup which is likely to make exec startup
slower and the code more complex
Perhaps true, I'm not sure. But the patch we'd been discussing
before this proposal was darn complex as well.
3. It puts us even further down the path of actually needing an
executor startup phase.
Huh? We have such a thing already.
For #1, the locks taken for SELECT queries are less likely to conflict
with other locks obtained by PostgreSQL, but at least at the moment if
someone is getting deadlocks with a DDL type operation, they can
change their query or DDL script so that locks are taken in the same
order. If we allowed executor startup to do this then if someone
comes complaining that PG18 deadlocks when PG17 didn't we'd just have
to tell them to live with it. There's a comment at the bottom of
find_inheritance_children_extended() just above the qsort() which
explains about the deadlocking issue.
The reason it's important there is that function is (sometimes)
used for lock modes that *are* exclusive.
For #3, I've been thinking about what improvements we can do to make
the executor more efficient. In [1], Andres talks about some very
interesting things. In particular, in his email items 3) and 5) are
relevant here. If we did move lots of executor startup code into the
planner, I think it would be possible to one day get rid of executor
startup and have the plan record how much memory is needed for the
non-readonly part of the executor state and tag each plan node with
the offset in bytes they should use for their portion of the executor
working state.
I'm fairly skeptical about that idea. The entire reason we have an
issue here is that we want to do runtime partition pruning, which
by definition can't be done at plan time. So I doubt it's going
to play nice with what we are trying to accomplish in this thread.
Moreover, while "replace a bunch of small pallocs with one big one"
would save some palloc effort, what are you going to do to ensure
that that memory has the right initial contents? I think this idea is
likely to make the executor a great deal more notationally complex
without actually buying all that much. Maybe Andres can make it work,
but I don't want to contort other parts of the system design on the
purely hypothetical basis that this might happen.
I think what Amit had before your objection was starting to turn into
something workable and we should switch back to working on that.
The reason I posted this idea was that I didn't think the previously
existing patch looked promising at all.
regards, tom lane
On Sun, 19 May 2024 at 13:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
1. No ability to control the order that the locks are obtained. The
order in which the locks are taken will be at the mercy of the plan
the planner chooses.I do not think I buy this argument, because plancache.c doesn't
provide any "ability to control the order" today, and never has.
The order in which AcquireExecutorLocks re-gets relation locks is only
weakly related to the order in which the parser/planner got them
originally. The order in which AcquirePlannerLocks re-gets the locks
is even less related to the original. This doesn't cause any big
problems that I'm aware of, because these locks are fairly weak.
It may not bite many people, it's just that if it does, I don't see
what we could do to help those people. At the moment we could tell
them to adjust their DDL script to obtain the locks in the same order
as their query. With your idea that cannot be done as the order could
change when the planner switches the join order.
I think we do have a guarantee that for partitioned tables, parents
will be locked before children, and that's probably valuable.
But an executor-driven lock order could preserve that property too.
I think you'd have to lock the parent before the child. That would
remain true and consistent anyway when taking locks during a
breadth-first plan traversal.
For #3, I've been thinking about what improvements we can do to make
the executor more efficient. In [1], Andres talks about some very
interesting things. In particular, in his email items 3) and 5) are
relevant here. If we did move lots of executor startup code into the
planner, I think it would be possible to one day get rid of executor
startup and have the plan record how much memory is needed for the
non-readonly part of the executor state and tag each plan node with
the offset in bytes they should use for their portion of the executor
working state.I'm fairly skeptical about that idea. The entire reason we have an
issue here is that we want to do runtime partition pruning, which
by definition can't be done at plan time. So I doubt it's going
to play nice with what we are trying to accomplish in this thread.
I think we could have both, providing there was a way to still
traverse the executor state tree in EXPLAIN. We'd need a way to skip
portions of the plan that are not relevant or could be invalid for the
current execution. e.g can't show Index Scan because index has been
dropped.
I think what Amit had before your objection was starting to turn into
something workable and we should switch back to working on that.The reason I posted this idea was that I didn't think the previously
existing patch looked promising at all.
Ok. It would be good if you could expand on that so we could
determine if there's some fundamental reason it can't work or if
that's because you were blinded by your epiphany and didn't give that
any thought after thinking of the alternative idea.
I've gone to effort to point out things that I think are concerning
with your idea. It would be good if you could do the same for the
previous patch other than "it didn't look promising". It's pretty hard
for me to argue with that level of detail.
David
On Sun, May 19, 2024 at 9:39 AM David Rowley <dgrowleyml@gmail.com> wrote:
For #1, the locks taken for SELECT queries are less likely to conflict
with other locks obtained by PostgreSQL, but at least at the moment if
someone is getting deadlocks with a DDL type operation, they can
change their query or DDL script so that locks are taken in the same
order. If we allowed executor startup to do this then if someone
comes complaining that PG18 deadlocks when PG17 didn't we'd just have
to tell them to live with it. There's a comment at the bottom of
find_inheritance_children_extended() just above the qsort() which
explains about the deadlocking issue.
Thought to chime in on this.
A deadlock may occur with the execution-time locking proposed in the
patch if the DDL script makes assumptions about how a cached plan's
execution determines the locking order for children of multiple parent
relations. Specifically, the deadlock can happen if the script tries
to lock the child relations directly, instead of locking them through
their respective parent relations. The patch doesn't change the order
of locking of relations mentioned in the query, because that's defined
in AcquirePlannerLocks().
--
Thanks, Amit Langote
I had occasion to run the same benchmark you described in the initial
email in this thread. To do so I applied patch series v49 on top of
07cb29737a4e, which is just one that happened to have the same date as
v49.
I then used a script like this (against a server having
plan_cache_mode=force_generic_mode)
for numparts in 0 1 2 4 8 16 32 48 64 80 81 96 127 128 160 200 256 257 288 300 384 512 1024 1536 2048; do
pgbench testdb -i --partitions=$numparts 2>/dev/null
echo -ne "$numparts\t"
pgbench -n testdb -S -T30 -Mprepared | grep "^tps" | sed -e 's/^tps = \([0-9.]*\) .*/\1/'
done
and did the same with the commit mentioned above (that is, unpatched).
I got this table as result
partitions │ patched │ 07cb29737a
────────────┼──────────────┼──────────────
0 │ 65632.090431 │ 68967.712741
1 │ 68096.641831 │ 65356.587223
2 │ 59456.507575 │ 60884.679464
4 │ 62097.426 │ 59698.747104
8 │ 58044.311175 │ 57817.104562
16 │ 59741.926563 │ 52549.916262
32 │ 59261.693449 │ 44815.317215
48 │ 59047.125629 │ 38362.123652
64 │ 59748.738797 │ 34051.158525
80 │ 59276.839183 │ 32026.135076
81 │ 62318.572932 │ 30418.122933
96 │ 59678.857163 │ 28478.113651
127 │ 58761.960028 │ 24272.303742
128 │ 59934.268306 │ 24275.214593
160 │ 56688.790899 │ 21119.043564
200 │ 56323.188599 │ 18111.212849
256 │ 55915.22466 │ 14753.953709
257 │ 57810.530461 │ 15093.497575
288 │ 56874.780092 │ 13873.332162
300 │ 57222.056549 │ 13463.768946
384 │ 54073.77295 │ 11183.558339
512 │ 37503.766847 │ 8114.32532
1024 │ 42746.866448 │ 4468.41359
1536 │ 39500.58411 │ 3049.984599
2048 │ 36988.519486 │ 2269.362006
where already at 16 partitions we can see that things are going downhill
with the unpatched code. (However, what happens when the table is not
partitioned looks a bit funny.)
I hope we can get this new executor code in 18.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
On Thu, Jun 20, 2024 at 2:09 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I hope we can get this new executor code in 18.
Thanks for doing the benchmark, Alvaro, and sorry for the late reply.
Yes, I'm hoping to get *some* version of this into v18. I've been
thinking how to move this forward and I'm starting to think that we
should go back to or at least consider as an option the old approach
of changing the plancache to do the initial runtime pruning instead of
changing the executor to take locks, which is the design that the
latest patch set tries to implement.
Here are the challenges facing the implementation of the current design:
1. I went through many iterations of the changes to ExecInitNode() to
return a partially initialized PlanState tree when it detects that the
CachedPlan was invalidated after locking a child table and to
ExecEndNode() to account for the PlanState tree sometimes being
partially initialized, but it still seems fragile and bug-prone to me.
It might be because this approach is fundamentally hard to get right
or I haven't invested enough effort in becoming more confident in its
robustness.
2. Refactoring needed due to the ExecutorStart() API change especially
that pertaining to portals does not seem airtight. I'm especially
worried about moving the ExecutorStart() call for the
PORTAL_MULTI_QUERY case from where it is currently to PortalStart().
That requires additional bookkeeping in PortalData and I am not
totally sure that the snapshot handling changes after that move are
entirely correct.
3. The need to add *back* the fields to store the RT indexes of
relations that are not looked at by ExecInitNode() traversal such as
root partitioned tables and non-leaf partitions.
I'm worried about #2 the most. One complaint about the previous
design was that the interface changes to capture and pass the result
of doing initial pruning in plancache.c to the executor did not look
great. However, after having tried doing #2, the changes to pass the
pruning result into the executor and changes to reuse it in
ExecInit[Merge]Append() seem a tad bit simpler than the refactoring
and adjustments needed to handle failed ExecutorStart() calls, at
multiple code sites.
About #1, I tend to agree with David that adding complexity around
PlanState tree construction may not be a good idea, because we might
want to rethink Plan initialization code and data structures in the
not too distant future. One idea I thought of is to take the
remaining locks (to wit, those on inheritance children if running a
cached plan) at the beginning of InitPlan(), that is before
ExecInitNode(), like we handle the permission checking, so that we
don't need to worry about ever returning a partially initialized
PlanState tree. However, we're still left with the tall task to
implement #2 such that it doesn't break anything.
Another concern about the old design was the unnecessary overhead of
initializing bitmapset fields in PlannedStmt that are meant for the
locking algorithm in AcquireExecutorLocks(). Andres suggested an idea
offlist to either piggyback on cursorOptions argument of
pg_plan_queries() or adding a new boolean parameter to let the planner
know if the plan is one that might get cached and thus have
AcquireExecutorLocks() called on it. Another idea David and I
discussed offlist is inventing a RTELockInfo (cf RTEPermissionInfo)
and only creating one for each RT entry that is un-prunable and do
away with PlannedStmt.rtable. For partitioned tables, that entry will
point to the PartitionPruneInfo that will contain the RT indexes of
partitions (or maybe just OIDs) mapped from their subplan indexes that
are returned by the pruning code. So AcquireExecutorLocks() will lock
all un-prunable relations by referring to their RTELockInfo entries
and for each entry that points to a PartitionPruneInfo with initial
pruning steps, will only lock the partitions that survive the pruning.
I am planning to polish that old patch set and post after playing with
those new ideas.
--
Thanks, Amit Langote
On Mon, Aug 12, 2024 at 8:54 AM Amit Langote <amitlangote09@gmail.com> wrote:
1. I went through many iterations of the changes to ExecInitNode() to
return a partially initialized PlanState tree when it detects that the
CachedPlan was invalidated after locking a child table and to
ExecEndNode() to account for the PlanState tree sometimes being
partially initialized, but it still seems fragile and bug-prone to me.
It might be because this approach is fundamentally hard to get right
or I haven't invested enough effort in becoming more confident in its
robustness.
Can you give some examples of what's going wrong, or what you think
might go wrong?
I didn't think there was a huge problem here based on previous
discussion, but I could very well be missing some important challenge.
2. Refactoring needed due to the ExecutorStart() API change especially
that pertaining to portals does not seem airtight. I'm especially
worried about moving the ExecutorStart() call for the
PORTAL_MULTI_QUERY case from where it is currently to PortalStart().
That requires additional bookkeeping in PortalData and I am not
totally sure that the snapshot handling changes after that move are
entirely correct.
Here again, it would help to see exactly what you had to do and what
consequences you think it might have. But it sounds like you're
talking about moving ExecutorStart() from PortalStart() to PortalRun()
and I agree that sounds like it might have user-visible behavioral
consequences that we don't want.
3. The need to add *back* the fields to store the RT indexes of
relations that are not looked at by ExecInitNode() traversal such as
root partitioned tables and non-leaf partitions.
I don't remember exactly why we removed those or what the benefit was,
so I'm not sure how big of a problem it is if we have to put them
back.
About #1, I tend to agree with David that adding complexity around
PlanState tree construction may not be a good idea, because we might
want to rethink Plan initialization code and data structures in the
not too distant future.
Like Tom, I don't really buy this. There might be a good reason not to
do this in ExecutorStart(), but the hypothetical possibility that we
might want to change something and that this patch might make it
harder is not it.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Aug 15, 2024 at 4:23 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Aug 12, 2024 at 8:54 AM Amit Langote <amitlangote09@gmail.com> wrote:
1. I went through many iterations of the changes to ExecInitNode() to
return a partially initialized PlanState tree when it detects that the
CachedPlan was invalidated after locking a child table and to
ExecEndNode() to account for the PlanState tree sometimes being
partially initialized, but it still seems fragile and bug-prone to me.
It might be because this approach is fundamentally hard to get right
or I haven't invested enough effort in becoming more confident in its
robustness.Can you give some examples of what's going wrong, or what you think
might go wrong?I didn't think there was a huge problem here based on previous
discussion, but I could very well be missing some important challenge.
TBH, it's more of a hunch that people who are not involved in this
development might find the new reality, whereby the execution is not
racefree until ExecutorRun(), hard to reason about.
That's perhaps true with the other approach too whereby one would need
to consult a separate data structure that records the result of
pruning done in plancache.c to be sure if a given node of the plan
tree coming from a CachedPlan is safe to execute or do something with.
2. Refactoring needed due to the ExecutorStart() API change especially
that pertaining to portals does not seem airtight. I'm especially
worried about moving the ExecutorStart() call for the
PORTAL_MULTI_QUERY case from where it is currently to PortalStart().
That requires additional bookkeeping in PortalData and I am not
totally sure that the snapshot handling changes after that move are
entirely correct.Here again, it would help to see exactly what you had to do and what
consequences you think it might have. But it sounds like you're
talking about moving ExecutorStart() from PortalStart() to PortalRun()
and I agree that sounds like it might have user-visible behavioral
consequences that we don't want.
Let's specifically looks at this block of code in PortalRunMulti():
/*
* Must always have a snapshot for plannable queries. First time
* through, take a new snapshot; for subsequent queries in the
* same portal, just update the snapshot's copy of the command
* counter.
*/
if (!active_snapshot_set)
{
Snapshot snapshot = GetTransactionSnapshot();
/* If told to, register the snapshot and save in portal */
if (setHoldSnapshot)
{
snapshot = RegisterSnapshot(snapshot);
portal->holdSnapshot = snapshot;
}
/*
* We can't have the holdSnapshot also be the active one,
* because UpdateActiveSnapshotCommandId would complain. So
* force an extra snapshot copy. Plain PushActiveSnapshot
* would have copied the transaction snapshot anyway, so this
* only adds a copy step when setHoldSnapshot is true. (It's
* okay for the command ID of the active snapshot to diverge
* from what holdSnapshot has.)
*/
PushCopiedSnapshot(snapshot);
/*
* As for PORTAL_ONE_SELECT portals, it does not seem
* necessary to maintain portal->portalSnapshot here.
*/
active_snapshot_set = true;
}
else
UpdateActiveSnapshotCommandId();
Without the patch, the code immediately following this does a
CreateQueryDesc(), which "registers" the above copied snapshot,
followed by ExecutorStart() immediately followed by ExecutorRun(), for
each query in the list for the PORTAL_RUN_MULTI case.
With the patch, CreateQueryDesc() and ExecutorStart() are moved to
PortalStart() so that QueryDescs including the PlanState trees for all
queries are built before any is run. Why? So that if ExecutorStart()
fails for any query in the list, we can simply throw out the QueryDesc
and the PlanState trees of the previous queries (NOT run them) and ask
plancache for a new CachedPlan for the list of queries. We don't have
a way to ask plancache.c to replan only a given query in the list.
Because of that reshuffling, the above block also needed to be moved
to PortalStart() along with the CommandCounterIncrement() between
queries. That requires the following non-trivial changes:
* A copy of the snapshot needs to be created for each statement after
the 1st one to be able to perform UpdateActiveSnapshotCommandId() on
it.
* In PortalRunMulti(), PushActiveSnapshot() must now be done for every
query because the executor expects the copy in the given query's
QueryDesc to match the ActiveSnapshot.
* There's no longer CCI() between queries in PortalRunMulti() because
the snapshots in each query's QueryDesc must have been adjusted to
reflect the correct command counter. I've checked but can't really be
sure if the value in the snapshot is all anyone ever uses if they want
to know the current value of the command counter.
There is likely to be performance regression for the multi-query cases
due to this handling of snapshots and maybe even correctness issues.
3. The need to add *back* the fields to store the RT indexes of
relations that are not looked at by ExecInitNode() traversal such as
root partitioned tables and non-leaf partitions.I don't remember exactly why we removed those or what the benefit was,
so I'm not sure how big of a problem it is if we have to put them
back.
We removed those in commit 52ed730d511b after commit f2343653f5b2
removed redundant execution-time locking of non-leaf relations. So we
removed them because we realized that execution time locking is
unnecessary given that AcquireExecutorLocks() exists and now we want
to add them back because we'd like to get rid of
AcquireExecutorLocks(). :-)
I'm attaching a rebased version of the patch that implements the
current design because the cfbot has been broken for a while and also
in case you or anyone else wants to take another look. I've combined
2 patches into one -- one that dealt with the executor side changes to
account for locking and another that dealt with caller side changes to
handle executor returning when the CachedPlan becomes invalid.
--
Thanks, Amit Langote
Attachments:
v50-0002-Add-field-to-store-parent-relids-to-Append-Merge.patchapplication/octet-stream; name=v50-0002-Add-field-to-store-parent-relids-to-Append-Merge.patchDownload
From f1ad8bea2621c5a7074742516b248c9efe9dd17e Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:54:02 +0900
Subject: [PATCH v50 2/6] Add field to store parent relids to
Append/MergeAppend
There's no way currently in the executor to tell if the child
subplans of Append/MergeAppend are scanning partitions, and if
they indeed do, what the RT indexes of their parent/ancestor tables
are. Executor doesn't need to see those RT indexes except for
run-time pruning, in which case they can can be found in the
PartitionPruneInfo. An upcoming commit will create a need for them
to be available for the purpose of locking those parent/ancestor
tables when executing a cached plan, so add a
field called allpartrelids to Append/MergeAppend to store those
RT indexes.
In the cases where an Append/MergeAppend node containing parent
RT indexes is eligible for elision in
set_{append|mergeappend}_references(), those RT indexes are now
transferred into PlannedStmt.elidedAppendPartRels.
The code to look up partitioned parent relids for a given list of
partition scan subpaths of an Append/MergeAppend is already present
in make_partition_pruneinfo() but it's local to partprune.c. This
commit refactors that code into its own function called
add_append_subpath_partrelids() defined in appendinfo.c and
generalizes it to consider child join and aggregation paths. To
facilitate looking up of parent rels of child grouping rels in
add_append_subpath_partrelids(), parent links are now also set in
the RelOptInfos of child grouping rels too, like they are in
those of child base and join rels.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execParallel.c | 1 +
src/backend/optimizer/plan/createplan.c | 41 ++++++--
src/backend/optimizer/plan/planner.c | 5 +
src/backend/optimizer/plan/setrefs.c | 22 ++++
src/backend/optimizer/util/appendinfo.c | 127 ++++++++++++++++++++++++
src/backend/partitioning/partprune.c | 124 +++--------------------
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 17 ++++
src/include/optimizer/appendinfo.h | 3 +
src/include/partitioning/partprune.h | 3 +-
10 files changed, 223 insertions(+), 123 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index bfb3419efb..f995714d7f 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -185,6 +185,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->permInfos = estate->es_rteperminfos;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
+ pstmt->elidedAppendPartRels = NIL;
/*
* Transfer only parallel-safe subplans, leaving a NULL "hole" in the list
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 28addc1129..49c193c237 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -25,6 +25,7 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
@@ -1232,6 +1233,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
Oid *nodeCollations = NULL;
bool *nodeNullsFirst = NULL;
bool consider_async = false;
+ List *allpartrelids = NIL;
/*
* The subpaths list could be empty, if every child was proven empty by
@@ -1373,15 +1375,23 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
++nasyncplans;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ plan->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1402,7 +1412,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
partpruneinfo =
make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
plan->appendplans = subplans;
@@ -1448,6 +1459,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ List *allpartrelids = NIL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1537,15 +1549,23 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplan = (Plan *) sort;
}
+ /*
+ * Find partitioned parent rel(s) of the subpath's rel(s).
+ */
+ allpartrelids = add_append_subpath_partrelids(root, subpath, rel,
+ allpartrelids);
+
subplans = lappend(subplans, subplan);
}
+ node->allpartrelids = allpartrelids;
+
/*
- * If any quals exist, they may be useful to perform further partition
- * pruning during execution. Gather information needed by the executor to
- * do partition pruning.
+ * If scanning partitions, check if there are quals that may be useful to
+ * perform further partition pruning during execution. Gather information
+ * needed by the executor to do partition pruning.
*/
- if (enable_partition_pruning)
+ if (enable_partition_pruning && allpartrelids != NIL)
{
List *prunequal;
@@ -1557,7 +1577,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
partpruneinfo = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ allpartrelids);
}
node->mergeplans = subplans;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 948afd9094..2c2e38f589 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -522,6 +522,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
Assert(glob->finalrowmarks == NIL);
Assert(glob->resultRelations == NIL);
Assert(glob->appendRelations == NIL);
+ Assert(glob->elidedAppendPartRels == NIL);
top_plan = set_plan_references(root, top_plan);
/* ... and the subplans (both regular subplans and initplans) */
Assert(list_length(glob->subplans) == list_length(glob->subroots));
@@ -549,6 +550,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
+ result->elidedAppendPartRels = glob->elidedAppendPartRels;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
result->rowMarks = glob->finalrowmarks;
@@ -7941,8 +7943,11 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
agg_costs, gd, &child_extra,
&child_partially_grouped_rel);
+ /* Mark as child of grouped_rel. */
+ child_grouped_rel->parent = grouped_rel;
if (child_partially_grouped_rel)
{
+ child_partially_grouped_rel->parent = grouped_rel;
partially_grouped_live_children =
lappend(partially_grouped_live_children,
child_partially_grouped_rel);
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 7aed84584c..4fc5ed15aa 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1757,6 +1757,10 @@ set_append_references(PlannerInfo *root,
lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset);
}
+ /* Do this before possibly removing the MergeAppend node below. */
+ foreach(l, aplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
+
/*
* See if it's safe to get rid of the Append entirely. For this to be
* safe, there must be only one child plan and that child plan's parallel
@@ -1770,7 +1774,14 @@ set_append_references(PlannerInfo *root,
Plan *p = (Plan *) linitial(aplan->appendplans);
if (p->parallel_aware == aplan->plan.parallel_aware)
+ {
+ if (aplan->allpartrelids)
+ root->glob->elidedAppendPartRels =
+ list_concat(root->glob->elidedAppendPartRels,
+ aplan->allpartrelids);
+
return clean_up_removed_plan_level((Plan *) aplan, p);
+ }
}
/*
@@ -1832,6 +1843,10 @@ set_mergeappend_references(PlannerInfo *root,
lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset);
}
+ /* Do this before possibly removing the MergeAppend node below. */
+ foreach(l, mplan->allpartrelids)
+ lfirst(l) = offset_relid_set((Relids) lfirst(l), rtoffset);
+
/*
* See if it's safe to get rid of the MergeAppend entirely. For this to
* be safe, there must be only one child plan and that child plan's
@@ -1846,7 +1861,14 @@ set_mergeappend_references(PlannerInfo *root,
Plan *p = (Plan *) linitial(mplan->mergeplans);
if (p->parallel_aware == mplan->plan.parallel_aware)
+ {
+ if (mplan->allpartrelids)
+ root->glob->elidedAppendPartRels =
+ list_concat(root->glob->elidedAppendPartRels,
+ mplan->allpartrelids);
+
return clean_up_removed_plan_level((Plan *) mplan, p);
+ }
}
/*
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 4989722637..0569cd00a5 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -1038,3 +1038,130 @@ distribute_row_identity_vars(PlannerInfo *root)
}
}
}
+
+/*
+ * add_append_subpath_partrelids
+ * Look up a child subpath's rel's partitioned parent relids up to
+ * parentrel and add the bitmapset containing those into
+ * 'allpartrelids'
+ */
+List *
+add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids)
+{
+ RelOptInfo *pathrel = subpath->parent;
+ Relids partrelids = NULL;
+ Index top_parent;
+ ListCell *lc;
+
+ /* Nothing to do if there's no parent to begin with. */
+ if (!IS_OTHER_REL(pathrel))
+ return allpartrelids;
+
+ /*
+ * Traverse up to the pathrel's topmost partitioned parent, collecting
+ * parent relids as we go; but stop if we reach parentrel. (Normally, a
+ * pathrel's topmost partitioned parent is either parentrel or a UNION ALL
+ * appendrel child of parentrel. But when handling partitionwise joins of
+ * multi-level partitioning trees, we can see an append path whose
+ * parentrel is an intermediate partitioned table.)
+ */
+ do
+ {
+ Relids parent_relids = NULL;
+
+ /*
+ * For simple child rels, we can simply set the parent_relids to
+ * pathrel->parent->relids. But for partitionwise join and aggregate
+ * child rels, while we can use pathrel->parent to move up the tree,
+ * parent_relids must be found the hard way through AppendInfoInfos,
+ * because 1) a joinrel's relids may point to RTE_JOIN entries,
+ * 2) topmost parent grouping rel's relids field is NULL.
+ */
+ if (IS_SIMPLE_REL(pathrel))
+ {
+ pathrel = pathrel->parent;
+ /* Stop once we reach the root partitioned rel. */
+ if (!IS_PARTITIONED_REL(pathrel))
+ break;
+ parent_relids = bms_add_members(parent_relids, pathrel->relids);
+ }
+ else
+ {
+ AppendRelInfo **appinfos;
+ int nappinfos,
+ i;
+
+ appinfos = find_appinfos_by_relids(root, pathrel->relids,
+ &nappinfos);
+ for (i = 0; i < nappinfos; i++)
+ {
+ AppendRelInfo *appinfo = appinfos[i];
+
+ parent_relids = bms_add_member(parent_relids,
+ appinfo->parent_relid);
+ }
+ pfree(appinfos);
+ pathrel = pathrel->parent;
+ }
+ /* accept this level as an interesting parent */
+ partrelids = bms_add_members(partrelids, parent_relids);
+ if (pathrel == parentrel)
+ break; /* don't traverse above parentrel */
+ } while (IS_OTHER_REL(pathrel));
+
+ if (partrelids == NULL)
+ return allpartrelids;
+
+ /*
+ * Append the 'partrelids' RT index bitmapset to 'allpartrelids' or
+ * merge the RT indexes into an appropriate bitmapset already present
+ * in the list
+ *
+ * Within 'allpartrelids', there is one Bitmapset for each topmost parent
+ * partitioned rel mentioned in the query whose children's subpaths have
+ * been passed to add_append_subpath_partrelids. Each Bitmapset contains
+ * the RT indexes of the topmost parent as well as its relevant non-leaf
+ * child partitions. Since (by construction of the rangetable list) parent
+ * partitions must have lower RT indexes than their children, we can
+ * distinguish the topmost parent as being the lowest set bit in the
+ * Bitmapset.
+ *
+ * Note that the list contains only RT indexes of partitioned tables that
+ * are parents of some scan-level relation appearing in the 'subpaths' that
+ * add_append_subpath_partrelids() is dealing with. Also, "topmost"
+ * parents are not allowed to be higher than the 'parentrel' associated
+ * with the append path. In this way, we avoid expending cycles on
+ * partitioned rels that can't contribute useful pruning information for
+ * the problem at hand.
+ *
+ * (It is possible for 'parentrel' to be a child partitioned table, and it
+ * is also possible for scan-level relations to be child partitioned tables
+ * rather than leaf partitions. Hence we must construct this relation set
+ * with reference to the particular append path we're dealing with, rather
+ * than looking at the full partitioning structure represented in the
+ * RelOptInfos.)
+ */
+
+ /* We can easily get the lowest set bit this way: */
+ top_parent = bms_next_member(partrelids, -1);
+ Assert(top_parent > 0);
+
+ /* Look for a matching topmost parent */
+ foreach(lc, allpartrelids)
+ {
+ Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
+ Index currtarget = bms_next_member(currpartrelids, -1);
+
+ if (top_parent == currtarget)
+ {
+ /* Found a match, so add any new RT indexes to this hierarchy */
+ currpartrelids = bms_add_members(currpartrelids, partrelids);
+ lfirst(lc) = currpartrelids;
+ return allpartrelids;
+ }
+ }
+ /* No match, so add the new partition hierarchy to the list */
+ return lappend(allpartrelids, partrelids);
+}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9a1a7faac7..2afc10c40b 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -137,7 +137,6 @@ typedef struct PruneStepResult
} PruneStepResult;
-static List *add_part_relids(List *allpartrelids, Bitmapset *partrelids);
static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
RelOptInfo *parentrel,
List *prunequal,
@@ -215,33 +214,32 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'allpartrelids' contains Bitmapsets of RT indexes of partitioned parents
+ * whose partitions' Paths are in 'subpaths'; there's one Bitmapset for every
+ * partition tree involved.
*/
PartitionPruneInfo *
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ List *allpartrelids)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
- List *allpartrelids;
List *prunerelinfos;
int *relid_subplan_map;
ListCell *lc;
int i;
+ Assert(list_length(allpartrelids) > 0);
+
/*
- * Scan the subpaths to see which ones are scans of partition child
- * relations, and identify their parent partitioned rels. (Note: we must
- * restrict the parent partitioned rels to be parentrel or children of
- * parentrel, otherwise we couldn't translate prunequal to match.)
- *
- * Also construct a temporary array to map from partition-child-relation
- * relid to the index in 'subpaths' of the scan plan for that partition.
+ * Construct a temporary array to map from partition-child-relation relid
+ * to the index in 'subpaths' of the scan plan for that partition.
* (Use of "subplan" rather than "subpath" is a bit of a misnomer, but
* we'll let it stand.) For convenience, we use 1-based indexes here, so
* that zero can represent an un-filled array entry.
*/
- allpartrelids = NIL;
relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
i = 1;
@@ -250,50 +248,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Path *path = (Path *) lfirst(lc);
RelOptInfo *pathrel = path->parent;
- /* We don't consider partitioned joins here */
- if (pathrel->reloptkind == RELOPT_OTHER_MEMBER_REL)
- {
- RelOptInfo *prel = pathrel;
- Bitmapset *partrelids = NULL;
-
- /*
- * Traverse up to the pathrel's topmost partitioned parent,
- * collecting parent relids as we go; but stop if we reach
- * parentrel. (Normally, a pathrel's topmost partitioned parent
- * is either parentrel or a UNION ALL appendrel child of
- * parentrel. But when handling partitionwise joins of
- * multi-level partitioning trees, we can see an append path whose
- * parentrel is an intermediate partitioned table.)
- */
- do
- {
- AppendRelInfo *appinfo;
-
- Assert(prel->relid < root->simple_rel_array_size);
- appinfo = root->append_rel_array[prel->relid];
- prel = find_base_rel(root, appinfo->parent_relid);
- if (!IS_PARTITIONED_REL(prel))
- break; /* reached a non-partitioned parent */
- /* accept this level as an interesting parent */
- partrelids = bms_add_member(partrelids, prel->relid);
- if (prel == parentrel)
- break; /* don't traverse above parentrel */
- } while (prel->reloptkind == RELOPT_OTHER_MEMBER_REL);
-
- if (partrelids)
- {
- /*
- * Found some relevant parent partitions, which may or may not
- * overlap with partition trees we already found. Add new
- * information to the allpartrelids list.
- */
- allpartrelids = add_part_relids(allpartrelids, partrelids);
- /* Also record the subplan in relid_subplan_map[] */
- /* No duplicates please */
- Assert(relid_subplan_map[pathrel->relid] == 0);
- relid_subplan_map[pathrel->relid] = i;
- }
- }
+ /* No duplicates please */
+ Assert(relid_subplan_map[pathrel->relid] == 0);
+ relid_subplan_map[pathrel->relid] = i;
i++;
}
@@ -359,63 +316,6 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return pruneinfo;
}
-/*
- * add_part_relids
- * Add new info to a list of Bitmapsets of partitioned relids.
- *
- * Within 'allpartrelids', there is one Bitmapset for each topmost parent
- * partitioned rel. Each Bitmapset contains the RT indexes of the topmost
- * parent as well as its relevant non-leaf child partitions. Since (by
- * construction of the rangetable list) parent partitions must have lower
- * RT indexes than their children, we can distinguish the topmost parent
- * as being the lowest set bit in the Bitmapset.
- *
- * 'partrelids' contains the RT indexes of a parent partitioned rel, and
- * possibly some non-leaf children, that are newly identified as parents of
- * some subpath rel passed to make_partition_pruneinfo(). These are added
- * to an appropriate member of 'allpartrelids'.
- *
- * Note that the list contains only RT indexes of partitioned tables that
- * are parents of some scan-level relation appearing in the 'subpaths' that
- * make_partition_pruneinfo() is dealing with. Also, "topmost" parents are
- * not allowed to be higher than the 'parentrel' associated with the append
- * path. In this way, we avoid expending cycles on partitioned rels that
- * can't contribute useful pruning information for the problem at hand.
- * (It is possible for 'parentrel' to be a child partitioned table, and it
- * is also possible for scan-level relations to be child partitioned tables
- * rather than leaf partitions. Hence we must construct this relation set
- * with reference to the particular append path we're dealing with, rather
- * than looking at the full partitioning structure represented in the
- * RelOptInfos.)
- */
-static List *
-add_part_relids(List *allpartrelids, Bitmapset *partrelids)
-{
- Index targetpart;
- ListCell *lc;
-
- /* We can easily get the lowest set bit this way: */
- targetpart = bms_next_member(partrelids, -1);
- Assert(targetpart > 0);
-
- /* Look for a matching topmost parent */
- foreach(lc, allpartrelids)
- {
- Bitmapset *currpartrelids = (Bitmapset *) lfirst(lc);
- Index currtarget = bms_next_member(currpartrelids, -1);
-
- if (targetpart == currtarget)
- {
- /* Found a match, so add any new RT indexes to this hierarchy */
- currpartrelids = bms_add_members(currpartrelids, partrelids);
- lfirst(lc) = currpartrelids;
- return allpartrelids;
- }
- }
- /* No match, so add the new partition hierarchy to the list */
- return lappend(allpartrelids, partrelids);
-}
-
/*
* make_partitionedrel_pruneinfo
* Build a List of PartitionedRelPruneInfos, one for each interesting
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 14ccfc1ac1..73c2a70028 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -128,6 +128,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* "flat list of Bitmapsets of RT indexes "*/
+ List *elidedAppendPartRels;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 1aeeaec95e..634c1908ca 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -79,6 +79,11 @@ typedef struct PlannedStmt
List *appendRelations; /* list of AppendRelInfo nodes */
+ List *elidedAppendPartRels; /* list of Bitmapsets of RT indexes of
+ * partitioned tables from Append/
+ * MergeAppend nodes that were elided
+ * in setrefs.c */
+
List *subplans; /* Plan trees for SubPlan expressions; note
* that some could be NULL */
@@ -269,6 +274,15 @@ typedef struct Append
List *appendplans;
int nasyncplans; /* # of asynchronous plans */
+ /*
+ * List of bitmapsets containing RT indexes of all partitioned tables
+ * scanned by this Append, with one bitmapset for every partitioned
+ * table appearing in the query. Each bitmapset contains the RT indexes
+ * of all non-pruned non-leaf partitions in the tree with a given
+ * partitioned table as root.
+ */
+ List *allpartrelids;
+
/*
* All 'appendplans' preceding this index are non-partial plans. All
* 'appendplans' from this index onwards are partial plans.
@@ -293,6 +307,9 @@ typedef struct MergeAppend
List *mergeplans;
+ /* See the description in Append's definition. */
+ List *allpartrelids;
+
/* these fields are just like the sort-key info in struct Sort: */
/* number of sort-key columns */
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index cc12c9c743..8e3d61c708 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -46,5 +46,8 @@ extern void add_row_identity_columns(PlannerInfo *root, Index rtindex,
RangeTblEntry *target_rte,
Relation target_relation);
extern void distribute_row_identity_vars(PlannerInfo *root);
+extern List *add_append_subpath_partrelids(PlannerInfo *root, Path *subpath,
+ RelOptInfo *parentrel,
+ List *allpartrelids);
#endif /* APPENDINFO_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index bd490d154f..1587298812 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -73,7 +73,8 @@ typedef struct PartitionPruneContext
extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ List *allpartrelids);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.43.0
v50-0001-Assorted-tightening-in-various-ExecEnd-routines.patchapplication/octet-stream; name=v50-0001-Assorted-tightening-in-various-ExecEnd-routines.patchDownload
From 28cadedabb59e5e3386f4d9a309b77bdc7047143 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 28 Sep 2023 16:56:29 +0900
Subject: [PATCH v50 1/6] Assorted tightening in various ExecEnd()* routines
This includes adding NULLness checks on pointers before cleaning them
up. Many ExecEnd*() routines already perform this check, but a few
are missing them. These NULLness checks might seem redundant as
things stand since the ExecEnd*() routines operate under the
assumption that their matching ExecInit* routine would have fully
executed, ensuring pointers are set. However, that assumption seems a
bit shaky in the face of future changes.
This also adds a guard at the begigging of EvalPlanQualEnd() to return
early if the EPQState does not appear to have been initialized. That
case can happen if the corresponding ExecInit*() routine returned
early without calling EvalPlanQualInit().
While at it, this commit ensures that pointers are consistently set
to NULL after cleanup in all ExecEnd*() routines.
Finally, for enhanced consistency, the format of NULLness checks has
been standardized to "if (pointer != NULL)", replacing the previous
"if (pointer)" style.
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 4 ++
src/backend/executor/nodeAgg.c | 27 +++++++++----
src/backend/executor/nodeAppend.c | 3 ++
src/backend/executor/nodeBitmapAnd.c | 4 +-
src/backend/executor/nodeBitmapHeapscan.c | 46 ++++++++++++++--------
src/backend/executor/nodeBitmapIndexscan.c | 23 ++++++-----
src/backend/executor/nodeBitmapOr.c | 4 +-
src/backend/executor/nodeCtescan.c | 3 +-
src/backend/executor/nodeForeignscan.c | 17 ++++----
src/backend/executor/nodeGather.c | 1 +
src/backend/executor/nodeGatherMerge.c | 1 +
src/backend/executor/nodeGroup.c | 6 +--
src/backend/executor/nodeHash.c | 6 +--
src/backend/executor/nodeHashjoin.c | 4 +-
src/backend/executor/nodeIncrementalSort.c | 13 +++++-
src/backend/executor/nodeIndexonlyscan.c | 25 ++++++------
src/backend/executor/nodeIndexscan.c | 23 ++++++-----
src/backend/executor/nodeLimit.c | 1 +
src/backend/executor/nodeLockRows.c | 1 +
src/backend/executor/nodeMaterial.c | 5 ++-
src/backend/executor/nodeMemoize.c | 7 +++-
src/backend/executor/nodeMergeAppend.c | 3 ++
src/backend/executor/nodeMergejoin.c | 2 +
src/backend/executor/nodeModifyTable.c | 11 +++++-
src/backend/executor/nodeNestloop.c | 2 +
src/backend/executor/nodeProjectSet.c | 1 +
src/backend/executor/nodeRecursiveunion.c | 24 +++++++++--
src/backend/executor/nodeResult.c | 1 +
src/backend/executor/nodeSamplescan.c | 7 +++-
src/backend/executor/nodeSeqscan.c | 16 +++-----
src/backend/executor/nodeSetOp.c | 6 ++-
src/backend/executor/nodeSort.c | 5 ++-
src/backend/executor/nodeSubqueryscan.c | 1 +
src/backend/executor/nodeTableFuncscan.c | 4 +-
src/backend/executor/nodeTidrangescan.c | 12 ++++--
src/backend/executor/nodeTidscan.c | 8 +++-
src/backend/executor/nodeUnique.c | 1 +
src/backend/executor/nodeWindowAgg.c | 41 +++++++++++++------
38 files changed, 246 insertions(+), 123 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4d7c92d63c..2d5234dee3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2986,6 +2986,10 @@ EvalPlanQualEnd(EPQState *epqstate)
MemoryContext oldcontext;
ListCell *l;
+ /* Nothing to do if no EvalPlanQualInit() was done to begin with. */
+ if (epqstate->parentestate == NULL)
+ return;
+
rtsize = epqstate->parentestate->es_range_table_size;
/*
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 53ead77ece..0dfba5ca16 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -4303,7 +4303,6 @@ GetAggInitVal(Datum textInitVal, Oid transtype)
void
ExecEndAgg(AggState *node)
{
- PlanState *outerPlan;
int transno;
int numGroupingSets = Max(node->maxsets, 1);
int setno;
@@ -4313,7 +4312,7 @@ ExecEndAgg(AggState *node)
* worker back into shared memory so that it can be picked up by the main
* process to report in EXPLAIN ANALYZE.
*/
- if (node->shared_info && IsParallelWorker())
+ if (node->shared_info != NULL && IsParallelWorker())
{
AggregateInstrumentation *si;
@@ -4326,10 +4325,16 @@ ExecEndAgg(AggState *node)
/* Make sure we have closed any open tuplesorts */
- if (node->sort_in)
+ if (node->sort_in != NULL)
+ {
tuplesort_end(node->sort_in);
- if (node->sort_out)
+ node->sort_in = NULL;
+ }
+ if (node->sort_out != NULL)
+ {
tuplesort_end(node->sort_out);
+ node->sort_out = NULL;
+ }
hashagg_reset_spill_state(node);
@@ -4345,19 +4350,25 @@ ExecEndAgg(AggState *node)
for (setno = 0; setno < numGroupingSets; setno++)
{
- if (pertrans->sortstates[setno])
+ if (pertrans->sortstates[setno] != NULL)
tuplesort_end(pertrans->sortstates[setno]);
}
}
/* And ensure any agg shutdown callbacks have been called */
for (setno = 0; setno < numGroupingSets; setno++)
+ {
ReScanExprContext(node->aggcontexts[setno]);
- if (node->hashcontext)
+ node->aggcontexts[setno] = NULL;
+ }
+ if (node->hashcontext != NULL)
+ {
ReScanExprContext(node->hashcontext);
+ node->hashcontext = NULL;
+ }
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index ca0f54d676..86d75b1a7e 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -399,7 +399,10 @@ ExecEndAppend(AppendState *node)
* shut down each of the subscans
*/
for (i = 0; i < nplans; i++)
+ {
ExecEndNode(appendplans[i]);
+ appendplans[i] = NULL;
+ }
}
void
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 9c9c666872..ae391222bf 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -192,8 +192,8 @@ ExecEndBitmapAnd(BitmapAndState *node)
*/
for (i = 0; i < nplans; i++)
{
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
+ ExecEndNode(bitmapplans[i]);
+ bitmapplans[i] = NULL;
}
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 3c63bdd93d..19f18ab817 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -625,8 +625,6 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node)
void
ExecEndBitmapHeapScan(BitmapHeapScanState *node)
{
- TableScanDesc scanDesc;
-
/*
* When ending a parallel worker, copy the statistics gathered by the
* worker back into shared memory so that it can be picked up by the main
@@ -650,38 +648,54 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
si->lossy_pages += node->stats.lossy_pages;
}
- /*
- * extract information from the node
- */
- scanDesc = node->ss.ss_currentScanDesc;
-
/*
* close down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
/*
* release bitmaps and buffers if any
*/
- if (node->tbmiterator)
+ if (node->tbmiterator != NULL)
+ {
tbm_end_iterate(node->tbmiterator);
- if (node->prefetch_iterator)
+ node->tbmiterator = NULL;
+ }
+ if (node->prefetch_iterator != NULL)
+ {
tbm_end_iterate(node->prefetch_iterator);
- if (node->tbm)
+ node->prefetch_iterator = NULL;
+ }
+ if (node->tbm != NULL)
+ {
tbm_free(node->tbm);
- if (node->shared_tbmiterator)
+ node->tbm = NULL;
+ }
+ if (node->shared_tbmiterator != NULL)
+ {
tbm_end_shared_iterate(node->shared_tbmiterator);
- if (node->shared_prefetch_iterator)
+ node->shared_tbmiterator = NULL;
+ }
+ if (node->shared_prefetch_iterator != NULL)
+ {
tbm_end_shared_iterate(node->shared_prefetch_iterator);
+ node->shared_prefetch_iterator = NULL;
+ }
if (node->pvmbuffer != InvalidBuffer)
+ {
ReleaseBuffer(node->pvmbuffer);
+ node->pvmbuffer = InvalidBuffer;
+ }
/*
- * close heap scan
+ * close heap scan (no-op if we didn't start it)
*/
- if (scanDesc)
- table_endscan(scanDesc);
-
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
+ table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 6df8e17ec8..4669e8d0ce 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -174,22 +174,21 @@ ExecReScanBitmapIndexScan(BitmapIndexScanState *node)
void
ExecEndBitmapIndexScan(BitmapIndexScanState *node)
{
- Relation indexRelationDesc;
- IndexScanDesc indexScanDesc;
-
- /*
- * extract information from the node
- */
- indexRelationDesc = node->biss_RelationDesc;
- indexScanDesc = node->biss_ScanDesc;
+ /* close the scan (no-op if we didn't start it) */
+ if (node->biss_ScanDesc != NULL)
+ {
+ index_endscan(node->biss_ScanDesc);
+ node->biss_ScanDesc = NULL;
+ }
/*
* close the index relation (no-op if we didn't open it)
*/
- if (indexScanDesc)
- index_endscan(indexScanDesc);
- if (indexRelationDesc)
- index_close(indexRelationDesc, NoLock);
+ if (node->biss_RelationDesc != NULL)
+ {
+ index_close(node->biss_RelationDesc, NoLock);
+ node->biss_RelationDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 7029536c64..de439235d2 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -210,8 +210,8 @@ ExecEndBitmapOr(BitmapOrState *node)
*/
for (i = 0; i < nplans; i++)
{
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
+ ExecEndNode(bitmapplans[i]);
+ bitmapplans[i] = NULL;
}
}
diff --git a/src/backend/executor/nodeCtescan.c b/src/backend/executor/nodeCtescan.c
index 8081eed887..7cea943988 100644
--- a/src/backend/executor/nodeCtescan.c
+++ b/src/backend/executor/nodeCtescan.c
@@ -290,10 +290,11 @@ ExecEndCteScan(CteScanState *node)
/*
* If I am the leader, free the tuplestore.
*/
- if (node->leader == node)
+ if (node->leader != NULL && node->leader == node)
{
tuplestore_end(node->cte_table);
node->cte_table = NULL;
+ node->leader = NULL;
}
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index fe4ae55c0f..1357ccf3c9 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -300,17 +300,20 @@ ExecEndForeignScan(ForeignScanState *node)
EState *estate = node->ss.ps.state;
/* Let the FDW shut down */
- if (plan->operation != CMD_SELECT)
+ if (node->fdwroutine != NULL)
{
- if (estate->es_epq_active == NULL)
- node->fdwroutine->EndDirectModify(node);
+ if (plan->operation != CMD_SELECT)
+ {
+ if (estate->es_epq_active == NULL)
+ node->fdwroutine->EndDirectModify(node);
+ }
+ else
+ node->fdwroutine->EndForeignScan(node);
}
- else
- node->fdwroutine->EndForeignScan(node);
/* Shut down any outer plan. */
- if (outerPlanState(node))
- ExecEndNode(outerPlanState(node));
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 5d4ffe989c..cae5ea1f92 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -244,6 +244,7 @@ void
ExecEndGather(GatherState *node)
{
ExecEndNode(outerPlanState(node)); /* let children clean up first */
+ outerPlanState(node) = NULL;
ExecShutdownGather(node);
}
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 45f6017c29..b36cd89e7d 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -284,6 +284,7 @@ void
ExecEndGatherMerge(GatherMergeState *node)
{
ExecEndNode(outerPlanState(node)); /* let children clean up first */
+ outerPlanState(node) = NULL;
ExecShutdownGatherMerge(node);
}
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index da32bec181..807429e504 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -225,10 +225,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
void
ExecEndGroup(GroupState *node)
{
- PlanState *outerPlan;
-
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 61480733a1..dbf4920363 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -412,13 +412,11 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
void
ExecEndHash(HashState *node)
{
- PlanState *outerPlan;
-
/*
* shut down the subplan
*/
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 5429e68734..592c098b9f 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -870,7 +870,7 @@ ExecEndHashJoin(HashJoinState *node)
/*
* Free hash table
*/
- if (node->hj_HashTable)
+ if (node->hj_HashTable != NULL)
{
ExecHashTableDestroy(node->hj_HashTable);
node->hj_HashTable = NULL;
@@ -880,7 +880,9 @@ ExecEndHashJoin(HashJoinState *node)
* clean up subtrees
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
}
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 2ce5ed5ec8..010bcfafa8 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1078,8 +1078,16 @@ ExecEndIncrementalSort(IncrementalSortState *node)
{
SO_printf("ExecEndIncrementalSort: shutting down sort node\n");
- ExecDropSingleTupleTableSlot(node->group_pivot);
- ExecDropSingleTupleTableSlot(node->transfer_tuple);
+ if (node->group_pivot != NULL)
+ {
+ ExecDropSingleTupleTableSlot(node->group_pivot);
+ node->group_pivot = NULL;
+ }
+ if (node->transfer_tuple != NULL)
+ {
+ ExecDropSingleTupleTableSlot(node->transfer_tuple);
+ node->transfer_tuple = NULL;
+ }
/*
* Release tuplesort resources.
@@ -1099,6 +1107,7 @@ ExecEndIncrementalSort(IncrementalSortState *node)
* Shut down the subplan.
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
SO_printf("ExecEndIncrementalSort: sort node shutdown\n");
}
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 612c673895..481d479760 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -397,15 +397,6 @@ ExecReScanIndexOnlyScan(IndexOnlyScanState *node)
void
ExecEndIndexOnlyScan(IndexOnlyScanState *node)
{
- Relation indexRelationDesc;
- IndexScanDesc indexScanDesc;
-
- /*
- * extract information from the node
- */
- indexRelationDesc = node->ioss_RelationDesc;
- indexScanDesc = node->ioss_ScanDesc;
-
/* Release VM buffer pin, if any. */
if (node->ioss_VMBuffer != InvalidBuffer)
{
@@ -413,13 +404,21 @@ ExecEndIndexOnlyScan(IndexOnlyScanState *node)
node->ioss_VMBuffer = InvalidBuffer;
}
+ /* close the scan (no-op if we didn't start it) */
+ if (node->ioss_ScanDesc != NULL)
+ {
+ index_endscan(node->ioss_ScanDesc);
+ node->ioss_ScanDesc = NULL;
+ }
+
/*
* close the index relation (no-op if we didn't open it)
*/
- if (indexScanDesc)
- index_endscan(indexScanDesc);
- if (indexRelationDesc)
- index_close(indexRelationDesc, NoLock);
+ if (node->ioss_RelationDesc != NULL)
+ {
+ index_close(node->ioss_RelationDesc, NoLock);
+ node->ioss_RelationDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 8000feff4c..a8172d8b82 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -784,22 +784,21 @@ ExecIndexAdvanceArrayKeys(IndexArrayKeyInfo *arrayKeys, int numArrayKeys)
void
ExecEndIndexScan(IndexScanState *node)
{
- Relation indexRelationDesc;
- IndexScanDesc indexScanDesc;
-
- /*
- * extract information from the node
- */
- indexRelationDesc = node->iss_RelationDesc;
- indexScanDesc = node->iss_ScanDesc;
+ /* close the scan (no-op if we didn't start it) */
+ if (node->iss_ScanDesc != NULL)
+ {
+ index_endscan(node->iss_ScanDesc);
+ node->iss_ScanDesc = NULL;
+ }
/*
* close the index relation (no-op if we didn't open it)
*/
- if (indexScanDesc)
- index_endscan(indexScanDesc);
- if (indexRelationDesc)
- index_close(indexRelationDesc, NoLock);
+ if (node->iss_RelationDesc != NULL)
+ {
+ index_close(node->iss_RelationDesc, NoLock);
+ node->iss_RelationDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index e6f1fb1562..eb7b6e52be 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -534,6 +534,7 @@ void
ExecEndLimit(LimitState *node)
{
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 41754ddfea..0d3489195b 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -387,6 +387,7 @@ ExecEndLockRows(LockRowsState *node)
/* We may have shut down EPQ already, but no harm in another call */
EvalPlanQualEnd(&node->lr_epqstate);
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 22e1787fbd..883e3f3933 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -243,13 +243,16 @@ ExecEndMaterial(MaterialState *node)
* Release tuplestore resources
*/
if (node->tuplestorestate != NULL)
+ {
tuplestore_end(node->tuplestorestate);
- node->tuplestorestate = NULL;
+ node->tuplestorestate = NULL;
+ }
/*
* shut down the subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index df8e3fff08..690dee1daa 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -1128,12 +1128,17 @@ ExecEndMemoize(MemoizeState *node)
}
/* Remove the cache context */
- MemoryContextDelete(node->tableContext);
+ if (node->tableContext != NULL)
+ {
+ MemoryContextDelete(node->tableContext);
+ node->tableContext = NULL;
+ }
/*
* shut down the subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index e1b9b984a7..3236444cf1 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -333,7 +333,10 @@ ExecEndMergeAppend(MergeAppendState *node)
* shut down each of the subscans
*/
for (i = 0; i < nplans; i++)
+ {
ExecEndNode(mergeplans[i]);
+ mergeplans[i] = NULL;
+ }
}
void
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 29c54fcd75..926e631d88 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1647,7 +1647,9 @@ ExecEndMergeJoin(MergeJoinState *node)
* shut down the subplans
*/
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
MJ1_printf("ExecEndMergeJoin: %s\n",
"node processing ended");
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4913e49319..062a780f29 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4721,7 +4721,9 @@ ExecEndModifyTable(ModifyTableState *node)
for (j = 0; j < resultRelInfo->ri_NumSlotsInitialized; j++)
{
ExecDropSingleTupleTableSlot(resultRelInfo->ri_Slots[j]);
+ resultRelInfo->ri_Slots[j] = NULL;
ExecDropSingleTupleTableSlot(resultRelInfo->ri_PlanSlots[j]);
+ resultRelInfo->ri_PlanSlots[j] = NULL;
}
}
@@ -4729,12 +4731,16 @@ ExecEndModifyTable(ModifyTableState *node)
* Close all the partitioned tables, leaf partitions, and their indices
* and release the slot used for tuple routing, if set.
*/
- if (node->mt_partition_tuple_routing)
+ if (node->mt_partition_tuple_routing != NULL)
{
ExecCleanupTupleRouting(node, node->mt_partition_tuple_routing);
+ node->mt_partition_tuple_routing = NULL;
- if (node->mt_root_tuple_slot)
+ if (node->mt_root_tuple_slot != NULL)
+ {
ExecDropSingleTupleTableSlot(node->mt_root_tuple_slot);
+ node->mt_root_tuple_slot = NULL;
+ }
}
/*
@@ -4746,6 +4752,7 @@ ExecEndModifyTable(ModifyTableState *node)
* shut down subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index 7f4bf6c4db..01f3d56a3b 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -367,7 +367,9 @@ ExecEndNestLoop(NestLoopState *node)
* close down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
NL1_printf("ExecEndNestLoop: %s\n",
"node processing ended");
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index e483730015..ca9a5e2ed2 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -331,6 +331,7 @@ ExecEndProjectSet(ProjectSetState *node)
* shut down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index c7f8a19fa4..7680142c7b 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -272,20 +272,36 @@ void
ExecEndRecursiveUnion(RecursiveUnionState *node)
{
/* Release tuplestores */
- tuplestore_end(node->working_table);
- tuplestore_end(node->intermediate_table);
+ if (node->working_table != NULL)
+ {
+ tuplestore_end(node->working_table);
+ node->working_table = NULL;
+ }
+ if (node->intermediate_table != NULL)
+ {
+ tuplestore_end(node->intermediate_table);
+ node->intermediate_table = NULL;
+ }
/* free subsidiary stuff including hashtable */
- if (node->tempContext)
+ if (node->tempContext != NULL)
+ {
MemoryContextDelete(node->tempContext);
- if (node->tableContext)
+ node->tempContext = NULL;
+ }
+ if (node->tableContext != NULL)
+ {
MemoryContextDelete(node->tableContext);
+ node->tableContext = NULL;
+ }
/*
* close down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 348361e7f4..e3cfc9b772 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -243,6 +243,7 @@ ExecEndResult(ResultState *node)
* shut down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 714b076e64..6ab91001bc 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -181,14 +181,17 @@ ExecEndSampleScan(SampleScanState *node)
/*
* Tell sampling function that we finished the scan.
*/
- if (node->tsmroutine->EndSampleScan)
+ if (node->tsmroutine != NULL && node->tsmroutine->EndSampleScan)
node->tsmroutine->EndSampleScan(node);
/*
- * close heap scan
+ * close heap scan (no-op if we didn't start it)
*/
if (node->ss.ss_currentScanDesc)
+ {
table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 7cb12a11c2..b052775e5b 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -183,18 +183,14 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
void
ExecEndSeqScan(SeqScanState *node)
{
- TableScanDesc scanDesc;
-
- /*
- * get information from node
- */
- scanDesc = node->ss.ss_currentScanDesc;
-
/*
- * close heap scan
+ * close heap scan (no-op if we didn't start it)
*/
- if (scanDesc != NULL)
- table_endscan(scanDesc);
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
+ table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index a8ac68b482..fe34b2134f 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -583,10 +583,14 @@ void
ExecEndSetOp(SetOpState *node)
{
/* free subsidiary stuff including hashtable */
- if (node->tableContext)
+ if (node->tableContext != NULL)
+ {
MemoryContextDelete(node->tableContext);
+ node->tableContext = NULL;
+ }
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index 3fc925d7b4..af852464d0 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -307,13 +307,16 @@ ExecEndSort(SortState *node)
* Release tuplesort resources
*/
if (node->tuplesortstate != NULL)
+ {
tuplesort_end((Tuplesortstate *) node->tuplesortstate);
- node->tuplesortstate = NULL;
+ node->tuplesortstate = NULL;
+ }
/*
* shut down the subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
SO1_printf("ExecEndSort: %s\n",
"sort node shutdown");
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 782097eaf2..0b2612183a 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -171,6 +171,7 @@ ExecEndSubqueryScan(SubqueryScanState *node)
* close down subquery
*/
ExecEndNode(node->subplan);
+ node->subplan = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTableFuncscan.c b/src/backend/executor/nodeTableFuncscan.c
index f483221bb8..778d25d511 100644
--- a/src/backend/executor/nodeTableFuncscan.c
+++ b/src/backend/executor/nodeTableFuncscan.c
@@ -223,8 +223,10 @@ ExecEndTableFuncScan(TableFuncScanState *node)
* Release tuplestore resources
*/
if (node->tupstore != NULL)
+ {
tuplestore_end(node->tupstore);
- node->tupstore = NULL;
+ node->tupstore = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 9aa7683d7e..702ee884d2 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -326,10 +326,14 @@ ExecReScanTidRangeScan(TidRangeScanState *node)
void
ExecEndTidRangeScan(TidRangeScanState *node)
{
- TableScanDesc scan = node->ss.ss_currentScanDesc;
-
- if (scan != NULL)
- table_endscan(scan);
+ /*
+ * close heap scan (no-op if we didn't start it)
+ */
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
+ table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 864a9013b6..f375951699 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -469,8 +469,14 @@ ExecReScanTidScan(TidScanState *node)
void
ExecEndTidScan(TidScanState *node)
{
- if (node->ss.ss_currentScanDesc)
+ /*
+ * close heap scan (no-op if we didn't start it)
+ */
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index a125923e93..b82d0e9ad5 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -168,6 +168,7 @@ void
ExecEndUnique(UniqueState *node)
{
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 3221fa1522..561d7e731d 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -1351,11 +1351,14 @@ release_partition(WindowAggState *winstate)
* any aggregate temp data). We don't rely on retail pfree because some
* aggregates might have allocated data we don't have direct pointers to.
*/
- MemoryContextReset(winstate->partcontext);
- MemoryContextReset(winstate->aggcontext);
+ if (winstate->partcontext != NULL)
+ MemoryContextReset(winstate->partcontext);
+ if (winstate->aggcontext != NULL)
+ MemoryContextReset(winstate->aggcontext);
for (i = 0; i < winstate->numaggs; i++)
{
- if (winstate->peragg[i].aggcontext != winstate->aggcontext)
+ if (winstate->peragg[i].aggcontext != NULL &&
+ winstate->peragg[i].aggcontext != winstate->aggcontext)
MemoryContextReset(winstate->peragg[i].aggcontext);
}
@@ -2681,24 +2684,40 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
void
ExecEndWindowAgg(WindowAggState *node)
{
- PlanState *outerPlan;
int i;
release_partition(node);
for (i = 0; i < node->numaggs; i++)
{
- if (node->peragg[i].aggcontext != node->aggcontext)
+ if (node->peragg[i].aggcontext != NULL &&
+ node->peragg[i].aggcontext != node->aggcontext)
MemoryContextDelete(node->peragg[i].aggcontext);
}
- MemoryContextDelete(node->partcontext);
- MemoryContextDelete(node->aggcontext);
+ if (node->partcontext != NULL)
+ {
+ MemoryContextDelete(node->partcontext);
+ node->partcontext = NULL;
+ }
+ if (node->aggcontext != NULL)
+ {
+ MemoryContextDelete(node->aggcontext);
+ node->aggcontext = NULL;
+ }
- pfree(node->perfunc);
- pfree(node->peragg);
+ if (node->perfunc != NULL)
+ {
+ pfree(node->perfunc);
+ node->perfunc = NULL;
+ }
+ if (node->peragg != NULL)
+ {
+ pfree(node->peragg);
+ node->peragg = NULL;
+ }
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
/* -----------------
--
2.43.0
v50-0003-Assert-that-relations-needing-their-permissions-.patchapplication/octet-stream; name=v50-0003-Assert-that-relations-needing-their-permissions-.patchDownload
From 1657389c3c44b1765919773ad9cbaaab8a72fa64 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Mon, 25 Sep 2023 11:52:02 +0900
Subject: [PATCH v50 3/6] Assert that relations needing their permissions
checked are locked
---
src/backend/executor/execMain.c | 13 +++++++++++++
src/backend/storage/lmgr/lmgr.c | 1 +
src/backend/utils/cache/lsyscache.c | 1 -
3 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2d5234dee3..5e1b8a42e8 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -52,6 +52,7 @@
#include "miscadmin.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -602,6 +603,18 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
(rte->rtekind == RTE_SUBQUERY &&
rte->relkind == RELKIND_VIEW));
+ /*
+ * Relations whose permissions need to be checked must already
+ * have been locked by the parser or by GetCachedPlan() if a
+ * cached plan is being executed.
+ *
+ * XXX Maybe we should we skip calling ExecCheckPermissions from
+ * InitPlan in a parallel worker.
+ */
+ Assert(IsParallelWorker() ||
+ CheckRelationOidLockedByMe(rte->relid, AccessShareLock,
+ true));
+
(void) getRTEPermissionInfo(rteperminfos, rte);
/* Many-to-one mapping not allowed */
Assert(!bms_is_member(rte->perminfoindex, indexset));
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 094522acb4..a1c89f5d72 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -26,6 +26,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 48a280d089..f647821382 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2113,7 +2113,6 @@ get_rel_relam(Oid relid)
return result;
}
-
/* ---------- TRANSFORM CACHE ---------- */
Oid
--
2.43.0
v50-0005-Don-t-lock-child-tables-in-GetCachedPlan.patchapplication/octet-stream; name=v50-0005-Don-t-lock-child-tables-in-GetCachedPlan.patchDownload
From a1aef6d727c0a6c6f3305a623f6337e35d007be5 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:54:15 +0900
Subject: [PATCH v50 5/6] Don't lock child tables in GetCachedPlan()
Currently, GetCachedPlan() takes a lock on all relations contained in
a cached plan before returning it as a valid plan to its callers for
execution. One disadvantage is that if the plan contains partitions
that are prunable with conditions involving EXTERN parameters and
other stable expressions (known as "initial pruning"), many of them
would be locked unnecessarily, because only those that survive
initial pruning need to have been locked. Locking all partitions this
way causes significant delay when there are many partitions. Note
that initial pruning occurs during executor's initialization of the
plan, that is, ExecInitNode().
Previous commits have made all the necessary adjustment to make the
executor lock child tables, to detect invalidation of the CachedPlan
resulting from that, and to retry the execution with a new CachePlan.
So, this commit simply removes the code in plancache.c that does the
"for execution" locking, aka AcquireExecutorLocks().
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 2 +-
src/backend/utils/cache/plancache.c | 154 +++++++----------
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 68 +++++++-
.../expected/cached-plan-replan.out | 158 ++++++++++++++++++
.../specs/cached-plan-replan.spec | 61 +++++++
6 files changed, 340 insertions(+), 106 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-replan.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-replan.spec
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index cf059cb850..168ab553ac 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1508,7 +1508,7 @@ ExecGetAncestorResultRels(EState *estate, ResultRelInfo *resultRelInfo)
/*
* All ancestors up to the root target relation must have been
- * locked by the planner or AcquireExecutorLocks().
+ * locked by the planner or ExecLockAppendNonLeafPartitions().
*/
ancRel = table_open(ancOid, NoLock);
rInfo = makeNode(ResultRelInfo);
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 5af1a168ec..ad33d611f9 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -103,13 +103,13 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool GenericPlanIsValid(CachedPlan *cplan);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -815,8 +815,13 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * If the plan includes child relations introduced by the planner, they
+ * wouldn't be locked yet. This is because AcquirePlannerLocks() only locks
+ * relations present in the original query's range table (before planner
+ * entry). Hence, the plan might become stale if child relations are modified
+ * concurrently. During the plan initialization, the executor must ensure the
+ * plan (CachedPlan) remains valid after locking each child table. If found
+ * invalid, the caller should be prompted to recreate the plan.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -830,60 +835,56 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (!plan)
return false;
- Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Generic plans are never one-shot */
- Assert(!plan->is_oneshot);
+ if (GenericPlanIsValid(plan))
+ return true;
/*
- * If plan isn't valid for current role, we can't use it.
+ * Plan has been invalidated, so unlink it from the parent and release it.
*/
- if (plan->is_valid && plan->dependsOnRole &&
- plan->planRoleId != GetUserId())
- plan->is_valid = false;
+ ReleaseGenericPlan(plansource);
- /*
- * If it appears valid, acquire locks and recheck; this is much the same
- * logic as in RevalidateCachedQuery, but for a plan.
- */
- if (plan->is_valid)
+ return false;
+}
+
+/*
+ * GenericPlanIsValid
+ * Is a generic plan still valid?
+ *
+ * It may have gone stale due to concurrent schema modifications of relations
+ * mentioned in the plan or a couple of other things mentioned below.
+ */
+static bool
+GenericPlanIsValid(CachedPlan *cplan)
+{
+ Assert(cplan != NULL);
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+ /* Generic plans are never one-shot */
+ Assert(!cplan->is_oneshot);
+
+ if (cplan->is_valid)
{
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
- Assert(plan->refcount > 0);
-
- AcquireExecutorLocks(plan->stmt_list, true);
+ Assert(cplan->refcount > 0);
/*
- * If plan was transient, check to see if TransactionXmin has
- * advanced, and if so invalidate it.
+ * If plan isn't valid for current role, we can't use it.
*/
- if (plan->is_valid &&
- TransactionIdIsValid(plan->saved_xmin) &&
- !TransactionIdEquals(plan->saved_xmin, TransactionXmin))
- plan->is_valid = false;
+ if (cplan->dependsOnRole && cplan->planRoleId != GetUserId())
+ cplan->is_valid = false;
/*
- * By now, if any invalidation has happened, the inval callback
- * functions will have marked the plan invalid.
+ * If plan was transient, check to see if TransactionXmin has
+ * advanced, and if so invalidate it.
*/
- if (plan->is_valid)
- {
- /* Successfully revalidated and locked the query. */
- return true;
- }
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (TransactionIdIsValid(cplan->saved_xmin) &&
+ !TransactionIdEquals(cplan->saved_xmin, TransactionXmin))
+ cplan->is_valid = false;
}
- /*
- * Plan has been invalidated, so unlink it from the parent and release it.
- */
- ReleaseGenericPlan(plansource);
-
- return false;
+ return cplan->is_valid;
}
/*
@@ -1153,8 +1154,16 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * Typically, the plan returned by this function is valid. However, a caveat
+ * arises with inheritance/partition child tables. These aren't locked by
+ * this function, as we only lock tables directly mentioned in the original
+ * query here. The task of locking these child tables falls to the executor
+ * during plan tree setup. If acquiring these locks invalidates the plan, the
+ * executor should inform the caller to regenerate the plan by invoking this
+ * function again. The reason for this deferred child table locking mechanism
+ * is efficiency: not all might need to be locked. Some could be pruned during
+ * executor initialization, especially if their corresponding plan nodes
+ * facilitate partition pruning.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1189,7 +1198,10 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
{
if (CheckCachedPlan(plansource))
{
- /* We want a generic plan, and we already have a valid one */
+ /*
+ * We want a generic plan, and we already have a valid one, though
+ * see the header comment.
+ */
plan = plansource->gplan;
Assert(plan->magic == CACHEDPLAN_MAGIC);
}
@@ -1387,8 +1399,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if the executor would need to take additional locks, that is, in
+ * addition to those taken by AcquirePlannerLocks() on a given query.
*/
foreach(lc, plan->stmt_list)
{
@@ -1764,58 +1776,6 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
- */
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
-{
- ListCell *lc1;
-
- foreach(lc1, stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
-
- if (plannedstmt->commandType == CMD_UTILITY)
- {
- /*
- * Ignore utility statements, except those (such as EXPLAIN) that
- * contain a parsed-but-not-planned query. Note: it's okay to use
- * ScanQueryForLocks, even though the query hasn't been through
- * rule rewriting, because rewriting doesn't change the query
- * representation.
- */
- Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
-
- if (query)
- ScanQueryForLocks(query, acquire);
- continue;
- }
-
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
-
- /*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
- */
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
-}
-
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..2fca84d027 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-replan
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 155c8a8d55..8bb6f0319a 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2024, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,46 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static bool
+delay_execution_ExecutorStart(QueryDesc *queryDesc, CachedPlan *cplan,
+ int eflags)
+{
+ bool plan_valid;
+
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ plan_valid = prev_ExecutorStart_hook(queryDesc, cplan, eflags);
+ else
+ plan_valid = standard_ExecutorStart(queryDesc, cplan, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ plan_valid ? "valid" : "not valid");
+
+ return plan_valid;
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +128,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-replan.out b/src/test/modules/delay_execution/expected/cached-plan-replan.out
new file mode 100644
index 0000000000..122d81f2ee
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-replan.out
@@ -0,0 +1,158 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+----------------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo11 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo11_a
+ Index Cond: (a = $1)
+(7 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-----------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo11 foo_1
+ Filter: (a = $1)
+(5 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+----------------------------------
+Bitmap Heap Scan on foo11 foo
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on foo11_a
+ Index Cond: (a = 1)
+(4 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------
+Seq Scan on foo11 foo
+ Filter: (a = 1)
+(2 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+------------------------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Index Only Scan using foo11_a on foo11 t1
+ -> Materialize
+ -> Index Scan using foo11_a on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(18 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo11_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Append
+ -> GroupAggregate
+ Group Key: t1.a
+ -> Merge Join
+ Merge Cond: (t1.a = t2.a)
+ -> Sort
+ Sort Key: t1.a
+ -> Seq Scan on foo11 t1
+ -> Sort
+ Sort Key: t2.a
+ -> Seq Scan on foo11 t2
+ -> GroupAggregate
+ Group Key: t1_1.a
+ -> Merge Join
+ Merge Cond: (t1_1.a = t2_1.a)
+ -> Sort
+ Sort Key: t1_1.a
+ -> Seq Scan on foo2 t1_1
+ -> Sort
+ Sort Key: t2_1.a
+ -> Seq Scan on foo2 t2_1
+(21 rows)
+
diff --git a/src/test/modules/delay_execution/specs/cached-plan-replan.spec b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
new file mode 100644
index 0000000000..2d0607b176
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-replan.spec
@@ -0,0 +1,61 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo1 PARTITION OF foo FOR VALUES IN (1) PARTITION BY LIST (a);
+ CREATE TABLE foo11 PARTITION OF foo1 FOR VALUES IN (1);
+ CREATE INDEX foo11_a ON foo11 (a);
+ CREATE TABLE foo2 PARTITION OF foo FOR VALUES IN (2);
+ CREATE VIEW foov AS SELECT * FROM foo;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP TABLE foo;
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# no Append case (only one partition selected by the planner)
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = 1;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Append with partition-wise join aggregate and join plans as child subplans
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ SET enable_partitionwise_aggregate = on;
+ SET enable_partitionwise_join = on;
+ PREPARE q3 AS SELECT t1.a, count(t2.b) FROM foo t1, foo t2 WHERE t1.a = t2.a GROUP BY 1;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo11_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
--
2.43.0
v50-0004-Preparations-to-allow-executor-to-take-locks-in-.patchapplication/octet-stream; name=v50-0004-Preparations-to-allow-executor-to-take-locks-in-.patchDownload
From cef620d80a764fa0787d074c3cc4dacba73ed190 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:53:46 +0900
Subject: [PATCH v50 4/6] Preparations to allow executor to take locks in some
cases
This does two things on the executor side:
* Make the executor take locks on the child tables if the plan
comes from a CachedPlan. To determine that a given range table
relation is a child table, ExecGetRangeTableRelation() now
examines RangeTblEntry.inFromCl. To that end, the planner now
sets inFromCl to false in the child tables' RTEs. Also, the
to deal with the cases where an unlocked child table might have
been concurrently dropped, ExecGetRangeTableRelation() is now
made to use try_table_open() replacing the existing table_open().
* Add checks at various points during the executor's initialization
of the plan tree to determine whether the originating CachedPlan
has become invalid as a result of taking locks on the relations
referenced in the plan. This includes addding the check after
every call to ExecOpenScanRelation() / ExecGetRangeTableRelation()
and to ExecInitNode(), including the recursive ones to initialize
child nodes.
If a given ExecInit*() function or any function called by it detects
that the plan has become invalid, it should return immediately even
if the PlanState node it's building may only be partially valid.
That is crucial for two reasons depending on where the check is:
* The checks following ExecOpenScanRelation() /
ExecGetRangeTableRelation() may find that the relation being
opened has been dropped concurrently or that the plan has become
invalid. In this case, some operations in the code that follows
may no longer be safe to do. For example, it might try to
dereference a NULL pointer in the case where the relation was
dropped.
* For the checks following ExecInitNode(), the returned child
PlanState node might be only partially invalid. The code that
follows may misbehave if it depends on inspecting the child
PlanState. Note that this commit adds the check following all
calls of ExecInitNode() that exist in the code base, even at
sites where there is no code that might misbehave today, because
it might misbehave in the future. It seems like a good idea to
put the guards in place today rather than in the future when the
need arises.
To pass the CachedPlan that the executor will use for these checks,
ExecutorStart() (and ExecutorStart_hook) now gets a new parameter
CachedPlan *cplan.
Changes on the side of the callers of ExecutorStart():
ExecutorStart() (and ExecutorStart_hook()) now return a Boolean
telling the caller if the plan initialization failed. When it
returns false due to the CachedPlan becoming invalid, the execution
should be reattempted using a fresh CachedPlan. Actually, a new
function TryExecutorStart() is added to use by the call sites that
supply the PlannedStmt to execute from a CachedPlan, which takes
care of cleaning up the partially initialized execution state
including planstate tree.
For the replan loop in that context, it makes more sense to have
ExecutorStart() either in the same scope or closer to where
GetCachedPlan() is invoked. So this commit modifies the following
sites:
* ExplainOnePlan() now returns a Boolean to tell the caller that
TryExecutorStart() failed, so the caller should retry with a new
plan.
* The ExecutorStart() call in _SPI_pquery() is moved to its caller
_SPI_execute_plan() and replaced by TryExecutorStart().
* The ExecutorStart() call in PortalRunMulti() is moved to
PortalStart() and replaced by TryExecutorStart(). This requires a
new List field in PortalData to store the QueryDescs created in
PortalStart() and a new memory context for those. One unintended
consequence is that CommandCounterIncrement() between queries in
the PORTAL_MULTI_QUERY case is now done in the loop in PortalStart()
and not in PortalRunMulti(). That still works because the Snapshot
registered in QueryDesc/EState is updated to account for the CCI().
This commit also adds a new flag to EState called es_canceled that
complements es_finished to denote the new scenario where
ExecutorStart() returns with a partially setup planstate tree. Also,
to reset the AFTER trigger state that would have been set up in the
ExecutorStart(), this adds a new function AfterTriggerCancelQuery()
which is called from ExecutorEnd() (not ExecutorFinish()) when
es_canceled is true.
Note that this commit by itself doesn't make any functional change,
because ExecutorStart() currently always returns true. The changes
to make it check if the CachedPlan has become invalid will be
added in the upcoming patches.
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.comk
---
contrib/auto_explain/auto_explain.c | 15 +-
.../pg_stat_statements/pg_stat_statements.c | 15 +-
contrib/postgres_fdw/postgres_fdw.c | 10 +-
src/backend/commands/copyto.c | 5 +-
src/backend/commands/createas.c | 9 +-
src/backend/commands/explain.c | 42 ++-
src/backend/commands/extension.c | 6 +-
src/backend/commands/matview.c | 10 +-
src/backend/commands/portalcmds.c | 6 +-
src/backend/commands/prepare.c | 30 +-
src/backend/commands/trigger.c | 13 +
src/backend/executor/README | 36 ++-
src/backend/executor/execMain.c | 144 ++++++++--
src/backend/executor/execParallel.c | 7 +-
src/backend/executor/execPartition.c | 2 +
src/backend/executor/execProcnode.c | 7 +
src/backend/executor/execUtils.c | 98 ++++++-
src/backend/executor/functions.c | 8 +-
src/backend/executor/nodeAgg.c | 2 +
src/backend/executor/nodeAppend.c | 24 +-
src/backend/executor/nodeBitmapAnd.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 4 +
src/backend/executor/nodeBitmapIndexscan.c | 6 +-
src/backend/executor/nodeBitmapOr.c | 2 +
src/backend/executor/nodeCustom.c | 2 +
src/backend/executor/nodeForeignscan.c | 4 +
src/backend/executor/nodeGather.c | 2 +
src/backend/executor/nodeGatherMerge.c | 2 +
src/backend/executor/nodeGroup.c | 2 +
src/backend/executor/nodeHash.c | 2 +
src/backend/executor/nodeHashjoin.c | 4 +
src/backend/executor/nodeIncrementalSort.c | 2 +
src/backend/executor/nodeIndexonlyscan.c | 6 +-
src/backend/executor/nodeIndexscan.c | 8 +-
src/backend/executor/nodeLimit.c | 2 +
src/backend/executor/nodeLockRows.c | 2 +
src/backend/executor/nodeMaterial.c | 2 +
src/backend/executor/nodeMemoize.c | 2 +
src/backend/executor/nodeMergeAppend.c | 18 +-
src/backend/executor/nodeMergejoin.c | 4 +
src/backend/executor/nodeModifyTable.c | 13 +
src/backend/executor/nodeNestloop.c | 4 +
src/backend/executor/nodeProjectSet.c | 2 +
src/backend/executor/nodeRecursiveunion.c | 4 +
src/backend/executor/nodeResult.c | 2 +
src/backend/executor/nodeSamplescan.c | 3 +
src/backend/executor/nodeSeqscan.c | 3 +
src/backend/executor/nodeSetOp.c | 2 +
src/backend/executor/nodeSort.c | 2 +
src/backend/executor/nodeSubqueryscan.c | 2 +
src/backend/executor/nodeTidrangescan.c | 2 +
src/backend/executor/nodeTidscan.c | 2 +
src/backend/executor/nodeUnique.c | 2 +
src/backend/executor/nodeWindowAgg.c | 2 +
src/backend/executor/spi.c | 46 ++-
src/backend/optimizer/util/inherit.c | 7 +
src/backend/parser/analyze.c | 7 +-
src/backend/tcop/postgres.c | 17 +-
src/backend/tcop/pquery.c | 263 ++++++++++--------
src/backend/utils/mmgr/portalmem.c | 9 +
src/include/commands/explain.h | 4 +-
src/include/commands/trigger.h | 1 +
src/include/executor/executor.h | 21 +-
src/include/nodes/execnodes.h | 5 +
src/include/nodes/parsenodes.h | 8 +-
src/include/tcop/pquery.h | 5 +-
src/include/utils/plancache.h | 14 +
src/include/utils/portal.h | 2 +
68 files changed, 804 insertions(+), 217 deletions(-)
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 677c135f59..1b4b8ad8b6 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -78,7 +78,8 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
-static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool explain_ExecutorStart(QueryDesc *queryDesc, CachedPlan *cplan,
+ int eflags);
static void explain_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count, bool execute_once);
@@ -258,9 +259,11 @@ _PG_init(void)
/*
* ExecutorStart hook: start up logging if needed
*/
-static void
-explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
+static bool
+explain_ExecutorStart(QueryDesc *queryDesc, CachedPlan *cplan, int eflags)
{
+ bool plan_valid;
+
/*
* At the beginning of each top-level statement, decide whether we'll
* sample this statement. If nested-statement explaining is enabled,
@@ -296,9 +299,9 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
}
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, cplan, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, cplan, eflags);
if (auto_explain_enabled())
{
@@ -316,6 +319,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return plan_valid;
}
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 362d222f63..74d359f61b 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -329,7 +329,8 @@ static PlannedStmt *pgss_planner(Query *parse,
const char *query_string,
int cursorOptions,
ParamListInfo boundParams);
-static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static bool pgss_ExecutorStart(QueryDesc *queryDesc, CachedPlan *cplan,
+ int eflags);
static void pgss_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count, bool execute_once);
@@ -984,13 +985,15 @@ pgss_planner(Query *parse,
/*
* ExecutorStart hook: start up tracking if needed
*/
-static void
-pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
+static bool
+pgss_ExecutorStart(QueryDesc *queryDesc, CachedPlan *cplan, int eflags)
{
+ bool plan_valid;
+
if (prev_ExecutorStart)
- prev_ExecutorStart(queryDesc, eflags);
+ plan_valid = prev_ExecutorStart(queryDesc, cplan, eflags);
else
- standard_ExecutorStart(queryDesc, eflags);
+ plan_valid = standard_ExecutorStart(queryDesc, cplan, eflags);
/*
* If query has queryId zero, don't track it. This prevents double
@@ -1013,6 +1016,8 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
+
+ return plan_valid;
}
/*
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fc65d81e21..e7080f8953 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2137,7 +2137,11 @@ postgresEndForeignModify(EState *estate,
{
PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
- /* If fmstate is NULL, we are in EXPLAIN; nothing to do */
+ /*
+ * fmstate could be NULL under two conditions: during an EXPLAIN
+ * operation, or if BeginForeignModify() hasn't been invoked.
+ * In either case, no action is required.
+ */
if (fmstate == NULL)
return;
@@ -2671,7 +2675,11 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
/* Get info about foreign table. */
rtindex = node->resultRelInfo->ri_RangeTableIndex;
if (fsplan->scan.scanrelid == 0)
+ {
dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (unlikely(dmstate->rel == NULL || !ExecPlanStillValid(estate)))
+ return;
+ }
else
dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index eb1d3d8fbb..e88c3a6760 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -561,8 +561,11 @@ BeginCopyTo(ParseState *pstate,
* Call ExecutorStart to prepare the plan for execution.
*
* ExecutorStart computes a result tupdesc for us
+ *
+ * Plan can't become invalid, because there's no CachedPlan.
*/
- ExecutorStart(cstate->queryDesc, 0);
+ if (!ExecutorStart(cstate->queryDesc, NULL, 0))
+ elog(ERROR, "unexpected failure running ExecutorStart()");
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 0b629b1f79..720bc1c72f 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -328,8 +328,13 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, GetIntoRelEFlags(into));
+ /*
+ * call ExecutorStart to prepare the plan for execution
+ *
+ * Plan can't become invalid, because there's no CachedPlan.
+ */
+ if (!ExecutorStart(queryDesc, NULL, GetIntoRelEFlags(into)))
+ elog(ERROR, "unexpected failure running ExecutorStart()");
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5771aabf40..d2de259062 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -506,10 +506,11 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
}
- /* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL),
- es->memory ? &mem_counters : NULL);
+ /* run it (if needed) and produce output; no CachedPlan, no replanning! */
+ if (!ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
+ &planduration, (es->buffers ? &bufusage : NULL),
+ es->memory ? &mem_counters : NULL))
+ elog(ERROR, "unexpected failure to finish ExplainOnePlan()");
}
/*
@@ -613,9 +614,13 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* This is exported because it's called back from prepare.c in the
* EXPLAIN EXECUTE case, and because an index advisor plugin would need
* to call it.
+ *
+ * Returns true if execution succeeds, false otherwise. Latter only possible
+ * if cplan != NULL and gets invalidated during ExecutorStart().
*/
-void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+bool
+ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -685,8 +690,16 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
if (into)
eflags |= GetIntoRelEFlags(into);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
+ /*
+ * Call TryExecutorStart to prepare the plan for execution. A cached plan
+ * may get invalidated during plan intialization.
+ */
+ if (!TryExecutorStart(queryDesc, cplan, eflags))
+ {
+ /* Clean up. */
+ PopActiveSnapshot();
+ return false;
+ }
/* Execute the plan for statistics if asked for */
if (es->analyze)
@@ -798,6 +811,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
es);
ExplainCloseGroup("Query", NULL, true, es);
+
+ return true;
}
/*
@@ -5194,6 +5209,17 @@ ExplainDummyGroup(const char *objtype, const char *labelname, ExplainState *es)
}
}
+/*
+ * Discard output buffer for a fresh restart.
+ */
+void
+ExplainResetOutput(ExplainState *es)
+{
+ Assert(es->str);
+ resetStringInfo(es->str);
+ ExplainBeginOutput(es);
+}
+
/*
* Emit the start-of-output boilerplate.
*
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 1643c8c69a..c3003ae1c6 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -802,7 +802,11 @@ execute_sql_string(const char *sql)
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
- ExecutorStart(qdesc, 0);
+ /*
+ * Plan can't become invalid, because there's no CachedPlan.
+ */
+ if (!ExecutorStart(qdesc, NULL, 0))
+ elog(ERROR, "unexpected failure running ExecutorStart()");
ExecutorRun(qdesc, ForwardScanDirection, 0, true);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 91f0fd6ea3..ac4dfa9a71 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -442,8 +442,14 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, 0);
+ /*
+ * call ExecutorStart to prepare the plan for execution
+ *
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
+ */
+ if (!ExecutorStart(queryDesc, NULL, 0))
+ elog(ERROR, "unexpected failure running ExecutorStart()");
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0, true);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 4f6acf6719..cea3e8eca4 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -142,9 +142,11 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
/*
* Start execution, inserting parameters if any.
+ *
+ * Plan can't become invalid here, because there's no CachedPlan.
*/
- PortalStart(portal, params, 0, GetActiveSnapshot());
-
+ if (!PortalStart(portal, params, 0, GetActiveSnapshot(), NULL))
+ elog(ERROR, "unexpected failure running PortalStart()");
Assert(portal->strategy == PORTAL_ONE_SELECT);
/*
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 07257d4db9..bf4917f22f 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -180,6 +180,7 @@ ExecuteQuery(ParseState *pstate,
paramLI = EvaluateParams(pstate, entry, stmt->params, estate);
}
+replan:
/* Create a new portal to run the query in */
portal = CreateNewPortal();
/* Don't display the portal in pg_cursors, it is for internal use only */
@@ -248,9 +249,16 @@ ExecuteQuery(ParseState *pstate,
}
/*
- * Run the portal as appropriate.
+ * Run the portal as appropriate. If the portal has a cached plan and
+ * it's found to be invalidated during the initialization of its plan
+ * trees, the plan must be regenerated.
*/
- PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
+ if (!PortalStart(portal, paramLI, eflags, GetActiveSnapshot(), cplan))
+ {
+ Assert(cplan != NULL);
+ PortalDrop(portal, false);
+ goto replan;
+ }
(void) PortalRun(portal, count, false, true, dest, dest, qc);
@@ -571,7 +579,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
PreparedStatement *entry;
const char *query_string;
- CachedPlan *cplan;
+ CachedPlan *cplan = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -628,6 +636,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+replan:
cplan = GetCachedPlan(entry->plansource, paramLI,
CurrentResourceOwner, queryEnv);
@@ -655,9 +664,18 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL),
- es->memory ? &mem_counters : NULL);
+ {
+ if (!ExplainOnePlan(pstmt, cplan, into, es, query_string, paramLI,
+ queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL),
+ es->memory ? &mem_counters : NULL))
+ {
+ Assert(cplan != NULL);
+ ExplainResetOutput(es);
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ goto replan;
+ }
+ }
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 170360edda..bb410d7313 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5023,6 +5023,19 @@ AfterTriggerBeginQuery(void)
afterTriggers.query_depth++;
}
+/* ----------
+ * AfterTriggerCancelQuery()
+ *
+ * Called from ExecutorEnd() if the query execution was canceled.
+ * ----------
+ */
+void
+AfterTriggerCancelQuery(void)
+{
+ /* Set to a value denoting that no query is active. */
+ afterTriggers.query_depth = -1;
+}
+
/* ----------
* AfterTriggerEndQuery()
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 642d63be61..6cd840d3a7 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,34 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Typically, when the executor initializes a plan tree for execution, it doesn't
+lock non-index relations if the plan tree is freshly generated and not derived
+from a CachedPlan. This is because such locks have already been established
+during the query's parsing, rewriting, and planning phases. However, with a
+cached plan tree, there can be relations that remain unlocked. The function
+GetCachedPlan() locks relations existing in the query's range table pre-planning
+but doesn't account for those added during the planning phase. Consequently,
+inheritance child tables, introduced to the query's range table during planning,
+won't be locked when the cached plan reaches the executor.
+
+The decision to defer locking child tables with GetCachedPlan() arises from the
+fact that not all might be accessed during plan execution. For instance, if
+child tables are partitions, some might be omitted due to pruning at
+execution-initialization-time. Thus, the responsibility of locking these child
+tables is pushed to execution-initialization-time, taking place in ExecInitNode()
+for plan nodes encompassing these tables.
+
+This approach opens a window where a cached plan tree with child tables could
+become outdated if another backend modifies these tables before ExecInitNode()
+locks them. Given this, the executor has the added duty to confirm the plan
+tree's validity whenever it locks a child table post execution-initialization-
+pruning. This validation is done by checking the CachedPlan.is_valid attribute
+of the CachedPlan provided. If the plan tree is outdated (is_valid=false), the
+executor halts any further initialization and alerts the caller that they should
+retry execution with another freshly created plan tree.
Query Processing Control Flow
-----------------------------
@@ -316,7 +344,13 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale during one of the recursive calls of ExecInitNode() after taking a
+lock on a child table, the control is immmediately returned to the caller of
+ExecutorStart(), which must redo the steps from CreateQueryDesc with a new
+plan tree.
+
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 5e1b8a42e8..cf059cb850 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -58,6 +58,7 @@
#include "utils/backend_status.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
+#include "utils/plancache.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
@@ -72,7 +73,7 @@ ExecutorEnd_hook_type ExecutorEnd_hook = NULL;
ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
/* decls for local routines only used within this module */
-static void InitPlan(QueryDesc *queryDesc, int eflags);
+static bool InitPlan(QueryDesc *queryDesc, CachedPlan *cplan, int eflags);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
static void ExecEndPlan(PlanState *planstate, EState *estate);
@@ -112,6 +113,13 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
*
* eflags contains flag bits as described in executor.h.
*
+ * Plan initialization may fail if the input plan tree is found to have been
+ * invalidated, which can happen if it comes from a CachedPlan ('cplan').
+ *
+ * Returns true if plan was successfully initialized and false otherwise. If
+ * the latter, the caller must call ExecutorEnd() on 'queryDesc' to clean up
+ * after failed plan initialization.
+ *
* NB: the CurrentMemoryContext when this is called will become the parent
* of the per-query context used for this Executor invocation.
*
@@ -121,8 +129,8 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
*
* ----------------------------------------------------------------
*/
-void
-ExecutorStart(QueryDesc *queryDesc, int eflags)
+bool
+ExecutorStart(QueryDesc *queryDesc, CachedPlan *cplan, int eflags)
{
/*
* In some cases (e.g. an EXECUTE statement) a query execution will skip
@@ -133,14 +141,34 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
if (ExecutorStart_hook)
- (*ExecutorStart_hook) (queryDesc, eflags);
- else
- standard_ExecutorStart(queryDesc, eflags);
+ return (*ExecutorStart_hook) (queryDesc, cplan, eflags);
+
+ return standard_ExecutorStart(queryDesc, cplan, eflags);
}
-void
-standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
+/*
+ * Variant of ExecutorStart() that handles cleaning up if the input CachedPlan
+ * becomes invalid due to locks being taken during ExecutorStart().
+ */
+bool
+TryExecutorStart(QueryDesc *queryDesc, CachedPlan *cplan, int eflags)
{
+ bool plan_valid = ExecutorStart(queryDesc, cplan, eflags);
+
+ if (!plan_valid)
+ {
+ Assert(cplan != NULL);
+ ExecutorEnd(queryDesc);
+ FreeQueryDesc(queryDesc);
+ }
+
+ return plan_valid;
+}
+
+bool
+standard_ExecutorStart(QueryDesc *queryDesc, CachedPlan *cplan, int eflags)
+{
+ bool plan_valid;
EState *estate;
MemoryContext oldcontext;
@@ -259,9 +287,14 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Initialize the plan state tree
*/
- InitPlan(queryDesc, eflags);
+ plan_valid = InitPlan(queryDesc, cplan, eflags);
+
+ /* Mark execution as canceled if plan won't be executed. */
+ estate->es_canceled = !plan_valid;
MemoryContextSwitchTo(oldcontext);
+
+ return plan_valid;
}
/* ----------------------------------------------------------------
@@ -321,6 +354,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
estate = queryDesc->estate;
Assert(estate != NULL);
+ Assert(!estate->es_canceled);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* caller must ensure the query's snapshot is active */
@@ -428,7 +462,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* This should be run once and only once per Executor instance */
- Assert(!estate->es_finished);
+ Assert(!estate->es_finished && !estate->es_canceled);
/* Switch into per-query memory context */
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -487,11 +521,11 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
Assert(estate != NULL);
/*
- * Check that ExecutorFinish was called, unless in EXPLAIN-only mode. This
- * Assert is needed because ExecutorFinish is new as of 9.1, and callers
- * might forget to call it.
+ * Check that ExecutorFinish was called, unless in EXPLAIN-only mode or if
+ * execution was canceled. This Assert is needed because ExecutorFinish is
+ * new as of 9.1, and callers might forget to call it.
*/
- Assert(estate->es_finished ||
+ Assert(estate->es_finished || estate->es_canceled ||
(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -505,6 +539,14 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
UnregisterSnapshot(estate->es_snapshot);
UnregisterSnapshot(estate->es_crosscheck_snapshot);
+ /*
+ * Cancel trigger execution too if the query execution was canceled.
+ */
+ if (estate->es_canceled &&
+ !(estate->es_top_eflags &
+ (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
+ AfterTriggerCancelQuery();
+
/*
* Must switch out of context before destroying it
*/
@@ -834,24 +876,55 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
+/* Lock the relations found in PlannedStmt.elidedAppendPartRels. */
+static void
+ExecLockElidedAppendPartRels(EState *estate, List *elidedAppendPartRels)
+{
+ ListCell *lc;
+
+ foreach(lc, elidedAppendPartRels)
+ {
+ Bitmapset *partrelids = castNode(Bitmapset, lfirst(lc));
+ int rti = -1;
+
+ while ((rti = bms_next_member(partrelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rti, estate);
+
+ /*
+ * Don't lock any partitioned tables mentioned in the query,
+ * because they would already have been locked before entering the
+ * executor.
+ */
+ if (!rte->inFromCl)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ Assert(CheckRelationOidLockedByMe(rte->relid, rte->rellockmode, true));
+ }
+ }
+}
/* ----------------------------------------------------------------
* InitPlan
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
+ *
+ * Returns true if the plan tree is successfully initialized for execution,
+ * false otherwise. The latter case may occur if the CachedPlan that provides
+ * the plan tree ('cplan') got invalidated during the initialization.
* ----------------------------------------------------------------
*/
-static void
-InitPlan(QueryDesc *queryDesc, int eflags)
+static bool
+InitPlan(QueryDesc *queryDesc, CachedPlan *cplan, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
- PlanState *planstate;
- TupleDesc tupType;
+ PlanState *planstate = NULL;
+ TupleDesc tupType = NULL;
ListCell *l;
int i;
@@ -865,7 +938,16 @@ InitPlan(QueryDesc *queryDesc, int eflags)
*/
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
+ /*
+ * With range table in place, lock partitioned tables whose
+ * Append/MergeAppend nodes have been removed by the planner.
+ */
+ if (cplan != NULL)
+ ExecLockElidedAppendPartRels(estate,
+ plannedstmt->elidedAppendPartRels);
+
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = cplan;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
@@ -897,6 +979,9 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (unlikely(relation == NULL ||
+ !ExecPlanStillValid(estate)))
+ return false;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -967,6 +1052,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return false;
i++;
}
@@ -977,6 +1064,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return false;
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -1018,8 +1107,12 @@ InitPlan(QueryDesc *queryDesc, int eflags)
}
}
+ Assert(queryDesc->tupDesc == NULL);
queryDesc->tupDesc = tupType;
+ Assert(queryDesc->planstate == NULL);
queryDesc->planstate = planstate;
+
+ return true;
}
/*
@@ -2847,7 +2940,8 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* Child EPQ EStates share the parent's copy of unchanging state such as
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
- * result-rel info, etc.
+ * result-rel info, etc. Also, we don't pass the parent's copy of the
+ * CachedPlan, because no new locks will be taken for EvalPlanQual().
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
@@ -2936,6 +3030,14 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
subplanstate = ExecInitNode(subplan, rcestate, 0);
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
+
+ /*
+ * All the necessary locks must already have been taken when
+ * initializing the parent's copy of subplanstate, so the CachedPlan,
+ * if any, should not have become invalid during ExecInitNode().
+ */
+ if (!ExecPlanStillValid(rcestate))
+ elog(ERROR, "unexpected failure to initialize subplan in EvalPlanQualStart()");
}
/*
@@ -2977,6 +3079,10 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
*/
epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
+ /* See the comment above. */
+ if (!ExecPlanStillValid(rcestate))
+ elog(ERROR, "unexpected failure to initialize main plantree in EvalPlanQualStart()");
+
MemoryContextSwitchTo(oldcontext);
}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f995714d7f..e8ca60d143 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1439,7 +1439,12 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
- ExecutorStart(queryDesc, fpes->eflags);
+ /*
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
+ */
+ if (!ExecutorStart(queryDesc, NULL, fpes->eflags))
+ elog(ERROR, "unexpected failure running ExecutorStart()");
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7651886229..97a43513ce 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1935,6 +1935,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (unlikely(partrel == NULL || !ExecPlanStillValid(estate)))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 34f28dfece..7689d34dd0 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -136,6 +136,10 @@ static bool ExecShutdownNode_walker(PlanState *node, void *context);
* 'eflags' is a bitwise OR of flag bits described in executor.h
*
* Returns a PlanState node corresponding to the given Plan node.
+ *
+ * Callers should check upon returning that ExecPlanStillValid(estate)
+ * returns true before continuing further with its processing, because the
+ * returned PlanState might be only partially valid otherwise.
* ------------------------------------------------------------------------
*/
PlanState *
@@ -388,6 +392,9 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
break;
}
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return result;
+
ExecSetExecProcNode(result, result->ExecProcNode);
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 5737f9f4eb..edf1c24e0e 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -146,6 +146,7 @@ CreateExecutorState(void)
estate->es_top_eflags = 0;
estate->es_instrument = 0;
estate->es_finished = false;
+ estate->es_canceled = false;
estate->es_exprcontexts = NIL;
@@ -691,6 +692,8 @@ ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
*
* Open the heap relation to be scanned by a base-level scan plan node.
* This should be called during the node's ExecInit routine.
+ *
+ * NULL is returned if the relation is found to have been dropped.
* ----------------------------------------------------------------
*/
Relation
@@ -700,6 +703,8 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
/* Open the relation. */
rel = ExecGetRangeTableRelation(estate, scanrelid);
+ if (unlikely(rel == NULL || !ExecPlanStillValid(estate)))
+ return NULL;
/*
* Complain if we're attempting a scan of an unscannable relation, except
@@ -717,6 +722,26 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
return rel;
}
+/* ----------------------------------------------------------------
+ * ExecOpenScanIndexRelation
+ *
+ * Open the index relation to be scanned by an index scan plan node.
+ * This should be called during the node's ExecInit routine.
+ * ----------------------------------------------------------------
+ */
+Relation
+ExecOpenScanIndexRelation(EState *estate, Oid indexid, int lockmode)
+{
+ Relation rel;
+
+ /* Open the index. */
+ rel = index_open(indexid, lockmode);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ elog(DEBUG2, "CachedPlan invalidated on locking index %u", indexid);
+
+ return rel;
+}
+
/*
* ExecInitRangeTable
* Set up executor's range-table-related data
@@ -757,6 +782,9 @@ ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
* Open the Relation for a range table entry, if not already done
*
* The Relations will be closed again in ExecEndPlan().
+ *
+ * Returned value may be NULL if the relation is a child relation that is not
+ * already locked.
*/
Relation
ExecGetRangeTableRelation(EState *estate, Index rti)
@@ -773,7 +801,31 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (IsParallelWorker() ||
+ (estate->es_cachedplan != NULL && !rte->inFromCl))
+ {
+ /*
+ * Take a lock if we are a parallel worker or if this is a child
+ * table referenced in a cached plan.
+ *
+ * Parallel workers need to have their own local lock on the
+ * relation. This ensures sane behavior in case the parent process
+ * exits before we do.
+ *
+ * When executing a cached plan, child tables must be locked
+ * here, because plancache.c (GetCachedPlan()) would only have
+ * locked tables mentioned in the query, that is, tables whose
+ * RTEs' inFromCl is true.
+ *
+ * Note that we use try_table_open() here, because without a lock
+ * held on the relation, it may have disappeared from under us.
+ */
+ rel = try_table_open(rte->relid, rte->rellockmode);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ elog(DEBUG2, "CachedPlan invalidated on locking relation %u",
+ rte->relid);
+ }
+ else
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -786,15 +838,6 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rellockmode == AccessShareLock ||
CheckRelationLockedByMe(rel, rte->rellockmode, false));
}
- else
- {
- /*
- * If we are a parallel worker, we need to obtain our own local
- * lock on the relation. This ensures sane behavior in case the
- * parent process exits before we do.
- */
- rel = table_open(rte->relid, rte->rellockmode);
- }
estate->es_relations[rti - 1] = rel;
}
@@ -802,6 +845,38 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
return rel;
}
+/*
+ * ExecLockAppendPartRels
+ * Lock non-leaf partitions whose child partitions are scanned by a given
+ * Append/MergeAppend node
+ */
+void
+ExecLockAppendPartRels(EState *estate, List *allpartrelids)
+{
+ ListCell *l;
+
+ foreach(l, allpartrelids)
+ {
+ Bitmapset *partrelids = lfirst_node(Bitmapset, l);
+ int rti = -1;
+
+ while ((rti = bms_next_member(partrelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rti, estate);
+
+ /*
+ * Don't lock any partitioned tables mentioned in the query,
+ * because they would already have been locked before entering the
+ * executor.
+ */
+ if (!rte->inFromCl)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ Assert(CheckRelationOidLockedByMe(rte->relid, rte->rellockmode, true));
+ }
+ }
+}
+
/*
* ExecInitResultRelation
* Open relation given by the passed-in RT index and fill its
@@ -817,6 +892,9 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (unlikely(resultRelationDesc == NULL ||
+ !ExecPlanStillValid(estate)))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 692854e2b3..8f65242f33 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -864,7 +864,13 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
eflags = EXEC_FLAG_SKIP_TRIGGERS;
else
eflags = 0; /* default run-to-completion flags */
- ExecutorStart(es->qd, eflags);
+
+ /*
+ * OK to ignore the return value; plan can't become invalid,
+ * because there's no CachedPlan.
+ */
+ if (!ExecutorStart(es->qd, NULL, eflags))
+ elog(ERROR, "unexpected failure running ExecutorStart()");
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 0dfba5ca16..8c40d8c520 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3303,6 +3303,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return aggstate;
/*
* initialize source tuple type.
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 86d75b1a7e..72437f729f 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -133,6 +133,20 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_syncdone = false;
appendstate->as_begun = false;
+ /*
+ * Lock non-leaf partitions whose leaf children are present in
+ * node->appendplans. Only need to do so if executing a cached
+ * plan, because child tables present in cached plans are not
+ * locked before execution.
+ *
+ * XXX - some of the non-leaf partitions may also be mentioned in
+ * part_prune_info, which would get locked again in
+ * ExecInitPartitionPruning() because it calls
+ * ExecGetRangeTableRelation() which locks child tables.
+ */
+ if (estate->es_cachedplan)
+ ExecLockAppendPartRels(estate, node->allpartrelids);
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
@@ -185,8 +199,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->ps.resultopsset = true;
appendstate->ps.resultopsfixed = false;
- appendplanstates = (PlanState **) palloc(nplans *
- sizeof(PlanState *));
+ appendplanstates = (PlanState **) palloc0(nplans *
+ sizeof(PlanState *));
+ appendstate->appendplans = appendplanstates;
+ appendstate->as_nplans = nplans;
/*
* call ExecInitNode on each of the valid plans to be executed and save
@@ -221,11 +237,11 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return appendstate;
}
appendstate->as_first_partial_plan = firstvalid;
- appendstate->appendplans = appendplanstates;
- appendstate->as_nplans = nplans;
/* Initialize async state */
appendstate->as_asyncplans = asyncplans;
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index ae391222bf..168c440692 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -89,6 +89,8 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return bitmapandstate;
i++;
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 19f18ab817..b13cae1cbb 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -754,11 +754,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(currentRelation == NULL || !ExecPlanStillValid(estate)))
+ return scanstate;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 4669e8d0ce..f04a53e9be 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -252,7 +252,11 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
+ indexstate->biss_RelationDesc = ExecOpenScanIndexRelation(estate,
+ node->indexid,
+ lockmode);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index de439235d2..980b68dd82 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -90,6 +90,8 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return bitmaporstate;
i++;
}
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index e559cd2346..2a7c5dccd8 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -58,6 +58,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (unlikely(scan_rel == NULL || !ExecPlanStillValid(estate)))
+ return css;
css->ss.ss_currentRelation = scan_rel;
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 1357ccf3c9..90d5878ae3 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -172,6 +172,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (unlikely(currentRelation == NULL || !ExecPlanStillValid(estate)))
+ return scanstate;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -263,6 +265,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/*
* Tell the FDW to initialize the scan.
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index cae5ea1f92..67548aa7ba 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -84,6 +84,8 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return gatherstate;
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index b36cd89e7d..cf0e074359 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -103,6 +103,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return gm_state;
/*
* Leader may access ExecProcNode result directly (if
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 807429e504..6d0fd9e7b4 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -184,6 +184,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return grpstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index dbf4920363..df55e697c0 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -385,6 +385,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return hashstate;
/*
* initialize our result slot and type. No need to build projection
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 592c098b9f..5e11975702 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -760,8 +760,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return hjstate;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return hjstate;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 010bcfafa8..af723ea755 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1040,6 +1040,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return incrsortstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 481d479760..109a90fe74 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -531,6 +531,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(currentRelation == NULL || !ExecPlanStillValid(estate)))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -583,7 +585,9 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexRelation = index_open(node->indexid, lockmode);
+ indexRelation = ExecOpenScanIndexRelation(estate, node->indexid, lockmode);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return indexstate;
indexstate->ioss_RelationDesc = indexRelation;
/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a8172d8b82..db28aeb3d6 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -907,6 +907,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(currentRelation == NULL || !ExecPlanStillValid(estate)))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -951,7 +953,11 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+ indexstate->iss_RelationDesc = ExecOpenScanIndexRelation(estate,
+ node->indexid,
+ lockmode);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index eb7b6e52be..369c904577 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -475,6 +475,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return limitstate;
/*
* initialize child expressions
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 0d3489195b..9077858413 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -322,6 +322,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return lrstate;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 883e3f3933..972962d44d 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return matstate;
/*
* Initialize result type and slot. No need to initialize projection info
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 690dee1daa..6aaab743b5 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -973,6 +973,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mstate;
/*
* Initialize return slot and type. No need to initialize projection info
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 3236444cf1..6e00c90916 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -81,6 +81,20 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.state = estate;
mergestate->ps.ExecProcNode = ExecMergeAppend;
+ /*
+ * Lock non-leaf partitions whose leaf children are present in
+ * node->mergeplans. Only need to do so if executing a cached
+ * plan, because child tables present in cached plans are not
+ * locked before execution.
+ *
+ * XXX - some of the non-leaf partitions may also be mentioned in
+ * part_prune_info, which would get locked again in
+ * ExecInitPartitionPruning() because it calls
+ * ExecGetRangeTableRelation() which locks child tables.
+ */
+ if (estate->es_cachedplan)
+ ExecLockAppendPartRels(estate, node->allpartrelids);
+
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
@@ -120,7 +134,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ms_prune_state = NULL;
}
- mergeplanstates = (PlanState **) palloc(nplans * sizeof(PlanState *));
+ mergeplanstates = (PlanState **) palloc0(nplans * sizeof(PlanState *));
mergestate->mergeplans = mergeplanstates;
mergestate->ms_nplans = nplans;
@@ -151,6 +165,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mergestate;
}
mergestate->ps.ps_ProjInfo = NULL;
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 926e631d88..53cb1ff207 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1490,11 +1490,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mergestate;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mergestate;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 062a780f29..aff089799d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4274,6 +4274,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ /*
+ * ExecInitResultRelation() may have returned without initializing
+ * rootResultRelInfo if the plan got invalidated, so check.
+ */
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mtstate;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
node->epqParam, node->resultRelations);
@@ -4306,6 +4313,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ /* See the comment above. */
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mtstate;
+
/*
* For child result relations, store the root result relation
* pointer. We do so for the convenience of places that want to
@@ -4332,6 +4343,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mtstate;
/*
* Do additional per-result-relation initialization.
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index 01f3d56a3b..34eafbb6e0 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -294,11 +294,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return nlstate;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return nlstate;
/*
* Initialize result slot, type and projection.
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index ca9a5e2ed2..f834499479 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -254,6 +254,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return state;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index 7680142c7b..5dd3285c41 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return rustate;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return rustate;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index e3cfc9b772..7d7c2aa786 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -207,6 +207,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return resstate;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6ab91001bc..3afdaeecd7 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -121,6 +121,9 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (unlikely(scanstate->ss.ss_currentRelation == NULL ||
+ !ExecPlanStillValid(estate)))
+ return scanstate;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index b052775e5b..f7fb64a4a2 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,9 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (unlikely(scanstate->ss.ss_currentRelation == NULL ||
+ !ExecPlanStillValid(estate)))
+ return scanstate;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index fe34b2134f..2231d8b82f 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return setopstate;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index af852464d0..fb76e4c01b 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return sortstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 0b2612183a..b5b538fa91 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return subquerystate;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 702ee884d2..a76836d021 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -377,6 +377,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(currentRelation == NULL || !ExecPlanStillValid(estate)))
+ return tidrangestate;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index f375951699..088babf572 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -522,6 +522,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(currentRelation == NULL || !ExecPlanStillValid(estate)))
+ return tidstate;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index b82d0e9ad5..cb46b2d5d0 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -135,6 +135,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return uniquestate;
/*
* Initialize result slot and type. Unique nodes do no projections, so
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 561d7e731d..1b96f51fe8 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2464,6 +2464,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return winstate;
/*
* initialize source tuple type (which is also the tuple type that we'll
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index d6516b1bca..5885a1d056 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -70,7 +70,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1581,6 +1581,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
Snapshot snapshot;
MemoryContext oldcontext;
Portal portal;
+ bool plan_valid;
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -1622,6 +1623,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
_SPI_current->processed = 0;
_SPI_current->tuptable = NULL;
+replan:
/* Create the portal */
if (name == NULL || name[0] == '\0')
{
@@ -1765,15 +1767,24 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
}
/*
- * Start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if the cached plan was found to have been invalidated when
+ * initializing one of the plan trees contained in it.
*/
- PortalStart(portal, paramLI, 0, snapshot);
+ plan_valid = PortalStart(portal, paramLI, 0, snapshot, cplan);
Assert(portal->strategy != PORTAL_MULTI_QUERY);
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
+ if (!plan_valid)
+ {
+ Assert(cplan != NULL);
+ PortalDrop(portal, false);
+ goto replan;
+ }
+
/* Pop the SPI stack */
_SPI_end_call(true);
@@ -2568,6 +2579,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+replan:
cplan = GetCachedPlan(plansource, options->params,
plan_owner, _SPI_current->queryEnv);
@@ -2677,6 +2689,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
QueryDesc *qdesc;
Snapshot snap;
+ int eflags;
if (ActiveSnapshotSet())
snap = GetActiveSnapshot();
@@ -2690,8 +2703,20 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ /* Select execution options */
+ if (fire_triggers)
+ eflags = 0; /* default run-to-completion flags */
+ else
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+
+ if (!TryExecutorStart(qdesc, cplan, eflags))
+ {
+ ReleaseCachedPlan(cplan, plan_owner);
+ goto replan;
+ }
+
+ res = _SPI_pquery(qdesc, canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2865,10 +2890,9 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, uint64 tcount)
{
int operation = queryDesc->operation;
- int eflags;
int res;
switch (operation)
@@ -2915,14 +2939,6 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
ResetUsage();
#endif
- /* Select execution options */
- if (fire_triggers)
- eflags = 0; /* default run-to-completion flags */
- else
- eflags = EXEC_FLAG_SKIP_TRIGGERS;
-
- ExecutorStart(queryDesc, eflags);
-
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
_SPI_current->processed = queryDesc->estate->es_processed;
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 4797312ae5..be6e4ddfdf 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -493,6 +493,13 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
}
else
childrte->inh = false;
+
+ /*
+ * Flag child tables as indirectly referenced in the query. This is to
+ * allow ExecGetRangeTableRelation() recognize them as inheritance
+ * child tables.
+ */
+ childrte->inFromCl = false;
childrte->securityQuals = NIL;
/* No permission checking for child RTEs. */
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index e901203424..fe7ad7b82f 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -3327,10 +3327,9 @@ transformLockingClause(ParseState *pstate, Query *qry, LockingClause *lc,
/*
* Lock all regular tables used in query and its subqueries. We
* examine inFromCl to exclude auto-added RTEs, particularly NEW/OLD
- * in rules. This is a bit of an abuse of a mostly-obsolete flag, but
- * it's convenient. We can't rely on the namespace mechanism that has
- * largely replaced inFromCl, since for example we need to lock
- * base-relation RTEs even if they are masked by upper joins.
+ * in rules. We can't rely on the namespace mechanism since for
+ * example we need to lock base-relation RTEs even if they are masked
+ * by upper joins.
*/
i = 0;
foreach(rt, qry->rtable)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bc6bea113..4f2196aaaf 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1241,8 +1241,11 @@ exec_simple_query(const char *query_string)
/*
* Start the portal. No parameters here.
+ *
+ * Plan can't become invalid here, because there's no CachedPlan.
*/
- PortalStart(portal, NULL, 0, InvalidSnapshot);
+ if (!PortalStart(portal, NULL, 0, InvalidSnapshot, NULL))
+ elog(ERROR, "unexpected failure running PortalStart()");
/*
* Select the appropriate output format: text unless we are doing a
@@ -1747,6 +1750,7 @@ exec_bind_message(StringInfo input_message)
"commands ignored until end of transaction block"),
errdetail_abort()));
+replan:
/*
* Create the portal. Allow silent replacement of an existing portal only
* if the unnamed portal is specified.
@@ -2034,9 +2038,16 @@ exec_bind_message(StringInfo input_message)
PopActiveSnapshot();
/*
- * And we're ready to start portal execution.
+ * Start portal execution. If the portal contains a cached plan, it must
+ * be recreated if the cached plan was found to have been invalidated when
+ * initializing one of the plan trees contained in it.
*/
- PortalStart(portal, params, 0, InvalidSnapshot);
+ if (!PortalStart(portal, params, 0, InvalidSnapshot, cplan))
+ {
+ Assert(cplan != NULL);
+ PortalDrop(portal, false);
+ goto replan;
+ }
/*
* Apply the result format requests to the portal.
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index a1f8d03db1..ea33da4c2a 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -34,8 +35,7 @@
*/
Portal ActivePortal = NULL;
-
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(QueryDesc *queryDesc,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -115,13 +115,12 @@ FreeQueryDesc(QueryDesc *qdesc)
pfree(qdesc);
}
-
/*
* ProcessQuery
* Execute a single plannable query within a PORTAL_MULTI_QUERY,
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
- * plan: the plan tree for the query
+ * queryDesc: QueryDesc created in PortalStart()
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -133,26 +132,14 @@ FreeQueryDesc(QueryDesc *qdesc)
* error; otherwise the executor's memory usage will be leaked.
*/
static void
-ProcessQuery(PlannedStmt *plan,
+ProcessQuery(QueryDesc *queryDesc,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
DestReceiver *dest,
QueryCompletion *qc)
{
- QueryDesc *queryDesc;
-
- /*
- * Create the QueryDesc object
- */
- queryDesc = CreateQueryDesc(plan, sourceText,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
-
- /*
- * Call ExecutorStart to prepare the plan for execution
- */
- ExecutorStart(queryDesc, 0);
+ queryDesc->dest = dest;
/*
* Run the plan to completion.
@@ -426,19 +413,22 @@ FetchStatementTargetList(Node *stmt)
* presently ignored for non-PORTAL_ONE_SELECT portals (it's only intended
* to be used for cursors).
*
- * On return, portal is ready to accept PortalRun() calls, and the result
- * tupdesc (if any) is known.
+ * True is returned if portal is ready to accept PortalRun() calls, and the
+ * result tupdesc (if any) is known. False if the plan tree is no longer
+ * valid, in which case, the caller must retry after generating a new
+ * CachedPlan.
*/
-void
+bool
PortalStart(Portal portal, ParamListInfo params,
- int eflags, Snapshot snapshot)
+ int eflags, Snapshot snapshot,
+ CachedPlan *cplan)
{
Portal saveActivePortal;
ResourceOwner saveResourceOwner;
- MemoryContext savePortalContext;
MemoryContext oldContext;
QueryDesc *queryDesc;
- int myeflags;
+ int myeflags = 0;
+ bool plan_valid = true;
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_DEFINED);
@@ -448,15 +438,13 @@ PortalStart(Portal portal, ParamListInfo params,
*/
saveActivePortal = ActivePortal;
saveResourceOwner = CurrentResourceOwner;
- savePortalContext = PortalContext;
PG_TRY();
{
ActivePortal = portal;
if (portal->resowner)
CurrentResourceOwner = portal->resowner;
- PortalContext = portal->portalContext;
- oldContext = MemoryContextSwitchTo(PortalContext);
+ oldContext = MemoryContextSwitchTo(portal->queryContext);
/* Must remember portal param list, if any */
portal->portalParams = params;
@@ -472,6 +460,8 @@ PortalStart(Portal portal, ParamListInfo params,
switch (portal->strategy)
{
case PORTAL_ONE_SELECT:
+ case PORTAL_ONE_RETURNING:
+ case PORTAL_ONE_MOD_WITH:
/* Must set snapshot before starting executor. */
if (snapshot)
@@ -489,8 +479,8 @@ PortalStart(Portal portal, ParamListInfo params,
*/
/*
- * Create QueryDesc in portal's context; for the moment, set
- * the destination to DestNone.
+ * Create QueryDesc in portal->queryContext; for the moment,
+ * set the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
portal->sourceText,
@@ -501,30 +491,49 @@ PortalStart(Portal portal, ParamListInfo params,
portal->queryEnv,
0);
+ /* Remember for PortalRunMulti(). */
+ if (portal->strategy == PORTAL_ONE_RETURNING ||
+ portal->strategy == PORTAL_ONE_MOD_WITH)
+ portal->qdescs = list_make1(queryDesc);
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
* might've asked for.
*/
- if (portal->cursorOptions & CURSOR_OPT_SCROLL)
+ if (portal->strategy == PORTAL_ONE_SELECT &&
+ (portal->cursorOptions & CURSOR_OPT_SCROLL))
myeflags = eflags | EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD;
else
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Call TryExecutorStart to prepare the plan for execution. A
+ * cached plan may get invalidated during plan intialization.
*/
- ExecutorStart(queryDesc, myeflags);
+ if (!TryExecutorStart(queryDesc, cplan, myeflags))
+ {
+ PopActiveSnapshot();
+ plan_valid = false;
+ goto plan_init_failed;
+ }
/*
- * This tells PortalCleanup to shut down the executor
+ * This tells PortalCleanup to shut down the executor, though
+ * not needed for queries handled by PortalRunMulti().
*/
- portal->queryDesc = queryDesc;
+ if (portal->strategy == PORTAL_ONE_SELECT)
+ portal->queryDesc = queryDesc;
/*
- * Remember tuple descriptor (computed by ExecutorStart)
+ * Remember tuple descriptor (computed by ExecutorStart),
+ * though make it independent of QueryDesc for queries handled
+ * by PortalRunMulti().
*/
- portal->tupDesc = queryDesc->tupDesc;
+ if (portal->strategy != PORTAL_ONE_SELECT)
+ portal->tupDesc = CreateTupleDescCopy(queryDesc->tupDesc);
+ else
+ portal->tupDesc = queryDesc->tupDesc;
/*
* Reset cursor position data to "start of query"
@@ -536,29 +545,6 @@ PortalStart(Portal portal, ParamListInfo params,
PopActiveSnapshot();
break;
- case PORTAL_ONE_RETURNING:
- case PORTAL_ONE_MOD_WITH:
-
- /*
- * We don't start the executor until we are told to run the
- * portal. We do need to set up the result tupdesc.
- */
- {
- PlannedStmt *pstmt;
-
- pstmt = PortalGetPrimaryStmt(portal);
- portal->tupDesc =
- ExecCleanTypeFromTL(pstmt->planTree->targetlist);
- }
-
- /*
- * Reset cursor position data to "start of query"
- */
- portal->atStart = true;
- portal->atEnd = false; /* allow fetches */
- portal->portalPos = 0;
- break;
-
case PORTAL_UTIL_SELECT:
/*
@@ -581,7 +567,79 @@ PortalStart(Portal portal, ParamListInfo params,
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+ {
+ ListCell *lc;
+ bool first = true;
+
+ myeflags = eflags;
+ foreach(lc, portal->stmts)
+ {
+ PlannedStmt *plan = lfirst_node(PlannedStmt, lc);
+ bool is_utility = (plan->utilityStmt != NULL);
+
+ /*
+ * Push the snapshot to be used by the executor.
+ */
+ if (!is_utility)
+ {
+ /*
+ * Must copy the snapshot for all statements
+ * except thec first as we'll need to update its
+ * command ID.
+ */
+ if (!first)
+ PushCopiedSnapshot(GetTransactionSnapshot());
+ else
+ PushActiveSnapshot(GetTransactionSnapshot());
+ }
+
+ /*
+ * From the 2nd statement onwards, update the command
+ * ID and the snapshot to match.
+ */
+ if (!first)
+ {
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+ }
+
+ first = false;
+
+ /*
+ * Create the QueryDesc. DestReceiver will be set in
+ * PortalRunMulti() before calling ExecutorRun().
+ */
+ queryDesc = CreateQueryDesc(plan,
+ portal->sourceText,
+ !is_utility ?
+ GetActiveSnapshot() :
+ InvalidSnapshot,
+ InvalidSnapshot,
+ NULL,
+ params,
+ portal->queryEnv, 0);
+
+ /* Remember for PortalRunMulti() */
+ portal->qdescs = lappend(portal->qdescs, queryDesc);
+
+ if (is_utility)
+ continue;
+
+ /*
+ * Call ExecutorStart to prepare the plan for
+ * execution. A cached plan may get invalidated
+ * during plan intialization.
+ */
+ if (!ExecutorStart(queryDesc, cplan, myeflags))
+ {
+ PopActiveSnapshot();
+ plan_valid = false;
+ goto plan_init_failed;
+ }
+ PopActiveSnapshot();
+ }
+ }
+
portal->tupDesc = NULL;
break;
}
@@ -594,19 +652,20 @@ PortalStart(Portal portal, ParamListInfo params,
/* Restore global vars and propagate error */
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
PG_RE_THROW();
}
PG_END_TRY();
+ portal->status = PORTAL_READY;
+
+plan_init_failed:
MemoryContextSwitchTo(oldContext);
ActivePortal = saveActivePortal;
CurrentResourceOwner = saveResourceOwner;
- PortalContext = savePortalContext;
- portal->status = PORTAL_READY;
+ return plan_valid;
}
/*
@@ -1193,8 +1252,8 @@ PortalRunMulti(Portal portal,
DestReceiver *dest, DestReceiver *altdest,
QueryCompletion *qc)
{
- bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ bool holdSnapshotSet = false;
+ ListCell *qdesc_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1215,9 +1274,10 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ foreach(qdesc_item, portal->qdescs)
{
- PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ QueryDesc *qdesc = (QueryDesc *) lfirst(qdesc_item);
+ PlannedStmt *pstmt = qdesc->plannedstmt;
/*
* If we got a cancel signal in prior command, quit
@@ -1234,48 +1294,33 @@ PortalRunMulti(Portal portal,
if (log_executor_stats)
ResetUsage();
+ /* Push the snapshot for plannable queries. */
+ PushActiveSnapshot(qdesc->snapshot);
+
/*
- * Must always have a snapshot for plannable queries. First time
- * through, take a new snapshot; for subsequent queries in the
- * same portal, just update the snapshot's copy of the command
- * counter.
+ * If told to, register the snapshot and save in portal
+ *
+ * Note that the command ID of qdesc->snapshot for 2nd query
+ * onwards would have been updated in PortalStart() to account
+ * for CCI() done between queries, but it's okay for the command
+ * ID of the active snapshot to diverge from what holdSnapshot
+ * has.
*/
- if (!active_snapshot_set)
+ if (setHoldSnapshot && !holdSnapshotSet)
{
- Snapshot snapshot = GetTransactionSnapshot();
-
- /* If told to, register the snapshot and save in portal */
- if (setHoldSnapshot)
- {
- snapshot = RegisterSnapshot(snapshot);
- portal->holdSnapshot = snapshot;
- }
-
- /*
- * We can't have the holdSnapshot also be the active one,
- * because UpdateActiveSnapshotCommandId would complain. So
- * force an extra snapshot copy. Plain PushActiveSnapshot
- * would have copied the transaction snapshot anyway, so this
- * only adds a copy step when setHoldSnapshot is true. (It's
- * okay for the command ID of the active snapshot to diverge
- * from what holdSnapshot has.)
- */
- PushCopiedSnapshot(snapshot);
-
- /*
- * As for PORTAL_ONE_SELECT portals, it does not seem
- * necessary to maintain portal->portalSnapshot here.
- */
-
- active_snapshot_set = true;
+ portal->holdSnapshot = RegisterSnapshot(qdesc->snapshot);
+ holdSnapshotSet = true;
}
- else
- UpdateActiveSnapshotCommandId();
+
+ /*
+ * As for PORTAL_ONE_SELECT portals, it does not seem
+ * necessary to maintain portal->portalSnapshot here.
+ */
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(qdesc,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1284,13 +1329,15 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(qdesc,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
altdest, NULL);
}
+ PopActiveSnapshot();
+
if (log_executor_stats)
ShowUsage("EXECUTOR STATISTICS");
@@ -1311,7 +1358,6 @@ PortalRunMulti(Portal portal,
*/
if (pstmt->canSetTag)
{
- Assert(!active_snapshot_set);
/* statement can set tag string */
PortalRunUtility(portal, pstmt, isTopLevel, false,
dest, qc);
@@ -1342,19 +1388,8 @@ PortalRunMulti(Portal portal,
*/
if (portal->stmts == NIL)
break;
-
- /*
- * Increment command counter between queries, but not after the last
- * one.
- */
- if (lnext(portal->stmts, stmtlist_item) != NULL)
- CommandCounterIncrement();
}
- /* Pop the snapshot if we pushed one. */
- if (active_snapshot_set)
- PopActiveSnapshot();
-
/*
* If a query completion data was supplied, use it. Otherwise use the
* portal's query completion data.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 4a24613537..2d5ba638f7 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -200,6 +200,13 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
portal->portalContext = AllocSetContextCreate(TopPortalContext,
"PortalContext",
ALLOCSET_SMALL_SIZES);
+ /*
+ * initialize portal's query context to store QueryDescs created during
+ * PortalStart() and then used in PortalRun().
+ */
+ portal->queryContext = AllocSetContextCreate(TopPortalContext,
+ "PortalQueryContext",
+ ALLOCSET_SMALL_SIZES);
/* create a resource owner for the portal */
portal->resowner = ResourceOwnerCreate(CurTransactionResourceOwner,
@@ -223,6 +230,7 @@ CreatePortal(const char *name, bool allowDup, bool dupSilent)
/* for named portals reuse portal->name copy */
MemoryContextSetIdentifier(portal->portalContext, portal->name[0] ? portal->name : "<unnamed>");
+ MemoryContextSetIdentifier(portal->queryContext, portal->name[0] ? portal->name : "<unnamed>");
return portal;
}
@@ -593,6 +601,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* release subsidiary storage */
MemoryContextDelete(portal->portalContext);
+ MemoryContextDelete(portal->queryContext);
/* release portal struct (it's in TopPortalContext) */
pfree(portal);
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9b8b351d9a..78292c69f9 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -101,7 +101,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern bool ExplainOnePlan(PlannedStmt *stmt, CachedPlan *cplan,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
@@ -118,6 +119,7 @@ extern void ExplainQueryParameters(ExplainState *es, ParamListInfo params, int m
extern void ExplainBeginOutput(ExplainState *es);
extern void ExplainEndOutput(ExplainState *es);
+extern void ExplainResetOutput(ExplainState *es);
extern void ExplainSeparatePlans(ExplainState *es);
extern void ExplainPropertyList(const char *qlabel, List *data,
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 8a5a9fe642..d3e6cf24cb 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -257,6 +257,7 @@ extern void ExecASTruncateTriggers(EState *estate,
extern void AfterTriggerBeginXact(void);
extern void AfterTriggerBeginQuery(void);
+extern void AfterTriggerCancelQuery(void);
extern void AfterTriggerEndQuery(EState *estate);
extern void AfterTriggerFireDeferred(void);
extern void AfterTriggerEndXact(bool isCommit);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 9770752ea3..1dbc0997ae 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -72,7 +73,8 @@
/* Hook for plugins to get control in ExecutorStart() */
-typedef void (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
+typedef bool (*ExecutorStart_hook_type) (QueryDesc *queryDesc, CachedPlan *cplan,
+ int eflags);
extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
/* Hook for plugins to get control in ExecutorRun() */
@@ -197,8 +199,10 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
-extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
-extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool ExecutorStart(QueryDesc *queryDesc, CachedPlan *cplan, int eflags);
+extern bool TryExecutorStart(QueryDesc *queryDesc, CachedPlan *cplan, int eflags);
+extern bool standard_ExecutorStart(QueryDesc *queryDesc, CachedPlan *cplan,
+ int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count, bool execute_once);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
@@ -257,6 +261,15 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the CachedPlan in es_cachedplan still valid?
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanStillValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
@@ -578,6 +591,7 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
+extern Relation ExecOpenScanIndexRelation(EState *estate, Oid indexid, int lockmode);
extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos);
extern void ExecCloseRangeTableRelations(EState *estate);
@@ -590,6 +604,7 @@ exec_rt_fetch(Index rti, EState *estate)
}
extern Relation ExecGetRangeTableRelation(EState *estate, Index rti);
+extern void ExecLockAppendPartRels(EState *estate, List *allpartrelids);
extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Index rti);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index c3670f7158..4bc6d9d461 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -631,6 +631,8 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct CachedPlan *es_cachedplan; /* CachedPlan if plannedstmt is from
+ * one, or NULL if not */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -676,6 +678,9 @@ typedef struct EState
int es_top_eflags; /* eflags passed to ExecutorStart */
int es_instrument; /* OR of InstrumentOption flags */
bool es_finished; /* true when ExecutorFinish is done */
+ bool es_canceled; /* true when execution was canceled
+ * upon encountering that plan was invalided
+ * during ExecInitNode() */
List *es_exprcontexts; /* List of ExprContexts within EState */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 85a62b538e..e8f14982c0 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1009,11 +1009,15 @@ typedef struct PartitionCmd
*
* inFromCl marks those range variables that are listed in the FROM clause.
* It's false for RTEs that are added to a query behind the scenes, such
- * as the NEW and OLD variables for a rule, or the subqueries of a UNION.
+ * as the NEW and OLD variables for a rule, or the subqueries of a UNION,
+ * or the RTEs of inheritance child tables that are added by the planner.
* This flag is not used during parsing (except in transformLockingClause,
* q.v.); the parser now uses a separate "namespace" data structure to
* control visibility. But it is needed by ruleutils.c to determine
- * whether RTEs should be shown in decompiled queries.
+ * whether RTEs should be shown in decompiled queries. The executor uses
+ * this to ascertain if an RTE_RELATION entry is for a table explicitly
+ * named in the query or a child table added by the planner. This
+ * distinction is vital when child tables in a plan must be locked.
*
* securityQuals is a list of security barrier quals (boolean expressions),
* to be tested in the listed order before returning a row from the
diff --git a/src/include/tcop/pquery.h b/src/include/tcop/pquery.h
index 073fb323bc..274510598f 100644
--- a/src/include/tcop/pquery.h
+++ b/src/include/tcop/pquery.h
@@ -29,8 +29,9 @@ extern List *FetchPortalTargetList(Portal portal);
extern List *FetchStatementTargetList(Node *stmt);
-extern void PortalStart(Portal portal, ParamListInfo params,
- int eflags, Snapshot snapshot);
+extern bool PortalStart(Portal portal, ParamListInfo params,
+ int eflags, Snapshot snapshot,
+ CachedPlan *cplan);
extern void PortalSetResultFormat(Portal portal, int nFormats,
int16 *formats);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a90dfdf906..f88e2abad2 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -223,6 +223,20 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+
+/*
+ * CachedPlanStillValid
+ * Returns if a cached generic plan is still valid
+ *
+ * Invoked by the executor for each relation lock acquired during the
+ * initialization of the plan tree within the CachedPlan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 29f49829f2..c0707c4876 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,8 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *qdescs; /* list of QueryDescs */
+ MemoryContext queryContext; /* memory for QueryDescs and children */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
--
2.43.0
v50-0006-Track-opened-range-table-relations-in-a-List-in-.patchapplication/octet-stream; name=v50-0006-Track-opened-range-table-relations-in-a-List-in-.patchDownload
From c72ca47d9d9e9af4e7b422d217d126bc7cd14ba4 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 6 Sep 2023 17:54:19 +0900
Subject: [PATCH v50 6/6] Track opened range table relations in a List in
EState
This makes ExecCloseRangeTableRelations faster when there are many
relations in the range table but only a few are opened during
execution, such as when run-time pruning kicks in on an Append
containing thousands of partition subplans.
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 9 +++++----
src/backend/executor/execUtils.c | 3 +++
src/include/nodes/execnodes.h | 2 ++
3 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 168ab553ac..8e067ab7d7 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1682,12 +1682,13 @@ ExecCloseResultRelations(EState *estate)
void
ExecCloseRangeTableRelations(EState *estate)
{
- int i;
+ ListCell *lc;
- for (i = 0; i < estate->es_range_table_size; i++)
+ foreach(lc, estate->es_opened_relations)
{
- if (estate->es_relations[i])
- table_close(estate->es_relations[i], NoLock);
+ Relation rel = lfirst(lc);
+
+ table_close(rel, NoLock);
}
}
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index edf1c24e0e..2e4e748559 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -840,6 +840,9 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
}
estate->es_relations[rti - 1] = rel;
+ if (rel != NULL)
+ estate->es_opened_relations = lappend(estate->es_opened_relations,
+ rel);
}
return rel;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 4bc6d9d461..655b6f1b8d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -627,6 +627,8 @@ typedef struct EState
Index es_range_table_size; /* size of the range table arrays */
Relation *es_relations; /* Array of per-range-table-entry Relation
* pointers, or NULL if not yet opened */
+ List *es_opened_relations; /* List of non-NULL entries in
+ * es_relations in no specific order */
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
--
2.43.0
On Thu, Aug 15, 2024 at 8:57 AM Amit Langote <amitlangote09@gmail.com> wrote:
TBH, it's more of a hunch that people who are not involved in this
development might find the new reality, whereby the execution is not
racefree until ExecutorRun(), hard to reason about.
I'm confused by what you mean here by "racefree". A race means
multiple sessions are doing stuff at the same time and the result
depends on who does what first, but the executor stuff is all
backend-private. Heavyweight locks are not backend-private, but those
would be taken in ExectorStart(), not ExecutorRun(), IIUC.
With the patch, CreateQueryDesc() and ExecutorStart() are moved to
PortalStart() so that QueryDescs including the PlanState trees for all
queries are built before any is run. Why? So that if ExecutorStart()
fails for any query in the list, we can simply throw out the QueryDesc
and the PlanState trees of the previous queries (NOT run them) and ask
plancache for a new CachedPlan for the list of queries. We don't have
a way to ask plancache.c to replan only a given query in the list.
I agree that moving this from PortalRun() to PortalStart() seems like
a bad idea, especially in view of what you write below.
* There's no longer CCI() between queries in PortalRunMulti() because
the snapshots in each query's QueryDesc must have been adjusted to
reflect the correct command counter. I've checked but can't really be
sure if the value in the snapshot is all anyone ever uses if they want
to know the current value of the command counter.
I don't think anything stops somebody wanting to look at the current
value of the command counter. I also don't think you can remove the
CommandCounterIncrement() calls between successive queries, because
then they won't see the effects of earlier calls. So this sounds
broken to me.
Also keep in mind that one of the queries could call a function which
does something that bumps the command counter again. I'm not sure if
that creates its own hazzard separate from the lack of CCIs, or
whether it's just another part of that same issue. But you can't
assume that each query's snapshot should have a command counter value
one more than the previous query.
While this all seems bad for the partially-initialized-execution-tree
approach, I wonder if you don't have problems here with the other
design, too. Let's say you've the multi-query case and there are 2
queries. The first one (Q1) is SELECT mysterious_function() and the
second one (Q2) is SELECT * FROM range_partitioned_table WHERE
key_column = 42. What if mysterious_function() performs DDL on
range_partitioned_table? I haven't tested this so maybe there are
things going on here that prevent trouble, but it seems like executing
Q1 can easily invalidate the plan for Q2. And then it seems like
you're basically back to the same problem.
3. The need to add *back* the fields to store the RT indexes of
relations that are not looked at by ExecInitNode() traversal such as
root partitioned tables and non-leaf partitions.I don't remember exactly why we removed those or what the benefit was,
so I'm not sure how big of a problem it is if we have to put them
back.We removed those in commit 52ed730d511b after commit f2343653f5b2
removed redundant execution-time locking of non-leaf relations. So we
removed them because we realized that execution time locking is
unnecessary given that AcquireExecutorLocks() exists and now we want
to add them back because we'd like to get rid of
AcquireExecutorLocks(). :-)
My bias is to believe that getting rid of AcquireExecutorLocks() is
probably the right thing to do, but that's not a strongly-held
position and I could be totally wrong about it. The thing is, though,
that AcquireExecutorLocks() is fundamentally stupid, and it's hard to
see how it can ever be any smarter. If we want to make smarter
decisions about what to lock, it seems reasonable to me to think that
the locking code needs to be closer to code that can evaluate
expressions and prune partitions and stuff like that.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Aug 16, 2024 at 12:35 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Aug 15, 2024 at 8:57 AM Amit Langote <amitlangote09@gmail.com> wrote:
TBH, it's more of a hunch that people who are not involved in this
development might find the new reality, whereby the execution is not
racefree until ExecutorRun(), hard to reason about.I'm confused by what you mean here by "racefree". A race means
multiple sessions are doing stuff at the same time and the result
depends on who does what first, but the executor stuff is all
backend-private. Heavyweight locks are not backend-private, but those
would be taken in ExectorStart(), not ExecutorRun(), IIUC.
Sorry, yes, I meant ExecutorStart(). A backend that wants to execute
a plan tree from a CachedPlan is in a race with other backends that
might modify tables before ExecutorStart() takes the remaining locks.
That race window is bigger when it is ExecutorStart() that will take
the locks, and I don't mean in terms of timing, but in terms of the
other code that can run in between GetCachedPlan() returning a
partially valid plan and ExecutorStart() takes the remaining locks
depending on the calling module.
With the patch, CreateQueryDesc() and ExecutorStart() are moved to
PortalStart() so that QueryDescs including the PlanState trees for all
queries are built before any is run. Why? So that if ExecutorStart()
fails for any query in the list, we can simply throw out the QueryDesc
and the PlanState trees of the previous queries (NOT run them) and ask
plancache for a new CachedPlan for the list of queries. We don't have
a way to ask plancache.c to replan only a given query in the list.I agree that moving this from PortalRun() to PortalStart() seems like
a bad idea, especially in view of what you write below.* There's no longer CCI() between queries in PortalRunMulti() because
the snapshots in each query's QueryDesc must have been adjusted to
reflect the correct command counter. I've checked but can't really be
sure if the value in the snapshot is all anyone ever uses if they want
to know the current value of the command counter.I don't think anything stops somebody wanting to look at the current
value of the command counter. I also don't think you can remove the
CommandCounterIncrement() calls between successive queries, because
then they won't see the effects of earlier calls. So this sounds
broken to me.
I suppose you mean CCI between "running" (calling ExecutorRun on)
successive queries. Then the patch is indeed broken. If we're to
make that right, the number of CCIs for the multi-query portals will
have to double given the separation of ExecutorStart() and
ExecutorRun() phases.
Also keep in mind that one of the queries could call a function which
does something that bumps the command counter again. I'm not sure if
that creates its own hazzard separate from the lack of CCIs, or
whether it's just another part of that same issue. But you can't
assume that each query's snapshot should have a command counter value
one more than the previous query.While this all seems bad for the partially-initialized-execution-tree
approach, I wonder if you don't have problems here with the other
design, too. Let's say you've the multi-query case and there are 2
queries. The first one (Q1) is SELECT mysterious_function() and the
second one (Q2) is SELECT * FROM range_partitioned_table WHERE
key_column = 42. What if mysterious_function() performs DDL on
range_partitioned_table? I haven't tested this so maybe there are
things going on here that prevent trouble, but it seems like executing
Q1 can easily invalidate the plan for Q2. And then it seems like
you're basically back to the same problem.
A rule (but not views AFAICS) can lead to the multi-query case (there
might be other ways). I tried the following, and, yes, the plan for
the query queued by the rule is broken by the execution of that for
the 1st query:
create table foo (a int);
create table bar (a int);
create or replace function foo_trig_func () returns trigger as $$
begin drop table bar cascade; return new.*; end; $$ language plpgsql;
create trigger foo_trig before insert on foo execute function foo_trig_func();
create rule insert_foo AS ON insert TO foo do also insert into bar
values (new.*);
set plan_cache_mode to force_generic_plan ;
prepare q as insert into foo values (1);
execute q;
NOTICE: drop cascades to rule insert_foo on table foo
ERROR: relation with OID 16418 does not exist
The ERROR comes from trying to run (actually "initialize") the cached
plan for `insert into bar values (new.*);` which is due to the rule.
Though, it doesn't have to be a cached plan for the breakage to
happen. You can see the same error without the prepared statement:
insert into foo values (1);
NOTICE: drop cascades to rule insert_foo on table foo
ERROR: relation with OID 16418 does not exist
Another example:
create or replace function foo_trig_func () returns trigger as $$
begin alter table bar add b int; return new.*; end; $$ language
plpgsql;
execute q;
ERROR: table row type and query-specified row type do not match
DETAIL: Query has too few columns.
insert into foo values (1);
ERROR: table row type and query-specified row type do not match
DETAIL: Query has too few columns.
This time the error occurs in ExecModifyTable(), so when "running" the
plan, but again the code that's throwing the error is just "lazy"
initialization of the ProjectionInfo when inserting into bar.
So it is possible for the executor to try to run a plan that has
become invalid since it was created, so...
3. The need to add *back* the fields to store the RT indexes of
relations that are not looked at by ExecInitNode() traversal such as
root partitioned tables and non-leaf partitions.I don't remember exactly why we removed those or what the benefit was,
so I'm not sure how big of a problem it is if we have to put them
back.We removed those in commit 52ed730d511b after commit f2343653f5b2
removed redundant execution-time locking of non-leaf relations. So we
removed them because we realized that execution time locking is
unnecessary given that AcquireExecutorLocks() exists and now we want
to add them back because we'd like to get rid of
AcquireExecutorLocks(). :-)My bias is to believe that getting rid of AcquireExecutorLocks() is
probably the right thing to do, but that's not a strongly-held
position and I could be totally wrong about it. The thing is, though,
that AcquireExecutorLocks() is fundamentally stupid, and it's hard to
see how it can ever be any smarter. If we want to make smarter
decisions about what to lock, it seems reasonable to me to think that
the locking code needs to be closer to code that can evaluate
expressions and prune partitions and stuff like that.
One perhaps crazy idea [1]I recall Michael Paquier mentioning something like this to me once when I was describing this patch and thread to him.:
What if we remove AcquireExecutorLocks() and move the responsibility
of taking the remaining necessary locks into the executor (those on
any inheritance children that are added during planning and thus not
accounted for by AcquirePlannerLocks()), like the patch already does,
but don't make it also check if the plan has become invalid, which it
can't do anyway unless it's from a CachedPlan. That means we instead
let the executor throw any errors that occur when trying to either
initialize the plan because of the changes that have occurred to the
objects referenced in the plan, like what is happening in the above
example. If that case is going to be rare anway, why spend energy on
checking the validity and replan, especially if that's not an easy
thing to do as we're finding out. In the above example, we could say
that it's a user error to create a rule like that, so it should not
happen in practice, but when it does, the executor seems to deal with
it correctly by refusing to execute a broken plan . Perhaps it's more
worthwhile to make the executor behave correctly in face of plan
invalidation than teach the rest of the system to deal with the
executor throwing its hands up when it runs into an invalid plan?
Again, I think this may be a crazy line of thinking but just wanted to
get it out there.
--
Thanks, Amit Langote
[1]: I recall Michael Paquier mentioning something like this to me once when I was describing this patch and thread to him.
when I was describing this patch and thread to him.
On Fri, Aug 16, 2024 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote:
So it is possible for the executor to try to run a plan that has
become invalid since it was created, so...
I'm not sure what the "so what" here is.
One perhaps crazy idea [1]:
What if we remove AcquireExecutorLocks() and move the responsibility
of taking the remaining necessary locks into the executor (those on
any inheritance children that are added during planning and thus not
accounted for by AcquirePlannerLocks()), like the patch already does,
but don't make it also check if the plan has become invalid, which it
can't do anyway unless it's from a CachedPlan. That means we instead
let the executor throw any errors that occur when trying to either
initialize the plan because of the changes that have occurred to the
objects referenced in the plan, like what is happening in the above
example. If that case is going to be rare anway, why spend energy on
checking the validity and replan, especially if that's not an easy
thing to do as we're finding out. In the above example, we could say
that it's a user error to create a rule like that, so it should not
happen in practice, but when it does, the executor seems to deal with
it correctly by refusing to execute a broken plan . Perhaps it's more
worthwhile to make the executor behave correctly in face of plan
invalidation than teach the rest of the system to deal with the
executor throwing its hands up when it runs into an invalid plan?
Again, I think this may be a crazy line of thinking but just wanted to
get it out there.
I don't know whether this is crazy or not. I think there are two
issues. One, the set of checks that we have right now might not be
complete, and we might just not have realized that because it happens
infrequently enough that we haven't found all the bugs. If that's so,
then a change like this could be a good thing, because it might force
us to fix stuff we should be fixing anyway. I have a feeling that some
of the checks you hit there were added as bug fixes long after the
code was written originally, so my confidence that we don't have more
bugs isn't especially high.
And two, it matters a lot how frequent the errors will be in practice.
I think we normally try to replan rather than let a stale plan be used
because we want to not fail, because users don't like failure. If the
design you propose here would make failures more (or less) frequent,
then that's a problem (or awesome).
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, Aug 16, 2024 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote:
So it is possible for the executor to try to run a plan that has
become invalid since it was created, so...
I'm not sure what the "so what" here is.
The fact that there are holes in our protections against that doesn't
make it a good idea to walk away from the protections. That path
leads to crashes and data corruption and unhappy users.
What the examples here are showing is that AcquireExecutorLocks
is incomplete because it only provides defenses against DDL
initiated by other sessions, not by our own session. We have
CheckTableNotInUse but I'm not sure if it could be applied here.
We certainly aren't calling that in anywhere near as systematic
a way as we have for acquiring locks.
Maybe we should rethink the principle that a session's locks
never conflict against itself, although I fear that might be
a nasty can of worms.
Could it work to do CheckTableNotInUse when acquiring an
exclusive table lock? I don't doubt that we'd have to fix some
code paths, but if the damage isn't extensive then that
might offer a more nearly bulletproof approach.
regards, tom lane
On Mon, Aug 19, 2024 at 12:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
What the examples here are showing is that AcquireExecutorLocks
is incomplete because it only provides defenses against DDL
initiated by other sessions, not by our own session. We have
CheckTableNotInUse but I'm not sure if it could be applied here.
We certainly aren't calling that in anywhere near as systematic
a way as we have for acquiring locks.Maybe we should rethink the principle that a session's locks
never conflict against itself, although I fear that might be
a nasty can of worms.
It might not be that bad. It could replace the CheckTableNotInUse()
protections that we have today but maybe cover more cases, and it
could do so without needing any changes to the shared lock manager.
Say every time you start a query you give that query an ID number, and
all locks taken by that query are tagged with that ID number in the
local lock table, and maybe some flags indicating why the lock was
taken. When a new lock acquisition comes along you can say "oh, this
lock was previously taken so that we could do thus-and-so" and then
use that to fail with the appropriate error message. That seems like
it might be more powerful than the refcnt check within
CheckTableNotInUse().
But that seems somewhat incidental to what this thread is about. IIUC,
Amit's original design involved having the plan cache call some new
executor function to do partition pruning before lock acquisition, and
then passing that data structure around, including back to the
executor, so that we didn't repeat the pruning we already did, which
would be a bad thing to do not only because it would incur CPU cost
but also because really bad things would happen if we got a different
answer the second time. IIUC, you didn't think that was going to work
out nicely, and suggested instead moving the pruning+locking to
ExecutorStart() time. But now Amit is finding problems with that
approach, because by the time we reach PortalRun() for the
PORTAL_MULTI_QUERY case, it's too late to replan, because we can't ask
the plancache to replan just one query from the list; and if we try to
fix that by moving ExecutorStart() to PortalStart(), then there are
other problems. Do you have a view on what the way forward might be?
This thread has gotten a tad depressing, honestly. All of the opinions
about what we ought to do seem to be based on the firm conviction that
X or Y or Z will not work, rather than on the confidence that A or B
or C will work. Yet I'm inclined to believe this problem is solvable.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
But that seems somewhat incidental to what this thread is about.
Perhaps. But if we're running into issues related to that, it might
be good to set aside the long-term goal for a bit and come up with
a cleaner answer for intra-session locking. That could allow the
pruning problem to be solved more cleanly in turn, and it'd be
an improvement even if not.
Do you have a view on what the way forward might be?
I'm fresh out of ideas at the moment, other than having a hope that
divide-and-conquer (ie, solving subproblems first) might pay off.
This thread has gotten a tad depressing, honestly. All of the opinions
about what we ought to do seem to be based on the firm conviction that
X or Y or Z will not work, rather than on the confidence that A or B
or C will work. Yet I'm inclined to believe this problem is solvable.
Yeah. We are working in an extremely not-green field here, which
means it's a lot easier to see pre-existing reasons why X will not
work than to have confidence that it will work. But hey, if this
were easy then we'd have done it already.
regards, tom lane
On Mon, Aug 19, 2024 at 1:52 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
But that seems somewhat incidental to what this thread is about.
Perhaps. But if we're running into issues related to that, it might
be good to set aside the long-term goal for a bit and come up with
a cleaner answer for intra-session locking. That could allow the
pruning problem to be solved more cleanly in turn, and it'd be
an improvement even if not.
Maybe, but the pieces aren't quite coming together for me. Solving
this would mean that if we execute a stale plan, we'd be more likely
to get a good error and less likely to get a bad, nasty-looking
internal error, or a crash. That's good on its own terms, but we don't
really want user queries to produce errors at all, so I don't think
we'd feel any more free to rearrange the order of operations than we
do today.
Do you have a view on what the way forward might be?
I'm fresh out of ideas at the moment, other than having a hope that
divide-and-conquer (ie, solving subproblems first) might pay off.
Fair enough, but why do you think that the original approach of
creating a data structure from within the plan cache mechanism
(probably via a call into some new executor entrypoint) and then
feeding that through to ExecutorRun() time can't work? Is it possible
you latched onto some non-optimal decisions that the early versions of
the patch made, rather than there being a fundamental problem with the
concept?
I actually thought the do-it-at-executorstart-time approach sounded
pretty good, even though we might have to abandon planstate tree
initialization partway through, right up until Amit started talking
about moving ExecutorStart() from PortalRun() to PortalStart(), which
I have a feeling is going to create a bigger problem than we can
solve. I think if we want to save that approach, we should try to
figure out if we can teach the plancache to replan one query from a
list without replanning the others, which seems like it might allow us
to keep the order of major operations unchanged. Otherwise, it makes
sense to me to have another go at the other approach, at least to make
sure we understand clearly why it can't work.
Yeah. We are working in an extremely not-green field here, which
means it's a lot easier to see pre-existing reasons why X will not
work than to have confidence that it will work. But hey, if this
were easy then we'd have done it already.
Yeah, true.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Aug 20, 2024 at 1:39 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Aug 16, 2024 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote:
So it is possible for the executor to try to run a plan that has
become invalid since it was created, so...I'm not sure what the "so what" here is.
I meant that if the executor has to deal with broken plans anyway, we
might as well lean into that fact by choosing not to handle only the
cached plan case in a certain way. Yes, I understand that that's not
a good justification.
One perhaps crazy idea [1]:
What if we remove AcquireExecutorLocks() and move the responsibility
of taking the remaining necessary locks into the executor (those on
any inheritance children that are added during planning and thus not
accounted for by AcquirePlannerLocks()), like the patch already does,
but don't make it also check if the plan has become invalid, which it
can't do anyway unless it's from a CachedPlan. That means we instead
let the executor throw any errors that occur when trying to either
initialize the plan because of the changes that have occurred to the
objects referenced in the plan, like what is happening in the above
example. If that case is going to be rare anway, why spend energy on
checking the validity and replan, especially if that's not an easy
thing to do as we're finding out. In the above example, we could say
that it's a user error to create a rule like that, so it should not
happen in practice, but when it does, the executor seems to deal with
it correctly by refusing to execute a broken plan . Perhaps it's more
worthwhile to make the executor behave correctly in face of plan
invalidation than teach the rest of the system to deal with the
executor throwing its hands up when it runs into an invalid plan?
Again, I think this may be a crazy line of thinking but just wanted to
get it out there.I don't know whether this is crazy or not. I think there are two
issues. One, the set of checks that we have right now might not be
complete, and we might just not have realized that because it happens
infrequently enough that we haven't found all the bugs. If that's so,
then a change like this could be a good thing, because it might force
us to fix stuff we should be fixing anyway. I have a feeling that some
of the checks you hit there were added as bug fixes long after the
code was written originally, so my confidence that we don't have more
bugs isn't especially high.
This makes sense.
And two, it matters a lot how frequent the errors will be in practice.
I think we normally try to replan rather than let a stale plan be used
because we want to not fail, because users don't like failure. If the
design you propose here would make failures more (or less) frequent,
then that's a problem (or awesome).
I think we'd modify plancache.c to postpone the locking of only
prunable relations (i.e., partitions), so we're looking at only a
handful of concurrent modifications that are going to cause execution
errors. That's because we disallow many DDL modifications of
partitions unless they are done via recursion from the parent, so the
space of errors in practice would be smaller compared to if we were to
postpone *all* cached plan locks to ExecInitNode() time. DROP INDEX
a_partion_only_index comes to mind as something that might cause an
error. I've not tested if other partition-only constraints can cause
unsafe behaviors.
Perhaps, we can add the check for CachedPlan.is_valid after every
table_open() and index_open() in the executor that takes a lock or at
all the places we discussed previously and throw the error (say:
"cached plan is no longer valid") if it's false. That's better than
running into and throwing into some random error by soldiering ahead
with its initialization / execution, but still a loss in terms of user
experience because we're adding a new failure mode, however rare.
--
Thanks, Amit Langote
On Tue, Aug 20, 2024 at 3:21 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Aug 19, 2024 at 1:52 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
But that seems somewhat incidental to what this thread is about.
Perhaps. But if we're running into issues related to that, it might
be good to set aside the long-term goal for a bit and come up with
a cleaner answer for intra-session locking. That could allow the
pruning problem to be solved more cleanly in turn, and it'd be
an improvement even if not.Maybe, but the pieces aren't quite coming together for me. Solving
this would mean that if we execute a stale plan, we'd be more likely
to get a good error and less likely to get a bad, nasty-looking
internal error, or a crash. That's good on its own terms, but we don't
really want user queries to produce errors at all, so I don't think
we'd feel any more free to rearrange the order of operations than we
do today.
Yeah, it's unclear whether executing a potentially stale plan is an
acceptable tradeoff compared to replanning, especially if it occurs
rarely. Personally, I would prefer that it is.
Do you have a view on what the way forward might be?
I'm fresh out of ideas at the moment, other than having a hope that
divide-and-conquer (ie, solving subproblems first) might pay off.Fair enough, but why do you think that the original approach of
creating a data structure from within the plan cache mechanism
(probably via a call into some new executor entrypoint) and then
feeding that through to ExecutorRun() time can't work?
That would be ExecutorStart(). The data structure need not be
referenced after ExecInitNode().
Is it possible
you latched onto some non-optimal decisions that the early versions of
the patch made, rather than there being a fundamental problem with the
concept?I actually thought the do-it-at-executorstart-time approach sounded
pretty good, even though we might have to abandon planstate tree
initialization partway through, right up until Amit started talking
about moving ExecutorStart() from PortalRun() to PortalStart(), which
I have a feeling is going to create a bigger problem than we can
solve. I think if we want to save that approach, we should try to
figure out if we can teach the plancache to replan one query from a
list without replanning the others, which seems like it might allow us
to keep the order of major operations unchanged. Otherwise, it makes
sense to me to have another go at the other approach, at least to make
sure we understand clearly why it can't work.
+1
--
Thanks, Amit Langote
On Tue, Aug 20, 2024 at 9:00 AM Amit Langote <amitlangote09@gmail.com> wrote:
I think we'd modify plancache.c to postpone the locking of only
prunable relations (i.e., partitions), so we're looking at only a
handful of concurrent modifications that are going to cause execution
errors. That's because we disallow many DDL modifications of
partitions unless they are done via recursion from the parent, so the
space of errors in practice would be smaller compared to if we were to
postpone *all* cached plan locks to ExecInitNode() time. DROP INDEX
a_partion_only_index comes to mind as something that might cause an
error. I've not tested if other partition-only constraints can cause
unsafe behaviors.
This seems like a valid point to some extent, but in other contexts
we've had discussions about how we don't actually guarantee all that
much uniformity between a partitioned table and its partitions, and
it's been questioned whether we made the right decisions there. So I'm
not entirely sure that the surface area for problems here will be as
narrow as you're hoping -- I think we'd need to go through all of the
ALTER TABLE variants and think it through. But maybe the problems
aren't that bad.
It does seem like constraints can change the plan. Imagine the
partition had a CHECK(false) constraint before and now doesn't, or
something.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Aug 20, 2024 at 11:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 20, 2024 at 9:00 AM Amit Langote <amitlangote09@gmail.com> wrote:
I think we'd modify plancache.c to postpone the locking of only
prunable relations (i.e., partitions), so we're looking at only a
handful of concurrent modifications that are going to cause execution
errors. That's because we disallow many DDL modifications of
partitions unless they are done via recursion from the parent, so the
space of errors in practice would be smaller compared to if we were to
postpone *all* cached plan locks to ExecInitNode() time. DROP INDEX
a_partion_only_index comes to mind as something that might cause an
error. I've not tested if other partition-only constraints can cause
unsafe behaviors.This seems like a valid point to some extent, but in other contexts
we've had discussions about how we don't actually guarantee all that
much uniformity between a partitioned table and its partitions, and
it's been questioned whether we made the right decisions there. So I'm
not entirely sure that the surface area for problems here will be as
narrow as you're hoping -- I think we'd need to go through all of the
ALTER TABLE variants and think it through. But maybe the problems
aren't that bad.
Many changeable properties that are reflected in the RelationData of a
partition after getting the lock on it seem to cause no issues as long
as the executor code only looks at RelationData, which is true for
most Scan nodes. It also seems true for ModifyTable which looks into
RelationData for relation properties relevant to insert/deletes.
The two things that don't cope are:
* Index Scan nodes with concurrent DROP INDEX of partition-only indexes.
* Concurrent DROP CONSTRAINT of partition-only CHECK and NOT NULL
constraints can lead to incorrect result as I write below.
It does seem like constraints can change the plan. Imagine the
partition had a CHECK(false) constraint before and now doesn't, or
something.
Yeah, if the CHECK constraint gets dropped concurrently, any new rows
that got added after that will not be returned by executing a stale
cached plan, because the plan would have been created based on the
assumption that such rows shouldn't be there due to the CHECK
constraint. We currently don't explicitly check that the constraints
that were used during planning still exist before executing the plan.
Overall, I'm starting to feel less enthused by the idea throwing an
error in the executor due to known and unknown hazards of trying to
execute a stale plan. Even if we made a note in the docs of such
hazards, any users who run into these rare errors are likely to head
to -bugs or -hackers anyway.
Tom said we should perhaps look at the hazards caused by intra-session
locking, but we'd still be left with the hazards of missing index and
constraints, AFAICS, due to DROP from other sessions.
So, the options:
* The replanning aspect of the lock-in-the-executor design would be
simpler if a CachedPlan contained the plan for a single query rather
than a list of queries, as previously mentioned. This is particularly
due to the requirements of the PORTAL_MULTI_QUERY case. However, this
option might be impractical.
* Polish the patch for the old design of doing the initial pruning
before AcquireExecutorLocks() and focus on hashing out any bugs and
issues of that design.
--
Thanks, Amit Langote
On Wed, Aug 21, 2024 at 8:45 AM Amit Langote <amitlangote09@gmail.com> wrote:
* The replanning aspect of the lock-in-the-executor design would be
simpler if a CachedPlan contained the plan for a single query rather
than a list of queries, as previously mentioned. This is particularly
due to the requirements of the PORTAL_MULTI_QUERY case. However, this
option might be impractical.
It might be, but maybe it would be worth a try? I mean,
GetCachedPlan() seems to just call pg_plan_queries() which just loops
over the list of query trees and does the same thing for each one. If
we wanted to replan a single query, why couldn't we do
fake_querytree_list = list_make1(list_nth(querytree_list, n)) and then
call pg_plan_queries(fake_querytree_list)? Or something equivalent to
that. We could have a new GetCachedSinglePlan(cplan, n) to do this.
* Polish the patch for the old design of doing the initial pruning
before AcquireExecutorLocks() and focus on hashing out any bugs and
issues of that design.
That's also an option. It probably has issues too, but I don't know
what they are exactly.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Aug 21, 2024 at 10:10 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Aug 21, 2024 at 8:45 AM Amit Langote <amitlangote09@gmail.com> wrote:
* The replanning aspect of the lock-in-the-executor design would be
simpler if a CachedPlan contained the plan for a single query rather
than a list of queries, as previously mentioned. This is particularly
due to the requirements of the PORTAL_MULTI_QUERY case. However, this
option might be impractical.It might be, but maybe it would be worth a try? I mean,
GetCachedPlan() seems to just call pg_plan_queries() which just loops
over the list of query trees and does the same thing for each one. If
we wanted to replan a single query, why couldn't we do
fake_querytree_list = list_make1(list_nth(querytree_list, n)) and then
call pg_plan_queries(fake_querytree_list)? Or something equivalent to
that. We could have a new GetCachedSinglePlan(cplan, n) to do this.
I've been hacking to prototype this, and it's showing promise. It
helps make the replan loop at the call sites that start the executor
with an invalidatable plan more localized and less prone to
action-at-a-distance issues. However, the interface and contract of
the new function in my prototype are pretty specialized for the replan
loop in this context—meaning it's not as general-purpose as
GetCachedPlan(). Essentially, what you get when you call it is a
'throwaway' CachedPlan containing only the plan for the query that
failed during ExecutorStart(), not a plan integrated into the original
CachedPlanSource's stmt_list. A call site entering the replan loop
will retry the execution with that throwaway plan, release it once
done, and resume looping over the plans in the original list. The
invalid plan that remains in the original list will be discarded and
replanned in the next call to GetCachedPlan() using the same
CachedPlanSource. While that may sound undesirable, I'm inclined to
think it's not something that needs optimization, given that we're
expecting this code path to be taken rarely.
I'll post a version of a revamped locks-in-the-executor patch set
using the above function after debugging some more.
--
Thanks, Amit Langote
On Fri, Aug 23, 2024 at 9:48 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Wed, Aug 21, 2024 at 10:10 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Aug 21, 2024 at 8:45 AM Amit Langote <amitlangote09@gmail.com> wrote:
* The replanning aspect of the lock-in-the-executor design would be
simpler if a CachedPlan contained the plan for a single query rather
than a list of queries, as previously mentioned. This is particularly
due to the requirements of the PORTAL_MULTI_QUERY case. However, this
option might be impractical.It might be, but maybe it would be worth a try? I mean,
GetCachedPlan() seems to just call pg_plan_queries() which just loops
over the list of query trees and does the same thing for each one. If
we wanted to replan a single query, why couldn't we do
fake_querytree_list = list_make1(list_nth(querytree_list, n)) and then
call pg_plan_queries(fake_querytree_list)? Or something equivalent to
that. We could have a new GetCachedSinglePlan(cplan, n) to do this.I've been hacking to prototype this, and it's showing promise. It
helps make the replan loop at the call sites that start the executor
with an invalidatable plan more localized and less prone to
action-at-a-distance issues. However, the interface and contract of
the new function in my prototype are pretty specialized for the replan
loop in this context—meaning it's not as general-purpose as
GetCachedPlan(). Essentially, what you get when you call it is a
'throwaway' CachedPlan containing only the plan for the query that
failed during ExecutorStart(), not a plan integrated into the original
CachedPlanSource's stmt_list. A call site entering the replan loop
will retry the execution with that throwaway plan, release it once
done, and resume looping over the plans in the original list. The
invalid plan that remains in the original list will be discarded and
replanned in the next call to GetCachedPlan() using the same
CachedPlanSource. While that may sound undesirable, I'm inclined to
think it's not something that needs optimization, given that we're
expecting this code path to be taken rarely.I'll post a version of a revamped locks-in-the-executor patch set
using the above function after debugging some more.
Here it is.
0001 implements changes to defer the locking of runtime-prunable
relations to the executor. The new design introduces a bitmapset
field in PlannedStmt to distinguish at runtime between relations that
are prunable whose locking can be deferred until ExecInitNode() and
those that are not and must be locked in advance. The set of prunable
relations can be constructed by looking at all the PartitionPruneInfos
in the plan and checking which are subject to "initial" pruning steps.
The set of unprunable relations is obtained by subtracting those from
the set of all RT indexes. This design gets rid of one annoying
aspect of the old design which was the need to add specialized fields
to store the RT indexes of partitioned relations that are not
otherwise referenced in the plan tree. That was necessary because in
the old design, I had removed the function AcquireExecutorLocks()
altogether to defer the locking of all child relations to execution.
In the new design such relations are still locked by
AcquireExecutorLocks().
0002 is the old patch to make ExecEndNode() robust against partially
initialized PlanState nodes by adding NULL checks.
0003 is the patch to add changes to deal with the CachedPlan becoming
invalid before the deferred locks on prunable relations are taken.
I've moved the replan loop into a new wrapper-over-ExecutorStart()
function instead of having the same logic at multiple sites. The
replan logic uses the GetSingleCachedPlan() described in the quoted
text. The callers of the new ExecutorStart()-wrapper, which I've
dubbed ExecutorStartExt(), need to pass the CachedPlanSource and a
query_index, which is the index of the query being executed in the
list CachedPlanSource.query_list. They are needed by
GetSingleCachedPlan(). The changes outside the executor are pretty
minimal in this design and all the difficulties of having to loop back
to GetCachedPlan() are now gone. I like how this turned out.
One idea that I think might be worth trying to reduce the footprint of
0003 is to try to lock the prunable relations in a step of InitPlan()
separate from ExecInitNode(), which can be implemented by doing the
initial runtime pruning in that separate step. That way, we'll have
all the necessary locks before calling ExecInitNode() and so we don't
need to sprinkle the CachedPlanStillValid() checks all over the place
and worry about missed checks and dealing with partially initialized
PlanState trees.
--
Thanks, Amit Langote
Attachments:
v51-0003-Handle-CachedPlan-invalidation-in-the-executor.patchapplication/octet-stream; name=v51-0003-Handle-CachedPlan-invalidation-in-the-executor.patchDownload
From 887627ec4455a70a716ce56f386f71df953cdf64 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 22 Aug 2024 19:38:13 +0900
Subject: [PATCH v51 3/3] Handle CachedPlan invalidation in the executor
This commit makes changes to handle cases where a cached plan
becomes invalid before deferred locks on prunable relations are taken.
* Add checks at various points in ExecutorStart() and its called
functions to determine if the plan becomes invalid. If detected,
the function and its callers return immediately. A previous commit
ensures any partially initialized PlanState tree objects are cleaned
up appropriately.
* Introduce ExecutorStartExt(), a wrapper over ExecutorStart(), to
handle cases where plan initialization is aborted due to invalidation.
ExecutorStartExt() creates a new transient CachedPlan if needed and
retries execution. This new entry point is only required for sites
using plancache.c. It requires passing the QueryDesc, eflags,
CachedPlanSource, and query_index (index in CachedPlanSource.query_list).
* Add GetSingleCachedPlan() in plancache.c to create a transient
CachedPlan for a specified query in the given CachedPlanSource.
This also adds isolation tests using the delay_execution test module
to verify scenarios where a CachedPlan becomes invalid before the
deferred locks are taken.
All ExecutorStart_hook implementations now must add the following
block after the ExecutorStart() call to ensure it doesn't work with an
invalid plan:
/* The plan may have become invalid during ExecutorStart() */
if (!ExecPlanStillValid(queryDesc->estate))
return;
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.comk
---
contrib/auto_explain/auto_explain.c | 4 +
.../pg_stat_statements/pg_stat_statements.c | 4 +
contrib/postgres_fdw/postgres_fdw.c | 36 +++-
src/backend/commands/explain.c | 8 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 10 +-
src/backend/commands/trigger.c | 14 ++
src/backend/executor/README | 32 +++-
src/backend/executor/execMain.c | 91 ++++++++-
src/backend/executor/execParallel.c | 4 +-
src/backend/executor/execPartition.c | 10 +
src/backend/executor/execProcnode.c | 7 +
src/backend/executor/execUtils.c | 42 ++++-
src/backend/executor/nodeAgg.c | 2 +
src/backend/executor/nodeAppend.c | 12 +-
src/backend/executor/nodeBitmapAnd.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 4 +
src/backend/executor/nodeBitmapIndexscan.c | 6 +-
src/backend/executor/nodeBitmapOr.c | 2 +
src/backend/executor/nodeCustom.c | 2 +
src/backend/executor/nodeForeignscan.c | 4 +
src/backend/executor/nodeGather.c | 2 +
src/backend/executor/nodeGatherMerge.c | 2 +
src/backend/executor/nodeGroup.c | 2 +
src/backend/executor/nodeHash.c | 2 +
src/backend/executor/nodeHashjoin.c | 4 +
src/backend/executor/nodeIncrementalSort.c | 2 +
src/backend/executor/nodeIndexonlyscan.c | 7 +-
src/backend/executor/nodeIndexscan.c | 8 +-
src/backend/executor/nodeLimit.c | 2 +
src/backend/executor/nodeLockRows.c | 2 +
src/backend/executor/nodeMaterial.c | 2 +
src/backend/executor/nodeMemoize.c | 2 +
src/backend/executor/nodeMergeAppend.c | 6 +-
src/backend/executor/nodeMergejoin.c | 4 +
src/backend/executor/nodeModifyTable.c | 13 ++
src/backend/executor/nodeNestloop.c | 4 +
src/backend/executor/nodeProjectSet.c | 2 +
src/backend/executor/nodeRecursiveunion.c | 4 +
src/backend/executor/nodeResult.c | 2 +
src/backend/executor/nodeSamplescan.c | 3 +
src/backend/executor/nodeSeqscan.c | 3 +
src/backend/executor/nodeSetOp.c | 2 +
src/backend/executor/nodeSort.c | 2 +
src/backend/executor/nodeSubqueryscan.c | 2 +
src/backend/executor/nodeTidrangescan.c | 2 +
src/backend/executor/nodeTidscan.c | 2 +
src/backend/executor/nodeUnique.c | 2 +
src/backend/executor/nodeWindowAgg.c | 2 +
src/backend/executor/spi.c | 19 +-
src/backend/tcop/postgres.c | 4 +-
src/backend/tcop/pquery.c | 31 +++-
src/backend/utils/cache/plancache.c | 50 +++++
src/backend/utils/mmgr/portalmem.c | 4 +-
src/include/commands/explain.h | 1 +
src/include/commands/trigger.h | 1 +
src/include/executor/execdesc.h | 1 +
src/include/executor/executor.h | 18 ++
src/include/nodes/execnodes.h | 1 +
src/include/utils/plancache.h | 18 ++
src/include/utils/portal.h | 4 +-
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 63 ++++++-
.../expected/cached-plan-inval.out | 175 ++++++++++++++++++
src/test/modules/delay_execution/meson.build | 1 +
.../specs/cached-plan-inval.spec | 65 +++++++
66 files changed, 790 insertions(+), 58 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-inval.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-inval.spec
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 677c135f59..3675ce9a88 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -300,6 +300,10 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
+ /* The plan may have become invalid during ExecutorStart() */
+ if (!ExecPlanStillValid(queryDesc->estate))
+ return;
+
if (auto_explain_enabled())
{
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 362d222f63..98a328b79f 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -992,6 +992,10 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
+ /* The plan may have become invalid during ExecutorStart() */
+ if (!ExecPlanStillValid(queryDesc->estate))
+ return;
+
/*
* If query has queryId zero, don't track it. This prevents double
* counting of optimizable statements that are directly contained in
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index adc62576d1..65f4ffe5ee 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2144,7 +2144,11 @@ postgresEndForeignModify(EState *estate,
{
PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
- /* If fmstate is NULL, we are in EXPLAIN; nothing to do */
+ /*
+ * fmstate could be NULL under two conditions: during an EXPLAIN
+ * operation or if BeginForeignModify() hasn't been invoked.
+ * In either case, no action is required.
+ */
if (fmstate == NULL)
return;
@@ -2650,8 +2654,9 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
{
ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
EState *estate = node->ss.ps.state;
+ Relation rel = node->ss.ss_currentRelation;
PgFdwDirectModifyState *dmstate;
- Index rtindex;
+ Index rtindex = node->resultRelInfo->ri_RangeTableIndex;
Oid userid;
ForeignTable *table;
UserMapping *user;
@@ -2663,24 +2668,32 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
return;
+ /*
+ * Open the foreign table using the RT index given in the ResultRelInfo if
+ * the ScanState doesn't provide it. If the plan becomes invalid as a
+ * result of taking a lock in ExecOpenScanRelation(), do nothing, in which
+ * case node->fdw_state remains NULL.
+ */
+ if (rel == NULL)
+ {
+ Assert(fsplan->scan.scanrelid == 0);
+ rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (unlikely(rel == NULL || !ExecPlanStillValid(estate)))
+ return;
+ }
+
/*
* We'll save private state in node->fdw_state.
*/
dmstate = (PgFdwDirectModifyState *) palloc0(sizeof(PgFdwDirectModifyState));
node->fdw_state = (void *) dmstate;
+ dmstate->rel = rel;
/*
* Identify which user to do the remote access as. This should match what
* ExecCheckPermissions() does.
*/
userid = OidIsValid(fsplan->checkAsUser) ? fsplan->checkAsUser : GetUserId();
-
- /* Get info about foreign table. */
- rtindex = node->resultRelInfo->ri_RangeTableIndex;
- if (fsplan->scan.scanrelid == 0)
- dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
- else
- dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
user = GetUserMapping(userid, table->serverid);
@@ -2811,7 +2824,10 @@ postgresEndDirectModify(ForeignScanState *node)
{
PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
- /* if dmstate is NULL, we are in EXPLAIN; nothing to do */
+ /*
+ * Nothing to do if dmstate is NULL, either because we are in EXPLAIN or
+ * dmstate wasn't initialized due to aborted plan initialization.
+ */
if (dmstate == NULL)
return;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a83ea07db1..a7643360a7 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -507,7 +507,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, NULL, -1, into, es, queryString, params,
+ queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -616,6 +617,7 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
*/
void
ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int query_index,
IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
@@ -686,8 +688,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
if (into)
eflags |= GetIntoRelEFlags(into);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
+ /* Call ExecutorStartExt to prepare the plan for execution. */
+ ExecutorStartExt(queryDesc, eflags, plansource, query_index);
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 4f6acf6719..4b1503c05e 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -107,6 +107,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 311b9ebd5b..4cd79a6e3a 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -202,7 +202,8 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- cplan);
+ cplan,
+ entry->plansource);
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
@@ -583,6 +584,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ int i = 0;
if (es->memory)
{
@@ -655,8 +657,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, cplan, into, es, query_string, paramLI,
- queryEnv,
+ ExplainOnePlan(pstmt, cplan, entry->plansource, i,
+ into, es, query_string, paramLI, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
@@ -668,6 +670,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Separate plans with an appropriate separator */
if (lnext(plan_list, p) != NULL)
ExplainSeparatePlans(es);
+
+ i++;
}
if (estate)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 170360edda..91e4b821a0 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5119,6 +5119,20 @@ AfterTriggerEndQuery(EState *estate)
afterTriggers.query_depth--;
}
+/* ----------
+ * AfterTriggerAbortQuery()
+ *
+ * Called by ExecutorEnd() if the query execution was aborted due to the
+ * plan becoming invalid during initialization.
+ * ----------
+ */
+void
+AfterTriggerAbortQuery(void)
+{
+ /* Revert the actions of AfterTriggerBeginQuery(). */
+ afterTriggers.query_depth--;
+}
+
/*
* AfterTriggerFreeQuery
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 642d63be61..e583df5be0 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,28 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Typically, when the executor initializes a plan tree for execution, it doesn't
+lock non-index relations if the plan tree is freshly generated and not derived
+from a CachedPlan. This is because such locks have already been established
+during the query's parsing, rewriting, and planning phases. However, with a
+cached plan tree, some relations may remain unlocked. The function
+AcquireExecutorLocks() only locks unprunable relations in the plan, deferring
+the locking of prunable ones to executor initialization. This avoids
+unnecessary locking of relations that will be pruned during "initial" runtime
+pruning in the ExecInitNode() routine of nodes containing the pruning info.
+
+This approach creates a window where a cached plan tree with child tables
+could become outdated if another backend modifies these tables before
+ExecInitNode() locks them. As a result, the executor has the added duty to
+verify the plan tree's validity whenever it locks a child table after
+execution-initialization-pruning. This validation is done by checking the
+CachedPlan.is_valid attribute. If the plan tree is outdated (is_valid=false),
+the executor halts further initialization, cleans up the partially initialized
+PlanState tree, and retries execution after creating a new transient
+CachedPlan.
Query Processing Control Flow
-----------------------------
@@ -288,7 +310,7 @@ This is a sketch of control flow for full query processing:
CreateQueryDesc
- ExecutorStart
+ ExecutorStart or ExecutorStartExt
CreateExecutorState
creates per-query context
switch to per-query context to run ExecInitNode
@@ -316,7 +338,13 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale during one of the recursive calls of ExecInitNode() after taking a
+lock on a child table, the control is immmediately returned to
+ExecutorStartExt(), which will create a new plan tree and perform the
+steps starting from CreateExecutorState() again.
+
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0f6dbd1e2b..92e0c9af9e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -58,6 +58,7 @@
#include "utils/backend_status.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
+#include "utils/plancache.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
@@ -133,6 +134,52 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
standard_ExecutorStart(queryDesc, eflags);
}
+/*
+ * A variant of ExecutorStart() that handles cleanup and replanning if the
+ * input CachedPlan becomes invalid due to locks being taken during
+ * ExecutorStartInternal(). If that happens, a new CachedPlan is created
+ * only for the at the index 'query_index' in plansource->query_list, which
+ * is released separately from the original CachedPlan.
+ */
+void
+ExecutorStartExt(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index)
+{
+ if (queryDesc->cplan == NULL)
+ ExecutorStart(queryDesc, eflags);
+ else
+ {
+ while (1)
+ {
+ ExecutorStart(queryDesc, eflags);
+ if (!CachedPlanStillValid(queryDesc->cplan))
+ {
+ CachedPlan *cplan_new;
+
+ /*
+ * Mark execution as aborted to ensure that AFTER trigger
+ * state is properly reset.
+ */
+ queryDesc->estate->es_aborted = true;
+
+ ExecutorEnd(queryDesc);
+
+ cplan_new = GetSingleCachedPlan(plansource, query_index,
+ queryDesc->params,
+ queryDesc->queryEnv);
+ Assert(list_length(cplan_new->stmt_list) == 1);
+ queryDesc->cplan = cplan_new;
+ queryDesc->release_cplan = true;
+ queryDesc->plannedstmt = linitial_node(PlannedStmt,
+ cplan_new->stmt_list);
+ }
+ else
+ break;
+ }
+ }
+}
+
void
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
@@ -316,6 +363,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
estate = queryDesc->estate;
Assert(estate != NULL);
+ Assert(!estate->es_aborted);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* caller must ensure the query's snapshot is active */
@@ -422,8 +470,11 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
Assert(estate != NULL);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
- /* This should be run once and only once per Executor instance */
- Assert(!estate->es_finished);
+ /*
+ * This should be run once and only once per Executor instance and never
+ * if the execution was aborted.
+ */
+ Assert(!estate->es_finished && !estate->es_aborted);
/* Switch into per-query memory context */
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -482,11 +533,10 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
Assert(estate != NULL);
/*
- * Check that ExecutorFinish was called, unless in EXPLAIN-only mode. This
- * Assert is needed because ExecutorFinish is new as of 9.1, and callers
- * might forget to call it.
+ * Check that ExecutorFinish was called, unless in EXPLAIN-only mode or if
+ * execution was aborted.
*/
- Assert(estate->es_finished ||
+ Assert(estate->es_finished || estate->es_aborted ||
(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -500,6 +550,14 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
UnregisterSnapshot(estate->es_snapshot);
UnregisterSnapshot(estate->es_crosscheck_snapshot);
+ /*
+ * Reset AFTER trigger module if the query execution was aborted.
+ */
+ if (estate->es_aborted &&
+ !(estate->es_top_eflags &
+ (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
+ AfterTriggerAbortQuery();
+
/*
* Must switch out of context before destroying it
*/
@@ -832,7 +890,6 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
-
/* ----------------------------------------------------------------
* InitPlan
*
@@ -897,6 +954,9 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (unlikely(relation == NULL ||
+ !ExecPlanStillValid(estate)))
+ return;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -967,6 +1027,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return;
i++;
}
@@ -977,6 +1039,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return;
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -2858,6 +2922,7 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
rcestate->es_rowmarks = parentestate->es_rowmarks;
rcestate->es_rteperminfos = parentestate->es_rteperminfos;
rcestate->es_plannedstmt = parentestate->es_plannedstmt;
+ rcestate->es_cachedplan = parentestate->es_cachedplan;
rcestate->es_junkFilter = parentestate->es_junkFilter;
rcestate->es_output_cid = parentestate->es_output_cid;
rcestate->es_queryEnv = parentestate->es_queryEnv;
@@ -2936,6 +3001,14 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
subplanstate = ExecInitNode(subplan, rcestate, 0);
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
+
+ /*
+ * All necessary locks should have been taken when initializing the
+ * parent's copy of subplanstate, so the CachedPlan, if any, should
+ * not have become invalid during the above ExecInitNode().
+ */
+ if (!ExecPlanStillValid(rcestate))
+ elog(ERROR, "unexpected failure to initialize subplan in EvalPlanQualStart()");
}
/*
@@ -2977,6 +3050,10 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
*/
epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
+ /* See the comment above. */
+ if (!ExecPlanStillValid(rcestate))
+ elog(ERROR, "unexpected failure to initialize main plantree in EvalPlanQualStart()");
+
MemoryContextSwitchTo(oldcontext);
}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 03b48e12b4..2017433c64 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1263,9 +1263,7 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
* if it should take locks on certain relations, but paraller workers
* always take locks anyway.
*/
- return CreateQueryDesc(pstmt,
- NULL,
- queryString,
+ return CreateQueryDesc(pstmt, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
}
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7651886229..38cd97b59c 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1794,6 +1794,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
* maps will be needed for subsequent execution pruning passes.
+ *
+ * Returns NULL if the plan has become invalid after taking the locks to
+ * create the PartitionPruneState in CreatePartitionPruneState().
*/
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
@@ -1809,6 +1812,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/* Create the working data structure for pruning */
prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Perform an initial partition prune pass, if required.
@@ -1860,6 +1865,9 @@ ExecInitPartitionPruning(PlanState *planstate,
* stored in each PartitionedRelPruningData can be re-used each time we
* re-evaluate which partitions match the pruning steps provided in each
* PartitionedRelPruneInfo.
+ *
+ * Returns NULL if the plan has become invalid after taking a lock to create
+ * a PartitionedRelPruningData.
*/
static PartitionPruneState *
CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
@@ -1935,6 +1943,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (unlikely(partrel == NULL || !ExecPlanStillValid(estate)))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 34f28dfece..7689d34dd0 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -136,6 +136,10 @@ static bool ExecShutdownNode_walker(PlanState *node, void *context);
* 'eflags' is a bitwise OR of flag bits described in executor.h
*
* Returns a PlanState node corresponding to the given Plan node.
+ *
+ * Callers should check upon returning that ExecPlanStillValid(estate)
+ * returns true before continuing further with its processing, because the
+ * returned PlanState might be only partially valid otherwise.
* ------------------------------------------------------------------------
*/
PlanState *
@@ -388,6 +392,9 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
break;
}
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return result;
+
ExecSetExecProcNode(result, result->ExecProcNode);
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 6dfd5a26b7..39b388e6b4 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -146,6 +146,7 @@ CreateExecutorState(void)
estate->es_top_eflags = 0;
estate->es_instrument = 0;
estate->es_finished = false;
+ estate->es_aborted = false;
estate->es_exprcontexts = NIL;
@@ -691,6 +692,8 @@ ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
*
* Open the heap relation to be scanned by a base-level scan plan node.
* This should be called during the node's ExecInit routine.
+ *
+ * NULL is returned if the relation is found to have been dropped.
* ----------------------------------------------------------------
*/
Relation
@@ -700,6 +703,8 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
/* Open the relation. */
rel = ExecGetRangeTableRelation(estate, scanrelid);
+ if (unlikely(rel == NULL || !ExecPlanStillValid(estate)))
+ return rel;
/*
* Complain if we're attempting a scan of an unscannable relation, except
@@ -717,6 +722,26 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
return rel;
}
+/* ----------------------------------------------------------------
+ * ExecOpenScanIndexRelation
+ *
+ * Open the index relation to be scanned by an index scan plan node.
+ * This should be called during the node's ExecInit routine.
+ * ----------------------------------------------------------------
+ */
+Relation
+ExecOpenScanIndexRelation(EState *estate, Oid indexid, int lockmode)
+{
+ Relation rel;
+
+ /* Open the index. */
+ rel = index_open(indexid, lockmode);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ elog(DEBUG2, "CachedPlan invalidated on locking index %u", indexid);
+
+ return rel;
+}
+
/*
* ExecInitRangeTable
* Set up executor's range-table-related data
@@ -776,8 +801,12 @@ ExecShouldLockRelation(EState *estate, Index rtindex)
* ExecGetRangeTableRelation
* Open the Relation for a range table entry, if not already done
*
- * The Relations will be closed again in ExecEndPlan().
+ * The Relations will be closed in ExecEndPlan().
+ *
+ * The returned value may be NULL if the relation is a prunable relation
+ * that has not been locked and may have been concurrently dropped.
*/
+
Relation
ExecGetRangeTableRelation(EState *estate, Index rti)
{
@@ -820,8 +849,14 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
* that of a prunable relation and we're running a cached generic
* plan. AcquireExecutorLocks() of plancache.c would have locked
* only the unprunable relations in the plan tree.
+ *
+ * Note that we use try_table_open() here, because without a lock
+ * held on the relation, it may have disappeared from under us.
*/
- rel = table_open(rte->relid, rte->rellockmode);
+ rel = try_table_open(rte->relid, rte->rellockmode);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ elog(DEBUG2, "CachedPlan invalidated on locking relation %u",
+ rte->relid);
}
estate->es_relations[rti - 1] = rel;
@@ -845,6 +880,9 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (unlikely(resultRelationDesc == NULL ||
+ !ExecPlanStillValid(estate)))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 0dfba5ca16..8c40d8c520 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3303,6 +3303,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return aggstate;
/*
* initialize source tuple type.
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 86d75b1a7e..3c82a1ceab 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -147,6 +147,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
list_length(node->appendplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return appendstate;
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -185,8 +187,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->ps.resultopsset = true;
appendstate->ps.resultopsfixed = false;
- appendplanstates = (PlanState **) palloc(nplans *
- sizeof(PlanState *));
+ appendplanstates = (PlanState **) palloc0(nplans *
+ sizeof(PlanState *));
+ appendstate->appendplans = appendplanstates;
+ appendstate->as_nplans = nplans;
/*
* call ExecInitNode on each of the valid plans to be executed and save
@@ -221,11 +225,11 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return appendstate;
}
appendstate->as_first_partial_plan = firstvalid;
- appendstate->appendplans = appendplanstates;
- appendstate->as_nplans = nplans;
/* Initialize async state */
appendstate->as_asyncplans = asyncplans;
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index ae391222bf..168c440692 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -89,6 +89,8 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return bitmapandstate;
i++;
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 19f18ab817..b13cae1cbb 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -754,11 +754,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(currentRelation == NULL || !ExecPlanStillValid(estate)))
+ return scanstate;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 4669e8d0ce..f04a53e9be 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -252,7 +252,11 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
+ indexstate->biss_RelationDesc = ExecOpenScanIndexRelation(estate,
+ node->indexid,
+ lockmode);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index de439235d2..980b68dd82 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -90,6 +90,8 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return bitmaporstate;
i++;
}
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index e559cd2346..2a7c5dccd8 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -58,6 +58,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (unlikely(scan_rel == NULL || !ExecPlanStillValid(estate)))
+ return css;
css->ss.ss_currentRelation = scan_rel;
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 1357ccf3c9..90d5878ae3 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -172,6 +172,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (unlikely(currentRelation == NULL || !ExecPlanStillValid(estate)))
+ return scanstate;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -263,6 +265,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/*
* Tell the FDW to initialize the scan.
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index cae5ea1f92..67548aa7ba 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -84,6 +84,8 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return gatherstate;
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index b36cd89e7d..cf0e074359 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -103,6 +103,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return gm_state;
/*
* Leader may access ExecProcNode result directly (if
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 807429e504..6d0fd9e7b4 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -184,6 +184,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return grpstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index a913d5b50c..e71d131d18 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -396,6 +396,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return hashstate;
/*
* initialize our result slot and type. No need to build projection
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 901c9e9be7..3c870de1c5 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -758,8 +758,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return hjstate;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return hjstate;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 010bcfafa8..af723ea755 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1040,6 +1040,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return incrsortstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 481d479760..0fba8f7d5a 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -531,6 +531,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(currentRelation == NULL || !ExecPlanStillValid(estate)))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -583,9 +585,12 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexRelation = index_open(node->indexid, lockmode);
+ indexRelation = ExecOpenScanIndexRelation(estate, node->indexid, lockmode);
indexstate->ioss_RelationDesc = indexRelation;
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return indexstate;
+
/*
* Initialize index-specific scan state
*/
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a8172d8b82..db28aeb3d6 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -907,6 +907,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(currentRelation == NULL || !ExecPlanStillValid(estate)))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -951,7 +953,11 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+ indexstate->iss_RelationDesc = ExecOpenScanIndexRelation(estate,
+ node->indexid,
+ lockmode);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index eb7b6e52be..369c904577 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -475,6 +475,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return limitstate;
/*
* initialize child expressions
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 0d3489195b..9077858413 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -322,6 +322,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return lrstate;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 883e3f3933..972962d44d 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return matstate;
/*
* Initialize result type and slot. No need to initialize projection info
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 690dee1daa..6aaab743b5 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -973,6 +973,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mstate;
/*
* Initialize return slot and type. No need to initialize projection info
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 3236444cf1..a82f0a71a0 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -95,6 +95,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
list_length(node->mergeplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -120,7 +122,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ms_prune_state = NULL;
}
- mergeplanstates = (PlanState **) palloc(nplans * sizeof(PlanState *));
+ mergeplanstates = (PlanState **) palloc0(nplans * sizeof(PlanState *));
mergestate->mergeplans = mergeplanstates;
mergestate->ms_nplans = nplans;
@@ -151,6 +153,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mergestate;
}
mergestate->ps.ps_ProjInfo = NULL;
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 926e631d88..53cb1ff207 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1490,11 +1490,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mergestate;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mergestate;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9e56f9c36c..8debfbd3ec 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4277,6 +4277,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ /*
+ * ExecInitResultRelation() may have returned without initializing
+ * rootResultRelInfo if the plan got invalidated, so check.
+ */
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mtstate;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
node->epqParam, node->resultRelations);
@@ -4309,6 +4316,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ /* See the comment above. */
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mtstate;
+
/*
* For child result relations, store the root result relation
* pointer. We do so for the convenience of places that want to
@@ -4335,6 +4346,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mtstate;
/*
* Do additional per-result-relation initialization.
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index 01f3d56a3b..34eafbb6e0 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -294,11 +294,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return nlstate;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return nlstate;
/*
* Initialize result slot, type and projection.
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index ca9a5e2ed2..f834499479 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -254,6 +254,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return state;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index 7680142c7b..5dd3285c41 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return rustate;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return rustate;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index e3cfc9b772..7d7c2aa786 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -207,6 +207,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return resstate;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6ab91001bc..3afdaeecd7 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -121,6 +121,9 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (unlikely(scanstate->ss.ss_currentRelation == NULL ||
+ !ExecPlanStillValid(estate)))
+ return scanstate;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index b052775e5b..f7fb64a4a2 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,9 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (unlikely(scanstate->ss.ss_currentRelation == NULL ||
+ !ExecPlanStillValid(estate)))
+ return scanstate;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index fe34b2134f..2231d8b82f 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return setopstate;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index af852464d0..fb76e4c01b 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return sortstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 0b2612183a..b5b538fa91 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return subquerystate;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 702ee884d2..a76836d021 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -377,6 +377,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(currentRelation == NULL || !ExecPlanStillValid(estate)))
+ return tidrangestate;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index f375951699..088babf572 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -522,6 +522,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(currentRelation == NULL || !ExecPlanStillValid(estate)))
+ return tidstate;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index b82d0e9ad5..cb46b2d5d0 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -135,6 +135,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return uniquestate;
/*
* Initialize result slot and type. Unique nodes do no projections, so
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 561d7e731d..1b96f51fe8 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2464,6 +2464,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return winstate;
/*
* initialize source tuple type (which is also the tuple type that we'll
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 902793b02b..b754827013 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -70,7 +70,8 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index);
static void _SPI_error_callback(void *arg);
@@ -1682,7 +1683,8 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- cplan);
+ cplan,
+ plansource);
/*
* Set up options for portal. Default SCROLL type is chosen the same way
@@ -2494,6 +2496,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ int i = 0;
spicallbackarg.query = plansource->query_string;
@@ -2691,8 +2694,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ res = _SPI_pquery(qdesc, fire_triggers, canSetTag ? options->tcount : 0,
+ plansource, i);
FreeQueryDesc(qdesc);
}
else
@@ -2789,6 +2793,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
my_res = res;
goto fail;
}
+
+ i++;
}
/* Done with this plan, so release refcount */
@@ -2866,7 +2872,8 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index)
{
int operation = queryDesc->operation;
int eflags;
@@ -2922,7 +2929,7 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
else
eflags = EXEC_FLAG_SKIP_TRIGGERS;
- ExecutorStart(queryDesc, eflags);
+ ExecutorStartExt(queryDesc, eflags, plansource, query_index);
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bc6bea113..ccbc27b575 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1237,6 +1237,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2027,7 +2028,8 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- cplan);
+ cplan,
+ psrc);
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 6e8f6b1b8f..d9ae60579b 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -37,6 +38,8 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -80,6 +83,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
qd->cplan = cplan; /* CachedPlan supplying the plannedstmt */
+ qd->release_cplan = false;
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -114,6 +118,13 @@ FreeQueryDesc(QueryDesc *qdesc)
UnregisterSnapshot(qdesc->snapshot);
UnregisterSnapshot(qdesc->crosscheck_snapshot);
+ /*
+ * Release CachedPlan if requested. The CachedPlan is not associated with
+ * a ResourceOwner when release_cplan is true; see ExecutorStartExt().
+ */
+ if (qdesc->release_cplan)
+ ReleaseCachedPlan(qdesc->cplan, NULL);
+
/* Only the QueryDesc itself need be freed */
pfree(qdesc);
}
@@ -126,6 +137,8 @@ FreeQueryDesc(QueryDesc *qdesc)
*
* plan: the plan tree for the query
* cplan: CachedPlan supplying the plan
+ * plansource: CachedPlanSource supplying the cplan
+ * query_index: index of the query in plansource->query_list
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -139,6 +152,8 @@ FreeQueryDesc(QueryDesc *qdesc)
static void
ProcessQuery(PlannedStmt *plan,
CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -157,7 +172,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Call ExecutorStart to prepare the plan for execution
*/
- ExecutorStart(queryDesc, 0);
+ ExecutorStartExt(queryDesc, 0, plansource, query_index);
/*
* Run the plan to completion.
@@ -518,9 +533,12 @@ PortalStart(Portal portal, ParamListInfo params,
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * ExecutorStartExt() to prepare the plan for execution. If
+ * the portal is using a cached plan, it may get invalidated
+ * during plan intialization, in which case a new one is
+ * created and saved in the QueryDesc.
*/
- ExecutorStart(queryDesc, myeflags);
+ ExecutorStartExt(queryDesc, myeflags, portal->plansource, 0);
/*
* This tells PortalCleanup to shut down the executor
@@ -1201,6 +1219,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int i = 0;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1283,6 +1302,8 @@ PortalRunMulti(Portal portal,
/* statement can set tag string */
ProcessQuery(pstmt,
portal->cplan,
+ portal->plansource,
+ i,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1293,6 +1314,8 @@ PortalRunMulti(Portal portal,
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
portal->cplan,
+ portal->plansource,
+ i,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1357,6 +1380,8 @@ PortalRunMulti(Portal portal,
*/
if (lnext(portal->stmts, stmtlist_item) != NULL)
CommandCounterIncrement();
+
+ i++;
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 6d2e385fe8..6ae05175c6 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1279,6 +1279,56 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
return plan;
}
+/*
+ * Check the validity of, and replan, only the query at the given 0-based index
+ * in the provided CachedPlanSource.
+ *
+ * Returns a CachedPlan for that specific query. The CachedPlan is not saved in
+ * the CachedPlanSource, so it is the caller's responsibility to free it by
+ * eventually calling ReleaseCachedPlan() on it.
+ */
+CachedPlan *
+GetSingleCachedPlan(CachedPlanSource *plansource, int query_index,
+ ParamListInfo boundParams, QueryEnvironment *queryEnv)
+{
+ List *query_list = plansource->query_list;
+ List *query_list_new;
+ CachedPlan *plan = plansource->gplan,
+ *newplan;
+ double generic_cost = plansource->generic_cost;
+ double total_custom_cost = plansource->total_custom_cost;
+
+ if (plan == NULL || plan->is_valid)
+ elog(ERROR, "GetSingleCachedPlan() called in the wrong context");
+
+ /*
+ * Create a new plan for the nth query after revalidating it.
+ *
+ * Temporarily reset gplan to ensure that the CachedPlan that it's pointing
+ * to is not released, because the caller might still need it.
+ */
+ query_list_new = list_make1(list_nth(plansource->query_list, query_index));
+ plansource->query_list = query_list_new;
+ plansource->gplan = NULL;
+ newplan = GetCachedPlan(plansource, boundParams, NULL, queryEnv);
+ plansource->gplan = plan;
+
+ /* Restore original query_list. */
+ plansource->query_list = query_list;
+ list_free(query_list_new);
+
+ /*
+ * Restore the original plan costs. The values after the GetCachedPlan()
+ * call represent the cost of only the nth query, whereas the original
+ * values represent the cumulative costs for all queries in
+ * plansource->query_list.
+ */
+ plansource->generic_cost = generic_cost;
+ plansource->total_custom_cost = total_custom_cost;
+
+ return newplan;
+}
+
/*
* ReleaseCachedPlan: release active use of a cached plan.
*
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 4a24613537..bf70fd4ce7 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,7 +284,8 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan)
+ CachedPlan *cplan,
+ CachedPlanSource *plansource)
{
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_NEW);
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
portal->stmts = stmts;
portal->cplan = cplan;
+ portal->plansource = plansource;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index bf326eeb70..652e1afbf7 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -102,6 +102,7 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ParamListInfo params, QueryEnvironment *queryEnv);
extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int plan_index,
IntoClause *into, ExplainState *es,
const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 8a5a9fe642..db21561c8c 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -258,6 +258,7 @@ extern void ExecASTruncateTriggers(EState *estate,
extern void AfterTriggerBeginXact(void);
extern void AfterTriggerBeginQuery(void);
extern void AfterTriggerEndQuery(EState *estate);
+extern void AfterTriggerAbortQuery(void);
extern void AfterTriggerFireDeferred(void);
extern void AfterTriggerEndXact(bool isCommit);
extern void AfterTriggerBeginSubXact(void);
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 0e7245435d..c6ad8fece7 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -36,6 +36,7 @@ typedef struct QueryDesc
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
CachedPlan *cplan; /* CachedPlan that supplies the plannedstmt */
+ bool release_cplan; /* Should FreeQueryDesc() release cplan? */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 69c3ebff00..ce2447a8cf 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -198,6 +199,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
* prototypes from functions in execMain.c
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern void ExecutorStartExt(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource, int query_index);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count, bool execute_once);
@@ -261,6 +264,20 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the CachedPlan in es_cachedplan still valid?
+ *
+ * Called at various points during ExecutorStart() because invalidation
+ * messages that affect the plan might be received after locks have been
+ * taken on runtime-prunable relations. The caller should take appropriate
+ * action if the plan has become invalid.
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanStillValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
@@ -589,6 +606,7 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
+extern Relation ExecOpenScanIndexRelation(EState *estate, Oid indexid, int lockmode);
extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos);
extern void ExecCloseRangeTableRelations(EState *estate);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index ee089505a0..2a8e5bd784 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -680,6 +680,7 @@ typedef struct EState
int es_top_eflags; /* eflags passed to ExecutorStart */
int es_instrument; /* OR of InstrumentOption flags */
bool es_finished; /* true when ExecutorFinish is done */
+ bool es_aborted; /* true when execution was aborted */
List *es_exprcontexts; /* List of ExprContexts within EState */
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0b5ee007ca..f3ecbd279b 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -224,6 +224,11 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+extern CachedPlan *GetSingleCachedPlan(CachedPlanSource *plansource,
+ int query_index,
+ ParamListInfo boundParams,
+ QueryEnvironment *queryEnv);
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
@@ -245,4 +250,17 @@ CachedPlanRequiresLocking(CachedPlan *cplan)
return cplan->is_generic;
}
+/*
+ * CachedPlanStillValid
+ * Returns whether a cached generic plan is still valid.
+ *
+ * Invoked by the executor to check if the plan has not been invalidated after
+ * taking locks during the initialization of the plan.
+ */
+static inline bool
+CachedPlanStillValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
#endif /* PLANCACHE_H */
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 29f49829f2..58c3828d2c 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ CachedPlanSource *plansource; /* CachedPlanSource, for cplan */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -241,7 +242,8 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan);
+ CachedPlan *cplan,
+ CachedPlanSource *plansource);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..3eeb097fde 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-inval
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 155c8a8d55..0b5f317cd1 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2024, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,41 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static void
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ CachedPlanStillValid(queryDesc->cplan) ? "valid" : "not valid");
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +123,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-inval.out b/src/test/modules/delay_execution/expected/cached-plan-inval.out
new file mode 100644
index 0000000000..e8efb6d9d9
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-inval.out
@@ -0,0 +1,175 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+------------------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = $1)
+(7 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on foo12_1 foo_1
+ Filter: (a = $1)
+(5 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(11 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(6 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ Update on foo3 foo
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(27 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ Update on foo3 foo
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(17 rows)
+
diff --git a/src/test/modules/delay_execution/meson.build b/src/test/modules/delay_execution/meson.build
index 41f3ac0b89..5a70b183d0 100644
--- a/src/test/modules/delay_execution/meson.build
+++ b/src/test/modules/delay_execution/meson.build
@@ -24,6 +24,7 @@ tests += {
'specs': [
'partition-addition',
'partition-removal-1',
+ 'cached-plan-inval',
],
},
}
diff --git a/src/test/modules/delay_execution/specs/cached-plan-inval.spec b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
new file mode 100644
index 0000000000..5b1f72b4a8
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
@@ -0,0 +1,65 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo12 PARTITION OF foo FOR VALUES IN (1, 2) PARTITION BY LIST (a);
+ CREATE TABLE foo12_1 PARTITION OF foo12 FOR VALUES IN (1);
+ CREATE TABLE foo12_2 PARTITION OF foo12 FOR VALUES IN (2);
+ CREATE INDEX foo12_1_a ON foo12_1 (a);
+ CREATE TABLE foo3 PARTITION OF foo FOR VALUES IN (3);
+ CREATE VIEW foov AS SELECT * FROM foo;
+ CREATE FUNCTION one () RETURNS int AS $$ BEGIN RETURN 1; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE FUNCTION two () RETURNS int AS $$ BEGIN RETURN 2; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE RULE update_foo AS ON UPDATE TO foo DO ALSO SELECT 1;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP RULE update_foo ON foo;
+ DROP TABLE foo;
+ DROP FUNCTION one(), two();
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# Another case with Append with run-time pruning
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Case with a rule adding another query
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo12_1_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
--
2.43.0
v51-0001-Defer-locking-of-runtime-prunable-relations-to-e.patchapplication/octet-stream; name=v51-0001-Defer-locking-of-runtime-prunable-relations-to-e.patchDownload
From d766e737ade779de3da4addbf71a05bb2a74ab75 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 7 Aug 2024 18:25:51 +0900
Subject: [PATCH v51 1/3] Defer locking of runtime-prunable relations to
executor
When preparing a cached plan for execution, plancache.c locks the
relations contained in the plan's range table to ensure it is safe for
execution. However, this simplistic approach, implemented in
AcquireExecutorLocks(), results in unnecessarily locking relations that
might be pruned during "initial" runtime pruning.
To optimize this, the locking is now deferred for relations that are
subject to "initial" runtime pruning. The planner now provides a set
of "unprunable" relations, available through the new
PlannedStmt.unprunableRelids field. AcquireExecutorLocks() will now
only lock those relations.
PlannedStmt.unprunableRelids is populated by subtracting the set of
initially prunable relids from the set of all RT indexes. The prunable
relids set is constructed by examining all PartitionPruneInfos during
set_plan_refs() and storing the RT indexes of partitions subject to
"initial" pruning steps. While at it, some duplicated code in
set_append_references() and set_mergeappend_references() that
constructs the prunable relids set has been refactored into a common
function.
To enable the executor to determine whether the plan tree it's
executing is a cached one, the CachedPlan is now made available via
the QueryDesc. The executor can call CachedPlanRequiresLocking(),
which returns true if the CachedPlan is a reusable generic plan that
might contain relations needing to be locked. If so, the executor
will lock any relation that is not in PlannedStmt.unprunableRelids.
Finally, an Assert has been added in ExecCheckPermissions() to ensure
that all relations whose permissions are checked have been properly
locked. This helps catch any accidental omission of relations from the
unprunableRelids set that should have their permissions checked.
This deferment introduces a window in which prunable relations may be
altered by concurrent DDL, potentially causing the plan to become
invalid. As a result, the executor might attempt to run an invalid plan,
leading to errors such as being unable to locate a partition-only index
during ExecInitIndexScan(). Future commits will introduce changes to
ready the executor to check plan validity during ExecutorStart() and
retry with a newly created plan if the original one becomes invalid
after taking deferred locks.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 ++--
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 3 +-
src/backend/executor/execMain.c | 18 ++++++++
src/backend/executor/execParallel.c | 9 +++-
src/backend/executor/execUtils.c | 30 +++++++++++++-
src/backend/executor/functions.c | 1 +
src/backend/executor/spi.c | 1 +
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 62 +++++++++++++++-------------
src/backend/partitioning/partprune.c | 24 ++++++++++-
src/backend/storage/lmgr/lmgr.c | 1 +
src/backend/tcop/pquery.c | 10 ++++-
src/backend/utils/cache/lsyscache.c | 1 -
src/backend/utils/cache/plancache.c | 25 +++++++----
src/include/commands/explain.h | 5 ++-
src/include/executor/execdesc.h | 2 +
src/include/nodes/execnodes.h | 2 +
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 11 +++++
src/include/utils/plancache.h | 10 +++++
24 files changed, 186 insertions(+), 51 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 91de442f43..db976f928a 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -552,7 +552,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 0b629b1f79..57a3375cad 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -324,7 +324,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 11df4a04d4..a83ea07db1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -507,7 +507,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -615,7 +615,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -671,7 +672,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, cplan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 1643c8c69a..3f7f4306fe 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -798,6 +798,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 91f0fd6ea3..a7a79583ec 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -438,7 +438,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 07257d4db9..311b9ebd5b 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -655,7 +655,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
+ ExplainOnePlan(pstmt, cplan, into, es, query_string, paramLI,
+ queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 29e186fa73..271f9d93fc 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -52,6 +52,7 @@
#include "miscadmin.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -597,6 +598,21 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
(rte->rtekind == RTE_SUBQUERY &&
rte->relkind == RELKIND_VIEW));
+ /*
+ * Ensure that we have at least an AccessShareLock on relations
+ * whose permissions need to be checked.
+ *
+ * Skip this check in a parallel worker because locks won't be
+ * taken until ExecInitNode() performs plan initialization.
+ *
+ * XXX: ExecCheckPermissions() in a parallel worker may be
+ * redundant with the checks done in the leader process, so this
+ * should be reviewed to ensure it’s necessary.
+ */
+ Assert(IsParallelWorker() ||
+ CheckRelationOidLockedByMe(rte->relid, AccessShareLock,
+ true));
+
(void) getRTEPermissionInfo(rteperminfos, rte);
/* Many-to-one mapping not allowed */
Assert(!bms_is_member(rte->perminfoindex, indexset));
@@ -829,6 +845,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ CachedPlan *cachedplan = queryDesc->cplan;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -848,6 +865,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = cachedplan;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index bfb3419efb..03b48e12b4 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1256,8 +1256,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. We pass NULL for cachedplan, because
+ * we don't have a pointer to the CachedPlan in the leader's process. It's
+ * fine because the only reason the executor needs to see it is to decide
+ * if it should take locks on certain relations, but paraller workers
+ * always take locks anyway.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 5737f9f4eb..6dfd5a26b7 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -752,6 +752,26 @@ ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
estate->es_rowmarks = NULL;
}
+/*
+ * ExecShouldLockRelation
+ * Determine if the relation should be locked.
+ *
+ * The relation does not need to be locked if we are not running a cached
+ * plan or if it has already been locked as an unprunable relation.
+ *
+ * Lock the relation if it might be one of the prunable relations mentioned
+ * in the cached plan.
+ */
+static bool
+ExecShouldLockRelation(EState *estate, Index rtindex)
+{
+ if (estate->es_cachedplan == NULL ||
+ bms_is_member(rtindex, estate->es_plannedstmt->unprunableRelids))
+ return false;
+
+ return CachedPlanRequiresLocking(estate->es_cachedplan);
+}
+
/*
* ExecGetRangeTableRelation
* Open the Relation for a range table entry, if not already done
@@ -773,7 +793,7 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (!IsParallelWorker() && !ExecShouldLockRelation(estate, rti))
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -789,9 +809,17 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
else
{
/*
+ * Lock relation either if we are a parallel worker or if
+ * ExecShouldLockRelation() says we should.
+ *
* If we are a parallel worker, we need to obtain our own local
* lock on the relation. This ensures sane behavior in case the
* parent process exits before we do.
+ *
+ * ExecShouldLockRelation() would return true if the RT index is
+ * that of a prunable relation and we're running a cached generic
+ * plan. AcquireExecutorLocks() of plancache.c would have locked
+ * only the unprunable relations in the plan tree.
*/
rel = table_open(rte->relid, rte->rellockmode);
}
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 692854e2b3..6f6f45e0ad 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -840,6 +840,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index d6516b1bca..902793b02b 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2684,6 +2684,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b5827d3980..cb9b6f0147 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -546,6 +546,8 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->rtable = glob->finalrtable;
+ result->unprunableRelids = bms_difference(bms_add_range(NULL, 1, list_length(result->rtable)),
+ glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 7aed84584c..b6be0e5730 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -154,6 +154,9 @@ static Plan *set_append_references(PlannerInfo *root,
static Plan *set_mergeappend_references(PlannerInfo *root,
MergeAppend *mplan,
int rtoffset);
+static void set_part_prune_references(PartitionPruneInfo *pinfo,
+ PlannerGlobal *glob,
+ int rtoffset);
static void set_hash_references(PlannerInfo *root, Plan *plan, int rtoffset);
static Relids offset_relid_set(Relids relids, int rtoffset);
static Node *fix_scan_expr(PlannerInfo *root, Node *node,
@@ -1783,20 +1786,8 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ set_part_prune_references(aplan->part_prune_info, root->glob,
+ rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1859,20 +1850,8 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ set_part_prune_references(mplan->part_prune_info, root->glob,
+ rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
@@ -1881,6 +1860,33 @@ set_mergeappend_references(PlannerInfo *root,
return (Plan *) mplan;
}
+/*
+ * Updates RT indexes in PartitionedRelPruneInfos contained in pinfo and adds
+ * the RT indexes of "prunable" relations into glob->prunableRelids.
+ */
+static void
+set_part_prune_references(PartitionPruneInfo *pinfo, PlannerGlobal *glob,
+ int rtoffset)
+{
+ ListCell *l;
+
+ foreach(l, pinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *prelinfo = lfirst(l2);
+
+ prelinfo->rtindex += rtoffset;
+ if (prelinfo->initial_pruning_steps != NIL)
+ glob->prunableRelids = bms_add_members(glob->prunableRelids,
+ prelinfo->present_part_rtis);
+ }
+ }
+}
+
/*
* set_hash_references
* Do set_plan_references processing on a Hash node
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9a1a7faac7..8e27e35df2 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -634,6 +634,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
PartitionedRelPruneInfo *pinfo = lfirst(lc);
RelOptInfo *subpart = find_base_rel(root, pinfo->rtindex);
Bitmapset *present_parts;
+ Bitmapset *present_part_rtis;
int nparts = subpart->nparts;
int *subplan_map;
int *subpart_map;
@@ -650,7 +651,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
- present_parts = NULL;
+ present_parts = present_part_rtis = NULL;
i = -1;
while ((i = bms_next_member(subpart->live_parts, i)) >= 0)
@@ -664,15 +665,35 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+
+ /*
+ * Track the RT indexes of partitions to ensure they are included
+ * in the prunableRelids set of relations that are locked during
+ * execution. This ensures that if the plan is cached, these
+ * partitions are locked when the plan is reused.
+ *
+ * Partitions without a subplan and sub-partitioned partitions
+ * where none of the sub-partitions have a subplan due to
+ * constraint exclusion are not included in this set. Instead,
+ * they are added to the unprunableRelids set, and the relations
+ * in this set are locked by AcquireExecutorLocks() before
+ * executing a cached plan.
+ */
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
+ present_part_rtis = bms_add_member(present_part_rtis,
+ partrel->relid);
/* Record finding this subplan */
subplansfound = bms_add_member(subplansfound, subplanidx);
}
else if (subpartidx >= 0)
+ {
present_parts = bms_add_member(present_parts, i);
+ present_part_rtis = bms_add_member(present_part_rtis,
+ partrel->relid);
+ }
}
/*
@@ -684,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Record the maps and other information. */
pinfo->present_parts = present_parts;
+ pinfo->present_part_rtis = present_part_rtis;
pinfo->nparts = nparts;
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 094522acb4..a1c89f5d72 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -26,6 +26,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index a1f8d03db1..6e8f6b1b8f 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -36,6 +36,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +66,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +79,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan supplying the plannedstmt */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * cplan: CachedPlan supplying the plan
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, cplan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -493,6 +498,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1276,6 +1282,7 @@ PortalRunMulti(Portal portal,
{
/* statement can set tag string */
ProcessQuery(pstmt,
+ portal->cplan,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1285,6 +1292,7 @@ PortalRunMulti(Portal portal,
{
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
+ portal->cplan,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 48a280d089..f647821382 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2113,7 +2113,6 @@ get_rel_relam(Oid relid)
return result;
}
-
/* ---------- TRANSFORM CACHE ---------- */
Oid
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 5af1a168ec..6d2e385fe8 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -104,7 +104,8 @@ static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ bool generic);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
@@ -904,7 +905,8 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ bool generic)
{
CachedPlan *plan;
List *plist;
@@ -1026,6 +1028,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->refcount = 0;
plan->context = plan_context;
plan->is_oneshot = plansource->is_oneshot;
+ plan->is_generic = generic;
plan->is_saved = false;
plan->is_valid = true;
@@ -1196,7 +1199,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv, true);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1241,7 +1244,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv, true);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1387,8 +1390,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if there are any lockable relations. This is probably
+ * unnecessary given the previous check, but let's be safe.
*/
foreach(lc, plan->stmt_list)
{
@@ -1776,7 +1779,7 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ int rtindex;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1794,9 +1797,13 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ rtindex = -1;
+ while ((rtindex = bms_next_member(plannedstmt->unprunableRelids,
+ rtindex)) >= 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry,
+ plannedstmt->rtable,
+ rtindex - 1);
if (!(rte->rtekind == RTE_RELATION ||
(rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9b8b351d9a..bf326eeb70 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -101,8 +101,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
- ExplainState *es, const char *queryString,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ IntoClause *into, ExplainState *es,
+ const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 0a7274e26c..0e7245435d 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ CachedPlan *cplan; /* CachedPlan that supplies the plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +58,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index af7d8fd1e7..ee089505a0 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -42,6 +42,7 @@
#include "storage/condition_variable.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
+#include "utils/plancache.h"
#include "utils/reltrigger.h"
#include "utils/sharedtuplestore.h"
#include "utils/snapshot.h"
@@ -633,6 +634,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ CachedPlan *es_cachedplan; /* CachedPlan supplying the plannedstmt */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 540d021592..2466157b25 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -116,6 +116,12 @@ typedef struct PlannerGlobal
/* "flat" rangetable for executor */
List *finalrtable;
+ /*
+ * RT indexes of relations subject to removal from the plan due to runtime
+ * pruning at plan initialization time
+ */
+ Bitmapset *prunableRelids;
+
/* "flat" list of RTEPermissionInfos */
List *finalrteperminfos;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 62cd6a6666..ae608812f1 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -71,6 +71,10 @@ typedef struct PlannedStmt
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *unprunableRelids; /* RT indexes of relations that are not
+ * subject to runtime pruning; for
+ * AcquireExecutorLocks() */
+
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
@@ -1459,6 +1463,13 @@ typedef struct PartitionedRelPruneInfo
/* Indexes of all partitions which subplans or subparts are present for */
Bitmapset *present_parts;
+ /*
+ * RT indexes of all partitions which subplans or subparts are present
+ * for; only used during planning to help in the construction of
+ * PlannerGlobal.prunableRelids.
+ */
+ Bitmapset *present_part_rtis;
+
/* Length of the following arrays: */
int nparts;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a90dfdf906..0b5ee007ca 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -149,6 +149,7 @@ typedef struct CachedPlan
int magic; /* should equal CACHEDPLAN_MAGIC */
List *stmt_list; /* list of PlannedStmts */
bool is_oneshot; /* is it a "oneshot" plan? */
+ bool is_generic; /* is it a reusable generic plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
Oid planRoleId; /* Role ID the plan was created for */
@@ -235,4 +236,13 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
+/*
+ * CachedPlanRequiresLocking: should the executor acquire locks?
+ */
+static inline bool
+CachedPlanRequiresLocking(CachedPlan *cplan)
+{
+ return cplan->is_generic;
+}
+
#endif /* PLANCACHE_H */
--
2.43.0
v51-0002-Assorted-tightening-in-various-ExecEnd-routines.patchapplication/octet-stream; name=v51-0002-Assorted-tightening-in-various-ExecEnd-routines.patchDownload
From 509bdce6a875278385f47ba9184774bc9e57fb8b Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 28 Sep 2023 16:56:29 +0900
Subject: [PATCH v51 2/3] Assorted tightening in various ExecEnd()* routines
This includes adding NULLness checks on pointers before cleaning them
up. Many ExecEnd*() routines already perform this check, but a few
are missing them. These NULLness checks might seem redundant as
things stand since the ExecEnd*() routines operate under the
assumption that their matching ExecInit* routine would have fully
executed, ensuring pointers are set. However, that assumption seems a
bit shaky in the face of future changes.
This also adds a guard at the begigging of EvalPlanQualEnd() to return
early if the EPQState does not appear to have been initialized. That
case can happen if the corresponding ExecInit*() routine returned
early without calling EvalPlanQualInit().
While at it, this commit ensures that pointers are consistently set
to NULL after cleanup in all ExecEnd*() routines.
Finally, for enhanced consistency, the format of NULLness checks has
been standardized to "if (pointer != NULL)", replacing the previous
"if (pointer)" style.
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 4 ++
src/backend/executor/nodeAgg.c | 27 +++++++++----
src/backend/executor/nodeAppend.c | 3 ++
src/backend/executor/nodeBitmapAnd.c | 4 +-
src/backend/executor/nodeBitmapHeapscan.c | 46 ++++++++++++++--------
src/backend/executor/nodeBitmapIndexscan.c | 23 ++++++-----
src/backend/executor/nodeBitmapOr.c | 4 +-
src/backend/executor/nodeCtescan.c | 3 +-
src/backend/executor/nodeForeignscan.c | 17 ++++----
src/backend/executor/nodeGather.c | 1 +
src/backend/executor/nodeGatherMerge.c | 1 +
src/backend/executor/nodeGroup.c | 6 +--
src/backend/executor/nodeHash.c | 6 +--
src/backend/executor/nodeHashjoin.c | 4 +-
src/backend/executor/nodeIncrementalSort.c | 13 +++++-
src/backend/executor/nodeIndexonlyscan.c | 25 ++++++------
src/backend/executor/nodeIndexscan.c | 23 ++++++-----
src/backend/executor/nodeLimit.c | 1 +
src/backend/executor/nodeLockRows.c | 1 +
src/backend/executor/nodeMaterial.c | 5 ++-
src/backend/executor/nodeMemoize.c | 7 +++-
src/backend/executor/nodeMergeAppend.c | 3 ++
src/backend/executor/nodeMergejoin.c | 2 +
src/backend/executor/nodeModifyTable.c | 11 +++++-
src/backend/executor/nodeNestloop.c | 2 +
src/backend/executor/nodeProjectSet.c | 1 +
src/backend/executor/nodeRecursiveunion.c | 24 +++++++++--
src/backend/executor/nodeResult.c | 1 +
src/backend/executor/nodeSamplescan.c | 7 +++-
src/backend/executor/nodeSeqscan.c | 16 +++-----
src/backend/executor/nodeSetOp.c | 6 ++-
src/backend/executor/nodeSort.c | 5 ++-
src/backend/executor/nodeSubqueryscan.c | 1 +
src/backend/executor/nodeTableFuncscan.c | 4 +-
src/backend/executor/nodeTidrangescan.c | 12 ++++--
src/backend/executor/nodeTidscan.c | 8 +++-
src/backend/executor/nodeUnique.c | 1 +
src/backend/executor/nodeWindowAgg.c | 41 +++++++++++++------
38 files changed, 246 insertions(+), 123 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 271f9d93fc..0f6dbd1e2b 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2999,6 +2999,10 @@ EvalPlanQualEnd(EPQState *epqstate)
MemoryContext oldcontext;
ListCell *l;
+ /* Nothing to do if no EvalPlanQualInit() was done to begin with. */
+ if (epqstate->parentestate == NULL)
+ return;
+
rtsize = epqstate->parentestate->es_range_table_size;
/*
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 53ead77ece..0dfba5ca16 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -4303,7 +4303,6 @@ GetAggInitVal(Datum textInitVal, Oid transtype)
void
ExecEndAgg(AggState *node)
{
- PlanState *outerPlan;
int transno;
int numGroupingSets = Max(node->maxsets, 1);
int setno;
@@ -4313,7 +4312,7 @@ ExecEndAgg(AggState *node)
* worker back into shared memory so that it can be picked up by the main
* process to report in EXPLAIN ANALYZE.
*/
- if (node->shared_info && IsParallelWorker())
+ if (node->shared_info != NULL && IsParallelWorker())
{
AggregateInstrumentation *si;
@@ -4326,10 +4325,16 @@ ExecEndAgg(AggState *node)
/* Make sure we have closed any open tuplesorts */
- if (node->sort_in)
+ if (node->sort_in != NULL)
+ {
tuplesort_end(node->sort_in);
- if (node->sort_out)
+ node->sort_in = NULL;
+ }
+ if (node->sort_out != NULL)
+ {
tuplesort_end(node->sort_out);
+ node->sort_out = NULL;
+ }
hashagg_reset_spill_state(node);
@@ -4345,19 +4350,25 @@ ExecEndAgg(AggState *node)
for (setno = 0; setno < numGroupingSets; setno++)
{
- if (pertrans->sortstates[setno])
+ if (pertrans->sortstates[setno] != NULL)
tuplesort_end(pertrans->sortstates[setno]);
}
}
/* And ensure any agg shutdown callbacks have been called */
for (setno = 0; setno < numGroupingSets; setno++)
+ {
ReScanExprContext(node->aggcontexts[setno]);
- if (node->hashcontext)
+ node->aggcontexts[setno] = NULL;
+ }
+ if (node->hashcontext != NULL)
+ {
ReScanExprContext(node->hashcontext);
+ node->hashcontext = NULL;
+ }
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index ca0f54d676..86d75b1a7e 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -399,7 +399,10 @@ ExecEndAppend(AppendState *node)
* shut down each of the subscans
*/
for (i = 0; i < nplans; i++)
+ {
ExecEndNode(appendplans[i]);
+ appendplans[i] = NULL;
+ }
}
void
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 9c9c666872..ae391222bf 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -192,8 +192,8 @@ ExecEndBitmapAnd(BitmapAndState *node)
*/
for (i = 0; i < nplans; i++)
{
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
+ ExecEndNode(bitmapplans[i]);
+ bitmapplans[i] = NULL;
}
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 3c63bdd93d..19f18ab817 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -625,8 +625,6 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node)
void
ExecEndBitmapHeapScan(BitmapHeapScanState *node)
{
- TableScanDesc scanDesc;
-
/*
* When ending a parallel worker, copy the statistics gathered by the
* worker back into shared memory so that it can be picked up by the main
@@ -650,38 +648,54 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
si->lossy_pages += node->stats.lossy_pages;
}
- /*
- * extract information from the node
- */
- scanDesc = node->ss.ss_currentScanDesc;
-
/*
* close down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
/*
* release bitmaps and buffers if any
*/
- if (node->tbmiterator)
+ if (node->tbmiterator != NULL)
+ {
tbm_end_iterate(node->tbmiterator);
- if (node->prefetch_iterator)
+ node->tbmiterator = NULL;
+ }
+ if (node->prefetch_iterator != NULL)
+ {
tbm_end_iterate(node->prefetch_iterator);
- if (node->tbm)
+ node->prefetch_iterator = NULL;
+ }
+ if (node->tbm != NULL)
+ {
tbm_free(node->tbm);
- if (node->shared_tbmiterator)
+ node->tbm = NULL;
+ }
+ if (node->shared_tbmiterator != NULL)
+ {
tbm_end_shared_iterate(node->shared_tbmiterator);
- if (node->shared_prefetch_iterator)
+ node->shared_tbmiterator = NULL;
+ }
+ if (node->shared_prefetch_iterator != NULL)
+ {
tbm_end_shared_iterate(node->shared_prefetch_iterator);
+ node->shared_prefetch_iterator = NULL;
+ }
if (node->pvmbuffer != InvalidBuffer)
+ {
ReleaseBuffer(node->pvmbuffer);
+ node->pvmbuffer = InvalidBuffer;
+ }
/*
- * close heap scan
+ * close heap scan (no-op if we didn't start it)
*/
- if (scanDesc)
- table_endscan(scanDesc);
-
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
+ table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 6df8e17ec8..4669e8d0ce 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -174,22 +174,21 @@ ExecReScanBitmapIndexScan(BitmapIndexScanState *node)
void
ExecEndBitmapIndexScan(BitmapIndexScanState *node)
{
- Relation indexRelationDesc;
- IndexScanDesc indexScanDesc;
-
- /*
- * extract information from the node
- */
- indexRelationDesc = node->biss_RelationDesc;
- indexScanDesc = node->biss_ScanDesc;
+ /* close the scan (no-op if we didn't start it) */
+ if (node->biss_ScanDesc != NULL)
+ {
+ index_endscan(node->biss_ScanDesc);
+ node->biss_ScanDesc = NULL;
+ }
/*
* close the index relation (no-op if we didn't open it)
*/
- if (indexScanDesc)
- index_endscan(indexScanDesc);
- if (indexRelationDesc)
- index_close(indexRelationDesc, NoLock);
+ if (node->biss_RelationDesc != NULL)
+ {
+ index_close(node->biss_RelationDesc, NoLock);
+ node->biss_RelationDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 7029536c64..de439235d2 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -210,8 +210,8 @@ ExecEndBitmapOr(BitmapOrState *node)
*/
for (i = 0; i < nplans; i++)
{
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
+ ExecEndNode(bitmapplans[i]);
+ bitmapplans[i] = NULL;
}
}
diff --git a/src/backend/executor/nodeCtescan.c b/src/backend/executor/nodeCtescan.c
index 8081eed887..7cea943988 100644
--- a/src/backend/executor/nodeCtescan.c
+++ b/src/backend/executor/nodeCtescan.c
@@ -290,10 +290,11 @@ ExecEndCteScan(CteScanState *node)
/*
* If I am the leader, free the tuplestore.
*/
- if (node->leader == node)
+ if (node->leader != NULL && node->leader == node)
{
tuplestore_end(node->cte_table);
node->cte_table = NULL;
+ node->leader = NULL;
}
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index fe4ae55c0f..1357ccf3c9 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -300,17 +300,20 @@ ExecEndForeignScan(ForeignScanState *node)
EState *estate = node->ss.ps.state;
/* Let the FDW shut down */
- if (plan->operation != CMD_SELECT)
+ if (node->fdwroutine != NULL)
{
- if (estate->es_epq_active == NULL)
- node->fdwroutine->EndDirectModify(node);
+ if (plan->operation != CMD_SELECT)
+ {
+ if (estate->es_epq_active == NULL)
+ node->fdwroutine->EndDirectModify(node);
+ }
+ else
+ node->fdwroutine->EndForeignScan(node);
}
- else
- node->fdwroutine->EndForeignScan(node);
/* Shut down any outer plan. */
- if (outerPlanState(node))
- ExecEndNode(outerPlanState(node));
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 5d4ffe989c..cae5ea1f92 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -244,6 +244,7 @@ void
ExecEndGather(GatherState *node)
{
ExecEndNode(outerPlanState(node)); /* let children clean up first */
+ outerPlanState(node) = NULL;
ExecShutdownGather(node);
}
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 45f6017c29..b36cd89e7d 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -284,6 +284,7 @@ void
ExecEndGatherMerge(GatherMergeState *node)
{
ExecEndNode(outerPlanState(node)); /* let children clean up first */
+ outerPlanState(node) = NULL;
ExecShutdownGatherMerge(node);
}
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index da32bec181..807429e504 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -225,10 +225,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
void
ExecEndGroup(GroupState *node)
{
- PlanState *outerPlan;
-
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 570a90ebe1..a913d5b50c 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -427,13 +427,11 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
void
ExecEndHash(HashState *node)
{
- PlanState *outerPlan;
-
/*
* shut down the subplan
*/
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 2f7170604d..901c9e9be7 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -950,7 +950,7 @@ ExecEndHashJoin(HashJoinState *node)
/*
* Free hash table
*/
- if (node->hj_HashTable)
+ if (node->hj_HashTable != NULL)
{
ExecHashTableDestroy(node->hj_HashTable);
node->hj_HashTable = NULL;
@@ -960,7 +960,9 @@ ExecEndHashJoin(HashJoinState *node)
* clean up subtrees
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
}
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 2ce5ed5ec8..010bcfafa8 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1078,8 +1078,16 @@ ExecEndIncrementalSort(IncrementalSortState *node)
{
SO_printf("ExecEndIncrementalSort: shutting down sort node\n");
- ExecDropSingleTupleTableSlot(node->group_pivot);
- ExecDropSingleTupleTableSlot(node->transfer_tuple);
+ if (node->group_pivot != NULL)
+ {
+ ExecDropSingleTupleTableSlot(node->group_pivot);
+ node->group_pivot = NULL;
+ }
+ if (node->transfer_tuple != NULL)
+ {
+ ExecDropSingleTupleTableSlot(node->transfer_tuple);
+ node->transfer_tuple = NULL;
+ }
/*
* Release tuplesort resources.
@@ -1099,6 +1107,7 @@ ExecEndIncrementalSort(IncrementalSortState *node)
* Shut down the subplan.
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
SO_printf("ExecEndIncrementalSort: sort node shutdown\n");
}
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 612c673895..481d479760 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -397,15 +397,6 @@ ExecReScanIndexOnlyScan(IndexOnlyScanState *node)
void
ExecEndIndexOnlyScan(IndexOnlyScanState *node)
{
- Relation indexRelationDesc;
- IndexScanDesc indexScanDesc;
-
- /*
- * extract information from the node
- */
- indexRelationDesc = node->ioss_RelationDesc;
- indexScanDesc = node->ioss_ScanDesc;
-
/* Release VM buffer pin, if any. */
if (node->ioss_VMBuffer != InvalidBuffer)
{
@@ -413,13 +404,21 @@ ExecEndIndexOnlyScan(IndexOnlyScanState *node)
node->ioss_VMBuffer = InvalidBuffer;
}
+ /* close the scan (no-op if we didn't start it) */
+ if (node->ioss_ScanDesc != NULL)
+ {
+ index_endscan(node->ioss_ScanDesc);
+ node->ioss_ScanDesc = NULL;
+ }
+
/*
* close the index relation (no-op if we didn't open it)
*/
- if (indexScanDesc)
- index_endscan(indexScanDesc);
- if (indexRelationDesc)
- index_close(indexRelationDesc, NoLock);
+ if (node->ioss_RelationDesc != NULL)
+ {
+ index_close(node->ioss_RelationDesc, NoLock);
+ node->ioss_RelationDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 8000feff4c..a8172d8b82 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -784,22 +784,21 @@ ExecIndexAdvanceArrayKeys(IndexArrayKeyInfo *arrayKeys, int numArrayKeys)
void
ExecEndIndexScan(IndexScanState *node)
{
- Relation indexRelationDesc;
- IndexScanDesc indexScanDesc;
-
- /*
- * extract information from the node
- */
- indexRelationDesc = node->iss_RelationDesc;
- indexScanDesc = node->iss_ScanDesc;
+ /* close the scan (no-op if we didn't start it) */
+ if (node->iss_ScanDesc != NULL)
+ {
+ index_endscan(node->iss_ScanDesc);
+ node->iss_ScanDesc = NULL;
+ }
/*
* close the index relation (no-op if we didn't open it)
*/
- if (indexScanDesc)
- index_endscan(indexScanDesc);
- if (indexRelationDesc)
- index_close(indexRelationDesc, NoLock);
+ if (node->iss_RelationDesc != NULL)
+ {
+ index_close(node->iss_RelationDesc, NoLock);
+ node->iss_RelationDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index e6f1fb1562..eb7b6e52be 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -534,6 +534,7 @@ void
ExecEndLimit(LimitState *node)
{
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 41754ddfea..0d3489195b 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -387,6 +387,7 @@ ExecEndLockRows(LockRowsState *node)
/* We may have shut down EPQ already, but no harm in another call */
EvalPlanQualEnd(&node->lr_epqstate);
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 22e1787fbd..883e3f3933 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -243,13 +243,16 @@ ExecEndMaterial(MaterialState *node)
* Release tuplestore resources
*/
if (node->tuplestorestate != NULL)
+ {
tuplestore_end(node->tuplestorestate);
- node->tuplestorestate = NULL;
+ node->tuplestorestate = NULL;
+ }
/*
* shut down the subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index df8e3fff08..690dee1daa 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -1128,12 +1128,17 @@ ExecEndMemoize(MemoizeState *node)
}
/* Remove the cache context */
- MemoryContextDelete(node->tableContext);
+ if (node->tableContext != NULL)
+ {
+ MemoryContextDelete(node->tableContext);
+ node->tableContext = NULL;
+ }
/*
* shut down the subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index e1b9b984a7..3236444cf1 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -333,7 +333,10 @@ ExecEndMergeAppend(MergeAppendState *node)
* shut down each of the subscans
*/
for (i = 0; i < nplans; i++)
+ {
ExecEndNode(mergeplans[i]);
+ mergeplans[i] = NULL;
+ }
}
void
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 29c54fcd75..926e631d88 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1647,7 +1647,9 @@ ExecEndMergeJoin(MergeJoinState *node)
* shut down the subplans
*/
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
MJ1_printf("ExecEndMergeJoin: %s\n",
"node processing ended");
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 8bf4c80d4a..9e56f9c36c 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4724,7 +4724,9 @@ ExecEndModifyTable(ModifyTableState *node)
for (j = 0; j < resultRelInfo->ri_NumSlotsInitialized; j++)
{
ExecDropSingleTupleTableSlot(resultRelInfo->ri_Slots[j]);
+ resultRelInfo->ri_Slots[j] = NULL;
ExecDropSingleTupleTableSlot(resultRelInfo->ri_PlanSlots[j]);
+ resultRelInfo->ri_PlanSlots[j] = NULL;
}
}
@@ -4732,12 +4734,16 @@ ExecEndModifyTable(ModifyTableState *node)
* Close all the partitioned tables, leaf partitions, and their indices
* and release the slot used for tuple routing, if set.
*/
- if (node->mt_partition_tuple_routing)
+ if (node->mt_partition_tuple_routing != NULL)
{
ExecCleanupTupleRouting(node, node->mt_partition_tuple_routing);
+ node->mt_partition_tuple_routing = NULL;
- if (node->mt_root_tuple_slot)
+ if (node->mt_root_tuple_slot != NULL)
+ {
ExecDropSingleTupleTableSlot(node->mt_root_tuple_slot);
+ node->mt_root_tuple_slot = NULL;
+ }
}
/*
@@ -4749,6 +4755,7 @@ ExecEndModifyTable(ModifyTableState *node)
* shut down subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index 7f4bf6c4db..01f3d56a3b 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -367,7 +367,9 @@ ExecEndNestLoop(NestLoopState *node)
* close down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
NL1_printf("ExecEndNestLoop: %s\n",
"node processing ended");
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index e483730015..ca9a5e2ed2 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -331,6 +331,7 @@ ExecEndProjectSet(ProjectSetState *node)
* shut down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index c7f8a19fa4..7680142c7b 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -272,20 +272,36 @@ void
ExecEndRecursiveUnion(RecursiveUnionState *node)
{
/* Release tuplestores */
- tuplestore_end(node->working_table);
- tuplestore_end(node->intermediate_table);
+ if (node->working_table != NULL)
+ {
+ tuplestore_end(node->working_table);
+ node->working_table = NULL;
+ }
+ if (node->intermediate_table != NULL)
+ {
+ tuplestore_end(node->intermediate_table);
+ node->intermediate_table = NULL;
+ }
/* free subsidiary stuff including hashtable */
- if (node->tempContext)
+ if (node->tempContext != NULL)
+ {
MemoryContextDelete(node->tempContext);
- if (node->tableContext)
+ node->tempContext = NULL;
+ }
+ if (node->tableContext != NULL)
+ {
MemoryContextDelete(node->tableContext);
+ node->tableContext = NULL;
+ }
/*
* close down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 348361e7f4..e3cfc9b772 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -243,6 +243,7 @@ ExecEndResult(ResultState *node)
* shut down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 714b076e64..6ab91001bc 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -181,14 +181,17 @@ ExecEndSampleScan(SampleScanState *node)
/*
* Tell sampling function that we finished the scan.
*/
- if (node->tsmroutine->EndSampleScan)
+ if (node->tsmroutine != NULL && node->tsmroutine->EndSampleScan)
node->tsmroutine->EndSampleScan(node);
/*
- * close heap scan
+ * close heap scan (no-op if we didn't start it)
*/
if (node->ss.ss_currentScanDesc)
+ {
table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 7cb12a11c2..b052775e5b 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -183,18 +183,14 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
void
ExecEndSeqScan(SeqScanState *node)
{
- TableScanDesc scanDesc;
-
- /*
- * get information from node
- */
- scanDesc = node->ss.ss_currentScanDesc;
-
/*
- * close heap scan
+ * close heap scan (no-op if we didn't start it)
*/
- if (scanDesc != NULL)
- table_endscan(scanDesc);
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
+ table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index a8ac68b482..fe34b2134f 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -583,10 +583,14 @@ void
ExecEndSetOp(SetOpState *node)
{
/* free subsidiary stuff including hashtable */
- if (node->tableContext)
+ if (node->tableContext != NULL)
+ {
MemoryContextDelete(node->tableContext);
+ node->tableContext = NULL;
+ }
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index 3fc925d7b4..af852464d0 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -307,13 +307,16 @@ ExecEndSort(SortState *node)
* Release tuplesort resources
*/
if (node->tuplesortstate != NULL)
+ {
tuplesort_end((Tuplesortstate *) node->tuplesortstate);
- node->tuplesortstate = NULL;
+ node->tuplesortstate = NULL;
+ }
/*
* shut down the subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
SO1_printf("ExecEndSort: %s\n",
"sort node shutdown");
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 782097eaf2..0b2612183a 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -171,6 +171,7 @@ ExecEndSubqueryScan(SubqueryScanState *node)
* close down subquery
*/
ExecEndNode(node->subplan);
+ node->subplan = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTableFuncscan.c b/src/backend/executor/nodeTableFuncscan.c
index f483221bb8..778d25d511 100644
--- a/src/backend/executor/nodeTableFuncscan.c
+++ b/src/backend/executor/nodeTableFuncscan.c
@@ -223,8 +223,10 @@ ExecEndTableFuncScan(TableFuncScanState *node)
* Release tuplestore resources
*/
if (node->tupstore != NULL)
+ {
tuplestore_end(node->tupstore);
- node->tupstore = NULL;
+ node->tupstore = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 9aa7683d7e..702ee884d2 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -326,10 +326,14 @@ ExecReScanTidRangeScan(TidRangeScanState *node)
void
ExecEndTidRangeScan(TidRangeScanState *node)
{
- TableScanDesc scan = node->ss.ss_currentScanDesc;
-
- if (scan != NULL)
- table_endscan(scan);
+ /*
+ * close heap scan (no-op if we didn't start it)
+ */
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
+ table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 864a9013b6..f375951699 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -469,8 +469,14 @@ ExecReScanTidScan(TidScanState *node)
void
ExecEndTidScan(TidScanState *node)
{
- if (node->ss.ss_currentScanDesc)
+ /*
+ * close heap scan (no-op if we didn't start it)
+ */
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index a125923e93..b82d0e9ad5 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -168,6 +168,7 @@ void
ExecEndUnique(UniqueState *node)
{
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 3221fa1522..561d7e731d 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -1351,11 +1351,14 @@ release_partition(WindowAggState *winstate)
* any aggregate temp data). We don't rely on retail pfree because some
* aggregates might have allocated data we don't have direct pointers to.
*/
- MemoryContextReset(winstate->partcontext);
- MemoryContextReset(winstate->aggcontext);
+ if (winstate->partcontext != NULL)
+ MemoryContextReset(winstate->partcontext);
+ if (winstate->aggcontext != NULL)
+ MemoryContextReset(winstate->aggcontext);
for (i = 0; i < winstate->numaggs; i++)
{
- if (winstate->peragg[i].aggcontext != winstate->aggcontext)
+ if (winstate->peragg[i].aggcontext != NULL &&
+ winstate->peragg[i].aggcontext != winstate->aggcontext)
MemoryContextReset(winstate->peragg[i].aggcontext);
}
@@ -2681,24 +2684,40 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
void
ExecEndWindowAgg(WindowAggState *node)
{
- PlanState *outerPlan;
int i;
release_partition(node);
for (i = 0; i < node->numaggs; i++)
{
- if (node->peragg[i].aggcontext != node->aggcontext)
+ if (node->peragg[i].aggcontext != NULL &&
+ node->peragg[i].aggcontext != node->aggcontext)
MemoryContextDelete(node->peragg[i].aggcontext);
}
- MemoryContextDelete(node->partcontext);
- MemoryContextDelete(node->aggcontext);
+ if (node->partcontext != NULL)
+ {
+ MemoryContextDelete(node->partcontext);
+ node->partcontext = NULL;
+ }
+ if (node->aggcontext != NULL)
+ {
+ MemoryContextDelete(node->aggcontext);
+ node->aggcontext = NULL;
+ }
- pfree(node->perfunc);
- pfree(node->peragg);
+ if (node->perfunc != NULL)
+ {
+ pfree(node->perfunc);
+ node->perfunc = NULL;
+ }
+ if (node->peragg != NULL)
+ {
+ pfree(node->peragg);
+ node->peragg = NULL;
+ }
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
/* -----------------
--
2.43.0
Hi,
On Thu, Aug 29, 2024 at 9:34 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Fri, Aug 23, 2024 at 9:48 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Wed, Aug 21, 2024 at 10:10 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Aug 21, 2024 at 8:45 AM Amit Langote <amitlangote09@gmail.com> wrote:
* The replanning aspect of the lock-in-the-executor design would be
simpler if a CachedPlan contained the plan for a single query rather
than a list of queries, as previously mentioned. This is particularly
due to the requirements of the PORTAL_MULTI_QUERY case. However, this
option might be impractical.It might be, but maybe it would be worth a try? I mean,
GetCachedPlan() seems to just call pg_plan_queries() which just loops
over the list of query trees and does the same thing for each one. If
we wanted to replan a single query, why couldn't we do
fake_querytree_list = list_make1(list_nth(querytree_list, n)) and then
call pg_plan_queries(fake_querytree_list)? Or something equivalent to
that. We could have a new GetCachedSinglePlan(cplan, n) to do this.I've been hacking to prototype this, and it's showing promise. It
helps make the replan loop at the call sites that start the executor
with an invalidatable plan more localized and less prone to
action-at-a-distance issues. However, the interface and contract of
the new function in my prototype are pretty specialized for the replan
loop in this context—meaning it's not as general-purpose as
GetCachedPlan(). Essentially, what you get when you call it is a
'throwaway' CachedPlan containing only the plan for the query that
failed during ExecutorStart(), not a plan integrated into the original
CachedPlanSource's stmt_list. A call site entering the replan loop
will retry the execution with that throwaway plan, release it once
done, and resume looping over the plans in the original list. The
invalid plan that remains in the original list will be discarded and
replanned in the next call to GetCachedPlan() using the same
CachedPlanSource. While that may sound undesirable, I'm inclined to
think it's not something that needs optimization, given that we're
expecting this code path to be taken rarely.I'll post a version of a revamped locks-in-the-executor patch set
using the above function after debugging some more.Here it is.
0001 implements changes to defer the locking of runtime-prunable
relations to the executor. The new design introduces a bitmapset
field in PlannedStmt to distinguish at runtime between relations that
are prunable whose locking can be deferred until ExecInitNode() and
those that are not and must be locked in advance. The set of prunable
relations can be constructed by looking at all the PartitionPruneInfos
in the plan and checking which are subject to "initial" pruning steps.
The set of unprunable relations is obtained by subtracting those from
the set of all RT indexes. This design gets rid of one annoying
aspect of the old design which was the need to add specialized fields
to store the RT indexes of partitioned relations that are not
otherwise referenced in the plan tree. That was necessary because in
the old design, I had removed the function AcquireExecutorLocks()
altogether to defer the locking of all child relations to execution.
In the new design such relations are still locked by
AcquireExecutorLocks().0002 is the old patch to make ExecEndNode() robust against partially
initialized PlanState nodes by adding NULL checks.0003 is the patch to add changes to deal with the CachedPlan becoming
invalid before the deferred locks on prunable relations are taken.
I've moved the replan loop into a new wrapper-over-ExecutorStart()
function instead of having the same logic at multiple sites. The
replan logic uses the GetSingleCachedPlan() described in the quoted
text. The callers of the new ExecutorStart()-wrapper, which I've
dubbed ExecutorStartExt(), need to pass the CachedPlanSource and a
query_index, which is the index of the query being executed in the
list CachedPlanSource.query_list. They are needed by
GetSingleCachedPlan(). The changes outside the executor are pretty
minimal in this design and all the difficulties of having to loop back
to GetCachedPlan() are now gone. I like how this turned out.One idea that I think might be worth trying to reduce the footprint of
0003 is to try to lock the prunable relations in a step of InitPlan()
separate from ExecInitNode(), which can be implemented by doing the
initial runtime pruning in that separate step. That way, we'll have
all the necessary locks before calling ExecInitNode() and so we don't
need to sprinkle the CachedPlanStillValid() checks all over the place
and worry about missed checks and dealing with partially initialized
PlanState trees.--
Thanks, Amit Langote
@@ -1241,7 +1244,7 @@ GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv, true);
Is the *true* here a typo? Seems it should be *false* for custom plan?
--
Regards
Junwang Zhao
On Sat, Aug 31, 2024 at 9:30 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
@@ -1241,7 +1244,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams, if (customplan) { /* Build a custom plan */ - plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv); + plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv, true);Is the *true* here a typo? Seems it should be *false* for custom plan?
That's correct, thanks for catching that. Will fix.
--
Thanks, Amit Langote
On Mon, Sep 2, 2024 at 5:19 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Sat, Aug 31, 2024 at 9:30 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
@@ -1241,7 +1244,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams, if (customplan) { /* Build a custom plan */ - plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv); + plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv, true);Is the *true* here a typo? Seems it should be *false* for custom plan?
That's correct, thanks for catching that. Will fix.
Done.
I've also rewritten the new GetSingleCachedPlan() function in 0003.
The most glaring bug in the previous version was that the transient
CachedPlan it creates cannot be seen by PlanCacheRelCallback() et al
functions because it was intentionally not linked to the
CachedPlanSource, so if the CachedPlan would not be invalidated even
if some prunable relation got changed before it is locked during
ExecutorStart(). I've added a new list standalone_plan_list to add
these to and changed the inval callback functions to invalidate any
plans contained in them.
Another thing I found out through testing is that CachedPlanSource can
have become invalid since leaving GetCachedPlan() (actually even
before returning from that function) because of
PlanCacheSysCallback(), which drops/invalidates *all* plans when a
syscache is invalidated. There are comments in plancache.c (see
BuildCachedPlan()) saying that such invalidations are, in theory,
false positives, but that gave me a pause nonetheless.
Finally, instead of calling GetCachedPlan() from GetSingleCachedPlan()
to create a plan for only the query whose plan got invalidated, which
required a bunch of care to ensure that the CachedPlanSource is not
overwritten with the information about this single-query planning,
I've made GetSingleCachedPlan() create the PlannedStmt and the
detached CachedPlan object on its own, borrowing the minimal necessary
code from BuildCachedPlan() to do so.
--
Thanks, Amit Langote
Attachments:
v52-0001-Defer-locking-of-runtime-prunable-relations-to-e.patchapplication/octet-stream; name=v52-0001-Defer-locking-of-runtime-prunable-relations-to-e.patchDownload
From 74906c4bbc42b362f7a5608774af68615a299912 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 7 Aug 2024 18:25:51 +0900
Subject: [PATCH v52 1/3] Defer locking of runtime-prunable relations to
executor
When preparing a cached plan for execution, plancache.c locks the
relations contained in the plan's range table to ensure it is safe for
execution. However, this simplistic approach, implemented in
AcquireExecutorLocks(), results in unnecessarily locking relations that
might be pruned during "initial" runtime pruning.
To optimize this, the locking is now deferred for relations that are
subject to "initial" runtime pruning. The planner now provides a set
of "unprunable" relations, available through the new
PlannedStmt.unprunableRelids field. AcquireExecutorLocks() will now
only lock those relations.
PlannedStmt.unprunableRelids is populated by subtracting the set of
initially prunable relids from the set of all RT indexes. The prunable
relids set is constructed by examining all PartitionPruneInfos during
set_plan_refs() and storing the RT indexes of partitions subject to
"initial" pruning steps. While at it, some duplicated code in
set_append_references() and set_mergeappend_references() that
constructs the prunable relids set has been refactored into a common
function.
To enable the executor to determine whether the plan tree it's
executing is a cached one, the CachedPlan is now made available via
the QueryDesc. The executor can call CachedPlanRequiresLocking(),
which returns true if the CachedPlan is a reusable generic plan that
might contain relations needing to be locked. If so, the executor
will lock any relation that is not in PlannedStmt.unprunableRelids.
Finally, an Assert has been added in ExecCheckPermissions() to ensure
that all relations whose permissions are checked have been properly
locked. This helps catch any accidental omission of relations from the
unprunableRelids set that should have their permissions checked.
This deferment introduces a window in which prunable relations may be
altered by concurrent DDL, potentially causing the plan to become
invalid. As a result, the executor might attempt to run an invalid plan,
leading to errors such as being unable to locate a partition-only index
during ExecInitIndexScan(). Future commits will introduce changes to
ready the executor to check plan validity during ExecutorStart() and
retry with a newly created plan if the original one becomes invalid
after taking deferred locks.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 ++--
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 3 +-
src/backend/executor/execMain.c | 18 ++++++++
src/backend/executor/execParallel.c | 9 +++-
src/backend/executor/execUtils.c | 30 +++++++++++++-
src/backend/executor/functions.c | 1 +
src/backend/executor/spi.c | 1 +
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 62 +++++++++++++++-------------
src/backend/partitioning/partprune.c | 24 ++++++++++-
src/backend/storage/lmgr/lmgr.c | 1 +
src/backend/tcop/pquery.c | 10 ++++-
src/backend/utils/cache/lsyscache.c | 1 -
src/backend/utils/cache/plancache.c | 40 +++++++++++-------
src/include/commands/explain.h | 5 ++-
src/include/executor/execdesc.h | 2 +
src/include/nodes/execnodes.h | 2 +
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 11 +++++
src/include/utils/plancache.h | 10 +++++
24 files changed, 195 insertions(+), 57 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 91de442f43..db976f928a 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -552,7 +552,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 0b629b1f79..57a3375cad 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -324,7 +324,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 11df4a04d4..a83ea07db1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -507,7 +507,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -615,7 +615,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -671,7 +672,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, cplan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 1643c8c69a..3f7f4306fe 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -798,6 +798,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 91f0fd6ea3..a7a79583ec 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -438,7 +438,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 07257d4db9..311b9ebd5b 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -655,7 +655,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
+ ExplainOnePlan(pstmt, cplan, into, es, query_string, paramLI,
+ queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 29e186fa73..271f9d93fc 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -52,6 +52,7 @@
#include "miscadmin.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -597,6 +598,21 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
(rte->rtekind == RTE_SUBQUERY &&
rte->relkind == RELKIND_VIEW));
+ /*
+ * Ensure that we have at least an AccessShareLock on relations
+ * whose permissions need to be checked.
+ *
+ * Skip this check in a parallel worker because locks won't be
+ * taken until ExecInitNode() performs plan initialization.
+ *
+ * XXX: ExecCheckPermissions() in a parallel worker may be
+ * redundant with the checks done in the leader process, so this
+ * should be reviewed to ensure it’s necessary.
+ */
+ Assert(IsParallelWorker() ||
+ CheckRelationOidLockedByMe(rte->relid, AccessShareLock,
+ true));
+
(void) getRTEPermissionInfo(rteperminfos, rte);
/* Many-to-one mapping not allowed */
Assert(!bms_is_member(rte->perminfoindex, indexset));
@@ -829,6 +845,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ CachedPlan *cachedplan = queryDesc->cplan;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -848,6 +865,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = cachedplan;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index bfb3419efb..03b48e12b4 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1256,8 +1256,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. We pass NULL for cachedplan, because
+ * we don't have a pointer to the CachedPlan in the leader's process. It's
+ * fine because the only reason the executor needs to see it is to decide
+ * if it should take locks on certain relations, but paraller workers
+ * always take locks anyway.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 5737f9f4eb..6dfd5a26b7 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -752,6 +752,26 @@ ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
estate->es_rowmarks = NULL;
}
+/*
+ * ExecShouldLockRelation
+ * Determine if the relation should be locked.
+ *
+ * The relation does not need to be locked if we are not running a cached
+ * plan or if it has already been locked as an unprunable relation.
+ *
+ * Lock the relation if it might be one of the prunable relations mentioned
+ * in the cached plan.
+ */
+static bool
+ExecShouldLockRelation(EState *estate, Index rtindex)
+{
+ if (estate->es_cachedplan == NULL ||
+ bms_is_member(rtindex, estate->es_plannedstmt->unprunableRelids))
+ return false;
+
+ return CachedPlanRequiresLocking(estate->es_cachedplan);
+}
+
/*
* ExecGetRangeTableRelation
* Open the Relation for a range table entry, if not already done
@@ -773,7 +793,7 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rte->rtekind == RTE_RELATION);
- if (!IsParallelWorker())
+ if (!IsParallelWorker() && !ExecShouldLockRelation(estate, rti))
{
/*
* In a normal query, we should already have the appropriate lock,
@@ -789,9 +809,17 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
else
{
/*
+ * Lock relation either if we are a parallel worker or if
+ * ExecShouldLockRelation() says we should.
+ *
* If we are a parallel worker, we need to obtain our own local
* lock on the relation. This ensures sane behavior in case the
* parent process exits before we do.
+ *
+ * ExecShouldLockRelation() would return true if the RT index is
+ * that of a prunable relation and we're running a cached generic
+ * plan. AcquireExecutorLocks() of plancache.c would have locked
+ * only the unprunable relations in the plan tree.
*/
rel = table_open(rte->relid, rte->rellockmode);
}
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 692854e2b3..6f6f45e0ad 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -840,6 +840,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index d6516b1bca..902793b02b 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2684,6 +2684,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b5827d3980..cb9b6f0147 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -546,6 +546,8 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->rtable = glob->finalrtable;
+ result->unprunableRelids = bms_difference(bms_add_range(NULL, 1, list_length(result->rtable)),
+ glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 7aed84584c..b6be0e5730 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -154,6 +154,9 @@ static Plan *set_append_references(PlannerInfo *root,
static Plan *set_mergeappend_references(PlannerInfo *root,
MergeAppend *mplan,
int rtoffset);
+static void set_part_prune_references(PartitionPruneInfo *pinfo,
+ PlannerGlobal *glob,
+ int rtoffset);
static void set_hash_references(PlannerInfo *root, Plan *plan, int rtoffset);
static Relids offset_relid_set(Relids relids, int rtoffset);
static Node *fix_scan_expr(PlannerInfo *root, Node *node,
@@ -1783,20 +1786,8 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ set_part_prune_references(aplan->part_prune_info, root->glob,
+ rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1859,20 +1850,8 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ set_part_prune_references(mplan->part_prune_info, root->glob,
+ rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
@@ -1881,6 +1860,33 @@ set_mergeappend_references(PlannerInfo *root,
return (Plan *) mplan;
}
+/*
+ * Updates RT indexes in PartitionedRelPruneInfos contained in pinfo and adds
+ * the RT indexes of "prunable" relations into glob->prunableRelids.
+ */
+static void
+set_part_prune_references(PartitionPruneInfo *pinfo, PlannerGlobal *glob,
+ int rtoffset)
+{
+ ListCell *l;
+
+ foreach(l, pinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *prelinfo = lfirst(l2);
+
+ prelinfo->rtindex += rtoffset;
+ if (prelinfo->initial_pruning_steps != NIL)
+ glob->prunableRelids = bms_add_members(glob->prunableRelids,
+ prelinfo->present_part_rtis);
+ }
+ }
+}
+
/*
* set_hash_references
* Do set_plan_references processing on a Hash node
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9a1a7faac7..8e27e35df2 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -634,6 +634,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
PartitionedRelPruneInfo *pinfo = lfirst(lc);
RelOptInfo *subpart = find_base_rel(root, pinfo->rtindex);
Bitmapset *present_parts;
+ Bitmapset *present_part_rtis;
int nparts = subpart->nparts;
int *subplan_map;
int *subpart_map;
@@ -650,7 +651,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
- present_parts = NULL;
+ present_parts = present_part_rtis = NULL;
i = -1;
while ((i = bms_next_member(subpart->live_parts, i)) >= 0)
@@ -664,15 +665,35 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+
+ /*
+ * Track the RT indexes of partitions to ensure they are included
+ * in the prunableRelids set of relations that are locked during
+ * execution. This ensures that if the plan is cached, these
+ * partitions are locked when the plan is reused.
+ *
+ * Partitions without a subplan and sub-partitioned partitions
+ * where none of the sub-partitions have a subplan due to
+ * constraint exclusion are not included in this set. Instead,
+ * they are added to the unprunableRelids set, and the relations
+ * in this set are locked by AcquireExecutorLocks() before
+ * executing a cached plan.
+ */
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
+ present_part_rtis = bms_add_member(present_part_rtis,
+ partrel->relid);
/* Record finding this subplan */
subplansfound = bms_add_member(subplansfound, subplanidx);
}
else if (subpartidx >= 0)
+ {
present_parts = bms_add_member(present_parts, i);
+ present_part_rtis = bms_add_member(present_part_rtis,
+ partrel->relid);
+ }
}
/*
@@ -684,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Record the maps and other information. */
pinfo->present_parts = present_parts;
+ pinfo->present_part_rtis = present_part_rtis;
pinfo->nparts = nparts;
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 094522acb4..a1c89f5d72 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -26,6 +26,7 @@
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
+#include "utils/lsyscache.h"
/*
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index a1f8d03db1..6e8f6b1b8f 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -36,6 +36,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +66,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +79,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan supplying the plannedstmt */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * cplan: CachedPlan supplying the plan
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, cplan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -493,6 +498,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1276,6 +1282,7 @@ PortalRunMulti(Portal portal,
{
/* statement can set tag string */
ProcessQuery(pstmt,
+ portal->cplan,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1285,6 +1292,7 @@ PortalRunMulti(Portal portal,
{
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
+ portal->cplan,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 48a280d089..f647821382 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2113,7 +2113,6 @@ get_rel_relam(Oid relid)
return result;
}
-
/* ---------- TRANSFORM CACHE ---------- */
Oid
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 5af1a168ec..5b75dadf13 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -104,7 +104,8 @@ static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ bool generic);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
@@ -815,8 +816,11 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * On a "true" return, we have acquired locks on the "unprunableRelids" set
+ * for all plans in plansource->stmt_list. The plans are not completely
+ * race-condition-free until the executor takes locks on the set of prunable
+ * relations that survive initial runtime pruning during executor
+ * initialization;
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -893,10 +897,10 @@ CheckCachedPlan(CachedPlanSource *plansource)
* or it can be set to NIL if we need to re-copy the plansource's query_list.
*
* To build a generic, parameter-value-independent plan, pass NULL for
- * boundParams. To build a custom plan, pass the actual parameter values via
- * boundParams. For best effect, the PARAM_FLAG_CONST flag should be set on
- * each parameter value; otherwise the planner will treat the value as a
- * hint rather than a hard constant.
+ * boundParams, and true for generic. To build a custom plan, pass the actual
+ * parameter values via boundParams, and false for generic. For best effect,
+ * the PARAM_FLAG_CONST flag should be set on each parameter value; otherwise
+ * the planner will treat the value as a hint rather than a hard constant.
*
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
@@ -904,7 +908,8 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ bool generic)
{
CachedPlan *plan;
List *plist;
@@ -1026,6 +1031,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->refcount = 0;
plan->context = plan_context;
plan->is_oneshot = plansource->is_oneshot;
+ plan->is_generic = generic;
plan->is_saved = false;
plan->is_valid = true;
@@ -1196,7 +1202,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv, true);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1241,7 +1247,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv, false);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1387,8 +1393,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if there are any lockable relations. This is probably
+ * unnecessary given the previous check, but let's be safe.
*/
foreach(lc, plan->stmt_list)
{
@@ -1776,7 +1782,7 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ int rtindex;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1794,9 +1800,13 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ rtindex = -1;
+ while ((rtindex = bms_next_member(plannedstmt->unprunableRelids,
+ rtindex)) >= 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry,
+ plannedstmt->rtable,
+ rtindex - 1);
if (!(rte->rtekind == RTE_RELATION ||
(rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9b8b351d9a..bf326eeb70 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -101,8 +101,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
- ExplainState *es, const char *queryString,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ IntoClause *into, ExplainState *es,
+ const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 0a7274e26c..0e7245435d 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ CachedPlan *cplan; /* CachedPlan that supplies the plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +58,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index af7d8fd1e7..ee089505a0 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -42,6 +42,7 @@
#include "storage/condition_variable.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
+#include "utils/plancache.h"
#include "utils/reltrigger.h"
#include "utils/sharedtuplestore.h"
#include "utils/snapshot.h"
@@ -633,6 +634,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ CachedPlan *es_cachedplan; /* CachedPlan supplying the plannedstmt */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 540d021592..2466157b25 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -116,6 +116,12 @@ typedef struct PlannerGlobal
/* "flat" rangetable for executor */
List *finalrtable;
+ /*
+ * RT indexes of relations subject to removal from the plan due to runtime
+ * pruning at plan initialization time
+ */
+ Bitmapset *prunableRelids;
+
/* "flat" list of RTEPermissionInfos */
List *finalrteperminfos;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 62cd6a6666..ae608812f1 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -71,6 +71,10 @@ typedef struct PlannedStmt
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *unprunableRelids; /* RT indexes of relations that are not
+ * subject to runtime pruning; for
+ * AcquireExecutorLocks() */
+
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
@@ -1459,6 +1463,13 @@ typedef struct PartitionedRelPruneInfo
/* Indexes of all partitions which subplans or subparts are present for */
Bitmapset *present_parts;
+ /*
+ * RT indexes of all partitions which subplans or subparts are present
+ * for; only used during planning to help in the construction of
+ * PlannerGlobal.prunableRelids.
+ */
+ Bitmapset *present_part_rtis;
+
/* Length of the following arrays: */
int nparts;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a90dfdf906..0b5ee007ca 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -149,6 +149,7 @@ typedef struct CachedPlan
int magic; /* should equal CACHEDPLAN_MAGIC */
List *stmt_list; /* list of PlannedStmts */
bool is_oneshot; /* is it a "oneshot" plan? */
+ bool is_generic; /* is it a reusable generic plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
Oid planRoleId; /* Role ID the plan was created for */
@@ -235,4 +236,13 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
+/*
+ * CachedPlanRequiresLocking: should the executor acquire locks?
+ */
+static inline bool
+CachedPlanRequiresLocking(CachedPlan *cplan)
+{
+ return cplan->is_generic;
+}
+
#endif /* PLANCACHE_H */
--
2.43.0
v52-0002-Assorted-tightening-in-various-ExecEnd-routines.patchapplication/octet-stream; name=v52-0002-Assorted-tightening-in-various-ExecEnd-routines.patchDownload
From a4077c425a1874036d2937bb93a6116cfd3640cb Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 28 Sep 2023 16:56:29 +0900
Subject: [PATCH v52 2/3] Assorted tightening in various ExecEnd()* routines
This includes adding NULLness checks on pointers before cleaning them
up. Many ExecEnd*() routines already perform this check, but a few
are missing them. These NULLness checks might seem redundant as
things stand since the ExecEnd*() routines operate under the
assumption that their matching ExecInit* routine would have fully
executed, ensuring pointers are set. However, that assumption seems a
bit shaky in the face of future changes.
This also adds a guard at the begigging of EvalPlanQualEnd() to return
early if the EPQState does not appear to have been initialized. That
case can happen if the corresponding ExecInit*() routine returned
early without calling EvalPlanQualInit().
While at it, this commit ensures that pointers are consistently set
to NULL after cleanup in all ExecEnd*() routines.
Finally, for enhanced consistency, the format of NULLness checks has
been standardized to "if (pointer != NULL)", replacing the previous
"if (pointer)" style.
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
---
src/backend/executor/execMain.c | 4 ++
src/backend/executor/nodeAgg.c | 27 +++++++++----
src/backend/executor/nodeAppend.c | 3 ++
src/backend/executor/nodeBitmapAnd.c | 4 +-
src/backend/executor/nodeBitmapHeapscan.c | 46 ++++++++++++++--------
src/backend/executor/nodeBitmapIndexscan.c | 23 ++++++-----
src/backend/executor/nodeBitmapOr.c | 4 +-
src/backend/executor/nodeCtescan.c | 3 +-
src/backend/executor/nodeForeignscan.c | 17 ++++----
src/backend/executor/nodeGather.c | 1 +
src/backend/executor/nodeGatherMerge.c | 1 +
src/backend/executor/nodeGroup.c | 6 +--
src/backend/executor/nodeHash.c | 6 +--
src/backend/executor/nodeHashjoin.c | 4 +-
src/backend/executor/nodeIncrementalSort.c | 13 +++++-
src/backend/executor/nodeIndexonlyscan.c | 25 ++++++------
src/backend/executor/nodeIndexscan.c | 23 ++++++-----
src/backend/executor/nodeLimit.c | 1 +
src/backend/executor/nodeLockRows.c | 1 +
src/backend/executor/nodeMaterial.c | 5 ++-
src/backend/executor/nodeMemoize.c | 7 +++-
src/backend/executor/nodeMergeAppend.c | 3 ++
src/backend/executor/nodeMergejoin.c | 2 +
src/backend/executor/nodeModifyTable.c | 11 +++++-
src/backend/executor/nodeNestloop.c | 2 +
src/backend/executor/nodeProjectSet.c | 1 +
src/backend/executor/nodeRecursiveunion.c | 24 +++++++++--
src/backend/executor/nodeResult.c | 1 +
src/backend/executor/nodeSamplescan.c | 7 +++-
src/backend/executor/nodeSeqscan.c | 16 +++-----
src/backend/executor/nodeSetOp.c | 6 ++-
src/backend/executor/nodeSort.c | 5 ++-
src/backend/executor/nodeSubqueryscan.c | 1 +
src/backend/executor/nodeTableFuncscan.c | 4 +-
src/backend/executor/nodeTidrangescan.c | 12 ++++--
src/backend/executor/nodeTidscan.c | 8 +++-
src/backend/executor/nodeUnique.c | 1 +
src/backend/executor/nodeWindowAgg.c | 41 +++++++++++++------
38 files changed, 246 insertions(+), 123 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 271f9d93fc..0f6dbd1e2b 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2999,6 +2999,10 @@ EvalPlanQualEnd(EPQState *epqstate)
MemoryContext oldcontext;
ListCell *l;
+ /* Nothing to do if no EvalPlanQualInit() was done to begin with. */
+ if (epqstate->parentestate == NULL)
+ return;
+
rtsize = epqstate->parentestate->es_range_table_size;
/*
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 53ead77ece..0dfba5ca16 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -4303,7 +4303,6 @@ GetAggInitVal(Datum textInitVal, Oid transtype)
void
ExecEndAgg(AggState *node)
{
- PlanState *outerPlan;
int transno;
int numGroupingSets = Max(node->maxsets, 1);
int setno;
@@ -4313,7 +4312,7 @@ ExecEndAgg(AggState *node)
* worker back into shared memory so that it can be picked up by the main
* process to report in EXPLAIN ANALYZE.
*/
- if (node->shared_info && IsParallelWorker())
+ if (node->shared_info != NULL && IsParallelWorker())
{
AggregateInstrumentation *si;
@@ -4326,10 +4325,16 @@ ExecEndAgg(AggState *node)
/* Make sure we have closed any open tuplesorts */
- if (node->sort_in)
+ if (node->sort_in != NULL)
+ {
tuplesort_end(node->sort_in);
- if (node->sort_out)
+ node->sort_in = NULL;
+ }
+ if (node->sort_out != NULL)
+ {
tuplesort_end(node->sort_out);
+ node->sort_out = NULL;
+ }
hashagg_reset_spill_state(node);
@@ -4345,19 +4350,25 @@ ExecEndAgg(AggState *node)
for (setno = 0; setno < numGroupingSets; setno++)
{
- if (pertrans->sortstates[setno])
+ if (pertrans->sortstates[setno] != NULL)
tuplesort_end(pertrans->sortstates[setno]);
}
}
/* And ensure any agg shutdown callbacks have been called */
for (setno = 0; setno < numGroupingSets; setno++)
+ {
ReScanExprContext(node->aggcontexts[setno]);
- if (node->hashcontext)
+ node->aggcontexts[setno] = NULL;
+ }
+ if (node->hashcontext != NULL)
+ {
ReScanExprContext(node->hashcontext);
+ node->hashcontext = NULL;
+ }
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index ca0f54d676..86d75b1a7e 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -399,7 +399,10 @@ ExecEndAppend(AppendState *node)
* shut down each of the subscans
*/
for (i = 0; i < nplans; i++)
+ {
ExecEndNode(appendplans[i]);
+ appendplans[i] = NULL;
+ }
}
void
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index 9c9c666872..ae391222bf 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -192,8 +192,8 @@ ExecEndBitmapAnd(BitmapAndState *node)
*/
for (i = 0; i < nplans; i++)
{
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
+ ExecEndNode(bitmapplans[i]);
+ bitmapplans[i] = NULL;
}
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 3c63bdd93d..19f18ab817 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -625,8 +625,6 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node)
void
ExecEndBitmapHeapScan(BitmapHeapScanState *node)
{
- TableScanDesc scanDesc;
-
/*
* When ending a parallel worker, copy the statistics gathered by the
* worker back into shared memory so that it can be picked up by the main
@@ -650,38 +648,54 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
si->lossy_pages += node->stats.lossy_pages;
}
- /*
- * extract information from the node
- */
- scanDesc = node->ss.ss_currentScanDesc;
-
/*
* close down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
/*
* release bitmaps and buffers if any
*/
- if (node->tbmiterator)
+ if (node->tbmiterator != NULL)
+ {
tbm_end_iterate(node->tbmiterator);
- if (node->prefetch_iterator)
+ node->tbmiterator = NULL;
+ }
+ if (node->prefetch_iterator != NULL)
+ {
tbm_end_iterate(node->prefetch_iterator);
- if (node->tbm)
+ node->prefetch_iterator = NULL;
+ }
+ if (node->tbm != NULL)
+ {
tbm_free(node->tbm);
- if (node->shared_tbmiterator)
+ node->tbm = NULL;
+ }
+ if (node->shared_tbmiterator != NULL)
+ {
tbm_end_shared_iterate(node->shared_tbmiterator);
- if (node->shared_prefetch_iterator)
+ node->shared_tbmiterator = NULL;
+ }
+ if (node->shared_prefetch_iterator != NULL)
+ {
tbm_end_shared_iterate(node->shared_prefetch_iterator);
+ node->shared_prefetch_iterator = NULL;
+ }
if (node->pvmbuffer != InvalidBuffer)
+ {
ReleaseBuffer(node->pvmbuffer);
+ node->pvmbuffer = InvalidBuffer;
+ }
/*
- * close heap scan
+ * close heap scan (no-op if we didn't start it)
*/
- if (scanDesc)
- table_endscan(scanDesc);
-
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
+ table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 6df8e17ec8..4669e8d0ce 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -174,22 +174,21 @@ ExecReScanBitmapIndexScan(BitmapIndexScanState *node)
void
ExecEndBitmapIndexScan(BitmapIndexScanState *node)
{
- Relation indexRelationDesc;
- IndexScanDesc indexScanDesc;
-
- /*
- * extract information from the node
- */
- indexRelationDesc = node->biss_RelationDesc;
- indexScanDesc = node->biss_ScanDesc;
+ /* close the scan (no-op if we didn't start it) */
+ if (node->biss_ScanDesc != NULL)
+ {
+ index_endscan(node->biss_ScanDesc);
+ node->biss_ScanDesc = NULL;
+ }
/*
* close the index relation (no-op if we didn't open it)
*/
- if (indexScanDesc)
- index_endscan(indexScanDesc);
- if (indexRelationDesc)
- index_close(indexRelationDesc, NoLock);
+ if (node->biss_RelationDesc != NULL)
+ {
+ index_close(node->biss_RelationDesc, NoLock);
+ node->biss_RelationDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index 7029536c64..de439235d2 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -210,8 +210,8 @@ ExecEndBitmapOr(BitmapOrState *node)
*/
for (i = 0; i < nplans; i++)
{
- if (bitmapplans[i])
- ExecEndNode(bitmapplans[i]);
+ ExecEndNode(bitmapplans[i]);
+ bitmapplans[i] = NULL;
}
}
diff --git a/src/backend/executor/nodeCtescan.c b/src/backend/executor/nodeCtescan.c
index 8081eed887..7cea943988 100644
--- a/src/backend/executor/nodeCtescan.c
+++ b/src/backend/executor/nodeCtescan.c
@@ -290,10 +290,11 @@ ExecEndCteScan(CteScanState *node)
/*
* If I am the leader, free the tuplestore.
*/
- if (node->leader == node)
+ if (node->leader != NULL && node->leader == node)
{
tuplestore_end(node->cte_table);
node->cte_table = NULL;
+ node->leader = NULL;
}
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index fe4ae55c0f..1357ccf3c9 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -300,17 +300,20 @@ ExecEndForeignScan(ForeignScanState *node)
EState *estate = node->ss.ps.state;
/* Let the FDW shut down */
- if (plan->operation != CMD_SELECT)
+ if (node->fdwroutine != NULL)
{
- if (estate->es_epq_active == NULL)
- node->fdwroutine->EndDirectModify(node);
+ if (plan->operation != CMD_SELECT)
+ {
+ if (estate->es_epq_active == NULL)
+ node->fdwroutine->EndDirectModify(node);
+ }
+ else
+ node->fdwroutine->EndForeignScan(node);
}
- else
- node->fdwroutine->EndForeignScan(node);
/* Shut down any outer plan. */
- if (outerPlanState(node))
- ExecEndNode(outerPlanState(node));
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 5d4ffe989c..cae5ea1f92 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -244,6 +244,7 @@ void
ExecEndGather(GatherState *node)
{
ExecEndNode(outerPlanState(node)); /* let children clean up first */
+ outerPlanState(node) = NULL;
ExecShutdownGather(node);
}
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 45f6017c29..b36cd89e7d 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -284,6 +284,7 @@ void
ExecEndGatherMerge(GatherMergeState *node)
{
ExecEndNode(outerPlanState(node)); /* let children clean up first */
+ outerPlanState(node) = NULL;
ExecShutdownGatherMerge(node);
}
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index da32bec181..807429e504 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -225,10 +225,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
void
ExecEndGroup(GroupState *node)
{
- PlanState *outerPlan;
-
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 570a90ebe1..a913d5b50c 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -427,13 +427,11 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
void
ExecEndHash(HashState *node)
{
- PlanState *outerPlan;
-
/*
* shut down the subplan
*/
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 2f7170604d..901c9e9be7 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -950,7 +950,7 @@ ExecEndHashJoin(HashJoinState *node)
/*
* Free hash table
*/
- if (node->hj_HashTable)
+ if (node->hj_HashTable != NULL)
{
ExecHashTableDestroy(node->hj_HashTable);
node->hj_HashTable = NULL;
@@ -960,7 +960,9 @@ ExecEndHashJoin(HashJoinState *node)
* clean up subtrees
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
}
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 2ce5ed5ec8..010bcfafa8 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1078,8 +1078,16 @@ ExecEndIncrementalSort(IncrementalSortState *node)
{
SO_printf("ExecEndIncrementalSort: shutting down sort node\n");
- ExecDropSingleTupleTableSlot(node->group_pivot);
- ExecDropSingleTupleTableSlot(node->transfer_tuple);
+ if (node->group_pivot != NULL)
+ {
+ ExecDropSingleTupleTableSlot(node->group_pivot);
+ node->group_pivot = NULL;
+ }
+ if (node->transfer_tuple != NULL)
+ {
+ ExecDropSingleTupleTableSlot(node->transfer_tuple);
+ node->transfer_tuple = NULL;
+ }
/*
* Release tuplesort resources.
@@ -1099,6 +1107,7 @@ ExecEndIncrementalSort(IncrementalSortState *node)
* Shut down the subplan.
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
SO_printf("ExecEndIncrementalSort: sort node shutdown\n");
}
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 612c673895..481d479760 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -397,15 +397,6 @@ ExecReScanIndexOnlyScan(IndexOnlyScanState *node)
void
ExecEndIndexOnlyScan(IndexOnlyScanState *node)
{
- Relation indexRelationDesc;
- IndexScanDesc indexScanDesc;
-
- /*
- * extract information from the node
- */
- indexRelationDesc = node->ioss_RelationDesc;
- indexScanDesc = node->ioss_ScanDesc;
-
/* Release VM buffer pin, if any. */
if (node->ioss_VMBuffer != InvalidBuffer)
{
@@ -413,13 +404,21 @@ ExecEndIndexOnlyScan(IndexOnlyScanState *node)
node->ioss_VMBuffer = InvalidBuffer;
}
+ /* close the scan (no-op if we didn't start it) */
+ if (node->ioss_ScanDesc != NULL)
+ {
+ index_endscan(node->ioss_ScanDesc);
+ node->ioss_ScanDesc = NULL;
+ }
+
/*
* close the index relation (no-op if we didn't open it)
*/
- if (indexScanDesc)
- index_endscan(indexScanDesc);
- if (indexRelationDesc)
- index_close(indexRelationDesc, NoLock);
+ if (node->ioss_RelationDesc != NULL)
+ {
+ index_close(node->ioss_RelationDesc, NoLock);
+ node->ioss_RelationDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 8000feff4c..a8172d8b82 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -784,22 +784,21 @@ ExecIndexAdvanceArrayKeys(IndexArrayKeyInfo *arrayKeys, int numArrayKeys)
void
ExecEndIndexScan(IndexScanState *node)
{
- Relation indexRelationDesc;
- IndexScanDesc indexScanDesc;
-
- /*
- * extract information from the node
- */
- indexRelationDesc = node->iss_RelationDesc;
- indexScanDesc = node->iss_ScanDesc;
+ /* close the scan (no-op if we didn't start it) */
+ if (node->iss_ScanDesc != NULL)
+ {
+ index_endscan(node->iss_ScanDesc);
+ node->iss_ScanDesc = NULL;
+ }
/*
* close the index relation (no-op if we didn't open it)
*/
- if (indexScanDesc)
- index_endscan(indexScanDesc);
- if (indexRelationDesc)
- index_close(indexRelationDesc, NoLock);
+ if (node->iss_RelationDesc != NULL)
+ {
+ index_close(node->iss_RelationDesc, NoLock);
+ node->iss_RelationDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index e6f1fb1562..eb7b6e52be 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -534,6 +534,7 @@ void
ExecEndLimit(LimitState *node)
{
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 41754ddfea..0d3489195b 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -387,6 +387,7 @@ ExecEndLockRows(LockRowsState *node)
/* We may have shut down EPQ already, but no harm in another call */
EvalPlanQualEnd(&node->lr_epqstate);
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 22e1787fbd..883e3f3933 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -243,13 +243,16 @@ ExecEndMaterial(MaterialState *node)
* Release tuplestore resources
*/
if (node->tuplestorestate != NULL)
+ {
tuplestore_end(node->tuplestorestate);
- node->tuplestorestate = NULL;
+ node->tuplestorestate = NULL;
+ }
/*
* shut down the subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index df8e3fff08..690dee1daa 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -1128,12 +1128,17 @@ ExecEndMemoize(MemoizeState *node)
}
/* Remove the cache context */
- MemoryContextDelete(node->tableContext);
+ if (node->tableContext != NULL)
+ {
+ MemoryContextDelete(node->tableContext);
+ node->tableContext = NULL;
+ }
/*
* shut down the subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index e1b9b984a7..3236444cf1 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -333,7 +333,10 @@ ExecEndMergeAppend(MergeAppendState *node)
* shut down each of the subscans
*/
for (i = 0; i < nplans; i++)
+ {
ExecEndNode(mergeplans[i]);
+ mergeplans[i] = NULL;
+ }
}
void
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 29c54fcd75..926e631d88 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1647,7 +1647,9 @@ ExecEndMergeJoin(MergeJoinState *node)
* shut down the subplans
*/
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
MJ1_printf("ExecEndMergeJoin: %s\n",
"node processing ended");
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 8bf4c80d4a..9e56f9c36c 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4724,7 +4724,9 @@ ExecEndModifyTable(ModifyTableState *node)
for (j = 0; j < resultRelInfo->ri_NumSlotsInitialized; j++)
{
ExecDropSingleTupleTableSlot(resultRelInfo->ri_Slots[j]);
+ resultRelInfo->ri_Slots[j] = NULL;
ExecDropSingleTupleTableSlot(resultRelInfo->ri_PlanSlots[j]);
+ resultRelInfo->ri_PlanSlots[j] = NULL;
}
}
@@ -4732,12 +4734,16 @@ ExecEndModifyTable(ModifyTableState *node)
* Close all the partitioned tables, leaf partitions, and their indices
* and release the slot used for tuple routing, if set.
*/
- if (node->mt_partition_tuple_routing)
+ if (node->mt_partition_tuple_routing != NULL)
{
ExecCleanupTupleRouting(node, node->mt_partition_tuple_routing);
+ node->mt_partition_tuple_routing = NULL;
- if (node->mt_root_tuple_slot)
+ if (node->mt_root_tuple_slot != NULL)
+ {
ExecDropSingleTupleTableSlot(node->mt_root_tuple_slot);
+ node->mt_root_tuple_slot = NULL;
+ }
}
/*
@@ -4749,6 +4755,7 @@ ExecEndModifyTable(ModifyTableState *node)
* shut down subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index 7f4bf6c4db..01f3d56a3b 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -367,7 +367,9 @@ ExecEndNestLoop(NestLoopState *node)
* close down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
NL1_printf("ExecEndNestLoop: %s\n",
"node processing ended");
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index e483730015..ca9a5e2ed2 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -331,6 +331,7 @@ ExecEndProjectSet(ProjectSetState *node)
* shut down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index c7f8a19fa4..7680142c7b 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -272,20 +272,36 @@ void
ExecEndRecursiveUnion(RecursiveUnionState *node)
{
/* Release tuplestores */
- tuplestore_end(node->working_table);
- tuplestore_end(node->intermediate_table);
+ if (node->working_table != NULL)
+ {
+ tuplestore_end(node->working_table);
+ node->working_table = NULL;
+ }
+ if (node->intermediate_table != NULL)
+ {
+ tuplestore_end(node->intermediate_table);
+ node->intermediate_table = NULL;
+ }
/* free subsidiary stuff including hashtable */
- if (node->tempContext)
+ if (node->tempContext != NULL)
+ {
MemoryContextDelete(node->tempContext);
- if (node->tableContext)
+ node->tempContext = NULL;
+ }
+ if (node->tableContext != NULL)
+ {
MemoryContextDelete(node->tableContext);
+ node->tableContext = NULL;
+ }
/*
* close down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
ExecEndNode(innerPlanState(node));
+ innerPlanState(node) = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 348361e7f4..e3cfc9b772 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -243,6 +243,7 @@ ExecEndResult(ResultState *node)
* shut down subplans
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
void
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 714b076e64..6ab91001bc 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -181,14 +181,17 @@ ExecEndSampleScan(SampleScanState *node)
/*
* Tell sampling function that we finished the scan.
*/
- if (node->tsmroutine->EndSampleScan)
+ if (node->tsmroutine != NULL && node->tsmroutine->EndSampleScan)
node->tsmroutine->EndSampleScan(node);
/*
- * close heap scan
+ * close heap scan (no-op if we didn't start it)
*/
if (node->ss.ss_currentScanDesc)
+ {
table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 7cb12a11c2..b052775e5b 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -183,18 +183,14 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
void
ExecEndSeqScan(SeqScanState *node)
{
- TableScanDesc scanDesc;
-
- /*
- * get information from node
- */
- scanDesc = node->ss.ss_currentScanDesc;
-
/*
- * close heap scan
+ * close heap scan (no-op if we didn't start it)
*/
- if (scanDesc != NULL)
- table_endscan(scanDesc);
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
+ table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index a8ac68b482..fe34b2134f 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -583,10 +583,14 @@ void
ExecEndSetOp(SetOpState *node)
{
/* free subsidiary stuff including hashtable */
- if (node->tableContext)
+ if (node->tableContext != NULL)
+ {
MemoryContextDelete(node->tableContext);
+ node->tableContext = NULL;
+ }
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index 3fc925d7b4..af852464d0 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -307,13 +307,16 @@ ExecEndSort(SortState *node)
* Release tuplesort resources
*/
if (node->tuplesortstate != NULL)
+ {
tuplesort_end((Tuplesortstate *) node->tuplesortstate);
- node->tuplesortstate = NULL;
+ node->tuplesortstate = NULL;
+ }
/*
* shut down the subplan
*/
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
SO1_printf("ExecEndSort: %s\n",
"sort node shutdown");
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 782097eaf2..0b2612183a 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -171,6 +171,7 @@ ExecEndSubqueryScan(SubqueryScanState *node)
* close down subquery
*/
ExecEndNode(node->subplan);
+ node->subplan = NULL;
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTableFuncscan.c b/src/backend/executor/nodeTableFuncscan.c
index f483221bb8..778d25d511 100644
--- a/src/backend/executor/nodeTableFuncscan.c
+++ b/src/backend/executor/nodeTableFuncscan.c
@@ -223,8 +223,10 @@ ExecEndTableFuncScan(TableFuncScanState *node)
* Release tuplestore resources
*/
if (node->tupstore != NULL)
+ {
tuplestore_end(node->tupstore);
- node->tupstore = NULL;
+ node->tupstore = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 9aa7683d7e..702ee884d2 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -326,10 +326,14 @@ ExecReScanTidRangeScan(TidRangeScanState *node)
void
ExecEndTidRangeScan(TidRangeScanState *node)
{
- TableScanDesc scan = node->ss.ss_currentScanDesc;
-
- if (scan != NULL)
- table_endscan(scan);
+ /*
+ * close heap scan (no-op if we didn't start it)
+ */
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
+ table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 864a9013b6..f375951699 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -469,8 +469,14 @@ ExecReScanTidScan(TidScanState *node)
void
ExecEndTidScan(TidScanState *node)
{
- if (node->ss.ss_currentScanDesc)
+ /*
+ * close heap scan (no-op if we didn't start it)
+ */
+ if (node->ss.ss_currentScanDesc != NULL)
+ {
table_endscan(node->ss.ss_currentScanDesc);
+ node->ss.ss_currentScanDesc = NULL;
+ }
}
/* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index a125923e93..b82d0e9ad5 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -168,6 +168,7 @@ void
ExecEndUnique(UniqueState *node)
{
ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 3221fa1522..561d7e731d 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -1351,11 +1351,14 @@ release_partition(WindowAggState *winstate)
* any aggregate temp data). We don't rely on retail pfree because some
* aggregates might have allocated data we don't have direct pointers to.
*/
- MemoryContextReset(winstate->partcontext);
- MemoryContextReset(winstate->aggcontext);
+ if (winstate->partcontext != NULL)
+ MemoryContextReset(winstate->partcontext);
+ if (winstate->aggcontext != NULL)
+ MemoryContextReset(winstate->aggcontext);
for (i = 0; i < winstate->numaggs; i++)
{
- if (winstate->peragg[i].aggcontext != winstate->aggcontext)
+ if (winstate->peragg[i].aggcontext != NULL &&
+ winstate->peragg[i].aggcontext != winstate->aggcontext)
MemoryContextReset(winstate->peragg[i].aggcontext);
}
@@ -2681,24 +2684,40 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
void
ExecEndWindowAgg(WindowAggState *node)
{
- PlanState *outerPlan;
int i;
release_partition(node);
for (i = 0; i < node->numaggs; i++)
{
- if (node->peragg[i].aggcontext != node->aggcontext)
+ if (node->peragg[i].aggcontext != NULL &&
+ node->peragg[i].aggcontext != node->aggcontext)
MemoryContextDelete(node->peragg[i].aggcontext);
}
- MemoryContextDelete(node->partcontext);
- MemoryContextDelete(node->aggcontext);
+ if (node->partcontext != NULL)
+ {
+ MemoryContextDelete(node->partcontext);
+ node->partcontext = NULL;
+ }
+ if (node->aggcontext != NULL)
+ {
+ MemoryContextDelete(node->aggcontext);
+ node->aggcontext = NULL;
+ }
- pfree(node->perfunc);
- pfree(node->peragg);
+ if (node->perfunc != NULL)
+ {
+ pfree(node->perfunc);
+ node->perfunc = NULL;
+ }
+ if (node->peragg != NULL)
+ {
+ pfree(node->peragg);
+ node->peragg = NULL;
+ }
- outerPlan = outerPlanState(node);
- ExecEndNode(outerPlan);
+ ExecEndNode(outerPlanState(node));
+ outerPlanState(node) = NULL;
}
/* -----------------
--
2.43.0
v52-0003-Handle-CachedPlan-invalidation-in-the-executor.patchapplication/octet-stream; name=v52-0003-Handle-CachedPlan-invalidation-in-the-executor.patchDownload
From 533abeac5857c4ac2950c8eb1699485b46cce9c7 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 22 Aug 2024 19:38:13 +0900
Subject: [PATCH v52 3/3] Handle CachedPlan invalidation in the executor
This commit makes changes to handle cases where a cached plan
becomes invalid before deferred locks on prunable relations are taken.
* Add checks at various points in ExecutorStart() and its called
functions to determine if the plan becomes invalid. If detected,
the function and its callers return immediately. A previous commit
ensures any partially initialized PlanState tree objects are cleaned
up appropriately.
* Introduce ExecutorStartExt(), a wrapper over ExecutorStart(), to
handle cases where plan initialization is aborted due to invalidation.
ExecutorStartExt() creates a new transient CachedPlan if needed and
retries execution. This new entry point is only required for sites
using plancache.c. It requires passing the QueryDesc, eflags,
CachedPlanSource, and query_index (index in CachedPlanSource.query_list).
* Add GetSingleCachedPlan() in plancache.c to create a transient
CachedPlan for a specified query in the given CachedPlanSource.
Such CachedPlans are tracked in a separate global list for the
plancache invalidation callbacks to check.
This also adds isolation tests using the delay_execution test module
to verify scenarios where a CachedPlan becomes invalid before the
deferred locks are taken.
All ExecutorStart_hook implementations now must add the following
block after the ExecutorStart() call to ensure it doesn't work with an
invalid plan:
/* The plan may have become invalid during ExecutorStart() */
if (!ExecPlanStillValid(queryDesc->estate))
return;
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.comk
---
contrib/auto_explain/auto_explain.c | 4 +
.../pg_stat_statements/pg_stat_statements.c | 4 +
contrib/postgres_fdw/postgres_fdw.c | 36 ++-
src/backend/commands/explain.c | 8 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 10 +-
src/backend/commands/trigger.c | 14 ++
src/backend/executor/README | 32 ++-
src/backend/executor/execMain.c | 99 ++++++++-
src/backend/executor/execParallel.c | 4 +-
src/backend/executor/execPartition.c | 10 +
src/backend/executor/execProcnode.c | 7 +
src/backend/executor/execUtils.c | 42 +++-
src/backend/executor/nodeAgg.c | 2 +
src/backend/executor/nodeAppend.c | 12 +-
src/backend/executor/nodeBitmapAnd.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 4 +
src/backend/executor/nodeBitmapIndexscan.c | 6 +-
src/backend/executor/nodeBitmapOr.c | 2 +
src/backend/executor/nodeCustom.c | 2 +
src/backend/executor/nodeForeignscan.c | 4 +
src/backend/executor/nodeGather.c | 2 +
src/backend/executor/nodeGatherMerge.c | 2 +
src/backend/executor/nodeGroup.c | 2 +
src/backend/executor/nodeHash.c | 2 +
src/backend/executor/nodeHashjoin.c | 4 +
src/backend/executor/nodeIncrementalSort.c | 2 +
src/backend/executor/nodeIndexonlyscan.c | 7 +-
src/backend/executor/nodeIndexscan.c | 8 +-
src/backend/executor/nodeLimit.c | 2 +
src/backend/executor/nodeLockRows.c | 2 +
src/backend/executor/nodeMaterial.c | 2 +
src/backend/executor/nodeMemoize.c | 2 +
src/backend/executor/nodeMergeAppend.c | 6 +-
src/backend/executor/nodeMergejoin.c | 4 +
src/backend/executor/nodeModifyTable.c | 13 ++
src/backend/executor/nodeNestloop.c | 4 +
src/backend/executor/nodeProjectSet.c | 2 +
src/backend/executor/nodeRecursiveunion.c | 4 +
src/backend/executor/nodeResult.c | 2 +
src/backend/executor/nodeSamplescan.c | 3 +
src/backend/executor/nodeSeqscan.c | 3 +
src/backend/executor/nodeSetOp.c | 2 +
src/backend/executor/nodeSort.c | 2 +
src/backend/executor/nodeSubqueryscan.c | 2 +
src/backend/executor/nodeTidrangescan.c | 2 +
src/backend/executor/nodeTidscan.c | 2 +
src/backend/executor/nodeUnique.c | 2 +
src/backend/executor/nodeWindowAgg.c | 2 +
src/backend/executor/spi.c | 19 +-
src/backend/tcop/postgres.c | 4 +-
src/backend/tcop/pquery.c | 31 ++-
src/backend/utils/cache/plancache.c | 206 ++++++++++++++++++
src/backend/utils/mmgr/portalmem.c | 4 +-
src/include/commands/explain.h | 1 +
src/include/commands/trigger.h | 1 +
src/include/executor/execdesc.h | 1 +
src/include/executor/executor.h | 18 ++
src/include/nodes/execnodes.h | 1 +
src/include/utils/plancache.h | 26 +++
src/include/utils/portal.h | 4 +-
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 63 +++++-
.../expected/cached-plan-inval.out | 175 +++++++++++++++
src/test/modules/delay_execution/meson.build | 1 +
.../specs/cached-plan-inval.spec | 65 ++++++
66 files changed, 962 insertions(+), 58 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-inval.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-inval.spec
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 677c135f59..3675ce9a88 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -300,6 +300,10 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
+ /* The plan may have become invalid during ExecutorStart() */
+ if (!ExecPlanStillValid(queryDesc->estate))
+ return;
+
if (auto_explain_enabled())
{
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 362d222f63..98a328b79f 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -992,6 +992,10 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
+ /* The plan may have become invalid during ExecutorStart() */
+ if (!ExecPlanStillValid(queryDesc->estate))
+ return;
+
/*
* If query has queryId zero, don't track it. This prevents double
* counting of optimizable statements that are directly contained in
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index adc62576d1..65f4ffe5ee 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2144,7 +2144,11 @@ postgresEndForeignModify(EState *estate,
{
PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
- /* If fmstate is NULL, we are in EXPLAIN; nothing to do */
+ /*
+ * fmstate could be NULL under two conditions: during an EXPLAIN
+ * operation or if BeginForeignModify() hasn't been invoked.
+ * In either case, no action is required.
+ */
if (fmstate == NULL)
return;
@@ -2650,8 +2654,9 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
{
ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
EState *estate = node->ss.ps.state;
+ Relation rel = node->ss.ss_currentRelation;
PgFdwDirectModifyState *dmstate;
- Index rtindex;
+ Index rtindex = node->resultRelInfo->ri_RangeTableIndex;
Oid userid;
ForeignTable *table;
UserMapping *user;
@@ -2663,24 +2668,32 @@ postgresBeginDirectModify(ForeignScanState *node, int eflags)
if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
return;
+ /*
+ * Open the foreign table using the RT index given in the ResultRelInfo if
+ * the ScanState doesn't provide it. If the plan becomes invalid as a
+ * result of taking a lock in ExecOpenScanRelation(), do nothing, in which
+ * case node->fdw_state remains NULL.
+ */
+ if (rel == NULL)
+ {
+ Assert(fsplan->scan.scanrelid == 0);
+ rel = ExecOpenScanRelation(estate, rtindex, eflags);
+ if (unlikely(rel == NULL || !ExecPlanStillValid(estate)))
+ return;
+ }
+
/*
* We'll save private state in node->fdw_state.
*/
dmstate = (PgFdwDirectModifyState *) palloc0(sizeof(PgFdwDirectModifyState));
node->fdw_state = (void *) dmstate;
+ dmstate->rel = rel;
/*
* Identify which user to do the remote access as. This should match what
* ExecCheckPermissions() does.
*/
userid = OidIsValid(fsplan->checkAsUser) ? fsplan->checkAsUser : GetUserId();
-
- /* Get info about foreign table. */
- rtindex = node->resultRelInfo->ri_RangeTableIndex;
- if (fsplan->scan.scanrelid == 0)
- dmstate->rel = ExecOpenScanRelation(estate, rtindex, eflags);
- else
- dmstate->rel = node->ss.ss_currentRelation;
table = GetForeignTable(RelationGetRelid(dmstate->rel));
user = GetUserMapping(userid, table->serverid);
@@ -2811,7 +2824,10 @@ postgresEndDirectModify(ForeignScanState *node)
{
PgFdwDirectModifyState *dmstate = (PgFdwDirectModifyState *) node->fdw_state;
- /* if dmstate is NULL, we are in EXPLAIN; nothing to do */
+ /*
+ * Nothing to do if dmstate is NULL, either because we are in EXPLAIN or
+ * dmstate wasn't initialized due to aborted plan initialization.
+ */
if (dmstate == NULL)
return;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a83ea07db1..a7643360a7 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -507,7 +507,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, NULL, -1, into, es, queryString, params,
+ queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -616,6 +617,7 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
*/
void
ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int query_index,
IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
@@ -686,8 +688,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
if (into)
eflags |= GetIntoRelEFlags(into);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
+ /* Call ExecutorStartExt to prepare the plan for execution. */
+ ExecutorStartExt(queryDesc, eflags, plansource, query_index);
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 4f6acf6719..4b1503c05e 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -107,6 +107,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 311b9ebd5b..4cd79a6e3a 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -202,7 +202,8 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- cplan);
+ cplan,
+ entry->plansource);
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
@@ -583,6 +584,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ int i = 0;
if (es->memory)
{
@@ -655,8 +657,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, cplan, into, es, query_string, paramLI,
- queryEnv,
+ ExplainOnePlan(pstmt, cplan, entry->plansource, i,
+ into, es, query_string, paramLI, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
@@ -668,6 +670,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Separate plans with an appropriate separator */
if (lnext(plan_list, p) != NULL)
ExplainSeparatePlans(es);
+
+ i++;
}
if (estate)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 170360edda..91e4b821a0 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5119,6 +5119,20 @@ AfterTriggerEndQuery(EState *estate)
afterTriggers.query_depth--;
}
+/* ----------
+ * AfterTriggerAbortQuery()
+ *
+ * Called by ExecutorEnd() if the query execution was aborted due to the
+ * plan becoming invalid during initialization.
+ * ----------
+ */
+void
+AfterTriggerAbortQuery(void)
+{
+ /* Revert the actions of AfterTriggerBeginQuery(). */
+ afterTriggers.query_depth--;
+}
+
/*
* AfterTriggerFreeQuery
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 642d63be61..e583df5be0 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,28 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Typically, when the executor initializes a plan tree for execution, it doesn't
+lock non-index relations if the plan tree is freshly generated and not derived
+from a CachedPlan. This is because such locks have already been established
+during the query's parsing, rewriting, and planning phases. However, with a
+cached plan tree, some relations may remain unlocked. The function
+AcquireExecutorLocks() only locks unprunable relations in the plan, deferring
+the locking of prunable ones to executor initialization. This avoids
+unnecessary locking of relations that will be pruned during "initial" runtime
+pruning in the ExecInitNode() routine of nodes containing the pruning info.
+
+This approach creates a window where a cached plan tree with child tables
+could become outdated if another backend modifies these tables before
+ExecInitNode() locks them. As a result, the executor has the added duty to
+verify the plan tree's validity whenever it locks a child table after
+execution-initialization-pruning. This validation is done by checking the
+CachedPlan.is_valid attribute. If the plan tree is outdated (is_valid=false),
+the executor halts further initialization, cleans up the partially initialized
+PlanState tree, and retries execution after creating a new transient
+CachedPlan.
Query Processing Control Flow
-----------------------------
@@ -288,7 +310,7 @@ This is a sketch of control flow for full query processing:
CreateQueryDesc
- ExecutorStart
+ ExecutorStart or ExecutorStartExt
CreateExecutorState
creates per-query context
switch to per-query context to run ExecInitNode
@@ -316,7 +338,13 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale during one of the recursive calls of ExecInitNode() after taking a
+lock on a child table, the control is immmediately returned to
+ExecutorStartExt(), which will create a new plan tree and perform the
+steps starting from CreateExecutorState() again.
+
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0f6dbd1e2b..000d02a337 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -58,6 +58,7 @@
#include "utils/backend_status.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
+#include "utils/plancache.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
@@ -133,6 +134,60 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
standard_ExecutorStart(queryDesc, eflags);
}
+/*
+ * A variant of ExecutorStart() that handles cleanup and replanning if the
+ * input CachedPlan becomes invalid due to locks being taken during
+ * ExecutorStartInternal(). If that happens, a new CachedPlan is created
+ * only for the at the index 'query_index' in plansource->query_list, which
+ * is released separately from the original CachedPlan.
+ */
+void
+ExecutorStartExt(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index)
+{
+ if (queryDesc->cplan == NULL)
+ {
+ ExecutorStart(queryDesc, eflags);
+ return;
+ }
+
+ while (1)
+ {
+ ExecutorStart(queryDesc, eflags);
+ if (!CachedPlanValid(queryDesc->cplan))
+ {
+ CachedPlan *cplan;
+
+ /*
+ * The plan got invalidated, so try with a new updated plan.
+ *
+ * But first undo what ExecutorStart() would've done. Mark
+ * execution as aborted to ensure that AFTER trigger state is
+ * properly reset.
+ */
+ queryDesc->estate->es_aborted = true;
+ ExecutorEnd(queryDesc);
+
+ cplan = GetSingleCachedPlan(plansource, query_index,
+ queryDesc->queryEnv);
+
+ /*
+ * Install the new transient cplan into the QueryDesc replacing
+ * the old one so that executor initialization code can see it.
+ * Mark it as in use by us and ask FreeQueryDesc() to release it.
+ */
+ cplan->refcount = 1;
+ queryDesc->cplan = cplan;
+ queryDesc->cplan_release = true;
+ queryDesc->plannedstmt = linitial_node(PlannedStmt,
+ queryDesc->cplan->stmt_list);
+ }
+ else
+ break; /* ExecutorStart() succeeded! */
+ }
+}
+
void
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
@@ -316,6 +371,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
estate = queryDesc->estate;
Assert(estate != NULL);
+ Assert(!estate->es_aborted);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* caller must ensure the query's snapshot is active */
@@ -422,8 +478,11 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
Assert(estate != NULL);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
- /* This should be run once and only once per Executor instance */
- Assert(!estate->es_finished);
+ /*
+ * This should be run once and only once per Executor instance and never
+ * if the execution was aborted.
+ */
+ Assert(!estate->es_finished && !estate->es_aborted);
/* Switch into per-query memory context */
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -482,11 +541,10 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
Assert(estate != NULL);
/*
- * Check that ExecutorFinish was called, unless in EXPLAIN-only mode. This
- * Assert is needed because ExecutorFinish is new as of 9.1, and callers
- * might forget to call it.
+ * Check that ExecutorFinish was called, unless in EXPLAIN-only mode or if
+ * execution was aborted.
*/
- Assert(estate->es_finished ||
+ Assert(estate->es_finished || estate->es_aborted ||
(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -500,6 +558,14 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
UnregisterSnapshot(estate->es_snapshot);
UnregisterSnapshot(estate->es_crosscheck_snapshot);
+ /*
+ * Reset AFTER trigger module if the query execution was aborted.
+ */
+ if (estate->es_aborted &&
+ !(estate->es_top_eflags &
+ (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
+ AfterTriggerAbortQuery();
+
/*
* Must switch out of context before destroying it
*/
@@ -832,7 +898,6 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
-
/* ----------------------------------------------------------------
* InitPlan
*
@@ -897,6 +962,9 @@ InitPlan(QueryDesc *queryDesc, int eflags)
case ROW_MARK_KEYSHARE:
case ROW_MARK_REFERENCE:
relation = ExecGetRangeTableRelation(estate, rc->rti);
+ if (unlikely(relation == NULL ||
+ !ExecPlanStillValid(estate)))
+ return;
break;
case ROW_MARK_COPY:
/* no physical table access is required */
@@ -967,6 +1035,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_subplanstates = lappend(estate->es_subplanstates,
subplanstate);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return;
i++;
}
@@ -977,6 +1047,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* processing tuples.
*/
planstate = ExecInitNode(plan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return;
/*
* Get the tuple descriptor describing the type of tuples to return.
@@ -2858,6 +2930,7 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
rcestate->es_rowmarks = parentestate->es_rowmarks;
rcestate->es_rteperminfos = parentestate->es_rteperminfos;
rcestate->es_plannedstmt = parentestate->es_plannedstmt;
+ rcestate->es_cachedplan = parentestate->es_cachedplan;
rcestate->es_junkFilter = parentestate->es_junkFilter;
rcestate->es_output_cid = parentestate->es_output_cid;
rcestate->es_queryEnv = parentestate->es_queryEnv;
@@ -2936,6 +3009,14 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
subplanstate = ExecInitNode(subplan, rcestate, 0);
rcestate->es_subplanstates = lappend(rcestate->es_subplanstates,
subplanstate);
+
+ /*
+ * All necessary locks should have been taken when initializing the
+ * parent's copy of subplanstate, so the CachedPlan, if any, should
+ * not have become invalid during the above ExecInitNode().
+ */
+ if (!ExecPlanStillValid(rcestate))
+ elog(ERROR, "unexpected failure to initialize subplan in EvalPlanQualStart()");
}
/*
@@ -2977,6 +3058,10 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
*/
epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
+ /* See the comment above. */
+ if (!ExecPlanStillValid(rcestate))
+ elog(ERROR, "unexpected failure to initialize main plantree in EvalPlanQualStart()");
+
MemoryContextSwitchTo(oldcontext);
}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 03b48e12b4..2017433c64 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1263,9 +1263,7 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
* if it should take locks on certain relations, but paraller workers
* always take locks anyway.
*/
- return CreateQueryDesc(pstmt,
- NULL,
- queryString,
+ return CreateQueryDesc(pstmt, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
}
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7651886229..38cd97b59c 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1794,6 +1794,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
* maps will be needed for subsequent execution pruning passes.
+ *
+ * Returns NULL if the plan has become invalid after taking the locks to
+ * create the PartitionPruneState in CreatePartitionPruneState().
*/
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
@@ -1809,6 +1812,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/* Create the working data structure for pruning */
prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ if (!ExecPlanStillValid(estate))
+ return NULL;
/*
* Perform an initial partition prune pass, if required.
@@ -1860,6 +1865,9 @@ ExecInitPartitionPruning(PlanState *planstate,
* stored in each PartitionedRelPruningData can be re-used each time we
* re-evaluate which partitions match the pruning steps provided in each
* PartitionedRelPruneInfo.
+ *
+ * Returns NULL if the plan has become invalid after taking a lock to create
+ * a PartitionedRelPruningData.
*/
static PartitionPruneState *
CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
@@ -1935,6 +1943,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* duration of this executor run.
*/
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+ if (unlikely(partrel == NULL || !ExecPlanStillValid(estate)))
+ return NULL;
partkey = RelationGetPartitionKey(partrel);
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 34f28dfece..7689d34dd0 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -136,6 +136,10 @@ static bool ExecShutdownNode_walker(PlanState *node, void *context);
* 'eflags' is a bitwise OR of flag bits described in executor.h
*
* Returns a PlanState node corresponding to the given Plan node.
+ *
+ * Callers should check upon returning that ExecPlanStillValid(estate)
+ * returns true before continuing further with its processing, because the
+ * returned PlanState might be only partially valid otherwise.
* ------------------------------------------------------------------------
*/
PlanState *
@@ -388,6 +392,9 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
break;
}
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return result;
+
ExecSetExecProcNode(result, result->ExecProcNode);
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 6dfd5a26b7..39b388e6b4 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -146,6 +146,7 @@ CreateExecutorState(void)
estate->es_top_eflags = 0;
estate->es_instrument = 0;
estate->es_finished = false;
+ estate->es_aborted = false;
estate->es_exprcontexts = NIL;
@@ -691,6 +692,8 @@ ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
*
* Open the heap relation to be scanned by a base-level scan plan node.
* This should be called during the node's ExecInit routine.
+ *
+ * NULL is returned if the relation is found to have been dropped.
* ----------------------------------------------------------------
*/
Relation
@@ -700,6 +703,8 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
/* Open the relation. */
rel = ExecGetRangeTableRelation(estate, scanrelid);
+ if (unlikely(rel == NULL || !ExecPlanStillValid(estate)))
+ return rel;
/*
* Complain if we're attempting a scan of an unscannable relation, except
@@ -717,6 +722,26 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
return rel;
}
+/* ----------------------------------------------------------------
+ * ExecOpenScanIndexRelation
+ *
+ * Open the index relation to be scanned by an index scan plan node.
+ * This should be called during the node's ExecInit routine.
+ * ----------------------------------------------------------------
+ */
+Relation
+ExecOpenScanIndexRelation(EState *estate, Oid indexid, int lockmode)
+{
+ Relation rel;
+
+ /* Open the index. */
+ rel = index_open(indexid, lockmode);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ elog(DEBUG2, "CachedPlan invalidated on locking index %u", indexid);
+
+ return rel;
+}
+
/*
* ExecInitRangeTable
* Set up executor's range-table-related data
@@ -776,8 +801,12 @@ ExecShouldLockRelation(EState *estate, Index rtindex)
* ExecGetRangeTableRelation
* Open the Relation for a range table entry, if not already done
*
- * The Relations will be closed again in ExecEndPlan().
+ * The Relations will be closed in ExecEndPlan().
+ *
+ * The returned value may be NULL if the relation is a prunable relation
+ * that has not been locked and may have been concurrently dropped.
*/
+
Relation
ExecGetRangeTableRelation(EState *estate, Index rti)
{
@@ -820,8 +849,14 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
* that of a prunable relation and we're running a cached generic
* plan. AcquireExecutorLocks() of plancache.c would have locked
* only the unprunable relations in the plan tree.
+ *
+ * Note that we use try_table_open() here, because without a lock
+ * held on the relation, it may have disappeared from under us.
*/
- rel = table_open(rte->relid, rte->rellockmode);
+ rel = try_table_open(rte->relid, rte->rellockmode);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ elog(DEBUG2, "CachedPlan invalidated on locking relation %u",
+ rte->relid);
}
estate->es_relations[rti - 1] = rel;
@@ -845,6 +880,9 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
Relation resultRelationDesc;
resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+ if (unlikely(resultRelationDesc == NULL ||
+ !ExecPlanStillValid(estate)))
+ return;
InitResultRelInfo(resultRelInfo,
resultRelationDesc,
rti,
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 0dfba5ca16..8c40d8c520 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3303,6 +3303,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
eflags &= ~EXEC_FLAG_REWIND;
outerPlan = outerPlan(node);
outerPlanState(aggstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return aggstate;
/*
* initialize source tuple type.
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 86d75b1a7e..3c82a1ceab 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -147,6 +147,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
list_length(node->appendplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return appendstate;
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -185,8 +187,10 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->ps.resultopsset = true;
appendstate->ps.resultopsfixed = false;
- appendplanstates = (PlanState **) palloc(nplans *
- sizeof(PlanState *));
+ appendplanstates = (PlanState **) palloc0(nplans *
+ sizeof(PlanState *));
+ appendstate->appendplans = appendplanstates;
+ appendstate->as_nplans = nplans;
/*
* call ExecInitNode on each of the valid plans to be executed and save
@@ -221,11 +225,11 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
firstvalid = j;
appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return appendstate;
}
appendstate->as_first_partial_plan = firstvalid;
- appendstate->appendplans = appendplanstates;
- appendstate->as_nplans = nplans;
/* Initialize async state */
appendstate->as_asyncplans = asyncplans;
diff --git a/src/backend/executor/nodeBitmapAnd.c b/src/backend/executor/nodeBitmapAnd.c
index ae391222bf..168c440692 100644
--- a/src/backend/executor/nodeBitmapAnd.c
+++ b/src/backend/executor/nodeBitmapAnd.c
@@ -89,6 +89,8 @@ ExecInitBitmapAnd(BitmapAnd *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return bitmapandstate;
i++;
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 19f18ab817..b13cae1cbb 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -754,11 +754,15 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(currentRelation == NULL || !ExecPlanStillValid(estate)))
+ return scanstate;
/*
* initialize child nodes
*/
outerPlanState(scanstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/*
* get the scan type from the relation descriptor.
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index 4669e8d0ce..f04a53e9be 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -252,7 +252,11 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->biss_RelationDesc = index_open(node->indexid, lockmode);
+ indexstate->biss_RelationDesc = ExecOpenScanIndexRelation(estate,
+ node->indexid,
+ lockmode);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeBitmapOr.c b/src/backend/executor/nodeBitmapOr.c
index de439235d2..980b68dd82 100644
--- a/src/backend/executor/nodeBitmapOr.c
+++ b/src/backend/executor/nodeBitmapOr.c
@@ -90,6 +90,8 @@ ExecInitBitmapOr(BitmapOr *node, EState *estate, int eflags)
{
initNode = (Plan *) lfirst(l);
bitmapplanstates[i] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return bitmaporstate;
i++;
}
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index e559cd2346..2a7c5dccd8 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -58,6 +58,8 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
if (scanrelid > 0)
{
scan_rel = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (unlikely(scan_rel == NULL || !ExecPlanStillValid(estate)))
+ return css;
css->ss.ss_currentRelation = scan_rel;
}
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 1357ccf3c9..90d5878ae3 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -172,6 +172,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (scanrelid > 0)
{
currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ if (unlikely(currentRelation == NULL || !ExecPlanStillValid(estate)))
+ return scanstate;
scanstate->ss.ss_currentRelation = currentRelation;
fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
}
@@ -263,6 +265,8 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
if (outerPlan(node))
outerPlanState(scanstate) =
ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return scanstate;
/*
* Tell the FDW to initialize the scan.
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index cae5ea1f92..67548aa7ba 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -84,6 +84,8 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return gatherstate;
tupDesc = ExecGetResultType(outerPlanState(gatherstate));
/*
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index b36cd89e7d..cf0e074359 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -103,6 +103,8 @@ ExecInitGatherMerge(GatherMerge *node, EState *estate, int eflags)
*/
outerNode = outerPlan(node);
outerPlanState(gm_state) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return gm_state;
/*
* Leader may access ExecProcNode result directly (if
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 807429e504..6d0fd9e7b4 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -184,6 +184,8 @@ ExecInitGroup(Group *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(grpstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return grpstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index a913d5b50c..e71d131d18 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -396,6 +396,8 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(hashstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return hashstate;
/*
* initialize our result slot and type. No need to build projection
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 901c9e9be7..3c870de1c5 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -758,8 +758,12 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
hashNode = (Hash *) innerPlan(node);
outerPlanState(hjstate) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return hjstate;
outerDesc = ExecGetResultType(outerPlanState(hjstate));
innerPlanState(hjstate) = ExecInitNode((Plan *) hashNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return hjstate;
innerDesc = ExecGetResultType(innerPlanState(hjstate));
/*
diff --git a/src/backend/executor/nodeIncrementalSort.c b/src/backend/executor/nodeIncrementalSort.c
index 010bcfafa8..af723ea755 100644
--- a/src/backend/executor/nodeIncrementalSort.c
+++ b/src/backend/executor/nodeIncrementalSort.c
@@ -1040,6 +1040,8 @@ ExecInitIncrementalSort(IncrementalSort *node, EState *estate, int eflags)
* nodes may be able to do something more useful.
*/
outerPlanState(incrsortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return incrsortstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 481d479760..0fba8f7d5a 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -531,6 +531,8 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(currentRelation == NULL || !ExecPlanStillValid(estate)))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -583,9 +585,12 @@ ExecInitIndexOnlyScan(IndexOnlyScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexRelation = index_open(node->indexid, lockmode);
+ indexRelation = ExecOpenScanIndexRelation(estate, node->indexid, lockmode);
indexstate->ioss_RelationDesc = indexRelation;
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return indexstate;
+
/*
* Initialize index-specific scan state
*/
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a8172d8b82..db28aeb3d6 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -907,6 +907,8 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(currentRelation == NULL || !ExecPlanStillValid(estate)))
+ return indexstate;
indexstate->ss.ss_currentRelation = currentRelation;
indexstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
@@ -951,7 +953,11 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
/* Open the index relation. */
lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
- indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+ indexstate->iss_RelationDesc = ExecOpenScanIndexRelation(estate,
+ node->indexid,
+ lockmode);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return indexstate;
/*
* Initialize index-specific scan state
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index eb7b6e52be..369c904577 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -475,6 +475,8 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(limitstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return limitstate;
/*
* initialize child expressions
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 0d3489195b..9077858413 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -322,6 +322,8 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(lrstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return lrstate;
/* node returns unmodified slots from the outer plan */
lrstate->ps.resultopsset = true;
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index 883e3f3933..972962d44d 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -214,6 +214,8 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
outerPlan = outerPlan(node);
outerPlanState(matstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return matstate;
/*
* Initialize result type and slot. No need to initialize projection info
diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 690dee1daa..6aaab743b5 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -973,6 +973,8 @@ ExecInitMemoize(Memoize *node, EState *estate, int eflags)
outerNode = outerPlan(node);
outerPlanState(mstate) = ExecInitNode(outerNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mstate;
/*
* Initialize return slot and type. No need to initialize projection info
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 3236444cf1..a82f0a71a0 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -95,6 +95,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
list_length(node->mergeplans),
node->part_prune_info,
&validsubplans);
+ if (!ExecPlanStillValid(estate))
+ return mergestate;
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
@@ -120,7 +122,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ms_prune_state = NULL;
}
- mergeplanstates = (PlanState **) palloc(nplans * sizeof(PlanState *));
+ mergeplanstates = (PlanState **) palloc0(nplans * sizeof(PlanState *));
mergestate->mergeplans = mergeplanstates;
mergestate->ms_nplans = nplans;
@@ -151,6 +153,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mergestate;
}
mergestate->ps.ps_ProjInfo = NULL;
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 926e631d88..53cb1ff207 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -1490,11 +1490,15 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
mergestate->mj_SkipMarkRestore = node->skip_mark_restore;
outerPlanState(mergestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mergestate;
outerDesc = ExecGetResultType(outerPlanState(mergestate));
innerPlanState(mergestate) = ExecInitNode(innerPlan(node), estate,
mergestate->mj_SkipMarkRestore ?
eflags :
(eflags | EXEC_FLAG_MARK));
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mergestate;
innerDesc = ExecGetResultType(innerPlanState(mergestate));
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9e56f9c36c..8debfbd3ec 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4277,6 +4277,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
linitial_int(node->resultRelations));
}
+ /*
+ * ExecInitResultRelation() may have returned without initializing
+ * rootResultRelInfo if the plan got invalidated, so check.
+ */
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mtstate;
+
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
node->epqParam, node->resultRelations);
@@ -4309,6 +4316,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
{
ExecInitResultRelation(estate, resultRelInfo, resultRelation);
+ /* See the comment above. */
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mtstate;
+
/*
* For child result relations, store the root result relation
* pointer. We do so for the convenience of places that want to
@@ -4335,6 +4346,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Now we may initialize the subplan.
*/
outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return mtstate;
/*
* Do additional per-result-relation initialization.
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index 01f3d56a3b..34eafbb6e0 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -294,11 +294,15 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return nlstate;
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return nlstate;
/*
* Initialize result slot, type and projection.
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index ca9a5e2ed2..f834499479 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -254,6 +254,8 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(state) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return state;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index 7680142c7b..5dd3285c41 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -244,7 +244,11 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return rustate;
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return rustate;
/*
* If hashing, precompute fmgr lookup data for inner loop, and create the
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index e3cfc9b772..7d7c2aa786 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -207,6 +207,8 @@ ExecInitResult(Result *node, EState *estate, int eflags)
* initialize child nodes
*/
outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return resstate;
/*
* we don't use inner plan
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6ab91001bc..3afdaeecd7 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -121,6 +121,9 @@ ExecInitSampleScan(SampleScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (unlikely(scanstate->ss.ss_currentRelation == NULL ||
+ !ExecPlanStillValid(estate)))
+ return scanstate;
/* we won't set up the HeapScanDesc till later */
scanstate->ss.ss_currentScanDesc = NULL;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index b052775e5b..f7fb64a4a2 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -153,6 +153,9 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
ExecOpenScanRelation(estate,
node->scan.scanrelid,
eflags);
+ if (unlikely(scanstate->ss.ss_currentRelation == NULL ||
+ !ExecPlanStillValid(estate)))
+ return scanstate;
/* and create slot with the appropriate rowtype */
ExecInitScanTupleSlot(estate, &scanstate->ss,
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index fe34b2134f..2231d8b82f 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -528,6 +528,8 @@ ExecInitSetOp(SetOp *node, EState *estate, int eflags)
if (node->strategy == SETOP_HASHED)
eflags &= ~EXEC_FLAG_REWIND;
outerPlanState(setopstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return setopstate;
outerDesc = ExecGetResultType(outerPlanState(setopstate));
/*
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index af852464d0..fb76e4c01b 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -263,6 +263,8 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK);
outerPlanState(sortstate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return sortstate;
/*
* Initialize scan slot and type.
diff --git a/src/backend/executor/nodeSubqueryscan.c b/src/backend/executor/nodeSubqueryscan.c
index 0b2612183a..b5b538fa91 100644
--- a/src/backend/executor/nodeSubqueryscan.c
+++ b/src/backend/executor/nodeSubqueryscan.c
@@ -124,6 +124,8 @@ ExecInitSubqueryScan(SubqueryScan *node, EState *estate, int eflags)
* initialize subquery
*/
subquerystate->subplan = ExecInitNode(node->subplan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return subquerystate;
/*
* Initialize scan slot and type (needed by ExecAssignScanProjectionInfo)
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 702ee884d2..a76836d021 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -377,6 +377,8 @@ ExecInitTidRangeScan(TidRangeScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(currentRelation == NULL || !ExecPlanStillValid(estate)))
+ return tidrangestate;
tidrangestate->ss.ss_currentRelation = currentRelation;
tidrangestate->ss.ss_currentScanDesc = NULL; /* no table scan here */
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index f375951699..088babf572 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -522,6 +522,8 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
* open the scan relation
*/
currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+ if (unlikely(currentRelation == NULL || !ExecPlanStillValid(estate)))
+ return tidstate;
tidstate->ss.ss_currentRelation = currentRelation;
tidstate->ss.ss_currentScanDesc = NULL; /* no heap scan here */
diff --git a/src/backend/executor/nodeUnique.c b/src/backend/executor/nodeUnique.c
index b82d0e9ad5..cb46b2d5d0 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -135,6 +135,8 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
* then initialize outer plan
*/
outerPlanState(uniquestate) = ExecInitNode(outerPlan(node), estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return uniquestate;
/*
* Initialize result slot and type. Unique nodes do no projections, so
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 561d7e731d..1b96f51fe8 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -2464,6 +2464,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
*/
outerPlan = outerPlan(node);
outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags);
+ if (unlikely(!ExecPlanStillValid(estate)))
+ return winstate;
/*
* initialize source tuple type (which is also the tuple type that we'll
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 902793b02b..b754827013 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -70,7 +70,8 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index);
static void _SPI_error_callback(void *arg);
@@ -1682,7 +1683,8 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- cplan);
+ cplan,
+ plansource);
/*
* Set up options for portal. Default SCROLL type is chosen the same way
@@ -2494,6 +2496,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ int i = 0;
spicallbackarg.query = plansource->query_string;
@@ -2691,8 +2694,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ res = _SPI_pquery(qdesc, fire_triggers, canSetTag ? options->tcount : 0,
+ plansource, i);
FreeQueryDesc(qdesc);
}
else
@@ -2789,6 +2793,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
my_res = res;
goto fail;
}
+
+ i++;
}
/* Done with this plan, so release refcount */
@@ -2866,7 +2872,8 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index)
{
int operation = queryDesc->operation;
int eflags;
@@ -2922,7 +2929,7 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
else
eflags = EXEC_FLAG_SKIP_TRIGGERS;
- ExecutorStart(queryDesc, eflags);
+ ExecutorStartExt(queryDesc, eflags, plansource, query_index);
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bc6bea113..ccbc27b575 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1237,6 +1237,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2027,7 +2028,8 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- cplan);
+ cplan,
+ psrc);
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 6e8f6b1b8f..c2ebddaa84 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -37,6 +38,8 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -80,6 +83,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
qd->cplan = cplan; /* CachedPlan supplying the plannedstmt */
+ qd->cplan_release = false;
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -114,6 +118,13 @@ FreeQueryDesc(QueryDesc *qdesc)
UnregisterSnapshot(qdesc->snapshot);
UnregisterSnapshot(qdesc->crosscheck_snapshot);
+ /*
+ * Release CachedPlan if requested. The CachedPlan is not associated with
+ * a ResourceOwner when release_cplan is true; see ExecutorStartExt().
+ */
+ if (qdesc->cplan_release)
+ ReleaseCachedPlan(qdesc->cplan, NULL);
+
/* Only the QueryDesc itself need be freed */
pfree(qdesc);
}
@@ -126,6 +137,8 @@ FreeQueryDesc(QueryDesc *qdesc)
*
* plan: the plan tree for the query
* cplan: CachedPlan supplying the plan
+ * plansource: CachedPlanSource supplying the cplan
+ * query_index: index of the query in plansource->query_list
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -139,6 +152,8 @@ FreeQueryDesc(QueryDesc *qdesc)
static void
ProcessQuery(PlannedStmt *plan,
CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -157,7 +172,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Call ExecutorStart to prepare the plan for execution
*/
- ExecutorStart(queryDesc, 0);
+ ExecutorStartExt(queryDesc, 0, plansource, query_index);
/*
* Run the plan to completion.
@@ -518,9 +533,12 @@ PortalStart(Portal portal, ParamListInfo params,
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * ExecutorStartExt() to prepare the plan for execution. If
+ * the portal is using a cached plan, it may get invalidated
+ * during plan intialization, in which case a new one is
+ * created and saved in the QueryDesc.
*/
- ExecutorStart(queryDesc, myeflags);
+ ExecutorStartExt(queryDesc, myeflags, portal->plansource, 0);
/*
* This tells PortalCleanup to shut down the executor
@@ -1201,6 +1219,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int i = 0;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1283,6 +1302,8 @@ PortalRunMulti(Portal portal,
/* statement can set tag string */
ProcessQuery(pstmt,
portal->cplan,
+ portal->plansource,
+ i,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1293,6 +1314,8 @@ PortalRunMulti(Portal portal,
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
portal->cplan,
+ portal->plansource,
+ i,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1357,6 +1380,8 @@ PortalRunMulti(Portal portal,
*/
if (lnext(portal->stmts, stmtlist_item) != NULL)
CommandCounterIncrement();
+
+ i++;
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 5b75dadf13..d33f871ea2 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -94,6 +94,14 @@
*/
static dlist_head saved_plan_list = DLIST_STATIC_INIT(saved_plan_list);
+/*
+ * Head of the backend's list of "standalone" CachedPlans that are not
+ * associated with a CachedPlanSource, created by GetSingleCachedPlan() for
+ * transient use by the executor in certain scenarios where they're needed
+ * only for one execution of the plan.
+ */
+static dlist_head standalone_plan_list = DLIST_STATIC_INIT(standalone_plan_list);
+
/*
* This is the head of the backend's list of CachedExpressions.
*/
@@ -905,6 +913,8 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * Note: When changing this, you should also look at GetSingleCachedPlan().
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
@@ -1034,6 +1044,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->is_generic = generic;
plan->is_saved = false;
plan->is_valid = true;
+ plan->is_standalone = false;
/* assign generation number to new plan */
plan->generation = ++(plansource->generation);
@@ -1282,6 +1293,121 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
return plan;
}
+/*
+ * Create a fresh CachedPlan for the query_index'th query in the provided
+ * CachedPlanSource.
+ *
+ * The created CachedPlan is standalone, meaning it is not tracked in the
+ * CachedPlanSource. The CachedPlan and its plan trees are allocated in a
+ * child context of the caller's memory context. The caller must ensure they
+ * remain valid until execution is complete, after which the plan should be
+ * released by calling ReleaseCachedPlan().
+ *
+ * This function primarily supports ExecutorStartExt(), which handles cases
+ * where the original generic CachedPlan becomes invalid after prunable
+ * relations are locked.
+ */
+CachedPlan *
+GetSingleCachedPlan(CachedPlanSource *plansource, int query_index,
+ QueryEnvironment *queryEnv)
+{
+ List *query_list = plansource->query_list,
+ *plan_list;
+ CachedPlan *plan = plansource->gplan;
+ MemoryContext oldcxt = CurrentMemoryContext,
+ plan_context;
+ PlannedStmt *plannedstmt;
+
+ Assert(ActiveSnapshotSet());
+
+ /* Sanity checks */
+ if (plan == NULL)
+ elog(ERROR, "GetSingleCachedPlan() called in the wrong context: plansource->gplan is NULL");
+ else if (plan->is_valid)
+ elog(ERROR, "GetSingleCachedPlan() called in the wrong context: plansource->gplan->is_valid");
+
+ /*
+ * The plansource might have become invalid since GetCachedPlan(). See the
+ * comment in BuildCachedPlan() for details on why this might happen.
+ *
+ * The risk is greater here because this function is called from the
+ * executor, meaning much more processing may have occurred compared to
+ * when BuildCachedPlan() is called from GetCachedPlan().
+ */
+ if (!plansource->is_valid)
+ query_list = RevalidateCachedQuery(plansource, queryEnv);
+ Assert(query_list != NIL);
+
+ /*
+ * Build a new generic plan for the query_index'th query, but make a copy
+ * to be scribbled on by the planner
+ */
+ query_list = list_make1(copyObject(list_nth_node(Query, query_list,
+ query_index)));
+ plan_list = pg_plan_queries(query_list, plansource->query_string,
+ plansource->cursor_options, NULL);
+
+ list_free_deep(query_list);
+
+ /*
+ * Make a dedicated memory context for the CachedPlan and its subsidiary
+ * data so that we can release it in ReleaseCachedPlan() that will be
+ * called in FreeQueryDesc().
+ */
+ plan_context = AllocSetContextCreate(CurrentMemoryContext,
+ "Standalone CachedPlan",
+ ALLOCSET_START_SMALL_SIZES);
+ MemoryContextCopyAndSetIdentifier(plan_context, plansource->query_string);
+
+ /*
+ * Copy plan into the new context.
+ */
+ MemoryContextSwitchTo(plan_context);
+ plan_list = copyObject(plan_list);
+
+ /*
+ * Create and fill the CachedPlan struct within the new context.
+ */
+ plan = (CachedPlan *) palloc(sizeof(CachedPlan));
+ plan->magic = CACHEDPLAN_MAGIC;
+ plan->stmt_list = plan_list;
+
+ plan->planRoleId = GetUserId();
+ Assert(list_length(plan_list) == 1);
+ plannedstmt = linitial_node(PlannedStmt, plan_list);
+
+ /*
+ * CachedPlan is dependent on role either if RLS affected the rewrite
+ * phase or if a role dependency was injected during planning. And it's
+ * transient if any plan is marked so.
+ */
+ plan->dependsOnRole = plansource->dependsOnRLS || plannedstmt->dependsOnRole;
+ if (plannedstmt->transientPlan)
+ {
+ Assert(TransactionIdIsNormal(TransactionXmin));
+ plan->saved_xmin = TransactionXmin;
+ }
+ else
+ plan->saved_xmin = InvalidTransactionId;
+ plan->refcount = 0;
+ plan->context = plan_context;
+ plan->is_oneshot = false;
+ plan->is_generic = true;
+ plan->is_saved = false;
+ plan->is_valid = true;
+ plan->is_standalone = true;
+ plan->generation = 1;
+ MemoryContextSwitchTo(oldcxt);
+
+ /*
+ * Add the entry to the global list of "standalone" cached plans. It is
+ * removed from the list by ReleaseCachedPlan().
+ */
+ dlist_push_tail(&standalone_plan_list, &plan->node);
+
+ return plan;
+}
+
/*
* ReleaseCachedPlan: release active use of a cached plan.
*
@@ -1309,6 +1435,10 @@ ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner)
/* Mark it no longer valid */
plan->magic = 0;
+ /* Remove from the global list if we are a standalone plan. */
+ if (plan->is_standalone)
+ dlist_delete(&plan->node);
+
/* One-shot plans do not own their context, so we can't free them */
if (!plan->is_oneshot)
MemoryContextDelete(plan->context);
@@ -2066,6 +2196,33 @@ PlanCacheRelCallback(Datum arg, Oid relid)
cexpr->is_valid = false;
}
}
+
+ /* Finally, invalidate any standalone cached plans */
+ dlist_foreach(iter, &standalone_plan_list)
+ {
+ CachedPlan *cplan = dlist_container(CachedPlan,
+ node, iter.cur);
+
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+
+ if (cplan->is_valid)
+ {
+ ListCell *lc;
+
+ foreach(lc, cplan->stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc);
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ continue; /* Ignore utility statements */
+ if ((relid == InvalidOid) ? plannedstmt->relationOids != NIL :
+ list_member_oid(plannedstmt->relationOids, relid))
+ cplan->is_valid = false;
+ if (!cplan->is_valid)
+ break; /* out of stmt_list scan */
+ }
+ }
+ }
}
/*
@@ -2176,6 +2333,44 @@ PlanCacheObjectCallback(Datum arg, int cacheid, uint32 hashvalue)
}
}
}
+
+ /* Finally, invalidate any standalone cached plans */
+ dlist_foreach(iter, &standalone_plan_list)
+ {
+ CachedPlan *cplan = dlist_container(CachedPlan,
+ node, iter.cur);
+
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+
+ if (cplan->is_valid)
+ {
+ ListCell *lc;
+
+ foreach(lc, cplan->stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc);
+ ListCell *lc3;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ continue; /* Ignore utility statements */
+ foreach(lc3, plannedstmt->invalItems)
+ {
+ PlanInvalItem *item = (PlanInvalItem *) lfirst(lc3);
+
+ if (item->cacheId != cacheid)
+ continue;
+ if (hashvalue == 0 ||
+ item->hashValue == hashvalue)
+ {
+ cplan->is_valid = false;
+ break; /* out of invalItems scan */
+ }
+ }
+ if (!cplan->is_valid)
+ break; /* out of stmt_list scan */
+ }
+ }
+ }
}
/*
@@ -2235,6 +2430,17 @@ ResetPlanCache(void)
cexpr->is_valid = false;
}
+
+ /* Finally, invalidate any standalone cached plans */
+ dlist_foreach(iter, &standalone_plan_list)
+ {
+ CachedPlan *cplan = dlist_container(CachedPlan,
+ node, iter.cur);
+
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+
+ cplan->is_valid = false;
+ }
}
/*
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 4a24613537..bf70fd4ce7 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,7 +284,8 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan)
+ CachedPlan *cplan,
+ CachedPlanSource *plansource)
{
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_NEW);
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
portal->stmts = stmts;
portal->cplan = cplan;
+ portal->plansource = plansource;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index bf326eeb70..652e1afbf7 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -102,6 +102,7 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ParamListInfo params, QueryEnvironment *queryEnv);
extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int plan_index,
IntoClause *into, ExplainState *es,
const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 8a5a9fe642..db21561c8c 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -258,6 +258,7 @@ extern void ExecASTruncateTriggers(EState *estate,
extern void AfterTriggerBeginXact(void);
extern void AfterTriggerBeginQuery(void);
extern void AfterTriggerEndQuery(EState *estate);
+extern void AfterTriggerAbortQuery(void);
extern void AfterTriggerFireDeferred(void);
extern void AfterTriggerEndXact(bool isCommit);
extern void AfterTriggerBeginSubXact(void);
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 0e7245435d..f6cb6479c0 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -36,6 +36,7 @@ typedef struct QueryDesc
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
CachedPlan *cplan; /* CachedPlan that supplies the plannedstmt */
+ bool cplan_release; /* Should FreeQueryDesc() release cplan? */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 69c3ebff00..084d8d5d91 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -198,6 +199,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
* prototypes from functions in execMain.c
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern void ExecutorStartExt(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource, int query_index);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count, bool execute_once);
@@ -261,6 +264,20 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the CachedPlan in es_cachedplan still valid?
+ *
+ * Called at various points during ExecutorStart() because invalidation
+ * messages that affect the plan might be received after locks have been
+ * taken on runtime-prunable relations. The caller should take appropriate
+ * action if the plan has become invalid.
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
@@ -589,6 +606,7 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
+extern Relation ExecOpenScanIndexRelation(EState *estate, Oid indexid, int lockmode);
extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos);
extern void ExecCloseRangeTableRelations(EState *estate);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index ee089505a0..2a8e5bd784 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -680,6 +680,7 @@ typedef struct EState
int es_top_eflags; /* eflags passed to ExecutorStart */
int es_instrument; /* OR of InstrumentOption flags */
bool es_finished; /* true when ExecutorFinish is done */
+ bool es_aborted; /* true when execution was aborted */
List *es_exprcontexts; /* List of ExprContexts within EState */
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0b5ee007ca..154f68f671 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -18,6 +18,7 @@
#include "access/tupdesc.h"
#include "lib/ilist.h"
#include "nodes/params.h"
+#include "nodes/parsenodes.h"
#include "tcop/cmdtag.h"
#include "utils/queryenvironment.h"
#include "utils/resowner.h"
@@ -152,6 +153,8 @@ typedef struct CachedPlan
bool is_generic; /* is it a reusable generic plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
+ bool is_standalone; /* is it not associated with a
+ * CachedPlanSource? */
Oid planRoleId; /* Role ID the plan was created for */
bool dependsOnRole; /* is plan specific to that role? */
TransactionId saved_xmin; /* if valid, replan when TransactionXmin
@@ -159,6 +162,12 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+
+ /*
+ * If the plan is not associated with a CachedPlanSource, it is saved in
+ * a separate global list.
+ */
+ dlist_node node; /* list link, if is_standalone */
} CachedPlan;
/*
@@ -224,6 +233,10 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+extern CachedPlan *GetSingleCachedPlan(CachedPlanSource *plansource,
+ int query_index,
+ QueryEnvironment *queryEnv);
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
@@ -245,4 +258,17 @@ CachedPlanRequiresLocking(CachedPlan *cplan)
return cplan->is_generic;
}
+/*
+ * CachedPlanValid
+ * Returns whether a cached generic plan is still valid.
+ *
+ * Invoked by the executor to check if the plan has not been invalidated after
+ * taking locks during the initialization of the plan.
+ */
+static inline bool
+CachedPlanValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
#endif /* PLANCACHE_H */
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 29f49829f2..58c3828d2c 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ CachedPlanSource *plansource; /* CachedPlanSource, for cplan */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -241,7 +242,8 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan);
+ CachedPlan *cplan,
+ CachedPlanSource *plansource);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..3eeb097fde 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-inval
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 155c8a8d55..304ca77f7b 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2024, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,41 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static void
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ CachedPlanValid(queryDesc->cplan) ? "valid" : "not valid");
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +123,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-inval.out b/src/test/modules/delay_execution/expected/cached-plan-inval.out
new file mode 100644
index 0000000000..e8efb6d9d9
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-inval.out
@@ -0,0 +1,175 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+------------------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = $1)
+(7 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on foo12_1 foo_1
+ Filter: (a = $1)
+(5 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(11 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(6 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ Update on foo3 foo
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(27 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ Update on foo3 foo
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(17 rows)
+
diff --git a/src/test/modules/delay_execution/meson.build b/src/test/modules/delay_execution/meson.build
index 41f3ac0b89..5a70b183d0 100644
--- a/src/test/modules/delay_execution/meson.build
+++ b/src/test/modules/delay_execution/meson.build
@@ -24,6 +24,7 @@ tests += {
'specs': [
'partition-addition',
'partition-removal-1',
+ 'cached-plan-inval',
],
},
}
diff --git a/src/test/modules/delay_execution/specs/cached-plan-inval.spec b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
new file mode 100644
index 0000000000..5b1f72b4a8
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
@@ -0,0 +1,65 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo12 PARTITION OF foo FOR VALUES IN (1, 2) PARTITION BY LIST (a);
+ CREATE TABLE foo12_1 PARTITION OF foo12 FOR VALUES IN (1);
+ CREATE TABLE foo12_2 PARTITION OF foo12 FOR VALUES IN (2);
+ CREATE INDEX foo12_1_a ON foo12_1 (a);
+ CREATE TABLE foo3 PARTITION OF foo FOR VALUES IN (3);
+ CREATE VIEW foov AS SELECT * FROM foo;
+ CREATE FUNCTION one () RETURNS int AS $$ BEGIN RETURN 1; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE FUNCTION two () RETURNS int AS $$ BEGIN RETURN 2; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE RULE update_foo AS ON UPDATE TO foo DO ALSO SELECT 1;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP RULE update_foo ON foo;
+ DROP TABLE foo;
+ DROP FUNCTION one(), two();
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# Another case with Append with run-time pruning
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Case with a rule adding another query
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo12_1_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
--
2.43.0
On Thu, Aug 29, 2024 at 10:34 PM Amit Langote <amitlangote09@gmail.com> wrote:
One idea that I think might be worth trying to reduce the footprint of
0003 is to try to lock the prunable relations in a step of InitPlan()
separate from ExecInitNode(), which can be implemented by doing the
initial runtime pruning in that separate step. That way, we'll have
all the necessary locks before calling ExecInitNode() and so we don't
need to sprinkle the CachedPlanStillValid() checks all over the place
and worry about missed checks and dealing with partially initialized
PlanState trees.
I've worked on this and found that it results in a much simpler design.
Attached are 0001 and 0002, which contain patches to refactor the
runtime pruning code. These changes move initial pruning outside of
ExecInitNode() and use the results during ExecInitNode() to determine
the set of child subnodes to initialize.
With that in place, the patches (0003, 0004) that move the locking of
prunable relations from plancache.c into the executor becomes simpler.
It no longer needs to modify any code called by ExecInitNode(). Since
no new locks are taken during ExecInitNode(), I didn't have to worry
about changing all the code involved in PlanState tree initialization
to add checks for CachedPlan validity. The check is only needed after
performing initial pruning, and if the CachedPlan is invalid,
ExecInitNode() won’t be called in the first place.
--
Thanks, Amit Langote
Attachments:
v53-0001-Move-PartitionPruneInfo-out-of-plan-nodes-into-P.patchapplication/octet-stream; name=v53-0001-Move-PartitionPruneInfo-out-of-plan-nodes-into-P.patchDownload
From fe2eaf3a8047ce63318818f157e7d85754e38cc9 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 6 Sep 2024 13:11:05 +0900
Subject: [PATCH v53 1/4] Move PartitionPruneInfo out of plan nodes into
PlannedStmt
This change moves PartitionPruneInfo from individual plan nodes to
PlannedStmt, allowing runtime initial pruning to be performed across
the entire plan tree without traversing the tree to find nodes
containing PartitionPruneInfos.
The PartitionPruneInfo pointer fields in Append and MergeAppend nodes
have been replaced with an integer index that points to
PartitionPruneInfos in a list within PlannedStmt, which holds the
PartitionPruneInfos for all subqueries.
Reviewed-by: Alvaro Herrera
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 19 +++++-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/nodeAppend.c | 5 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/optimizer/plan/createplan.c | 24 +++----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 86 ++++++++++++++++---------
src/backend/partitioning/partprune.c | 19 ++++--
src/include/executor/execPartition.h | 4 +-
src/include/nodes/execnodes.h | 1 +
src/include/nodes/pathnodes.h | 6 ++
src/include/nodes/plannodes.h | 14 ++--
src/include/partitioning/partprune.h | 8 +--
15 files changed, 133 insertions(+), 62 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 29e186fa73..8837d77c3e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -848,6 +848,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index bfb3419efb..b01a2fdfdd 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -181,6 +181,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->permInfos = estate->es_rteperminfos;
pstmt->resultRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7651886229..ec730674f2 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1786,6 +1786,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* Initialize data structure needed for run-time partition pruning and
* do initial pruning if needed
*
+ * 'root_parent_relids' identifies the relation to which both the parent plan
+ * and the PartitionPruneInfo given by 'part_prune_index' belong.
+ *
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
* Initial pruning is performed here if needed and in that case only the
@@ -1798,11 +1801,25 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
+ Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo;
+
+ /* Obtain the pruneinfo we need, and make sure it's the right one */
+ pruneinfo = list_nth_node(PartitionPruneInfo, estate->es_part_prune_infos,
+ part_prune_index);
+ if (!bms_equal(root_parent_relids, pruneinfo->root_parent_relids))
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("mismatching PartitionPruneInfo found at part_prune_index %d",
+ part_prune_index),
+ errdetail_internal("plan node relids %s, pruneinfo relids %s",
+ bmsToString(root_parent_relids),
+ bmsToString(pruneinfo->root_parent_relids)));
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 5737f9f4eb..67734979b0 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -118,6 +118,7 @@ CreateExecutorState(void)
estate->es_rowmarks = NULL;
estate->es_rteperminfos = NIL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index ca0f54d676..de7ebab5c2 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
+ node->apprelids,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index e1b9b984a7..3ed91808dd 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
+ node->apprelids,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index bb45ef318f..6642d09a39 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1225,7 +1225,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1376,6 +1375,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1399,16 +1401,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1447,7 +1447,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1540,6 +1539,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1555,13 +1557,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
Assert(best_path->path.param_info == NULL);
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index df35d1ff9c..1b9071c774 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -547,6 +547,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 91c7c4fe2f..e2ea406c4e 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1732,6 +1732,48 @@ set_customscan_references(PlannerInfo *root,
cscan->custom_relids = offset_relid_set(cscan->custom_relids, rtoffset);
}
+/*
+ * register_partpruneinfo
+ * Subroutine for set_append_references and set_mergeappend_references
+ *
+ * Add the PartitionPruneInfo from root->partPruneInfos at the given index
+ * into PlannerGlobal->partPruneInfos and return its index there.
+ *
+ * Also update the RT indexes present in PartitionedRelPruneInfos to add the
+ * offset.
+ */
+static int
+register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
+{
+ PlannerGlobal *glob = root->glob;
+ PartitionPruneInfo *pinfo;
+ ListCell *l;
+
+ Assert(part_prune_index >= 0 &&
+ part_prune_index < list_length(root->partPruneInfos));
+ pinfo = list_nth_node(PartitionPruneInfo, root->partPruneInfos,
+ part_prune_index);
+
+ pinfo->root_parent_relids = offset_relid_set(pinfo->root_parent_relids,
+ rtoffset);
+ foreach(l, pinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *prelinfo = lfirst(l2);
+
+ prelinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pinfo);
+
+ return list_length(glob->partPruneInfos) - 1;
+}
+
/*
* set_append_references
* Do set_plan_references processing on an Append
@@ -1784,21 +1826,13 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * Add PartitionPruneInfo, if any, to PlannerGlobal and update the index.
+ * Also update the RT indexes present in it to add the offset.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index =
+ register_partpruneinfo(root, aplan->part_prune_index, rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1860,21 +1894,13 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * Add PartitionPruneInfo, if any, to PlannerGlobal and update the index.
+ * Also update the RT indexes present in it to add the offset.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index =
+ register_partpruneinfo(root, mplan->part_prune_index, rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9a1a7faac7..60fabb1734 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -207,16 +207,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -330,10 +334,11 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
+ pruneinfo->root_parent_relids = parentrel->relids;
pruneinfo->prune_infos = prunerelinfos;
/*
@@ -356,7 +361,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index c09bc83b2a..12aacc84ff 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,9 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
+ Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 516b948743..49f1d56a5d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -635,6 +635,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 07e2415398..8d30b6e896 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -128,6 +128,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* List of PartitionPruneInfo contained in the plan */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -559,6 +562,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 62cd6a6666..39d0281c23 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -69,6 +69,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in the
+ * plan */
+
List *rtable; /* list of RangeTblEntry nodes */
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
@@ -276,8 +279,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -311,8 +314,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
@@ -1414,6 +1417,8 @@ typedef struct PlanRowMark
* Then, since an Append-type node could have multiple partitioning
* hierarchies among its children, we have an unordered List of those Lists.
*
+ * root_parent_relids RelOptInfo.relids of the relation to which the parent
+ * plan node and this PartitionPruneInfo node belong
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
@@ -1426,6 +1431,7 @@ typedef struct PartitionPruneInfo
pg_node_attr(no_equal, no_query_jumble)
NodeTag type;
+ Bitmapset *root_parent_relids;
List *prune_infos;
Bitmapset *other_subplans;
} PartitionPruneInfo;
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index bd490d154f..c536a1fe19 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.43.0
v53-0002-Perform-runtime-initial-pruning-outside-ExecInit.patchapplication/octet-stream; name=v53-0002-Perform-runtime-initial-pruning-outside-ExecInit.patchDownload
From 6e63bee3e1f306e7c618d3256c1f94780d325ce6 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 12 Sep 2024 15:44:43 +0900
Subject: [PATCH v53 2/4] Perform runtime initial pruning outside
ExecInitNode()
This commit follows up on the previous change that moved
PartitionPruneInfos out of individual plan nodes into a list in
PlannedStmt. It moves the initialization of PartitionPruneStates
and runtime initial pruning out of ExecInitNode() and into a new
routine, ExecDoInitialPruning(), which is called by InitPlan()
before ExecInitNode() is invoked on the main plan tree and subplans.
ExecDoInitialPruning() stores the PartitionPruneStates in a list
matching the length of es_part_prune_infos (which holds the
PartitionPruneInfos from PlannedStmt), allowing both lists to share
the same index. It also saves the initial pruning result -- a
bitmapset of indexes for surviving child subnodes -- in a similarly
indexed list.
While the initial pruning is done earlier, the execution pruning
context information (needed for runtime pruning) is initialized
later during ExecInitNode() for the parent plan node, as it requires
access to the parent node's PlanState struct.
---
src/backend/executor/execMain.c | 55 ++++++++++
src/backend/executor/execPartition.c | 146 +++++++++++++++++++++------
src/include/executor/execPartition.h | 3 +
src/include/nodes/execnodes.h | 2 +
4 files changed, 176 insertions(+), 30 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 8837d77c3e..dceef322af 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -46,6 +46,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "mb/pg_wchar.h"
@@ -816,6 +817,54 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
+/*
+ * ExecDoInitialPruning
+ * Perform runtime "initial" pruning, if necessary, to determine the set
+ * of child subnodes that need to be initialized during ExecInitNode()
+ * for plan nodes that support partition pruning.
+ *
+ * For each PartitionPruneInfo in estate->es_part_prune_infos, this function
+ * creates a PartitionPruneState (even if no initial pruning is done) and adds
+ * it to es_part_prune_states. For PartitionPruneInfo entries that include
+ * initial pruning steps, the result of those steps is saved as a bitmapset
+ * of indexes representing child subnodes that are "valid" and should be
+ * initialized for execution.
+ */
+static void
+ExecDoInitialPruning(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+ Bitmapset *validsubplans = NULL;
+
+ /*
+ * Create the working data structure for pruning, and save it for use
+ * later in ExecInitPartitionPruning(), which will be called by the
+ * parent plan node's ExecInit* function.
+ */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+
+ /*
+ * Perform an initial partition pruning pass, if necessary, and save
+ * the bitmapset of valid subplans for use in
+ * ExecInitPartitionPruning(). If no initial pruning is performed, we
+ * still store a NULL to ensure that es_part_prune_results is the same
+ * length as es_part_prune_infos. This ensures that
+ * ExecInitPartitionPruning() can use the same index to locate the
+ * result.
+ */
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true);
+ estate->es_part_prune_results = lappend(estate->es_part_prune_results,
+ validsubplans);
+ }
+}
/* ----------------------------------------------------------------
* InitPlan
@@ -848,7 +897,13 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+
+ /*
+ * Perform runtime "initial" pruning to determine the plan nodes that will
+ * not be executed.
+ */
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ ExecDoInitialPruning(estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index ec730674f2..08b1f3d030 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -181,8 +181,6 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
-static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -192,6 +190,9 @@ static void InitPartitionPruneContext(PartitionPruneContext *context,
static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
Bitmapset *initially_valid_subplans,
int n_total_subplans);
+static void PartitionPruneInitExecPruning(PartitionPruneInfo *pruneinfo,
+ PartitionPruneState *prunestate,
+ PlanState *planstate);
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
@@ -1821,17 +1822,21 @@ ExecInitPartitionPruning(PlanState *planstate,
bmsToString(root_parent_relids),
bmsToString(pruneinfo->root_parent_relids)));
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
-
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
-
/*
- * Perform an initial partition prune pass, if required.
+ * ExecDoInitialPruning() must have initialized the PartitionPruneState to
+ * perform the initial pruning. Now we simply need to initialize the
+ * context information for exec pruning.
*/
+ prunestate = list_nth(estate->es_part_prune_states, part_prune_index);
+ Assert(prunestate != NULL);
+ if (prunestate->do_exec_prune)
+ PartitionPruneInitExecPruning(pruneinfo, prunestate, planstate);
+
+ /* Use the result of initial pruning done by ExecDoInitialPruning(). */
if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ *initially_valid_subplans = list_nth_node(Bitmapset,
+ estate->es_part_prune_results,
+ part_prune_index);
else
{
/* No pruning, so we'll need to initialize all subplans */
@@ -1878,15 +1883,15 @@ ExecInitPartitionPruning(PlanState *planstate,
* re-evaluate which partitions match the pruning steps provided in each
* PartitionedRelPruneInfo.
*/
-static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+PartitionPruneState *
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
- EState *estate = planstate->state;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
+ /* We may need an expression context to evaluate partition exprs */
+ ExprContext *econtext = CreateExprContext(estate);
/* For data reading, executor always includes detached partitions */
if (estate->es_partition_directory == NULL)
@@ -1974,6 +1979,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* set to -1, as if they were pruned. By construction, both
* arrays are in partition bounds order.
*/
+ pprune->partrel = partrel;
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
@@ -2073,29 +2079,31 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
- partdesc, partkey, planstate,
+ partdesc, partkey, NULL,
econtext);
/* Record whether initial pruning is needed at any level */
prunestate->do_initial_prune = true;
}
- pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps &&
- !(econtext->ecxt_estate->es_top_eflags & EXEC_FLAG_EXPLAIN_GENERIC))
- {
- InitPartitionPruneContext(&pprune->exec_context,
- pinfo->exec_pruning_steps,
- partdesc, partkey, planstate,
- econtext);
- /* Record whether exec pruning is needed at any level */
- prunestate->do_exec_prune = true;
- }
/*
- * Accumulate the IDs of all PARAM_EXEC Params affecting the
- * partitioning decisions at this plan node.
+ * The exec pruning context will be initialized in
+ * ExecInitPartitionPruning() when called during the initialization
+ * of the parent plan node.
+ *
+ * pprune->exec_pruning_steps is set to NIL to prevent
+ * ExecFindMatchingSubPlans() from accessing an uninitialized
+ * pprune->exec_context during the initial pruning by
+ * ExecDoInitialPruning().
+ *
+ * prunestate->do_exec_prune is set to indicate whether
+ * PartitionPruneInitExecPruning() needs to be called by
+ * ExecInitPartitionPruning(). This optimization avoids
+ * unnecessary cycles when only initial pruning is required.
*/
- prunestate->execparamids = bms_add_members(prunestate->execparamids,
- pinfo->execparamids);
+ pprune->exec_pruning_steps = NIL;
+ if (pinfo->exec_pruning_steps &&
+ !(econtext->ecxt_estate->es_top_eflags & EXEC_FLAG_EXPLAIN_GENERIC))
+ prunestate->do_exec_prune = true;
j++;
}
@@ -2305,6 +2313,84 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
pfree(new_subplan_indexes);
}
+/*
+ * PartitionPruneInitExecPruning
+ * Initialize PartitionPruneState for exec pruning.
+ */
+static void
+PartitionPruneInitExecPruning(PartitionPruneInfo *pruneinfo,
+ PartitionPruneState *prunestate,
+ PlanState *planstate)
+{
+ EState *estate = planstate->state;
+ int i;
+ ExprContext *econtext;
+
+ /* CreatePartitionPruneState() must have initialized. */
+ Assert(estate->es_partition_directory != NULL);
+
+ /* CreatePartitionPruneState() must have set this. */
+ Assert(prunestate->do_exec_prune);
+
+ /*
+ * Create ExprContext if not already done for the planstate. We may need
+ * an expression context to evaluate partition exprs.
+ */
+ ExecAssignExprContext(estate, planstate);
+ econtext = planstate->ps_ExprContext;
+ for (i = 0; i < prunestate->num_partprunedata; i++)
+ {
+ List *partrel_pruneinfos =
+ list_nth_node(List, pruneinfo->prune_infos, i);
+ PartitionPruningData *prunedata = prunestate->partprunedata[i];
+ int j;
+
+ for (j = 0; j < prunedata->num_partrelprunedata; j++)
+ {
+ PartitionedRelPruneInfo *pinfo =
+ list_nth_node(PartitionedRelPruneInfo, partrel_pruneinfos, j);
+ PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
+ Relation partrel = pprune->partrel;
+ PartitionDesc partdesc;
+ PartitionKey partkey;
+
+ /*
+ * Nothing to do if there are no exec pruning steps, but do set
+ * pprune->exec_pruning_steps, becasue
+ * find_matching_subplans_recurse() looks at it.
+ *
+ * Also skip if doing EXPLAIN (GENERIC_PLAN), since parameter
+ * values may be missing.
+ */
+ pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
+ if (pprune->exec_pruning_steps == NIL ||
+ (econtext->ecxt_estate->es_top_eflags & EXEC_FLAG_EXPLAIN_GENERIC))
+ continue;
+
+ /*
+ * We can rely on the copies of the partitioned table's partition
+ * key and partition descriptor appearing in its relcache entry,
+ * because that entry will be held open and locked for the
+ * duration of this executor run.
+ */
+ partkey = RelationGetPartitionKey(partrel);
+ partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
+ partrel);
+ InitPartitionPruneContext(&pprune->exec_context,
+ pprune->exec_pruning_steps,
+ partdesc, partkey, planstate,
+ econtext);
+
+ /*
+ * Accumulate the IDs of all PARAM_EXEC Params affecting the
+ * partitioning decisions at this plan node.
+ */
+ prunestate->execparamids = bms_add_members(prunestate->execparamids,
+ pinfo->execparamids);
+ }
+ }
+}
+
/*
* ExecFindMatchingSubPlans
* Determine which subplans match the pruning steps detailed in
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 12aacc84ff..dc73de8738 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -58,6 +58,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
*/
typedef struct PartitionedRelPruningData
{
+ Relation partrel;
int nparts;
int *subplan_map;
int *subpart_map;
@@ -128,4 +129,6 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
+extern PartitionPruneState *CreatePartitionPruneState(EState *estate,
+ PartitionPruneInfo *pruneinfo);
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 49f1d56a5d..daf04dcf5c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -636,6 +636,8 @@ typedef struct EState
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_states; /* List of PartitionPruneState */
+ List *es_part_prune_results; /* List of Bitmapset */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
--
2.43.0
v53-0004-Handle-CachedPlan-invalidation-in-the-executor.patchapplication/octet-stream; name=v53-0004-Handle-CachedPlan-invalidation-in-the-executor.patchDownload
From 10d54077987f8532c1d2a04d6004c1ec03450845 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 22 Aug 2024 19:38:13 +0900
Subject: [PATCH v53 4/4] Handle CachedPlan invalidation in the executor
This commit makes changes to handle cases where a cached plan
becomes invalid before deferred locks on prunable relations are taken.
* Add checks at various points in ExecutorStart() and its called
functions to determine if the plan becomes invalid. If detected,
the function and its callers return immediately. A previous commit
ensures any partially initialized PlanState tree objects are cleaned
up appropriately.
* Introduce ExecutorStartExt(), a wrapper over ExecutorStart(), to
handle cases where plan initialization is aborted due to invalidation.
ExecutorStartExt() creates a new transient CachedPlan if needed and
retries execution. This new entry point is only required for sites
using plancache.c. It requires passing the QueryDesc, eflags,
CachedPlanSource, and query_index (index in CachedPlanSource.query_list).
* Add GetSingleCachedPlan() in plancache.c to create a transient
CachedPlan for a specified query in the given CachedPlanSource.
Such CachedPlans are tracked in a separate global list for the
plancache invalidation callbacks to check.
This also adds isolation tests using the delay_execution test module
to verify scenarios where a CachedPlan becomes invalid before the
deferred locks are taken.
All ExecutorStart_hook implementations now must add the following
block after the ExecutorStart() call to ensure it doesn't work with an
invalid plan:
/* The plan may have become invalid during ExecutorStart() */
if (!ExecPlanStillValid(queryDesc->estate))
return;
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.comk
---
contrib/auto_explain/auto_explain.c | 4 +
.../pg_stat_statements/pg_stat_statements.c | 4 +
src/backend/commands/explain.c | 8 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 10 +-
src/backend/commands/trigger.c | 14 ++
src/backend/executor/README | 35 ++-
src/backend/executor/execMain.c | 84 ++++++-
src/backend/executor/execUtils.c | 3 +-
src/backend/executor/spi.c | 19 +-
src/backend/tcop/postgres.c | 4 +-
src/backend/tcop/pquery.c | 31 ++-
src/backend/utils/cache/plancache.c | 206 ++++++++++++++++++
src/backend/utils/mmgr/portalmem.c | 4 +-
src/include/commands/explain.h | 1 +
src/include/commands/trigger.h | 1 +
src/include/executor/execdesc.h | 1 +
src/include/executor/executor.h | 17 ++
src/include/nodes/execnodes.h | 1 +
src/include/utils/plancache.h | 26 +++
src/include/utils/portal.h | 4 +-
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 63 +++++-
.../expected/cached-plan-inval.out | 175 +++++++++++++++
src/test/modules/delay_execution/meson.build | 1 +
.../specs/cached-plan-inval.spec | 65 ++++++
26 files changed, 749 insertions(+), 36 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-inval.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-inval.spec
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 677c135f59..9eb5e9a619 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -300,6 +300,10 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
+ /* The plan may have become invalid during standard_ExecutorStart() */
+ if (!ExecPlanStillValid(queryDesc->estate))
+ return;
+
if (auto_explain_enabled())
{
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 362d222f63..026a3f1362 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -992,6 +992,10 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
+ /* The plan may have become invalid during standard_ExecutorStart() */
+ if (!ExecPlanStillValid(queryDesc->estate))
+ return;
+
/*
* If query has queryId zero, don't track it. This prevents double
* counting of optimizable statements that are directly contained in
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 13f5683cf6..ecae32c32b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -507,7 +507,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, NULL, -1, into, es, queryString, params,
+ queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -616,6 +617,7 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
*/
void
ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int query_index,
IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
@@ -686,8 +688,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
if (into)
eflags |= GetIntoRelEFlags(into);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
+ /* Call ExecutorStartExt to prepare the plan for execution. */
+ ExecutorStartExt(queryDesc, eflags, plansource, query_index);
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 4f6acf6719..4b1503c05e 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -107,6 +107,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 311b9ebd5b..4cd79a6e3a 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -202,7 +202,8 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- cplan);
+ cplan,
+ entry->plansource);
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
@@ -583,6 +584,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ int i = 0;
if (es->memory)
{
@@ -655,8 +657,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, cplan, into, es, query_string, paramLI,
- queryEnv,
+ ExplainOnePlan(pstmt, cplan, entry->plansource, i,
+ into, es, query_string, paramLI, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
@@ -668,6 +670,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Separate plans with an appropriate separator */
if (lnext(plan_list, p) != NULL)
ExplainSeparatePlans(es);
+
+ i++;
}
if (estate)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 170360edda..91e4b821a0 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5119,6 +5119,20 @@ AfterTriggerEndQuery(EState *estate)
afterTriggers.query_depth--;
}
+/* ----------
+ * AfterTriggerAbortQuery()
+ *
+ * Called by ExecutorEnd() if the query execution was aborted due to the
+ * plan becoming invalid during initialization.
+ * ----------
+ */
+void
+AfterTriggerAbortQuery(void)
+{
+ /* Revert the actions of AfterTriggerBeginQuery(). */
+ afterTriggers.query_depth--;
+}
+
/*
* AfterTriggerFreeQuery
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 642d63be61..c76a00b394 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,28 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Typically, when the executor initializes a plan tree for execution, it doesn't
+lock non-index relations if the plan tree is freshly generated and not derived
+from a CachedPlan. This is because such locks have already been established
+during the query's parsing, rewriting, and planning phases. However, with a
+cached plan tree, some relations may remain unlocked. The function
+AcquireExecutorLocks() only locks unprunable relations in the plan, deferring
+the locking of prunable ones to executor initialization. This avoids
+unnecessary locking of relations that will be pruned during "initial" runtime
+pruning in ExecDoInitialPruning().
+
+This approach creates a window where a cached plan tree with child tables
+could become outdated if another backend modifies these tables before
+ExecDoInitialPruning() locks them. As a result, the executor has the added duty
+to verify the plan tree's validity whenever it locks a child table after
+doing initial pruning. This validation is done by checking the CachedPlan.is_valid
+attribute. If the plan tree is outdated (is_valid=false), the executor halts
+further initialization, cleans up anything in EState that would have been
+allocated up to that point, and retries execution after recreating the
+invalid plan in the CachedPlan.
Query Processing Control Flow
-----------------------------
@@ -288,11 +310,13 @@ This is a sketch of control flow for full query processing:
CreateQueryDesc
- ExecutorStart
+ ExecutorStart or ExecutorStartExt
CreateExecutorState
creates per-query context
- switch to per-query context to run ExecInitNode
+ switch to per-query context to run ExecDoInitialPruning and ExecInitNode
AfterTriggerBeginQuery
+ ExecDoInitialPruning
+ does initial pruning and locks surviving partitions if needed
ExecInitNode --- recursively scans plan tree
ExecInitNode
recurse into subsidiary nodes
@@ -316,7 +340,12 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale after locking partitions in ExecDoInitialPruning(), the control is
+immediately returned to ExecutorStartExt(), which will create a new plan tree
+and perform the steps starting from CreateExecutorState() again.
+
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index cb5ed921d0..741801adb9 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -59,6 +59,7 @@
#include "utils/backend_status.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
+#include "utils/plancache.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
@@ -135,6 +136,60 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
standard_ExecutorStart(queryDesc, eflags);
}
+/*
+ * A variant of ExecutorStart() that handles cleanup and replanning if the
+ * input CachedPlan becomes invalid due to locks being taken during
+ * ExecutorStartInternal(). If that happens, a new CachedPlan is created
+ * only for the at the index 'query_index' in plansource->query_list, which
+ * is released separately from the original CachedPlan.
+ */
+void
+ExecutorStartExt(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index)
+{
+ if (queryDesc->cplan == NULL)
+ {
+ ExecutorStart(queryDesc, eflags);
+ return;
+ }
+
+ while (1)
+ {
+ ExecutorStart(queryDesc, eflags);
+ if (!CachedPlanValid(queryDesc->cplan))
+ {
+ CachedPlan *cplan;
+
+ /*
+ * The plan got invalidated, so try with a new updated plan.
+ *
+ * But first undo what ExecutorStart() would've done. Mark
+ * execution as aborted to ensure that AFTER trigger state is
+ * properly reset.
+ */
+ queryDesc->estate->es_aborted = true;
+ ExecutorEnd(queryDesc);
+
+ cplan = GetSingleCachedPlan(plansource, query_index,
+ queryDesc->queryEnv);
+
+ /*
+ * Install the new transient cplan into the QueryDesc replacing
+ * the old one so that executor initialization code can see it.
+ * Mark it as in use by us and ask FreeQueryDesc() to release it.
+ */
+ cplan->refcount = 1;
+ queryDesc->cplan = cplan;
+ queryDesc->cplan_release = true;
+ queryDesc->plannedstmt = linitial_node(PlannedStmt,
+ queryDesc->cplan->stmt_list);
+ }
+ else
+ break; /* ExecutorStart() succeeded! */
+ }
+}
+
void
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
@@ -318,6 +373,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
estate = queryDesc->estate;
Assert(estate != NULL);
+ Assert(!estate->es_aborted);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* caller must ensure the query's snapshot is active */
@@ -424,8 +480,11 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
Assert(estate != NULL);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
- /* This should be run once and only once per Executor instance */
- Assert(!estate->es_finished);
+ /*
+ * This should be run once and only once per Executor instance and never
+ * if the execution was aborted.
+ */
+ Assert(!estate->es_finished && !estate->es_aborted);
/* Switch into per-query memory context */
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -484,11 +543,10 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
Assert(estate != NULL);
/*
- * Check that ExecutorFinish was called, unless in EXPLAIN-only mode. This
- * Assert is needed because ExecutorFinish is new as of 9.1, and callers
- * might forget to call it.
+ * Check that ExecutorFinish was called, unless in EXPLAIN-only mode or if
+ * execution was aborted.
*/
- Assert(estate->es_finished ||
+ Assert(estate->es_finished || estate->es_aborted ||
(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -502,6 +560,14 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
UnregisterSnapshot(estate->es_snapshot);
UnregisterSnapshot(estate->es_crosscheck_snapshot);
+ /*
+ * Reset AFTER trigger module if the query execution was aborted.
+ */
+ if (estate->es_aborted &&
+ !(estate->es_top_eflags &
+ (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
+ AfterTriggerAbortQuery();
+
/*
* Must switch out of context before destroying it
*/
@@ -948,6 +1014,9 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
ExecDoInitialPruning(estate);
+ if (!ExecPlanStillValid(estate))
+ return;
+
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
*/
@@ -2929,6 +2998,9 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
* result-rel info, etc.
+ *
+ * es_cachedplan is not copied because EPQ plan execution does not acquire
+ * any new locks that could invalidate the CachedPlan.
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 67734979b0..435ae0df7a 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -147,6 +147,7 @@ CreateExecutorState(void)
estate->es_top_eflags = 0;
estate->es_instrument = 0;
estate->es_finished = false;
+ estate->es_aborted = false;
estate->es_exprcontexts = NIL;
@@ -757,7 +758,7 @@ ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
* ExecGetRangeTableRelation
* Open the Relation for a range table entry, if not already done
*
- * The Relations will be closed again in ExecEndPlan().
+ * The Relations will be closed in ExecEndPlan().
*/
Relation
ExecGetRangeTableRelation(EState *estate, Index rti)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 659bd6dcd9..f84f376c9c 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -70,7 +70,8 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index);
static void _SPI_error_callback(void *arg);
@@ -1682,7 +1683,8 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- cplan);
+ cplan,
+ plansource);
/*
* Set up options for portal. Default SCROLL type is chosen the same way
@@ -2494,6 +2496,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ int i = 0;
spicallbackarg.query = plansource->query_string;
@@ -2691,8 +2694,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ res = _SPI_pquery(qdesc, fire_triggers, canSetTag ? options->tcount : 0,
+ plansource, i);
FreeQueryDesc(qdesc);
}
else
@@ -2789,6 +2793,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
my_res = res;
goto fail;
}
+
+ i++;
}
/* Done with this plan, so release refcount */
@@ -2866,7 +2872,8 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index)
{
int operation = queryDesc->operation;
int eflags;
@@ -2922,7 +2929,7 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
else
eflags = EXEC_FLAG_SKIP_TRIGGERS;
- ExecutorStart(queryDesc, eflags);
+ ExecutorStartExt(queryDesc, eflags, plansource, query_index);
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bc6bea113..ccbc27b575 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1237,6 +1237,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2027,7 +2028,8 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- cplan);
+ cplan,
+ psrc);
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 6e8f6b1b8f..dbb0ffb771 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -37,6 +38,8 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -80,6 +83,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
qd->cplan = cplan; /* CachedPlan supplying the plannedstmt */
+ qd->cplan_release = false;
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -114,6 +118,13 @@ FreeQueryDesc(QueryDesc *qdesc)
UnregisterSnapshot(qdesc->snapshot);
UnregisterSnapshot(qdesc->crosscheck_snapshot);
+ /*
+ * Release CachedPlan if requested. The CachedPlan is not associated with
+ * a ResourceOwner when cplan_release is true; see ExecutorStartExt().
+ */
+ if (qdesc->cplan_release)
+ ReleaseCachedPlan(qdesc->cplan, NULL);
+
/* Only the QueryDesc itself need be freed */
pfree(qdesc);
}
@@ -126,6 +137,8 @@ FreeQueryDesc(QueryDesc *qdesc)
*
* plan: the plan tree for the query
* cplan: CachedPlan supplying the plan
+ * plansource: CachedPlanSource supplying the cplan
+ * query_index: index of the query in plansource->query_list
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -139,6 +152,8 @@ FreeQueryDesc(QueryDesc *qdesc)
static void
ProcessQuery(PlannedStmt *plan,
CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -157,7 +172,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Call ExecutorStart to prepare the plan for execution
*/
- ExecutorStart(queryDesc, 0);
+ ExecutorStartExt(queryDesc, 0, plansource, query_index);
/*
* Run the plan to completion.
@@ -518,9 +533,12 @@ PortalStart(Portal portal, ParamListInfo params,
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * ExecutorStartExt() to prepare the plan for execution. If
+ * the portal is using a cached plan, it may get invalidated
+ * during plan intialization, in which case a new one is
+ * created and saved in the QueryDesc.
*/
- ExecutorStart(queryDesc, myeflags);
+ ExecutorStartExt(queryDesc, myeflags, portal->plansource, 0);
/*
* This tells PortalCleanup to shut down the executor
@@ -1201,6 +1219,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int i = 0;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1283,6 +1302,8 @@ PortalRunMulti(Portal portal,
/* statement can set tag string */
ProcessQuery(pstmt,
portal->cplan,
+ portal->plansource,
+ i,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1293,6 +1314,8 @@ PortalRunMulti(Portal portal,
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
portal->cplan,
+ portal->plansource,
+ i,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1357,6 +1380,8 @@ PortalRunMulti(Portal portal,
*/
if (lnext(portal->stmts, stmtlist_item) != NULL)
CommandCounterIncrement();
+
+ i++;
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 5b75dadf13..d33f871ea2 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -94,6 +94,14 @@
*/
static dlist_head saved_plan_list = DLIST_STATIC_INIT(saved_plan_list);
+/*
+ * Head of the backend's list of "standalone" CachedPlans that are not
+ * associated with a CachedPlanSource, created by GetSingleCachedPlan() for
+ * transient use by the executor in certain scenarios where they're needed
+ * only for one execution of the plan.
+ */
+static dlist_head standalone_plan_list = DLIST_STATIC_INIT(standalone_plan_list);
+
/*
* This is the head of the backend's list of CachedExpressions.
*/
@@ -905,6 +913,8 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * Note: When changing this, you should also look at GetSingleCachedPlan().
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
@@ -1034,6 +1044,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->is_generic = generic;
plan->is_saved = false;
plan->is_valid = true;
+ plan->is_standalone = false;
/* assign generation number to new plan */
plan->generation = ++(plansource->generation);
@@ -1282,6 +1293,121 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
return plan;
}
+/*
+ * Create a fresh CachedPlan for the query_index'th query in the provided
+ * CachedPlanSource.
+ *
+ * The created CachedPlan is standalone, meaning it is not tracked in the
+ * CachedPlanSource. The CachedPlan and its plan trees are allocated in a
+ * child context of the caller's memory context. The caller must ensure they
+ * remain valid until execution is complete, after which the plan should be
+ * released by calling ReleaseCachedPlan().
+ *
+ * This function primarily supports ExecutorStartExt(), which handles cases
+ * where the original generic CachedPlan becomes invalid after prunable
+ * relations are locked.
+ */
+CachedPlan *
+GetSingleCachedPlan(CachedPlanSource *plansource, int query_index,
+ QueryEnvironment *queryEnv)
+{
+ List *query_list = plansource->query_list,
+ *plan_list;
+ CachedPlan *plan = plansource->gplan;
+ MemoryContext oldcxt = CurrentMemoryContext,
+ plan_context;
+ PlannedStmt *plannedstmt;
+
+ Assert(ActiveSnapshotSet());
+
+ /* Sanity checks */
+ if (plan == NULL)
+ elog(ERROR, "GetSingleCachedPlan() called in the wrong context: plansource->gplan is NULL");
+ else if (plan->is_valid)
+ elog(ERROR, "GetSingleCachedPlan() called in the wrong context: plansource->gplan->is_valid");
+
+ /*
+ * The plansource might have become invalid since GetCachedPlan(). See the
+ * comment in BuildCachedPlan() for details on why this might happen.
+ *
+ * The risk is greater here because this function is called from the
+ * executor, meaning much more processing may have occurred compared to
+ * when BuildCachedPlan() is called from GetCachedPlan().
+ */
+ if (!plansource->is_valid)
+ query_list = RevalidateCachedQuery(plansource, queryEnv);
+ Assert(query_list != NIL);
+
+ /*
+ * Build a new generic plan for the query_index'th query, but make a copy
+ * to be scribbled on by the planner
+ */
+ query_list = list_make1(copyObject(list_nth_node(Query, query_list,
+ query_index)));
+ plan_list = pg_plan_queries(query_list, plansource->query_string,
+ plansource->cursor_options, NULL);
+
+ list_free_deep(query_list);
+
+ /*
+ * Make a dedicated memory context for the CachedPlan and its subsidiary
+ * data so that we can release it in ReleaseCachedPlan() that will be
+ * called in FreeQueryDesc().
+ */
+ plan_context = AllocSetContextCreate(CurrentMemoryContext,
+ "Standalone CachedPlan",
+ ALLOCSET_START_SMALL_SIZES);
+ MemoryContextCopyAndSetIdentifier(plan_context, plansource->query_string);
+
+ /*
+ * Copy plan into the new context.
+ */
+ MemoryContextSwitchTo(plan_context);
+ plan_list = copyObject(plan_list);
+
+ /*
+ * Create and fill the CachedPlan struct within the new context.
+ */
+ plan = (CachedPlan *) palloc(sizeof(CachedPlan));
+ plan->magic = CACHEDPLAN_MAGIC;
+ plan->stmt_list = plan_list;
+
+ plan->planRoleId = GetUserId();
+ Assert(list_length(plan_list) == 1);
+ plannedstmt = linitial_node(PlannedStmt, plan_list);
+
+ /*
+ * CachedPlan is dependent on role either if RLS affected the rewrite
+ * phase or if a role dependency was injected during planning. And it's
+ * transient if any plan is marked so.
+ */
+ plan->dependsOnRole = plansource->dependsOnRLS || plannedstmt->dependsOnRole;
+ if (plannedstmt->transientPlan)
+ {
+ Assert(TransactionIdIsNormal(TransactionXmin));
+ plan->saved_xmin = TransactionXmin;
+ }
+ else
+ plan->saved_xmin = InvalidTransactionId;
+ plan->refcount = 0;
+ plan->context = plan_context;
+ plan->is_oneshot = false;
+ plan->is_generic = true;
+ plan->is_saved = false;
+ plan->is_valid = true;
+ plan->is_standalone = true;
+ plan->generation = 1;
+ MemoryContextSwitchTo(oldcxt);
+
+ /*
+ * Add the entry to the global list of "standalone" cached plans. It is
+ * removed from the list by ReleaseCachedPlan().
+ */
+ dlist_push_tail(&standalone_plan_list, &plan->node);
+
+ return plan;
+}
+
/*
* ReleaseCachedPlan: release active use of a cached plan.
*
@@ -1309,6 +1435,10 @@ ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner)
/* Mark it no longer valid */
plan->magic = 0;
+ /* Remove from the global list if we are a standalone plan. */
+ if (plan->is_standalone)
+ dlist_delete(&plan->node);
+
/* One-shot plans do not own their context, so we can't free them */
if (!plan->is_oneshot)
MemoryContextDelete(plan->context);
@@ -2066,6 +2196,33 @@ PlanCacheRelCallback(Datum arg, Oid relid)
cexpr->is_valid = false;
}
}
+
+ /* Finally, invalidate any standalone cached plans */
+ dlist_foreach(iter, &standalone_plan_list)
+ {
+ CachedPlan *cplan = dlist_container(CachedPlan,
+ node, iter.cur);
+
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+
+ if (cplan->is_valid)
+ {
+ ListCell *lc;
+
+ foreach(lc, cplan->stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc);
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ continue; /* Ignore utility statements */
+ if ((relid == InvalidOid) ? plannedstmt->relationOids != NIL :
+ list_member_oid(plannedstmt->relationOids, relid))
+ cplan->is_valid = false;
+ if (!cplan->is_valid)
+ break; /* out of stmt_list scan */
+ }
+ }
+ }
}
/*
@@ -2176,6 +2333,44 @@ PlanCacheObjectCallback(Datum arg, int cacheid, uint32 hashvalue)
}
}
}
+
+ /* Finally, invalidate any standalone cached plans */
+ dlist_foreach(iter, &standalone_plan_list)
+ {
+ CachedPlan *cplan = dlist_container(CachedPlan,
+ node, iter.cur);
+
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+
+ if (cplan->is_valid)
+ {
+ ListCell *lc;
+
+ foreach(lc, cplan->stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc);
+ ListCell *lc3;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ continue; /* Ignore utility statements */
+ foreach(lc3, plannedstmt->invalItems)
+ {
+ PlanInvalItem *item = (PlanInvalItem *) lfirst(lc3);
+
+ if (item->cacheId != cacheid)
+ continue;
+ if (hashvalue == 0 ||
+ item->hashValue == hashvalue)
+ {
+ cplan->is_valid = false;
+ break; /* out of invalItems scan */
+ }
+ }
+ if (!cplan->is_valid)
+ break; /* out of stmt_list scan */
+ }
+ }
+ }
}
/*
@@ -2235,6 +2430,17 @@ ResetPlanCache(void)
cexpr->is_valid = false;
}
+
+ /* Finally, invalidate any standalone cached plans */
+ dlist_foreach(iter, &standalone_plan_list)
+ {
+ CachedPlan *cplan = dlist_container(CachedPlan,
+ node, iter.cur);
+
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+
+ cplan->is_valid = false;
+ }
}
/*
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 4a24613537..bf70fd4ce7 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,7 +284,8 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan)
+ CachedPlan *cplan,
+ CachedPlanSource *plansource)
{
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_NEW);
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
portal->stmts = stmts;
portal->cplan = cplan;
+ portal->plansource = plansource;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 21c71e0d53..a39989a950 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -104,6 +104,7 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ParamListInfo params, QueryEnvironment *queryEnv);
extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int plan_index,
IntoClause *into, ExplainState *es,
const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 8a5a9fe642..db21561c8c 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -258,6 +258,7 @@ extern void ExecASTruncateTriggers(EState *estate,
extern void AfterTriggerBeginXact(void);
extern void AfterTriggerBeginQuery(void);
extern void AfterTriggerEndQuery(EState *estate);
+extern void AfterTriggerAbortQuery(void);
extern void AfterTriggerFireDeferred(void);
extern void AfterTriggerEndXact(bool isCommit);
extern void AfterTriggerBeginSubXact(void);
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 0e7245435d..f6cb6479c0 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -36,6 +36,7 @@ typedef struct QueryDesc
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
CachedPlan *cplan; /* CachedPlan that supplies the plannedstmt */
+ bool cplan_release; /* Should FreeQueryDesc() release cplan? */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 69c3ebff00..5bc0edb5a0 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -198,6 +199,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
* prototypes from functions in execMain.c
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern void ExecutorStartExt(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource, int query_index);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count, bool execute_once);
@@ -261,6 +264,19 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the CachedPlan in es_cachedplan still valid?
+ *
+ * Called from InitPlan() because invalidation messages that affect the plan
+ * might be received after locks have been taken on runtime-prunable relations.
+ * The caller should take appropriate action if the plan has become invalid.
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
@@ -589,6 +605,7 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
+extern Relation ExecOpenScanIndexRelation(EState *estate, Oid indexid, int lockmode);
extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos);
extern void ExecCloseRangeTableRelations(EState *estate);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 181cf5ad09..aa984eee0f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -685,6 +685,7 @@ typedef struct EState
int es_top_eflags; /* eflags passed to ExecutorStart */
int es_instrument; /* OR of InstrumentOption flags */
bool es_finished; /* true when ExecutorFinish is done */
+ bool es_aborted; /* true when execution was aborted */
List *es_exprcontexts; /* List of ExprContexts within EState */
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0b5ee007ca..154f68f671 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -18,6 +18,7 @@
#include "access/tupdesc.h"
#include "lib/ilist.h"
#include "nodes/params.h"
+#include "nodes/parsenodes.h"
#include "tcop/cmdtag.h"
#include "utils/queryenvironment.h"
#include "utils/resowner.h"
@@ -152,6 +153,8 @@ typedef struct CachedPlan
bool is_generic; /* is it a reusable generic plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
+ bool is_standalone; /* is it not associated with a
+ * CachedPlanSource? */
Oid planRoleId; /* Role ID the plan was created for */
bool dependsOnRole; /* is plan specific to that role? */
TransactionId saved_xmin; /* if valid, replan when TransactionXmin
@@ -159,6 +162,12 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+
+ /*
+ * If the plan is not associated with a CachedPlanSource, it is saved in
+ * a separate global list.
+ */
+ dlist_node node; /* list link, if is_standalone */
} CachedPlan;
/*
@@ -224,6 +233,10 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+extern CachedPlan *GetSingleCachedPlan(CachedPlanSource *plansource,
+ int query_index,
+ QueryEnvironment *queryEnv);
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
@@ -245,4 +258,17 @@ CachedPlanRequiresLocking(CachedPlan *cplan)
return cplan->is_generic;
}
+/*
+ * CachedPlanValid
+ * Returns whether a cached generic plan is still valid.
+ *
+ * Invoked by the executor to check if the plan has not been invalidated after
+ * taking locks during the initialization of the plan.
+ */
+static inline bool
+CachedPlanValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
#endif /* PLANCACHE_H */
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 29f49829f2..58c3828d2c 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ CachedPlanSource *plansource; /* CachedPlanSource, for cplan */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -241,7 +242,8 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan);
+ CachedPlan *cplan,
+ CachedPlanSource *plansource);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..3eeb097fde 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-inval
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 155c8a8d55..304ca77f7b 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2024, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,41 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static void
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ CachedPlanValid(queryDesc->cplan) ? "valid" : "not valid");
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +123,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-inval.out b/src/test/modules/delay_execution/expected/cached-plan-inval.out
new file mode 100644
index 0000000000..e8efb6d9d9
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-inval.out
@@ -0,0 +1,175 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+------------------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = $1)
+(7 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on foo12_1 foo_1
+ Filter: (a = $1)
+(5 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(11 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(6 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ Update on foo3 foo
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(27 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ Update on foo3 foo
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(17 rows)
+
diff --git a/src/test/modules/delay_execution/meson.build b/src/test/modules/delay_execution/meson.build
index 41f3ac0b89..5a70b183d0 100644
--- a/src/test/modules/delay_execution/meson.build
+++ b/src/test/modules/delay_execution/meson.build
@@ -24,6 +24,7 @@ tests += {
'specs': [
'partition-addition',
'partition-removal-1',
+ 'cached-plan-inval',
],
},
}
diff --git a/src/test/modules/delay_execution/specs/cached-plan-inval.spec b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
new file mode 100644
index 0000000000..5b1f72b4a8
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
@@ -0,0 +1,65 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo12 PARTITION OF foo FOR VALUES IN (1, 2) PARTITION BY LIST (a);
+ CREATE TABLE foo12_1 PARTITION OF foo12 FOR VALUES IN (1);
+ CREATE TABLE foo12_2 PARTITION OF foo12 FOR VALUES IN (2);
+ CREATE INDEX foo12_1_a ON foo12_1 (a);
+ CREATE TABLE foo3 PARTITION OF foo FOR VALUES IN (3);
+ CREATE VIEW foov AS SELECT * FROM foo;
+ CREATE FUNCTION one () RETURNS int AS $$ BEGIN RETURN 1; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE FUNCTION two () RETURNS int AS $$ BEGIN RETURN 2; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE RULE update_foo AS ON UPDATE TO foo DO ALSO SELECT 1;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP RULE update_foo ON foo;
+ DROP TABLE foo;
+ DROP FUNCTION one(), two();
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# Another case with Append with run-time pruning
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Case with a rule adding another query
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo12_1_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
--
2.43.0
v53-0003-Defer-locking-of-runtime-prunable-relations-to-e.patchapplication/octet-stream; name=v53-0003-Defer-locking-of-runtime-prunable-relations-to-e.patchDownload
From 7a3e86a079be3fb6e6c8fa4a2ed8f0101fe0ee5b Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 7 Aug 2024 18:25:51 +0900
Subject: [PATCH v53 3/4] Defer locking of runtime-prunable relations to
executor
When preparing a cached plan for execution, plancache.c locks the
relations contained in the plan's range table to ensure it is safe for
execution. However, this simplistic approach, implemented in
AcquireExecutorLocks(), results in unnecessarily locking relations that
might be pruned during "initial" runtime pruning.
To optimize this, the locking is now deferred for relations that are
subject to "initial" runtime pruning. The planner now provides a set
of "unprunable" relations, available through the new
PlannedStmt.unprunableRelids field. AcquireExecutorLocks() will now
only lock those relations.
PlannedStmt.unprunableRelids is populated by subtracting the set of
initially prunable relids from the set of all RT indexes. The prunable
relids set is constructed by examining all PartitionPruneInfos during
set_plan_refs() and storing the RT indexes of partitions subject to
"initial" pruning steps. While at it, some duplicated code in
set_append_references() and set_mergeappend_references() that
constructs the prunable relids set has been refactored into a common
function.
To enable the executor to determine whether the plan tree it's
executing is a cached one, the CachedPlan is now made available via
the QueryDesc. The executor can call CachedPlanRequiresLocking(),
which returns true if the CachedPlan is a reusable generic plan that
might contain relations needing to be locked. If so, the executor
will lock any relation that is not in PlannedStmt.unprunableRelids.
Finally, an Assert has been added in ExecCheckPermissions() to ensure
that all relations whose permissions are checked have been properly
locked. This helps catch any accidental omission of relations from the
unprunableRelids set that should have their permissions checked.
This deferment introduces a window in which prunable relations may be
altered by concurrent DDL, potentially causing the plan to become
invalid. As a result, the executor might attempt to run an invalid plan,
leading to errors such as being unable to locate a partition-only index
during ExecInitIndexScan(). Future commits will introduce changes to
ready the executor to check plan validity during ExecutorStart() and
retry with a newly created plan if the original one becomes invalid
after taking deferred locks.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 ++--
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 3 +-
src/backend/executor/execMain.c | 45 +++++++++++++++++++++++++-
src/backend/executor/execParallel.c | 9 +++++-
src/backend/executor/execPartition.c | 37 ++++++++++++++++++---
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeAppend.c | 8 ++---
src/backend/executor/nodeMergeAppend.c | 2 +-
src/backend/executor/spi.c | 1 +
src/backend/optimizer/plan/planner.c | 2 ++
src/backend/optimizer/plan/setrefs.c | 11 +++++++
src/backend/partitioning/partprune.c | 20 +++++++++++-
src/backend/tcop/pquery.c | 10 +++++-
src/backend/utils/cache/plancache.c | 40 ++++++++++++++---------
src/include/commands/explain.h | 5 +--
src/include/executor/execPartition.h | 9 +++++-
src/include/executor/execdesc.h | 2 ++
src/include/nodes/execnodes.h | 2 ++
src/include/nodes/pathnodes.h | 6 ++++
src/include/nodes/plannodes.h | 11 +++++++
src/include/utils/plancache.h | 10 ++++++
25 files changed, 209 insertions(+), 39 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 91de442f43..db976f928a 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -552,7 +552,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 0b629b1f79..57a3375cad 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -324,7 +324,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 2819e479f8..13f5683cf6 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -507,7 +507,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -615,7 +615,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -671,7 +672,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, cplan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index fab59ad5f6..bd169edeff 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -742,6 +742,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 010097873d..69be74b4bd 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -438,7 +438,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 07257d4db9..311b9ebd5b 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -655,7 +655,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
+ ExplainOnePlan(pstmt, cplan, into, es, query_string, paramLI,
+ queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index dceef322af..cb5ed921d0 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -53,6 +53,7 @@
#include "miscadmin.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -90,6 +91,7 @@ static bool ExecCheckPermissionsModified(Oid relOid, Oid userid,
AclMode requiredPerms);
static void ExecCheckXactReadOnly(PlannedStmt *plannedstmt);
static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
+static inline bool ExecShouldLockRelations(EState *estate);
/* end of local decls */
@@ -598,6 +600,21 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
(rte->rtekind == RTE_SUBQUERY &&
rte->relkind == RELKIND_VIEW));
+ /*
+ * Ensure that we have at least an AccessShareLock on relations
+ * whose permissions need to be checked.
+ *
+ * Skip this check in a parallel worker because locks won't be
+ * taken until ExecInitNode() performs plan initialization.
+ *
+ * XXX: ExecCheckPermissions() in a parallel worker may be
+ * redundant with the checks done in the leader process, so this
+ * should be reviewed to ensure it’s necessary.
+ */
+ Assert(IsParallelWorker() ||
+ CheckRelationOidLockedByMe(rte->relid, AccessShareLock,
+ true));
+
(void) getRTEPermissionInfo(rteperminfos, rte);
/* Many-to-one mapping not allowed */
Assert(!bms_is_member(rte->perminfoindex, indexset));
@@ -860,11 +877,35 @@ ExecDoInitialPruning(EState *estate)
* result.
*/
if (prunestate->do_initial_prune)
- validsubplans = ExecFindMatchingSubPlans(prunestate, true);
+ {
+ List *leaf_partition_oids = NIL;
+
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &leaf_partition_oids);
+ if (ExecShouldLockRelations(estate))
+ {
+ ListCell *lc1;
+
+ foreach(lc1, leaf_partition_oids)
+ {
+ LockRelationOid(lfirst_oid(lc1), prunestate->lockmode);
+ }
+ }
+ }
estate->es_part_prune_results = lappend(estate->es_part_prune_results,
validsubplans);
}
}
+/*
+ * Locks might be needed only if running a cached plan that might contain
+ * unlocked relations, such as reused generic plans.
+ */
+static inline bool
+ExecShouldLockRelations(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? false :
+ CachedPlanRequiresLocking(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* InitPlan
@@ -878,6 +919,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ CachedPlan *cachedplan = queryDesc->cplan;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -897,6 +939,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = cachedplan;
/*
* Perform runtime "initial" pruning to determine the plan nodes that will
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index b01a2fdfdd..7519c9a860 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1257,8 +1257,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. We pass NULL for cachedplan, because
+ * we don't have a pointer to the CachedPlan in the leader's process. It's
+ * fine because the only reason the executor needs to see it is to decide
+ * if it should take locks on certain relations, but paraller workers
+ * always take locks anyway.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 08b1f3d030..861e64856d 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -26,6 +26,7 @@
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
#include "rewrite/rewriteManip.h"
+#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
@@ -196,7 +197,8 @@ static void PartitionPruneInitExecPruning(PartitionPruneInfo *pruneinfo,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ List **leaf_part_oids);
/*
@@ -1927,6 +1929,7 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
ALLOCSET_DEFAULT_SIZES);
i = 0;
+ prunestate->lockmode = NoLock;
foreach(lc, pruneinfo->prune_infos)
{
List *partrelpruneinfos = lfirst_node(List, lc);
@@ -1950,6 +1953,15 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
PartitionDesc partdesc;
PartitionKey partkey;
+ /*
+ * Assign the lock mode of the first (root) partitioned table's RTE
+ * as the lock mode to lock leaf partitions after initial pruning,
+ * if needed.
+ */
+ if (prunestate->lockmode == NoLock)
+ prunestate->lockmode = exec_rt_fetch(pinfo->rtindex, estate)->rellockmode;
+ Assert(prunestate->lockmode != NoLock);
+
/*
* We can rely on the copies of the partitioned table's partition
* key and partition descriptor appearing in its relcache entry,
@@ -1982,6 +1994,9 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
pprune->partrel = partrel;
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->relid_map = palloc(sizeof(Oid) * partdesc->nparts);
+ memcpy(pprune->relid_map, partdesc->oids,
+ sizeof(Oid) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts &&
memcmp(partdesc->oids, pinfo->relid_map,
@@ -2399,10 +2414,13 @@ PartitionPruneInitExecPruning(PartitionPruneInfo *pruneinfo,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * leaf_part_oids must be non-NULL if initial_prune is true.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ List **leaf_part_oids)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2437,7 +2455,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, leaf_part_oids);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2451,6 +2469,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (leaf_part_oids)
+ *leaf_part_oids = list_copy(*leaf_part_oids);
MemoryContextReset(prunestate->prune_context);
@@ -2467,7 +2487,8 @@ static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ List **leaf_part_oids)
{
Bitmapset *partset;
int i;
@@ -2494,8 +2515,13 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ if (leaf_part_oids)
+ *leaf_part_oids = lappend_oid(*leaf_part_oids,
+ pprune->relid_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2503,7 +2529,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ leaf_part_oids);
else
{
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 692854e2b3..6f6f45e0ad 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -840,6 +840,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index de7ebab5c2..006bdafaea 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -581,7 +581,7 @@ choose_next_subplan_locally(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
}
@@ -648,7 +648,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
/*
@@ -724,7 +724,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
mark_invalid_subplans_as_finished(node);
@@ -877,7 +877,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
classify_matching_subplans(node);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 3ed91808dd..f7821aa178 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -219,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 90d9834576..659bd6dcd9 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2684,6 +2684,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1b9071c774..9e47a7fd50 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -549,6 +549,8 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
+ result->unprunableRelids = bms_difference(bms_add_range(NULL, 1, list_length(result->rtable)),
+ glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index e2ea406c4e..8ce6d1149d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1764,8 +1764,19 @@ register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *prelinfo = lfirst(l2);
+ Bitmapset *present_leafpart_rtis = prelinfo->present_leafpart_rtis;
prelinfo->rtindex += rtoffset;
+ present_leafpart_rtis = offset_relid_set(present_leafpart_rtis,
+ rtoffset);
+ if (prelinfo->initial_pruning_steps != NIL)
+ glob->prunableRelids = bms_add_members(glob->prunableRelids,
+ present_leafpart_rtis);
+ /*
+ * Don't need this anymore, so set to NULL to save space in the
+ * final plan tree.
+ */
+ prelinfo->present_leafpart_rtis = NULL;
}
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 60fabb1734..c022c5ee0b 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -641,6 +641,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
PartitionedRelPruneInfo *pinfo = lfirst(lc);
RelOptInfo *subpart = find_base_rel(root, pinfo->rtindex);
Bitmapset *present_parts;
+ Bitmapset *present_leafpart_rtis;
int nparts = subpart->nparts;
int *subplan_map;
int *subpart_map;
@@ -657,7 +658,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
- present_parts = NULL;
+ present_parts = present_leafpart_rtis = NULL;
i = -1;
while ((i = bms_next_member(subpart->live_parts, i)) >= 0)
@@ -671,9 +672,25 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+
+ /*
+ * Track the RT indexes of partitions to ensure they are included
+ * in the prunableRelids set of relations that are locked during
+ * execution. This ensures that if the plan is cached, these
+ * partitions are locked when the plan is reused.
+ *
+ * Partitions without a subplan and sub-partitioned partitions
+ * where none of the sub-partitions have a subplan due to
+ * constraint exclusion are not included in this set. Instead,
+ * they are added to the unprunableRelids set, and the relations
+ * in this set are locked by AcquireExecutorLocks() before
+ * executing a cached plan.
+ */
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
+ present_leafpart_rtis = bms_add_member(present_leafpart_rtis,
+ partrel->relid);
/* Record finding this subplan */
subplansfound = bms_add_member(subplansfound, subplanidx);
@@ -691,6 +708,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Record the maps and other information. */
pinfo->present_parts = present_parts;
+ pinfo->present_leafpart_rtis = present_leafpart_rtis;
pinfo->nparts = nparts;
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index a1f8d03db1..6e8f6b1b8f 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -36,6 +36,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +66,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +79,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan supplying the plannedstmt */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * cplan: CachedPlan supplying the plan
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, cplan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -493,6 +498,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1276,6 +1282,7 @@ PortalRunMulti(Portal portal,
{
/* statement can set tag string */
ProcessQuery(pstmt,
+ portal->cplan,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1285,6 +1292,7 @@ PortalRunMulti(Portal portal,
{
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
+ portal->cplan,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 5af1a168ec..5b75dadf13 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -104,7 +104,8 @@ static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ bool generic);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
@@ -815,8 +816,11 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * On a "true" return, we have acquired locks on the "unprunableRelids" set
+ * for all plans in plansource->stmt_list. The plans are not completely
+ * race-condition-free until the executor takes locks on the set of prunable
+ * relations that survive initial runtime pruning during executor
+ * initialization;
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -893,10 +897,10 @@ CheckCachedPlan(CachedPlanSource *plansource)
* or it can be set to NIL if we need to re-copy the plansource's query_list.
*
* To build a generic, parameter-value-independent plan, pass NULL for
- * boundParams. To build a custom plan, pass the actual parameter values via
- * boundParams. For best effect, the PARAM_FLAG_CONST flag should be set on
- * each parameter value; otherwise the planner will treat the value as a
- * hint rather than a hard constant.
+ * boundParams, and true for generic. To build a custom plan, pass the actual
+ * parameter values via boundParams, and false for generic. For best effect,
+ * the PARAM_FLAG_CONST flag should be set on each parameter value; otherwise
+ * the planner will treat the value as a hint rather than a hard constant.
*
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
@@ -904,7 +908,8 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ bool generic)
{
CachedPlan *plan;
List *plist;
@@ -1026,6 +1031,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->refcount = 0;
plan->context = plan_context;
plan->is_oneshot = plansource->is_oneshot;
+ plan->is_generic = generic;
plan->is_saved = false;
plan->is_valid = true;
@@ -1196,7 +1202,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv, true);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1241,7 +1247,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv, false);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1387,8 +1393,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if there are any lockable relations. This is probably
+ * unnecessary given the previous check, but let's be safe.
*/
foreach(lc, plan->stmt_list)
{
@@ -1776,7 +1782,7 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ int rtindex;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1794,9 +1800,13 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ rtindex = -1;
+ while ((rtindex = bms_next_member(plannedstmt->unprunableRelids,
+ rtindex)) >= 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry,
+ plannedstmt->rtable,
+ rtindex - 1);
if (!(rte->rtekind == RTE_RELATION ||
(rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3ab0aae78f..21c71e0d53 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -103,8 +103,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
- ExplainState *es, const char *queryString,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ IntoClause *into, ExplainState *es,
+ const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index dc73de8738..ca39fa1feb 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * relid_map Partition OID by partition index.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -62,6 +63,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Oid *relid_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -91,6 +93,9 @@ typedef struct PartitionPruningData
* the clauses being unable to match to any tuple that the subplan could
* possibly produce.
*
+ * lockmode Lock mode to lock the leaf partitions with, if needed;
+ * this is same as the lock mode that the root partitioned
+ * table would be locked with.
* execparamids Contains paramids of PARAM_EXEC Params found within
* any of the partprunedata structs. Pruning must be
* done again each time the value of one of these
@@ -113,6 +118,7 @@ typedef struct PartitionPruningData
*/
typedef struct PartitionPruneState
{
+ int lockmode;
Bitmapset *execparamids;
Bitmapset *other_subplans;
MemoryContext prune_context;
@@ -128,7 +134,8 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ List **leaf_part_oids);
extern PartitionPruneState *CreatePartitionPruneState(EState *estate,
PartitionPruneInfo *pruneinfo);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 0a7274e26c..0e7245435d 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ CachedPlan *cplan; /* CachedPlan that supplies the plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +58,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index daf04dcf5c..181cf5ad09 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -42,6 +42,7 @@
#include "storage/condition_variable.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
+#include "utils/plancache.h"
#include "utils/reltrigger.h"
#include "utils/sharedtuplestore.h"
#include "utils/snapshot.h"
@@ -635,6 +636,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ CachedPlan *es_cachedplan;
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
List *es_part_prune_states; /* List of PartitionPruneState */
List *es_part_prune_results; /* List of Bitmapset */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 8d30b6e896..cc2190ea63 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -116,6 +116,12 @@ typedef struct PlannerGlobal
/* "flat" rangetable for executor */
List *finalrtable;
+ /*
+ * RT indexes of relations subject to removal from the plan due to runtime
+ * pruning at plan initialization time
+ */
+ Bitmapset *prunableRelids;
+
/* "flat" list of RTEPermissionInfos */
List *finalrteperminfos;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 39d0281c23..4f552550c8 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -74,6 +74,10 @@ typedef struct PlannedStmt
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *unprunableRelids; /* RT indexes of relations that are not
+ * subject to runtime pruning; for
+ * AcquireExecutorLocks() */
+
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
@@ -1465,6 +1469,13 @@ typedef struct PartitionedRelPruneInfo
/* Indexes of all partitions which subplans or subparts are present for */
Bitmapset *present_parts;
+ /*
+ * RT indexes of all leaf for which subplans are present; only used during
+ * planning to help in the construction of PlannerGlobal.prunableRelids
+ * and set to NULL afterwards to save space in the final plan tree.
+ */
+ Bitmapset *present_leafpart_rtis;
+
/* Length of the following arrays: */
int nparts;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a90dfdf906..0b5ee007ca 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -149,6 +149,7 @@ typedef struct CachedPlan
int magic; /* should equal CACHEDPLAN_MAGIC */
List *stmt_list; /* list of PlannedStmts */
bool is_oneshot; /* is it a "oneshot" plan? */
+ bool is_generic; /* is it a reusable generic plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
Oid planRoleId; /* Role ID the plan was created for */
@@ -235,4 +236,13 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
+/*
+ * CachedPlanRequiresLocking: should the executor acquire locks?
+ */
+static inline bool
+CachedPlanRequiresLocking(CachedPlan *cplan)
+{
+ return cplan->is_generic;
+}
+
#endif /* PLANCACHE_H */
--
2.43.0
On Tue, Sep 17, 2024 at 9:57 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Aug 29, 2024 at 10:34 PM Amit Langote <amitlangote09@gmail.com> wrote:
One idea that I think might be worth trying to reduce the footprint of
0003 is to try to lock the prunable relations in a step of InitPlan()
separate from ExecInitNode(), which can be implemented by doing the
initial runtime pruning in that separate step. That way, we'll have
all the necessary locks before calling ExecInitNode() and so we don't
need to sprinkle the CachedPlanStillValid() checks all over the place
and worry about missed checks and dealing with partially initialized
PlanState trees.I've worked on this and found that it results in a much simpler design.
Attached are 0001 and 0002, which contain patches to refactor the
runtime pruning code. These changes move initial pruning outside of
ExecInitNode() and use the results during ExecInitNode() to determine
the set of child subnodes to initialize.With that in place, the patches (0003, 0004) that move the locking of
prunable relations from plancache.c into the executor becomes simpler.
It no longer needs to modify any code called by ExecInitNode(). Since
no new locks are taken during ExecInitNode(), I didn't have to worry
about changing all the code involved in PlanState tree initialization
to add checks for CachedPlan validity. The check is only needed after
performing initial pruning, and if the CachedPlan is invalid,
ExecInitNode() won’t be called in the first place.
Sorry, I had missed merging some hunks into 0002 that fixed obsolete
comments. Fixed in the attached v54.
Regarding 0002, I was a bit bothered by the need to add a new function
just to iterate over the PartitionPruningDatas and the
PartitionedRelPruningData they contain, solely to initialize the
PartitionPruneContext needed for exec pruning. To address this, I
propose 0003, which moves the initialization of those contexts to be
done "lazily" in find_matching_subplan_recurse(), where they are
actually used. To make this work, I added an is_valid flag to
PartitionPruneContext, which is checked as follows in the code block
where it's initialized:
+ if (unlikely(!pprune->exec_context.is_valid))
I didn't notice any overhead of adding this to
find_matching_partitions_recurse() which is called for every instance
of exec pruning, so I think it's worthwhile to consider 0003.
I realized that I had missed considering, in the
delay-locking-to-executor patch (now 0004), that there may be plan
objects belonging to pruned partitions, such as RowMarks and
ResultRelInfos, which should not be initialized.
ExecGetRangeTableRelation() invoked with the RT indexes in these
objects would cause crashes in Assert builds since the pruned
partitions would not have been locked. I've updated the patch to
ignore RowMarks and result relations (in ModifyTable.resultRelations)
for pruned child relations, which required adding more accounting info
to EState to store the bitmapset of unpruned RT indexes. For
ResultRelInfos, I took the approach of memsetting them to 0 for pruned
result relations and adding checks at multiple sites to ensure the
ResultRelInfo being handled is valid. I recall previously proposing
lazy initialization for these objects when first needed [1]/messages/by-id/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp, which
would make the added code unnecessary, but I might save that for
another time.
--
Thanks, Amit Langote
[1]: /messages/by-id/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
Attachments:
v54-0003-Defer-locking-of-runtime-prunable-relations-to-e.patchapplication/x-patch; name=v54-0003-Defer-locking-of-runtime-prunable-relations-to-e.patchDownload
From 9ee8384daaff650ebea44e590ced0885fd2be8e3 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 7 Aug 2024 18:25:51 +0900
Subject: [PATCH v54 3/4] Defer locking of runtime-prunable relations to
executor
When preparing a cached plan for execution, plancache.c locks the
relations contained in the plan's range table to ensure it is safe for
execution. However, this simplistic approach, implemented in
AcquireExecutorLocks(), results in unnecessarily locking relations that
might be pruned during "initial" runtime pruning.
To optimize this, the locking is now deferred for relations that are
subject to "initial" runtime pruning. The planner now provides a set
of "unprunable" relations, available through the new
PlannedStmt.unprunableRelids field. AcquireExecutorLocks() will now
only lock those relations.
PlannedStmt.unprunableRelids is populated by subtracting the set of
initially prunable relids from the set of all RT indexes. The prunable
relids set is constructed by examining all PartitionPruneInfos during
set_plan_refs() and storing the RT indexes of partitions subject to
"initial" pruning steps. While at it, some duplicated code in
set_append_references() and set_mergeappend_references() that
constructs the prunable relids set has been refactored into a common
function.
To enable the executor to determine whether the plan tree it's
executing is a cached one, the CachedPlan is now made available via
the QueryDesc. The executor can call CachedPlanRequiresLocking(),
which returns true if the CachedPlan is a reusable generic plan that
might contain relations needing to be locked. If so, the executor
will lock any relation that is not in PlannedStmt.unprunableRelids.
Finally, an Assert has been added in ExecCheckPermissions() to ensure
that all relations whose permissions are checked have been properly
locked. This helps catch any accidental omission of relations from the
unprunableRelids set that should have their permissions checked.
This deferment introduces a window in which prunable relations may be
altered by concurrent DDL, potentially causing the plan to become
invalid. As a result, the executor might attempt to run an invalid plan,
leading to errors such as being unable to locate a partition-only index
during ExecInitIndexScan(). Future commits will introduce changes to
ready the executor to check plan validity during ExecutorStart() and
retry with a newly created plan if the original one becomes invalid
after taking deferred locks.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 ++--
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 3 +-
src/backend/executor/execMain.c | 45 +++++++++++++++++++++++++-
src/backend/executor/execParallel.c | 9 +++++-
src/backend/executor/execPartition.c | 37 ++++++++++++++++++---
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeAppend.c | 8 ++---
src/backend/executor/nodeMergeAppend.c | 2 +-
src/backend/executor/spi.c | 1 +
src/backend/optimizer/plan/planner.c | 2 ++
src/backend/optimizer/plan/setrefs.c | 11 +++++++
src/backend/partitioning/partprune.c | 20 +++++++++++-
src/backend/tcop/pquery.c | 10 +++++-
src/backend/utils/cache/plancache.c | 40 ++++++++++++++---------
src/include/commands/explain.h | 5 +--
src/include/executor/execPartition.h | 9 +++++-
src/include/executor/execdesc.h | 2 ++
src/include/nodes/execnodes.h | 2 ++
src/include/nodes/pathnodes.h | 6 ++++
src/include/nodes/plannodes.h | 11 +++++++
src/include/utils/plancache.h | 10 ++++++
25 files changed, 209 insertions(+), 39 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 91de442f43..db976f928a 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -552,7 +552,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 0b629b1f79..57a3375cad 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -324,7 +324,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index aaec439892..49f7370734 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -509,7 +509,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -617,7 +617,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -673,7 +674,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, cplan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index fab59ad5f6..bd169edeff 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -742,6 +742,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 010097873d..69be74b4bd 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -438,7 +438,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 07257d4db9..311b9ebd5b 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -655,7 +655,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
+ ExplainOnePlan(pstmt, cplan, into, es, query_string, paramLI,
+ queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 8fab8dbccd..cb7a2bc456 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -53,6 +53,7 @@
#include "miscadmin.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -90,6 +91,7 @@ static bool ExecCheckPermissionsModified(Oid relOid, Oid userid,
AclMode requiredPerms);
static void ExecCheckXactReadOnly(PlannedStmt *plannedstmt);
static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
+static inline bool ExecShouldLockRelations(EState *estate);
/* end of local decls */
@@ -600,6 +602,21 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
(rte->rtekind == RTE_SUBQUERY &&
rte->relkind == RELKIND_VIEW));
+ /*
+ * Ensure that we have at least an AccessShareLock on relations
+ * whose permissions need to be checked.
+ *
+ * Skip this check in a parallel worker because locks won't be
+ * taken until ExecInitNode() performs plan initialization.
+ *
+ * XXX: ExecCheckPermissions() in a parallel worker may be
+ * redundant with the checks done in the leader process, so this
+ * should be reviewed to ensure it’s necessary.
+ */
+ Assert(IsParallelWorker() ||
+ CheckRelationOidLockedByMe(rte->relid, AccessShareLock,
+ true));
+
(void) getRTEPermissionInfo(rteperminfos, rte);
/* Many-to-one mapping not allowed */
Assert(!bms_is_member(rte->perminfoindex, indexset));
@@ -862,11 +879,35 @@ ExecDoInitialPruning(EState *estate)
* result.
*/
if (prunestate->do_initial_prune)
- validsubplans = ExecFindMatchingSubPlans(prunestate, true);
+ {
+ List *leaf_partition_oids = NIL;
+
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &leaf_partition_oids);
+ if (ExecShouldLockRelations(estate))
+ {
+ ListCell *lc1;
+
+ foreach(lc1, leaf_partition_oids)
+ {
+ LockRelationOid(lfirst_oid(lc1), prunestate->lockmode);
+ }
+ }
+ }
estate->es_part_prune_results = lappend(estate->es_part_prune_results,
validsubplans);
}
}
+/*
+ * Locks might be needed only if running a cached plan that might contain
+ * unlocked relations, such as reused generic plans.
+ */
+static inline bool
+ExecShouldLockRelations(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? false :
+ CachedPlanRequiresLocking(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* InitPlan
@@ -880,6 +921,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ CachedPlan *cachedplan = queryDesc->cplan;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -899,6 +941,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = cachedplan;
/*
* Perform runtime "initial" pruning to determine the plan nodes that will
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index b01a2fdfdd..7519c9a860 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1257,8 +1257,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. We pass NULL for cachedplan, because
+ * we don't have a pointer to the CachedPlan in the leader's process. It's
+ * fine because the only reason the executor needs to see it is to decide
+ * if it should take locks on certain relations, but paraller workers
+ * always take locks anyway.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d205e64e84..f958973378 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -26,6 +26,7 @@
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
#include "rewrite/rewriteManip.h"
+#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
@@ -196,7 +197,8 @@ static void PartitionPruneInitExecPruning(PartitionPruneInfo *pruneinfo,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ List **leaf_part_oids);
/*
@@ -1940,6 +1942,7 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
ALLOCSET_DEFAULT_SIZES);
i = 0;
+ prunestate->lockmode = NoLock;
foreach(lc, pruneinfo->prune_infos)
{
List *partrelpruneinfos = lfirst_node(List, lc);
@@ -1963,6 +1966,15 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
PartitionDesc partdesc;
PartitionKey partkey;
+ /*
+ * Assign the lock mode of the first (root) partitioned table's RTE
+ * as the lock mode to lock leaf partitions after initial pruning,
+ * if needed.
+ */
+ if (prunestate->lockmode == NoLock)
+ prunestate->lockmode = exec_rt_fetch(pinfo->rtindex, estate)->rellockmode;
+ Assert(prunestate->lockmode != NoLock);
+
/*
* We can rely on the copies of the partitioned table's partition
* key and partition descriptor appearing in its relcache entry,
@@ -1995,6 +2007,9 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
pprune->partrel = partrel;
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->relid_map = palloc(sizeof(Oid) * partdesc->nparts);
+ memcpy(pprune->relid_map, partdesc->oids,
+ sizeof(Oid) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts &&
memcmp(partdesc->oids, pinfo->relid_map,
@@ -2412,10 +2427,13 @@ PartitionPruneInitExecPruning(PartitionPruneInfo *pruneinfo,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * leaf_part_oids must be non-NULL if initial_prune is true.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ List **leaf_part_oids)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2450,7 +2468,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, leaf_part_oids);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2464,6 +2482,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (leaf_part_oids)
+ *leaf_part_oids = list_copy(*leaf_part_oids);
MemoryContextReset(prunestate->prune_context);
@@ -2480,7 +2500,8 @@ static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ List **leaf_part_oids)
{
Bitmapset *partset;
int i;
@@ -2507,8 +2528,13 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ if (leaf_part_oids)
+ *leaf_part_oids = lappend_oid(*leaf_part_oids,
+ pprune->relid_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2516,7 +2542,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ leaf_part_oids);
else
{
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 692854e2b3..6f6f45e0ad 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -840,6 +840,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index de7ebab5c2..006bdafaea 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -581,7 +581,7 @@ choose_next_subplan_locally(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
}
@@ -648,7 +648,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
/*
@@ -724,7 +724,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
mark_invalid_subplans_as_finished(node);
@@ -877,7 +877,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
classify_matching_subplans(node);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 3ed91808dd..f7821aa178 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -219,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 90d9834576..659bd6dcd9 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2684,6 +2684,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1b9071c774..9e47a7fd50 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -549,6 +549,8 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
+ result->unprunableRelids = bms_difference(bms_add_range(NULL, 1, list_length(result->rtable)),
+ glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index e2ea406c4e..8ce6d1149d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1764,8 +1764,19 @@ register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *prelinfo = lfirst(l2);
+ Bitmapset *present_leafpart_rtis = prelinfo->present_leafpart_rtis;
prelinfo->rtindex += rtoffset;
+ present_leafpart_rtis = offset_relid_set(present_leafpart_rtis,
+ rtoffset);
+ if (prelinfo->initial_pruning_steps != NIL)
+ glob->prunableRelids = bms_add_members(glob->prunableRelids,
+ present_leafpart_rtis);
+ /*
+ * Don't need this anymore, so set to NULL to save space in the
+ * final plan tree.
+ */
+ prelinfo->present_leafpart_rtis = NULL;
}
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 60fabb1734..c022c5ee0b 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -641,6 +641,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
PartitionedRelPruneInfo *pinfo = lfirst(lc);
RelOptInfo *subpart = find_base_rel(root, pinfo->rtindex);
Bitmapset *present_parts;
+ Bitmapset *present_leafpart_rtis;
int nparts = subpart->nparts;
int *subplan_map;
int *subpart_map;
@@ -657,7 +658,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
- present_parts = NULL;
+ present_parts = present_leafpart_rtis = NULL;
i = -1;
while ((i = bms_next_member(subpart->live_parts, i)) >= 0)
@@ -671,9 +672,25 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+
+ /*
+ * Track the RT indexes of partitions to ensure they are included
+ * in the prunableRelids set of relations that are locked during
+ * execution. This ensures that if the plan is cached, these
+ * partitions are locked when the plan is reused.
+ *
+ * Partitions without a subplan and sub-partitioned partitions
+ * where none of the sub-partitions have a subplan due to
+ * constraint exclusion are not included in this set. Instead,
+ * they are added to the unprunableRelids set, and the relations
+ * in this set are locked by AcquireExecutorLocks() before
+ * executing a cached plan.
+ */
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
+ present_leafpart_rtis = bms_add_member(present_leafpart_rtis,
+ partrel->relid);
/* Record finding this subplan */
subplansfound = bms_add_member(subplansfound, subplanidx);
@@ -691,6 +708,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Record the maps and other information. */
pinfo->present_parts = present_parts;
+ pinfo->present_leafpart_rtis = present_leafpart_rtis;
pinfo->nparts = nparts;
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index a1f8d03db1..6e8f6b1b8f 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -36,6 +36,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +66,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +79,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan supplying the plannedstmt */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * cplan: CachedPlan supplying the plan
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, cplan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -493,6 +498,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1276,6 +1282,7 @@ PortalRunMulti(Portal portal,
{
/* statement can set tag string */
ProcessQuery(pstmt,
+ portal->cplan,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1285,6 +1292,7 @@ PortalRunMulti(Portal portal,
{
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
+ portal->cplan,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 5af1a168ec..5b75dadf13 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -104,7 +104,8 @@ static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ bool generic);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
@@ -815,8 +816,11 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * On a "true" return, we have acquired locks on the "unprunableRelids" set
+ * for all plans in plansource->stmt_list. The plans are not completely
+ * race-condition-free until the executor takes locks on the set of prunable
+ * relations that survive initial runtime pruning during executor
+ * initialization;
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -893,10 +897,10 @@ CheckCachedPlan(CachedPlanSource *plansource)
* or it can be set to NIL if we need to re-copy the plansource's query_list.
*
* To build a generic, parameter-value-independent plan, pass NULL for
- * boundParams. To build a custom plan, pass the actual parameter values via
- * boundParams. For best effect, the PARAM_FLAG_CONST flag should be set on
- * each parameter value; otherwise the planner will treat the value as a
- * hint rather than a hard constant.
+ * boundParams, and true for generic. To build a custom plan, pass the actual
+ * parameter values via boundParams, and false for generic. For best effect,
+ * the PARAM_FLAG_CONST flag should be set on each parameter value; otherwise
+ * the planner will treat the value as a hint rather than a hard constant.
*
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
@@ -904,7 +908,8 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ bool generic)
{
CachedPlan *plan;
List *plist;
@@ -1026,6 +1031,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->refcount = 0;
plan->context = plan_context;
plan->is_oneshot = plansource->is_oneshot;
+ plan->is_generic = generic;
plan->is_saved = false;
plan->is_valid = true;
@@ -1196,7 +1202,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv, true);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1241,7 +1247,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv, false);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1387,8 +1393,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if there are any lockable relations. This is probably
+ * unnecessary given the previous check, but let's be safe.
*/
foreach(lc, plan->stmt_list)
{
@@ -1776,7 +1782,7 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ int rtindex;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1794,9 +1800,13 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ rtindex = -1;
+ while ((rtindex = bms_next_member(plannedstmt->unprunableRelids,
+ rtindex)) >= 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry,
+ plannedstmt->rtable,
+ rtindex - 1);
if (!(rte->rtekind == RTE_RELATION ||
(rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3ab0aae78f..21c71e0d53 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -103,8 +103,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
- ExplainState *es, const char *queryString,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ IntoClause *into, ExplainState *es,
+ const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index c0ba23097f..496ecef4c4 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -48,6 +48,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * relid_map Partition OID by partition index.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -65,6 +66,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Oid *relid_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -94,6 +96,9 @@ typedef struct PartitionPruningData
* the clauses being unable to match to any tuple that the subplan could
* possibly produce.
*
+ * lockmode Lock mode to lock the leaf partitions with, if needed;
+ * this is same as the lock mode that the root partitioned
+ * table would be locked with.
* execparamids Contains paramids of PARAM_EXEC Params found within
* any of the partprunedata structs. Pruning must be
* done again each time the value of one of these
@@ -116,6 +121,7 @@ typedef struct PartitionPruningData
*/
typedef struct PartitionPruneState
{
+ int lockmode;
Bitmapset *execparamids;
Bitmapset *other_subplans;
MemoryContext prune_context;
@@ -131,7 +137,8 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ List **leaf_part_oids);
extern PartitionPruneState *CreatePartitionPruneState(EState *estate,
PartitionPruneInfo *pruneinfo);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 0a7274e26c..0e7245435d 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ CachedPlan *cplan; /* CachedPlan that supplies the plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +58,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 518a9fcd15..1ed925b99b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -42,6 +42,7 @@
#include "storage/condition_variable.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
+#include "utils/plancache.h"
#include "utils/reltrigger.h"
#include "utils/sharedtuplestore.h"
#include "utils/snapshot.h"
@@ -636,6 +637,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ CachedPlan *es_cachedplan;
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
List *es_part_prune_states; /* List of PartitionPruneState */
List *es_part_prune_results; /* List of Bitmapset */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 8d30b6e896..cc2190ea63 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -116,6 +116,12 @@ typedef struct PlannerGlobal
/* "flat" rangetable for executor */
List *finalrtable;
+ /*
+ * RT indexes of relations subject to removal from the plan due to runtime
+ * pruning at plan initialization time
+ */
+ Bitmapset *prunableRelids;
+
/* "flat" list of RTEPermissionInfos */
List *finalrteperminfos;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 39d0281c23..4f552550c8 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -74,6 +74,10 @@ typedef struct PlannedStmt
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *unprunableRelids; /* RT indexes of relations that are not
+ * subject to runtime pruning; for
+ * AcquireExecutorLocks() */
+
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
@@ -1465,6 +1469,13 @@ typedef struct PartitionedRelPruneInfo
/* Indexes of all partitions which subplans or subparts are present for */
Bitmapset *present_parts;
+ /*
+ * RT indexes of all leaf for which subplans are present; only used during
+ * planning to help in the construction of PlannerGlobal.prunableRelids
+ * and set to NULL afterwards to save space in the final plan tree.
+ */
+ Bitmapset *present_leafpart_rtis;
+
/* Length of the following arrays: */
int nparts;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a90dfdf906..0b5ee007ca 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -149,6 +149,7 @@ typedef struct CachedPlan
int magic; /* should equal CACHEDPLAN_MAGIC */
List *stmt_list; /* list of PlannedStmts */
bool is_oneshot; /* is it a "oneshot" plan? */
+ bool is_generic; /* is it a reusable generic plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
Oid planRoleId; /* Role ID the plan was created for */
@@ -235,4 +236,13 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
+/*
+ * CachedPlanRequiresLocking: should the executor acquire locks?
+ */
+static inline bool
+CachedPlanRequiresLocking(CachedPlan *cplan)
+{
+ return cplan->is_generic;
+}
+
#endif /* PLANCACHE_H */
--
2.43.0
v54-0002-Perform-runtime-initial-pruning-outside-ExecInit.patchapplication/x-patch; name=v54-0002-Perform-runtime-initial-pruning-outside-ExecInit.patchDownload
From 1deb363d5d0b7573d116198798a3e550be9a320f Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 12 Sep 2024 15:44:43 +0900
Subject: [PATCH v54 2/4] Perform runtime initial pruning outside
ExecInitNode()
This commit follows up on the previous change that moved
PartitionPruneInfos out of individual plan nodes into a list in
PlannedStmt. It moves the initialization of PartitionPruneStates
and runtime initial pruning out of ExecInitNode() and into a new
routine, ExecDoInitialPruning(), which is called by InitPlan()
before ExecInitNode() is invoked on the main plan tree and subplans.
ExecDoInitialPruning() stores the PartitionPruneStates in a list
matching the length of es_part_prune_infos (which holds the
PartitionPruneInfos from PlannedStmt), allowing both lists to share
the same index. It also saves the initial pruning result -- a
bitmapset of indexes for surviving child subnodes -- in a similarly
indexed list.
While the initial pruning is done earlier, the execution pruning
context information (needed for runtime pruning) is initialized
later during ExecInitNode() for the parent plan node, as it requires
access to the parent node's PlanState struct.
---
src/backend/executor/execMain.c | 55 ++++++++
src/backend/executor/execPartition.c | 179 +++++++++++++++++++++------
src/include/executor/execPartition.h | 6 +
src/include/nodes/execnodes.h | 2 +
4 files changed, 202 insertions(+), 40 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index e6197c165e..8fab8dbccd 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -46,6 +46,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "mb/pg_wchar.h"
@@ -818,6 +819,54 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
+/*
+ * ExecDoInitialPruning
+ * Perform runtime "initial" pruning, if necessary, to determine the set
+ * of child subnodes that need to be initialized during ExecInitNode()
+ * for plan nodes that support partition pruning.
+ *
+ * For each PartitionPruneInfo in estate->es_part_prune_infos, this function
+ * creates a PartitionPruneState (even if no initial pruning is done) and adds
+ * it to es_part_prune_states. For PartitionPruneInfo entries that include
+ * initial pruning steps, the result of those steps is saved as a bitmapset
+ * of indexes representing child subnodes that are "valid" and should be
+ * initialized for execution.
+ */
+static void
+ExecDoInitialPruning(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+ Bitmapset *validsubplans = NULL;
+
+ /*
+ * Create the working data structure for pruning, and save it for use
+ * later in ExecInitPartitionPruning(), which will be called by the
+ * parent plan node's ExecInit* function.
+ */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+
+ /*
+ * Perform an initial partition pruning pass, if necessary, and save
+ * the bitmapset of valid subplans for use in
+ * ExecInitPartitionPruning(). If no initial pruning is performed, we
+ * still store a NULL to ensure that es_part_prune_results is the same
+ * length as es_part_prune_infos. This ensures that
+ * ExecInitPartitionPruning() can use the same index to locate the
+ * result.
+ */
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true);
+ estate->es_part_prune_results = lappend(estate->es_part_prune_results,
+ validsubplans);
+ }
+}
/* ----------------------------------------------------------------
* InitPlan
@@ -850,7 +899,13 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+
+ /*
+ * Perform runtime "initial" pruning to determine the plan nodes that will
+ * not be executed.
+ */
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ ExecDoInitialPruning(estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index ec730674f2..d205e64e84 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -181,8 +181,6 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
-static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -192,6 +190,9 @@ static void InitPartitionPruneContext(PartitionPruneContext *context,
static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
Bitmapset *initially_valid_subplans,
int n_total_subplans);
+static void PartitionPruneInitExecPruning(PartitionPruneInfo *pruneinfo,
+ PartitionPruneState *prunestate,
+ PlanState *planstate);
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
@@ -1783,20 +1784,26 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
/*
* ExecInitPartitionPruning
- * Initialize data structure needed for run-time partition pruning and
- * do initial pruning if needed
+ * Initialize the data structures needed for runtime "exec" partition
+ * pruning and return the result of initial pruning, if available.
*
* 'root_parent_relids' identifies the relation to which both the parent plan
- * and the PartitionPruneInfo given by 'part_prune_index' belong.
+ * and the PartitionPruneInfo associated with 'part_prune_index' belong.
*
- * On return, *initially_valid_subplans is assigned the set of indexes of
- * child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * The PartitionPruneState would have been created by ExecDoInitialPruning()
+ * and stored as the part_prune_index'th element of EState.es_part_prune_states.
+ * Here, we initialize only the PartitionPruneContext necessary for execution
+ * pruning.
*
- * If subplans are indeed pruned, subplan_map arrays contained in the returned
- * PartitionPruneState are re-sequenced to not count those, though only if the
- * maps will be needed for subsequent execution pruning passes.
+ * On return, *initially_valid_subplans is assigned the set of indexes of child
+ * subplans that must be initialized alongside the parent plan node. Initial
+ * pruning would have been performed by ExecDoInitialPruning() if necessary,
+ * and the bitmapset of surviving subplans' indexes would have been stored as
+ * the part_prune_index'th element of EState.es_part_prune_results.
+ *
+ * If subplans are pruned, the subplan_map arrays in the returned
+ * PartitionPruneState are re-sequenced to exclude those subplans, but only if
+ * the maps will be needed for subsequent execution pruning passes.
*/
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
@@ -1821,17 +1828,21 @@ ExecInitPartitionPruning(PlanState *planstate,
bmsToString(root_parent_relids),
bmsToString(pruneinfo->root_parent_relids)));
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
-
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
-
/*
- * Perform an initial partition prune pass, if required.
+ * ExecDoInitialPruning() must have initialized the PartitionPruneState to
+ * perform the initial pruning. Now we simply need to initialize the
+ * context information for exec pruning.
*/
+ prunestate = list_nth(estate->es_part_prune_states, part_prune_index);
+ Assert(prunestate != NULL);
+ if (prunestate->do_exec_prune)
+ PartitionPruneInitExecPruning(pruneinfo, prunestate, planstate);
+
+ /* Use the result of initial pruning done by ExecDoInitialPruning(). */
if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ *initially_valid_subplans = list_nth_node(Bitmapset,
+ estate->es_part_prune_results,
+ part_prune_index);
else
{
/* No pruning, so we'll need to initialize all subplans */
@@ -1877,16 +1888,23 @@ ExecInitPartitionPruning(PlanState *planstate,
* stored in each PartitionedRelPruningData can be re-used each time we
* re-evaluate which partitions match the pruning steps provided in each
* PartitionedRelPruneInfo.
+ *
+ * Note that we only initialize the PartitionPruneContext (which is placed into
+ * each PartitionedRelPruningData) for initial pruning here. Execution pruning
+ * requires access to the parent plan node's PlanState, which is not available
+ * when this function is called from ExecDoInitialPruning(), so it is
+ * initialized later during ExecInitPartitionPruning() by calling
+ * PartitionPruneInitExecPruning().
*/
-static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+PartitionPruneState *
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
- EState *estate = planstate->state;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
+ /* We may need an expression context to evaluate partition exprs */
+ ExprContext *econtext = CreateExprContext(estate);
/* For data reading, executor always includes detached partitions */
if (estate->es_partition_directory == NULL)
@@ -1974,6 +1992,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* set to -1, as if they were pruned. By construction, both
* arrays are in partition bounds order.
*/
+ pprune->partrel = partrel;
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
@@ -2073,29 +2092,31 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
- partdesc, partkey, planstate,
+ partdesc, partkey, NULL,
econtext);
/* Record whether initial pruning is needed at any level */
prunestate->do_initial_prune = true;
}
- pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps &&
- !(econtext->ecxt_estate->es_top_eflags & EXEC_FLAG_EXPLAIN_GENERIC))
- {
- InitPartitionPruneContext(&pprune->exec_context,
- pinfo->exec_pruning_steps,
- partdesc, partkey, planstate,
- econtext);
- /* Record whether exec pruning is needed at any level */
- prunestate->do_exec_prune = true;
- }
/*
- * Accumulate the IDs of all PARAM_EXEC Params affecting the
- * partitioning decisions at this plan node.
+ * The exec pruning context will be initialized in
+ * ExecInitPartitionPruning() when called during the initialization
+ * of the parent plan node.
+ *
+ * pprune->exec_pruning_steps is set to NIL to prevent
+ * ExecFindMatchingSubPlans() from accessing an uninitialized
+ * pprune->exec_context during the initial pruning by
+ * ExecDoInitialPruning().
+ *
+ * prunestate->do_exec_prune is set to indicate whether
+ * PartitionPruneInitExecPruning() needs to be called by
+ * ExecInitPartitionPruning(). This optimization avoids
+ * unnecessary cycles when only initial pruning is required.
*/
- prunestate->execparamids = bms_add_members(prunestate->execparamids,
- pinfo->execparamids);
+ pprune->exec_pruning_steps = NIL;
+ if (pinfo->exec_pruning_steps &&
+ !(econtext->ecxt_estate->es_top_eflags & EXEC_FLAG_EXPLAIN_GENERIC))
+ prunestate->do_exec_prune = true;
j++;
}
@@ -2305,6 +2326,84 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
pfree(new_subplan_indexes);
}
+/*
+ * PartitionPruneInitExecPruning
+ * Initialize PartitionPruneState for exec pruning.
+ */
+static void
+PartitionPruneInitExecPruning(PartitionPruneInfo *pruneinfo,
+ PartitionPruneState *prunestate,
+ PlanState *planstate)
+{
+ EState *estate = planstate->state;
+ int i;
+ ExprContext *econtext;
+
+ /* CreatePartitionPruneState() must have initialized. */
+ Assert(estate->es_partition_directory != NULL);
+
+ /* CreatePartitionPruneState() must have set this. */
+ Assert(prunestate->do_exec_prune);
+
+ /*
+ * Create ExprContext if not already done for the planstate. We may need
+ * an expression context to evaluate partition exprs.
+ */
+ ExecAssignExprContext(estate, planstate);
+ econtext = planstate->ps_ExprContext;
+ for (i = 0; i < prunestate->num_partprunedata; i++)
+ {
+ List *partrel_pruneinfos =
+ list_nth_node(List, pruneinfo->prune_infos, i);
+ PartitionPruningData *prunedata = prunestate->partprunedata[i];
+ int j;
+
+ for (j = 0; j < prunedata->num_partrelprunedata; j++)
+ {
+ PartitionedRelPruneInfo *pinfo =
+ list_nth_node(PartitionedRelPruneInfo, partrel_pruneinfos, j);
+ PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
+ Relation partrel = pprune->partrel;
+ PartitionDesc partdesc;
+ PartitionKey partkey;
+
+ /*
+ * Nothing to do if there are no exec pruning steps, but do set
+ * pprune->exec_pruning_steps, becasue
+ * find_matching_subplans_recurse() looks at it.
+ *
+ * Also skip if doing EXPLAIN (GENERIC_PLAN), since parameter
+ * values may be missing.
+ */
+ pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
+ if (pprune->exec_pruning_steps == NIL ||
+ (econtext->ecxt_estate->es_top_eflags & EXEC_FLAG_EXPLAIN_GENERIC))
+ continue;
+
+ /*
+ * We can rely on the copies of the partitioned table's partition
+ * key and partition descriptor appearing in its relcache entry,
+ * because that entry will be held open and locked for the
+ * duration of this executor run.
+ */
+ partkey = RelationGetPartitionKey(partrel);
+ partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
+ partrel);
+ InitPartitionPruneContext(&pprune->exec_context,
+ pprune->exec_pruning_steps,
+ partdesc, partkey, planstate,
+ econtext);
+
+ /*
+ * Accumulate the IDs of all PARAM_EXEC Params affecting the
+ * partitioning decisions at this plan node.
+ */
+ prunestate->execparamids = bms_add_members(prunestate->execparamids,
+ pinfo->execparamids);
+ }
+ }
+}
+
/*
* ExecFindMatchingSubPlans
* Determine which subplans match the pruning steps detailed in
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 12aacc84ff..c0ba23097f 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -42,6 +42,9 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* PartitionedRelPruneInfo (see plannodes.h); though note that here,
* subpart_map contains indexes into PartitionPruningData.partrelprunedata[].
*
+ * partrel Partitioned table; points to
+ * EState.es_relations[rti-1], where rti is the
+ * table's RT index
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
@@ -58,6 +61,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
*/
typedef struct PartitionedRelPruningData
{
+ Relation partrel;
int nparts;
int *subplan_map;
int *subpart_map;
@@ -128,4 +132,6 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
+extern PartitionPruneState *CreatePartitionPruneState(EState *estate,
+ PartitionPruneInfo *pruneinfo);
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 22b928e085..518a9fcd15 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -637,6 +637,8 @@ typedef struct EState
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_states; /* List of PartitionPruneState */
+ List *es_part_prune_results; /* List of Bitmapset */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
--
2.43.0
v54-0001-Move-PartitionPruneInfo-out-of-plan-nodes-into-P.patchapplication/x-patch; name=v54-0001-Move-PartitionPruneInfo-out-of-plan-nodes-into-P.patchDownload
From cf75d48323a3c28d272e34c942f123a2e04044fd Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 6 Sep 2024 13:11:05 +0900
Subject: [PATCH v54 1/4] Move PartitionPruneInfo out of plan nodes into
PlannedStmt
This change moves PartitionPruneInfo from individual plan nodes to
PlannedStmt, allowing runtime initial pruning to be performed across
the entire plan tree without traversing the tree to find nodes
containing PartitionPruneInfos.
The PartitionPruneInfo pointer fields in Append and MergeAppend nodes
have been replaced with an integer index that points to
PartitionPruneInfos in a list within PlannedStmt, which holds the
PartitionPruneInfos for all subqueries.
Reviewed-by: Alvaro Herrera
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 19 +++++-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/nodeAppend.c | 5 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/optimizer/plan/createplan.c | 24 +++----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 86 ++++++++++++++++---------
src/backend/partitioning/partprune.c | 19 ++++--
src/include/executor/execPartition.h | 4 +-
src/include/nodes/execnodes.h | 1 +
src/include/nodes/pathnodes.h | 6 ++
src/include/nodes/plannodes.h | 14 ++--
src/include/partitioning/partprune.h | 8 +--
15 files changed, 133 insertions(+), 62 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 7042ca6c60..e6197c165e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -850,6 +850,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index bfb3419efb..b01a2fdfdd 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -181,6 +181,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->permInfos = estate->es_rteperminfos;
pstmt->resultRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7651886229..ec730674f2 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1786,6 +1786,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* Initialize data structure needed for run-time partition pruning and
* do initial pruning if needed
*
+ * 'root_parent_relids' identifies the relation to which both the parent plan
+ * and the PartitionPruneInfo given by 'part_prune_index' belong.
+ *
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
* Initial pruning is performed here if needed and in that case only the
@@ -1798,11 +1801,25 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
+ Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo;
+
+ /* Obtain the pruneinfo we need, and make sure it's the right one */
+ pruneinfo = list_nth_node(PartitionPruneInfo, estate->es_part_prune_infos,
+ part_prune_index);
+ if (!bms_equal(root_parent_relids, pruneinfo->root_parent_relids))
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("mismatching PartitionPruneInfo found at part_prune_index %d",
+ part_prune_index),
+ errdetail_internal("plan node relids %s, pruneinfo relids %s",
+ bmsToString(root_parent_relids),
+ bmsToString(pruneinfo->root_parent_relids)));
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 5737f9f4eb..67734979b0 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -118,6 +118,7 @@ CreateExecutorState(void)
estate->es_rowmarks = NULL;
estate->es_rteperminfos = NIL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index ca0f54d676..de7ebab5c2 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
+ node->apprelids,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index e1b9b984a7..3ed91808dd 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
+ node->apprelids,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index bb45ef318f..6642d09a39 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1225,7 +1225,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1376,6 +1375,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1399,16 +1401,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1447,7 +1447,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1540,6 +1539,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1555,13 +1557,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
Assert(best_path->path.param_info == NULL);
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index df35d1ff9c..1b9071c774 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -547,6 +547,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 91c7c4fe2f..e2ea406c4e 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1732,6 +1732,48 @@ set_customscan_references(PlannerInfo *root,
cscan->custom_relids = offset_relid_set(cscan->custom_relids, rtoffset);
}
+/*
+ * register_partpruneinfo
+ * Subroutine for set_append_references and set_mergeappend_references
+ *
+ * Add the PartitionPruneInfo from root->partPruneInfos at the given index
+ * into PlannerGlobal->partPruneInfos and return its index there.
+ *
+ * Also update the RT indexes present in PartitionedRelPruneInfos to add the
+ * offset.
+ */
+static int
+register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
+{
+ PlannerGlobal *glob = root->glob;
+ PartitionPruneInfo *pinfo;
+ ListCell *l;
+
+ Assert(part_prune_index >= 0 &&
+ part_prune_index < list_length(root->partPruneInfos));
+ pinfo = list_nth_node(PartitionPruneInfo, root->partPruneInfos,
+ part_prune_index);
+
+ pinfo->root_parent_relids = offset_relid_set(pinfo->root_parent_relids,
+ rtoffset);
+ foreach(l, pinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *prelinfo = lfirst(l2);
+
+ prelinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pinfo);
+
+ return list_length(glob->partPruneInfos) - 1;
+}
+
/*
* set_append_references
* Do set_plan_references processing on an Append
@@ -1784,21 +1826,13 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * Add PartitionPruneInfo, if any, to PlannerGlobal and update the index.
+ * Also update the RT indexes present in it to add the offset.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index =
+ register_partpruneinfo(root, aplan->part_prune_index, rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1860,21 +1894,13 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * Add PartitionPruneInfo, if any, to PlannerGlobal and update the index.
+ * Also update the RT indexes present in it to add the offset.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index =
+ register_partpruneinfo(root, mplan->part_prune_index, rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9a1a7faac7..60fabb1734 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -207,16 +207,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -330,10 +334,11 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
+ pruneinfo->root_parent_relids = parentrel->relids;
pruneinfo->prune_infos = prunerelinfos;
/*
@@ -356,7 +361,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index c09bc83b2a..12aacc84ff 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,9 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
+ Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 88467977f8..22b928e085 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -636,6 +636,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 07e2415398..8d30b6e896 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -128,6 +128,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* List of PartitionPruneInfo contained in the plan */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -559,6 +562,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 62cd6a6666..39d0281c23 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -69,6 +69,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in the
+ * plan */
+
List *rtable; /* list of RangeTblEntry nodes */
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
@@ -276,8 +279,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -311,8 +314,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
@@ -1414,6 +1417,8 @@ typedef struct PlanRowMark
* Then, since an Append-type node could have multiple partitioning
* hierarchies among its children, we have an unordered List of those Lists.
*
+ * root_parent_relids RelOptInfo.relids of the relation to which the parent
+ * plan node and this PartitionPruneInfo node belong
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
@@ -1426,6 +1431,7 @@ typedef struct PartitionPruneInfo
pg_node_attr(no_equal, no_query_jumble)
NodeTag type;
+ Bitmapset *root_parent_relids;
List *prune_infos;
Bitmapset *other_subplans;
} PartitionPruneInfo;
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index bd490d154f..c536a1fe19 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.43.0
v54-0004-Handle-CachedPlan-invalidation-in-the-executor.patchapplication/x-patch; name=v54-0004-Handle-CachedPlan-invalidation-in-the-executor.patchDownload
From 3916c8617ba777317d01aa11c89b3276b46fe7a0 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 22 Aug 2024 19:38:13 +0900
Subject: [PATCH v54 4/4] Handle CachedPlan invalidation in the executor
This commit makes changes to handle cases where a cached plan
becomes invalid before deferred locks on prunable relations are taken.
* Add checks at various points in ExecutorStart() and its called
functions to determine if the plan becomes invalid. If detected,
the function and its callers return immediately. A previous commit
ensures any partially initialized PlanState tree objects are cleaned
up appropriately.
* Introduce ExecutorStartExt(), a wrapper over ExecutorStart(), to
handle cases where plan initialization is aborted due to invalidation.
ExecutorStartExt() creates a new transient CachedPlan if needed and
retries execution. This new entry point is only required for sites
using plancache.c. It requires passing the QueryDesc, eflags,
CachedPlanSource, and query_index (index in CachedPlanSource.query_list).
* Add GetSingleCachedPlan() in plancache.c to create a transient
CachedPlan for a specified query in the given CachedPlanSource.
Such CachedPlans are tracked in a separate global list for the
plancache invalidation callbacks to check.
This also adds isolation tests using the delay_execution test module
to verify scenarios where a CachedPlan becomes invalid before the
deferred locks are taken.
All ExecutorStart_hook implementations now must add the following
block after the ExecutorStart() call to ensure it doesn't work with an
invalid plan:
/* The plan may have become invalid during ExecutorStart() */
if (!ExecPlanStillValid(queryDesc->estate))
return;
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.comk
---
contrib/auto_explain/auto_explain.c | 4 +
.../pg_stat_statements/pg_stat_statements.c | 4 +
src/backend/commands/explain.c | 8 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 10 +-
src/backend/commands/trigger.c | 14 ++
src/backend/executor/README | 35 ++-
src/backend/executor/execMain.c | 84 ++++++-
src/backend/executor/execUtils.c | 3 +-
src/backend/executor/spi.c | 19 +-
src/backend/tcop/postgres.c | 4 +-
src/backend/tcop/pquery.c | 31 ++-
src/backend/utils/cache/plancache.c | 206 ++++++++++++++++++
src/backend/utils/mmgr/portalmem.c | 4 +-
src/include/commands/explain.h | 1 +
src/include/commands/trigger.h | 1 +
src/include/executor/execdesc.h | 1 +
src/include/executor/executor.h | 17 ++
src/include/nodes/execnodes.h | 1 +
src/include/utils/plancache.h | 26 +++
src/include/utils/portal.h | 4 +-
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 63 +++++-
.../expected/cached-plan-inval.out | 175 +++++++++++++++
src/test/modules/delay_execution/meson.build | 1 +
.../specs/cached-plan-inval.spec | 65 ++++++
26 files changed, 749 insertions(+), 36 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-inval.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-inval.spec
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 677c135f59..9eb5e9a619 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -300,6 +300,10 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
+ /* The plan may have become invalid during standard_ExecutorStart() */
+ if (!ExecPlanStillValid(queryDesc->estate))
+ return;
+
if (auto_explain_enabled())
{
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 3c72e437f7..76642b557a 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -985,6 +985,10 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
+ /* The plan may have become invalid during standard_ExecutorStart() */
+ if (!ExecPlanStillValid(queryDesc->estate))
+ return;
+
/*
* If query has queryId zero, don't track it. This prevents double
* counting of optimizable statements that are directly contained in
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 49f7370734..b7a0b8c05b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -509,7 +509,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, NULL, -1, into, es, queryString, params,
+ queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -618,6 +619,7 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
*/
void
ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int query_index,
IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
@@ -688,8 +690,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
if (into)
eflags |= GetIntoRelEFlags(into);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
+ /* Call ExecutorStartExt to prepare the plan for execution. */
+ ExecutorStartExt(queryDesc, eflags, plansource, query_index);
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 4f6acf6719..4b1503c05e 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -107,6 +107,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 311b9ebd5b..4cd79a6e3a 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -202,7 +202,8 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- cplan);
+ cplan,
+ entry->plansource);
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
@@ -583,6 +584,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ int i = 0;
if (es->memory)
{
@@ -655,8 +657,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, cplan, into, es, query_string, paramLI,
- queryEnv,
+ ExplainOnePlan(pstmt, cplan, entry->plansource, i,
+ into, es, query_string, paramLI, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
@@ -668,6 +670,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Separate plans with an appropriate separator */
if (lnext(plan_list, p) != NULL)
ExplainSeparatePlans(es);
+
+ i++;
}
if (estate)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 29d30bfb6f..e33b8f573b 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5120,6 +5120,20 @@ AfterTriggerEndQuery(EState *estate)
afterTriggers.query_depth--;
}
+/* ----------
+ * AfterTriggerAbortQuery()
+ *
+ * Called by ExecutorEnd() if the query execution was aborted due to the
+ * plan becoming invalid during initialization.
+ * ----------
+ */
+void
+AfterTriggerAbortQuery(void)
+{
+ /* Revert the actions of AfterTriggerBeginQuery(). */
+ afterTriggers.query_depth--;
+}
+
/*
* AfterTriggerFreeQuery
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 642d63be61..c76a00b394 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,28 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Typically, when the executor initializes a plan tree for execution, it doesn't
+lock non-index relations if the plan tree is freshly generated and not derived
+from a CachedPlan. This is because such locks have already been established
+during the query's parsing, rewriting, and planning phases. However, with a
+cached plan tree, some relations may remain unlocked. The function
+AcquireExecutorLocks() only locks unprunable relations in the plan, deferring
+the locking of prunable ones to executor initialization. This avoids
+unnecessary locking of relations that will be pruned during "initial" runtime
+pruning in ExecDoInitialPruning().
+
+This approach creates a window where a cached plan tree with child tables
+could become outdated if another backend modifies these tables before
+ExecDoInitialPruning() locks them. As a result, the executor has the added duty
+to verify the plan tree's validity whenever it locks a child table after
+doing initial pruning. This validation is done by checking the CachedPlan.is_valid
+attribute. If the plan tree is outdated (is_valid=false), the executor halts
+further initialization, cleans up anything in EState that would have been
+allocated up to that point, and retries execution after recreating the
+invalid plan in the CachedPlan.
Query Processing Control Flow
-----------------------------
@@ -288,11 +310,13 @@ This is a sketch of control flow for full query processing:
CreateQueryDesc
- ExecutorStart
+ ExecutorStart or ExecutorStartExt
CreateExecutorState
creates per-query context
- switch to per-query context to run ExecInitNode
+ switch to per-query context to run ExecDoInitialPruning and ExecInitNode
AfterTriggerBeginQuery
+ ExecDoInitialPruning
+ does initial pruning and locks surviving partitions if needed
ExecInitNode --- recursively scans plan tree
ExecInitNode
recurse into subsidiary nodes
@@ -316,7 +340,12 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale after locking partitions in ExecDoInitialPruning(), the control is
+immediately returned to ExecutorStartExt(), which will create a new plan tree
+and perform the steps starting from CreateExecutorState() again.
+
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index cb7a2bc456..4065e01f10 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -59,6 +59,7 @@
#include "utils/backend_status.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
+#include "utils/plancache.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
@@ -137,6 +138,60 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
standard_ExecutorStart(queryDesc, eflags);
}
+/*
+ * A variant of ExecutorStart() that handles cleanup and replanning if the
+ * input CachedPlan becomes invalid due to locks being taken during
+ * ExecutorStartInternal(). If that happens, a new CachedPlan is created
+ * only for the at the index 'query_index' in plansource->query_list, which
+ * is released separately from the original CachedPlan.
+ */
+void
+ExecutorStartExt(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index)
+{
+ if (queryDesc->cplan == NULL)
+ {
+ ExecutorStart(queryDesc, eflags);
+ return;
+ }
+
+ while (1)
+ {
+ ExecutorStart(queryDesc, eflags);
+ if (!CachedPlanValid(queryDesc->cplan))
+ {
+ CachedPlan *cplan;
+
+ /*
+ * The plan got invalidated, so try with a new updated plan.
+ *
+ * But first undo what ExecutorStart() would've done. Mark
+ * execution as aborted to ensure that AFTER trigger state is
+ * properly reset.
+ */
+ queryDesc->estate->es_aborted = true;
+ ExecutorEnd(queryDesc);
+
+ cplan = GetSingleCachedPlan(plansource, query_index,
+ queryDesc->queryEnv);
+
+ /*
+ * Install the new transient cplan into the QueryDesc replacing
+ * the old one so that executor initialization code can see it.
+ * Mark it as in use by us and ask FreeQueryDesc() to release it.
+ */
+ cplan->refcount = 1;
+ queryDesc->cplan = cplan;
+ queryDesc->cplan_release = true;
+ queryDesc->plannedstmt = linitial_node(PlannedStmt,
+ queryDesc->cplan->stmt_list);
+ }
+ else
+ break; /* ExecutorStart() succeeded! */
+ }
+}
+
void
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
@@ -320,6 +375,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
estate = queryDesc->estate;
Assert(estate != NULL);
+ Assert(!estate->es_aborted);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* caller must ensure the query's snapshot is active */
@@ -426,8 +482,11 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
Assert(estate != NULL);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
- /* This should be run once and only once per Executor instance */
- Assert(!estate->es_finished);
+ /*
+ * This should be run once and only once per Executor instance and never
+ * if the execution was aborted.
+ */
+ Assert(!estate->es_finished && !estate->es_aborted);
/* Switch into per-query memory context */
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -486,11 +545,10 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
Assert(estate != NULL);
/*
- * Check that ExecutorFinish was called, unless in EXPLAIN-only mode. This
- * Assert is needed because ExecutorFinish is new as of 9.1, and callers
- * might forget to call it.
+ * Check that ExecutorFinish was called, unless in EXPLAIN-only mode or if
+ * execution was aborted.
*/
- Assert(estate->es_finished ||
+ Assert(estate->es_finished || estate->es_aborted ||
(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -504,6 +562,14 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
UnregisterSnapshot(estate->es_snapshot);
UnregisterSnapshot(estate->es_crosscheck_snapshot);
+ /*
+ * Reset AFTER trigger module if the query execution was aborted.
+ */
+ if (estate->es_aborted &&
+ !(estate->es_top_eflags &
+ (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
+ AfterTriggerAbortQuery();
+
/*
* Must switch out of context before destroying it
*/
@@ -950,6 +1016,9 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
ExecDoInitialPruning(estate);
+ if (!ExecPlanStillValid(estate))
+ return;
+
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
*/
@@ -2931,6 +3000,9 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
* result-rel info, etc.
+ *
+ * es_cachedplan is not copied because EPQ plan execution does not acquire
+ * any new locks that could invalidate the CachedPlan.
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 67734979b0..435ae0df7a 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -147,6 +147,7 @@ CreateExecutorState(void)
estate->es_top_eflags = 0;
estate->es_instrument = 0;
estate->es_finished = false;
+ estate->es_aborted = false;
estate->es_exprcontexts = NIL;
@@ -757,7 +758,7 @@ ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
* ExecGetRangeTableRelation
* Open the Relation for a range table entry, if not already done
*
- * The Relations will be closed again in ExecEndPlan().
+ * The Relations will be closed in ExecEndPlan().
*/
Relation
ExecGetRangeTableRelation(EState *estate, Index rti)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 659bd6dcd9..f84f376c9c 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -70,7 +70,8 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index);
static void _SPI_error_callback(void *arg);
@@ -1682,7 +1683,8 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- cplan);
+ cplan,
+ plansource);
/*
* Set up options for portal. Default SCROLL type is chosen the same way
@@ -2494,6 +2496,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ int i = 0;
spicallbackarg.query = plansource->query_string;
@@ -2691,8 +2694,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ res = _SPI_pquery(qdesc, fire_triggers, canSetTag ? options->tcount : 0,
+ plansource, i);
FreeQueryDesc(qdesc);
}
else
@@ -2789,6 +2793,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
my_res = res;
goto fail;
}
+
+ i++;
}
/* Done with this plan, so release refcount */
@@ -2866,7 +2872,8 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index)
{
int operation = queryDesc->operation;
int eflags;
@@ -2922,7 +2929,7 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
else
eflags = EXEC_FLAG_SKIP_TRIGGERS;
- ExecutorStart(queryDesc, eflags);
+ ExecutorStartExt(queryDesc, eflags, plansource, query_index);
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e394f1419a..b95c859655 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1237,6 +1237,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2039,7 +2040,8 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- cplan);
+ cplan,
+ psrc);
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 6e8f6b1b8f..dbb0ffb771 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -37,6 +38,8 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -80,6 +83,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
qd->cplan = cplan; /* CachedPlan supplying the plannedstmt */
+ qd->cplan_release = false;
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -114,6 +118,13 @@ FreeQueryDesc(QueryDesc *qdesc)
UnregisterSnapshot(qdesc->snapshot);
UnregisterSnapshot(qdesc->crosscheck_snapshot);
+ /*
+ * Release CachedPlan if requested. The CachedPlan is not associated with
+ * a ResourceOwner when cplan_release is true; see ExecutorStartExt().
+ */
+ if (qdesc->cplan_release)
+ ReleaseCachedPlan(qdesc->cplan, NULL);
+
/* Only the QueryDesc itself need be freed */
pfree(qdesc);
}
@@ -126,6 +137,8 @@ FreeQueryDesc(QueryDesc *qdesc)
*
* plan: the plan tree for the query
* cplan: CachedPlan supplying the plan
+ * plansource: CachedPlanSource supplying the cplan
+ * query_index: index of the query in plansource->query_list
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -139,6 +152,8 @@ FreeQueryDesc(QueryDesc *qdesc)
static void
ProcessQuery(PlannedStmt *plan,
CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -157,7 +172,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Call ExecutorStart to prepare the plan for execution
*/
- ExecutorStart(queryDesc, 0);
+ ExecutorStartExt(queryDesc, 0, plansource, query_index);
/*
* Run the plan to completion.
@@ -518,9 +533,12 @@ PortalStart(Portal portal, ParamListInfo params,
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * ExecutorStartExt() to prepare the plan for execution. If
+ * the portal is using a cached plan, it may get invalidated
+ * during plan intialization, in which case a new one is
+ * created and saved in the QueryDesc.
*/
- ExecutorStart(queryDesc, myeflags);
+ ExecutorStartExt(queryDesc, myeflags, portal->plansource, 0);
/*
* This tells PortalCleanup to shut down the executor
@@ -1201,6 +1219,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int i = 0;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1283,6 +1302,8 @@ PortalRunMulti(Portal portal,
/* statement can set tag string */
ProcessQuery(pstmt,
portal->cplan,
+ portal->plansource,
+ i,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1293,6 +1314,8 @@ PortalRunMulti(Portal portal,
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
portal->cplan,
+ portal->plansource,
+ i,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1357,6 +1380,8 @@ PortalRunMulti(Portal portal,
*/
if (lnext(portal->stmts, stmtlist_item) != NULL)
CommandCounterIncrement();
+
+ i++;
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 5b75dadf13..d33f871ea2 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -94,6 +94,14 @@
*/
static dlist_head saved_plan_list = DLIST_STATIC_INIT(saved_plan_list);
+/*
+ * Head of the backend's list of "standalone" CachedPlans that are not
+ * associated with a CachedPlanSource, created by GetSingleCachedPlan() for
+ * transient use by the executor in certain scenarios where they're needed
+ * only for one execution of the plan.
+ */
+static dlist_head standalone_plan_list = DLIST_STATIC_INIT(standalone_plan_list);
+
/*
* This is the head of the backend's list of CachedExpressions.
*/
@@ -905,6 +913,8 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * Note: When changing this, you should also look at GetSingleCachedPlan().
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
@@ -1034,6 +1044,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->is_generic = generic;
plan->is_saved = false;
plan->is_valid = true;
+ plan->is_standalone = false;
/* assign generation number to new plan */
plan->generation = ++(plansource->generation);
@@ -1282,6 +1293,121 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
return plan;
}
+/*
+ * Create a fresh CachedPlan for the query_index'th query in the provided
+ * CachedPlanSource.
+ *
+ * The created CachedPlan is standalone, meaning it is not tracked in the
+ * CachedPlanSource. The CachedPlan and its plan trees are allocated in a
+ * child context of the caller's memory context. The caller must ensure they
+ * remain valid until execution is complete, after which the plan should be
+ * released by calling ReleaseCachedPlan().
+ *
+ * This function primarily supports ExecutorStartExt(), which handles cases
+ * where the original generic CachedPlan becomes invalid after prunable
+ * relations are locked.
+ */
+CachedPlan *
+GetSingleCachedPlan(CachedPlanSource *plansource, int query_index,
+ QueryEnvironment *queryEnv)
+{
+ List *query_list = plansource->query_list,
+ *plan_list;
+ CachedPlan *plan = plansource->gplan;
+ MemoryContext oldcxt = CurrentMemoryContext,
+ plan_context;
+ PlannedStmt *plannedstmt;
+
+ Assert(ActiveSnapshotSet());
+
+ /* Sanity checks */
+ if (plan == NULL)
+ elog(ERROR, "GetSingleCachedPlan() called in the wrong context: plansource->gplan is NULL");
+ else if (plan->is_valid)
+ elog(ERROR, "GetSingleCachedPlan() called in the wrong context: plansource->gplan->is_valid");
+
+ /*
+ * The plansource might have become invalid since GetCachedPlan(). See the
+ * comment in BuildCachedPlan() for details on why this might happen.
+ *
+ * The risk is greater here because this function is called from the
+ * executor, meaning much more processing may have occurred compared to
+ * when BuildCachedPlan() is called from GetCachedPlan().
+ */
+ if (!plansource->is_valid)
+ query_list = RevalidateCachedQuery(plansource, queryEnv);
+ Assert(query_list != NIL);
+
+ /*
+ * Build a new generic plan for the query_index'th query, but make a copy
+ * to be scribbled on by the planner
+ */
+ query_list = list_make1(copyObject(list_nth_node(Query, query_list,
+ query_index)));
+ plan_list = pg_plan_queries(query_list, plansource->query_string,
+ plansource->cursor_options, NULL);
+
+ list_free_deep(query_list);
+
+ /*
+ * Make a dedicated memory context for the CachedPlan and its subsidiary
+ * data so that we can release it in ReleaseCachedPlan() that will be
+ * called in FreeQueryDesc().
+ */
+ plan_context = AllocSetContextCreate(CurrentMemoryContext,
+ "Standalone CachedPlan",
+ ALLOCSET_START_SMALL_SIZES);
+ MemoryContextCopyAndSetIdentifier(plan_context, plansource->query_string);
+
+ /*
+ * Copy plan into the new context.
+ */
+ MemoryContextSwitchTo(plan_context);
+ plan_list = copyObject(plan_list);
+
+ /*
+ * Create and fill the CachedPlan struct within the new context.
+ */
+ plan = (CachedPlan *) palloc(sizeof(CachedPlan));
+ plan->magic = CACHEDPLAN_MAGIC;
+ plan->stmt_list = plan_list;
+
+ plan->planRoleId = GetUserId();
+ Assert(list_length(plan_list) == 1);
+ plannedstmt = linitial_node(PlannedStmt, plan_list);
+
+ /*
+ * CachedPlan is dependent on role either if RLS affected the rewrite
+ * phase or if a role dependency was injected during planning. And it's
+ * transient if any plan is marked so.
+ */
+ plan->dependsOnRole = plansource->dependsOnRLS || plannedstmt->dependsOnRole;
+ if (plannedstmt->transientPlan)
+ {
+ Assert(TransactionIdIsNormal(TransactionXmin));
+ plan->saved_xmin = TransactionXmin;
+ }
+ else
+ plan->saved_xmin = InvalidTransactionId;
+ plan->refcount = 0;
+ plan->context = plan_context;
+ plan->is_oneshot = false;
+ plan->is_generic = true;
+ plan->is_saved = false;
+ plan->is_valid = true;
+ plan->is_standalone = true;
+ plan->generation = 1;
+ MemoryContextSwitchTo(oldcxt);
+
+ /*
+ * Add the entry to the global list of "standalone" cached plans. It is
+ * removed from the list by ReleaseCachedPlan().
+ */
+ dlist_push_tail(&standalone_plan_list, &plan->node);
+
+ return plan;
+}
+
/*
* ReleaseCachedPlan: release active use of a cached plan.
*
@@ -1309,6 +1435,10 @@ ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner)
/* Mark it no longer valid */
plan->magic = 0;
+ /* Remove from the global list if we are a standalone plan. */
+ if (plan->is_standalone)
+ dlist_delete(&plan->node);
+
/* One-shot plans do not own their context, so we can't free them */
if (!plan->is_oneshot)
MemoryContextDelete(plan->context);
@@ -2066,6 +2196,33 @@ PlanCacheRelCallback(Datum arg, Oid relid)
cexpr->is_valid = false;
}
}
+
+ /* Finally, invalidate any standalone cached plans */
+ dlist_foreach(iter, &standalone_plan_list)
+ {
+ CachedPlan *cplan = dlist_container(CachedPlan,
+ node, iter.cur);
+
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+
+ if (cplan->is_valid)
+ {
+ ListCell *lc;
+
+ foreach(lc, cplan->stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc);
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ continue; /* Ignore utility statements */
+ if ((relid == InvalidOid) ? plannedstmt->relationOids != NIL :
+ list_member_oid(plannedstmt->relationOids, relid))
+ cplan->is_valid = false;
+ if (!cplan->is_valid)
+ break; /* out of stmt_list scan */
+ }
+ }
+ }
}
/*
@@ -2176,6 +2333,44 @@ PlanCacheObjectCallback(Datum arg, int cacheid, uint32 hashvalue)
}
}
}
+
+ /* Finally, invalidate any standalone cached plans */
+ dlist_foreach(iter, &standalone_plan_list)
+ {
+ CachedPlan *cplan = dlist_container(CachedPlan,
+ node, iter.cur);
+
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+
+ if (cplan->is_valid)
+ {
+ ListCell *lc;
+
+ foreach(lc, cplan->stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc);
+ ListCell *lc3;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ continue; /* Ignore utility statements */
+ foreach(lc3, plannedstmt->invalItems)
+ {
+ PlanInvalItem *item = (PlanInvalItem *) lfirst(lc3);
+
+ if (item->cacheId != cacheid)
+ continue;
+ if (hashvalue == 0 ||
+ item->hashValue == hashvalue)
+ {
+ cplan->is_valid = false;
+ break; /* out of invalItems scan */
+ }
+ }
+ if (!cplan->is_valid)
+ break; /* out of stmt_list scan */
+ }
+ }
+ }
}
/*
@@ -2235,6 +2430,17 @@ ResetPlanCache(void)
cexpr->is_valid = false;
}
+
+ /* Finally, invalidate any standalone cached plans */
+ dlist_foreach(iter, &standalone_plan_list)
+ {
+ CachedPlan *cplan = dlist_container(CachedPlan,
+ node, iter.cur);
+
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+
+ cplan->is_valid = false;
+ }
}
/*
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 4a24613537..bf70fd4ce7 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,7 +284,8 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan)
+ CachedPlan *cplan,
+ CachedPlanSource *plansource)
{
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_NEW);
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
portal->stmts = stmts;
portal->cplan = cplan;
+ portal->plansource = plansource;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 21c71e0d53..a39989a950 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -104,6 +104,7 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ParamListInfo params, QueryEnvironment *queryEnv);
extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int plan_index,
IntoClause *into, ExplainState *es,
const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 8a5a9fe642..db21561c8c 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -258,6 +258,7 @@ extern void ExecASTruncateTriggers(EState *estate,
extern void AfterTriggerBeginXact(void);
extern void AfterTriggerBeginQuery(void);
extern void AfterTriggerEndQuery(EState *estate);
+extern void AfterTriggerAbortQuery(void);
extern void AfterTriggerFireDeferred(void);
extern void AfterTriggerEndXact(bool isCommit);
extern void AfterTriggerBeginSubXact(void);
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 0e7245435d..f6cb6479c0 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -36,6 +36,7 @@ typedef struct QueryDesc
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
CachedPlan *cplan; /* CachedPlan that supplies the plannedstmt */
+ bool cplan_release; /* Should FreeQueryDesc() release cplan? */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 69c3ebff00..5bc0edb5a0 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -198,6 +199,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
* prototypes from functions in execMain.c
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern void ExecutorStartExt(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource, int query_index);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count, bool execute_once);
@@ -261,6 +264,19 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the CachedPlan in es_cachedplan still valid?
+ *
+ * Called from InitPlan() because invalidation messages that affect the plan
+ * might be received after locks have been taken on runtime-prunable relations.
+ * The caller should take appropriate action if the plan has become invalid.
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
@@ -589,6 +605,7 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
+extern Relation ExecOpenScanIndexRelation(EState *estate, Oid indexid, int lockmode);
extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos);
extern void ExecCloseRangeTableRelations(EState *estate);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 1ed925b99b..3ca96a85b6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -686,6 +686,7 @@ typedef struct EState
int es_top_eflags; /* eflags passed to ExecutorStart */
int es_instrument; /* OR of InstrumentOption flags */
bool es_finished; /* true when ExecutorFinish is done */
+ bool es_aborted; /* true when execution was aborted */
List *es_exprcontexts; /* List of ExprContexts within EState */
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0b5ee007ca..154f68f671 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -18,6 +18,7 @@
#include "access/tupdesc.h"
#include "lib/ilist.h"
#include "nodes/params.h"
+#include "nodes/parsenodes.h"
#include "tcop/cmdtag.h"
#include "utils/queryenvironment.h"
#include "utils/resowner.h"
@@ -152,6 +153,8 @@ typedef struct CachedPlan
bool is_generic; /* is it a reusable generic plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
+ bool is_standalone; /* is it not associated with a
+ * CachedPlanSource? */
Oid planRoleId; /* Role ID the plan was created for */
bool dependsOnRole; /* is plan specific to that role? */
TransactionId saved_xmin; /* if valid, replan when TransactionXmin
@@ -159,6 +162,12 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+
+ /*
+ * If the plan is not associated with a CachedPlanSource, it is saved in
+ * a separate global list.
+ */
+ dlist_node node; /* list link, if is_standalone */
} CachedPlan;
/*
@@ -224,6 +233,10 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+extern CachedPlan *GetSingleCachedPlan(CachedPlanSource *plansource,
+ int query_index,
+ QueryEnvironment *queryEnv);
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
@@ -245,4 +258,17 @@ CachedPlanRequiresLocking(CachedPlan *cplan)
return cplan->is_generic;
}
+/*
+ * CachedPlanValid
+ * Returns whether a cached generic plan is still valid.
+ *
+ * Invoked by the executor to check if the plan has not been invalidated after
+ * taking locks during the initialization of the plan.
+ */
+static inline bool
+CachedPlanValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
#endif /* PLANCACHE_H */
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 29f49829f2..58c3828d2c 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ CachedPlanSource *plansource; /* CachedPlanSource, for cplan */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -241,7 +242,8 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan);
+ CachedPlan *cplan,
+ CachedPlanSource *plansource);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..3eeb097fde 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-inval
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 155c8a8d55..304ca77f7b 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2024, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,41 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static void
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ CachedPlanValid(queryDesc->cplan) ? "valid" : "not valid");
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +123,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-inval.out b/src/test/modules/delay_execution/expected/cached-plan-inval.out
new file mode 100644
index 0000000000..e8efb6d9d9
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-inval.out
@@ -0,0 +1,175 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+------------------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = $1)
+(7 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on foo12_1 foo_1
+ Filter: (a = $1)
+(5 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(11 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(6 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ Update on foo3 foo
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(27 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ Update on foo3 foo
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(17 rows)
+
diff --git a/src/test/modules/delay_execution/meson.build b/src/test/modules/delay_execution/meson.build
index 41f3ac0b89..5a70b183d0 100644
--- a/src/test/modules/delay_execution/meson.build
+++ b/src/test/modules/delay_execution/meson.build
@@ -24,6 +24,7 @@ tests += {
'specs': [
'partition-addition',
'partition-removal-1',
+ 'cached-plan-inval',
],
},
}
diff --git a/src/test/modules/delay_execution/specs/cached-plan-inval.spec b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
new file mode 100644
index 0000000000..5b1f72b4a8
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
@@ -0,0 +1,65 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo12 PARTITION OF foo FOR VALUES IN (1, 2) PARTITION BY LIST (a);
+ CREATE TABLE foo12_1 PARTITION OF foo12 FOR VALUES IN (1);
+ CREATE TABLE foo12_2 PARTITION OF foo12 FOR VALUES IN (2);
+ CREATE INDEX foo12_1_a ON foo12_1 (a);
+ CREATE TABLE foo3 PARTITION OF foo FOR VALUES IN (3);
+ CREATE VIEW foov AS SELECT * FROM foo;
+ CREATE FUNCTION one () RETURNS int AS $$ BEGIN RETURN 1; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE FUNCTION two () RETURNS int AS $$ BEGIN RETURN 2; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE RULE update_foo AS ON UPDATE TO foo DO ALSO SELECT 1;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP RULE update_foo ON foo;
+ DROP TABLE foo;
+ DROP FUNCTION one(), two();
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# Another case with Append with run-time pruning
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Case with a rule adding another query
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo12_1_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
--
2.43.0
On Thu, Sep 19, 2024 at 5:39 PM Amit Langote <amitlangote09@gmail.com> wrote:
For
ResultRelInfos, I took the approach of memsetting them to 0 for pruned
result relations and adding checks at multiple sites to ensure the
ResultRelInfo being handled is valid.
After some reflection, I realized that nobody would think that that
approach is very robust. In the attached, I’ve modified
ExecInitModifyTable() to allocate ResultRelInfos only for unpruned
relations, instead of allocating for all in
ModifyTable.resultRelations and setting pruned ones to 0. This
approach feels more robust.
--
Thanks, Amit Langote
Attachments:
v55-0001-Move-PartitionPruneInfo-out-of-plan-nodes-into-P.patchapplication/octet-stream; name=v55-0001-Move-PartitionPruneInfo-out-of-plan-nodes-into-P.patchDownload
From cf75d48323a3c28d272e34c942f123a2e04044fd Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 6 Sep 2024 13:11:05 +0900
Subject: [PATCH v55 1/5] Move PartitionPruneInfo out of plan nodes into
PlannedStmt
This change moves PartitionPruneInfo from individual plan nodes to
PlannedStmt, allowing runtime initial pruning to be performed across
the entire plan tree without traversing the tree to find nodes
containing PartitionPruneInfos.
The PartitionPruneInfo pointer fields in Append and MergeAppend nodes
have been replaced with an integer index that points to
PartitionPruneInfos in a list within PlannedStmt, which holds the
PartitionPruneInfos for all subqueries.
Reviewed-by: Alvaro Herrera
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 19 +++++-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/nodeAppend.c | 5 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/optimizer/plan/createplan.c | 24 +++----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 86 ++++++++++++++++---------
src/backend/partitioning/partprune.c | 19 ++++--
src/include/executor/execPartition.h | 4 +-
src/include/nodes/execnodes.h | 1 +
src/include/nodes/pathnodes.h | 6 ++
src/include/nodes/plannodes.h | 14 ++--
src/include/partitioning/partprune.h | 8 +--
15 files changed, 133 insertions(+), 62 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 7042ca6c60..e6197c165e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -850,6 +850,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index bfb3419efb..b01a2fdfdd 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -181,6 +181,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->permInfos = estate->es_rteperminfos;
pstmt->resultRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7651886229..ec730674f2 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1786,6 +1786,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* Initialize data structure needed for run-time partition pruning and
* do initial pruning if needed
*
+ * 'root_parent_relids' identifies the relation to which both the parent plan
+ * and the PartitionPruneInfo given by 'part_prune_index' belong.
+ *
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
* Initial pruning is performed here if needed and in that case only the
@@ -1798,11 +1801,25 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
+ Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo;
+
+ /* Obtain the pruneinfo we need, and make sure it's the right one */
+ pruneinfo = list_nth_node(PartitionPruneInfo, estate->es_part_prune_infos,
+ part_prune_index);
+ if (!bms_equal(root_parent_relids, pruneinfo->root_parent_relids))
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("mismatching PartitionPruneInfo found at part_prune_index %d",
+ part_prune_index),
+ errdetail_internal("plan node relids %s, pruneinfo relids %s",
+ bmsToString(root_parent_relids),
+ bmsToString(pruneinfo->root_parent_relids)));
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 5737f9f4eb..67734979b0 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -118,6 +118,7 @@ CreateExecutorState(void)
estate->es_rowmarks = NULL;
estate->es_rteperminfos = NIL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index ca0f54d676..de7ebab5c2 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
+ node->apprelids,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index e1b9b984a7..3ed91808dd 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
+ node->apprelids,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index bb45ef318f..6642d09a39 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1225,7 +1225,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1376,6 +1375,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1399,16 +1401,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1447,7 +1447,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1540,6 +1539,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1555,13 +1557,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
Assert(best_path->path.param_info == NULL);
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index df35d1ff9c..1b9071c774 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -547,6 +547,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 91c7c4fe2f..e2ea406c4e 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1732,6 +1732,48 @@ set_customscan_references(PlannerInfo *root,
cscan->custom_relids = offset_relid_set(cscan->custom_relids, rtoffset);
}
+/*
+ * register_partpruneinfo
+ * Subroutine for set_append_references and set_mergeappend_references
+ *
+ * Add the PartitionPruneInfo from root->partPruneInfos at the given index
+ * into PlannerGlobal->partPruneInfos and return its index there.
+ *
+ * Also update the RT indexes present in PartitionedRelPruneInfos to add the
+ * offset.
+ */
+static int
+register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
+{
+ PlannerGlobal *glob = root->glob;
+ PartitionPruneInfo *pinfo;
+ ListCell *l;
+
+ Assert(part_prune_index >= 0 &&
+ part_prune_index < list_length(root->partPruneInfos));
+ pinfo = list_nth_node(PartitionPruneInfo, root->partPruneInfos,
+ part_prune_index);
+
+ pinfo->root_parent_relids = offset_relid_set(pinfo->root_parent_relids,
+ rtoffset);
+ foreach(l, pinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *prelinfo = lfirst(l2);
+
+ prelinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pinfo);
+
+ return list_length(glob->partPruneInfos) - 1;
+}
+
/*
* set_append_references
* Do set_plan_references processing on an Append
@@ -1784,21 +1826,13 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * Add PartitionPruneInfo, if any, to PlannerGlobal and update the index.
+ * Also update the RT indexes present in it to add the offset.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index =
+ register_partpruneinfo(root, aplan->part_prune_index, rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1860,21 +1894,13 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * Add PartitionPruneInfo, if any, to PlannerGlobal and update the index.
+ * Also update the RT indexes present in it to add the offset.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index =
+ register_partpruneinfo(root, mplan->part_prune_index, rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9a1a7faac7..60fabb1734 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -207,16 +207,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -330,10 +334,11 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
+ pruneinfo->root_parent_relids = parentrel->relids;
pruneinfo->prune_infos = prunerelinfos;
/*
@@ -356,7 +361,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index c09bc83b2a..12aacc84ff 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,9 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
+ Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 88467977f8..22b928e085 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -636,6 +636,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 07e2415398..8d30b6e896 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -128,6 +128,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* List of PartitionPruneInfo contained in the plan */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -559,6 +562,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 62cd6a6666..39d0281c23 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -69,6 +69,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in the
+ * plan */
+
List *rtable; /* list of RangeTblEntry nodes */
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
@@ -276,8 +279,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -311,8 +314,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
@@ -1414,6 +1417,8 @@ typedef struct PlanRowMark
* Then, since an Append-type node could have multiple partitioning
* hierarchies among its children, we have an unordered List of those Lists.
*
+ * root_parent_relids RelOptInfo.relids of the relation to which the parent
+ * plan node and this PartitionPruneInfo node belong
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
@@ -1426,6 +1431,7 @@ typedef struct PartitionPruneInfo
pg_node_attr(no_equal, no_query_jumble)
NodeTag type;
+ Bitmapset *root_parent_relids;
List *prune_infos;
Bitmapset *other_subplans;
} PartitionPruneInfo;
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index bd490d154f..c536a1fe19 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.43.0
v55-0003-Initialize-PartitionPruneContext-for-exec-prunin.patchapplication/octet-stream; name=v55-0003-Initialize-PartitionPruneContext-for-exec-prunin.patchDownload
From 92d87cdbb3ad675ac6ffa2767f1d7d5876bd5369 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 18 Sep 2024 11:16:48 +0900
Subject: [PATCH v55 3/5] Initialize PartitionPruneContext for exec pruning
lazily
Currently, ExecInitPartitionPruning() iterates over PartitionPruningDatas
and nested PartitionedRelPruningDatas in a PartitionPruneState solely
to initialize the exec_context of the PartitionedRelPruningData.
This commit moves the initialization to find_matching_subplans_recurse(),
where the exec_context is actually needed, eliminating the need for
the above iteration. To track whether the context has been initialized
and is ready for use, a boolean field is_valid is added to
PartitionPruneContext.
---
src/backend/executor/execPartition.c | 166 ++++++++++-----------------
src/include/executor/execPartition.h | 1 +
src/include/partitioning/partprune.h | 2 +
3 files changed, 65 insertions(+), 104 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 3c7c631867..d9fa593785 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -190,10 +190,8 @@ static void InitPartitionPruneContext(PartitionPruneContext *context,
static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
Bitmapset *initially_valid_subplans,
int n_total_subplans);
-static void PartitionPruneInitExecPruning(PartitionPruneInfo *pruneinfo,
- PartitionPruneState *prunestate,
- PlanState *planstate);
-static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
+static void find_matching_subplans_recurse(PlanState *parent_plan,
+ PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
Bitmapset **validsubplans);
@@ -1830,13 +1828,14 @@ ExecInitPartitionPruning(PlanState *planstate,
/*
* ExecDoInitialPruning() must have initialized the PartitionPruneState to
- * perform the initial pruning. Now we simply need to initialize the
- * context information for exec pruning.
+ * perform the initial pruning. Store PlanState so that the exec_context
+ * can be initialized using it later when find_matching_subplans_recurse()
+ * needs it.
*/
prunestate = list_nth(estate->es_part_prune_states, part_prune_index);
Assert(prunestate != NULL);
if (prunestate->do_exec_prune)
- PartitionPruneInitExecPruning(pruneinfo, prunestate, planstate);
+ prunestate->parent_plan = planstate;
/* Use the result of initial pruning done by ExecDoInitialPruning(). */
if (prunestate->do_initial_prune)
@@ -1893,8 +1892,7 @@ ExecInitPartitionPruning(PlanState *planstate,
* each PartitionedRelPruningData) for initial pruning here. Execution pruning
* requires access to the parent plan node's PlanState, which is not available
* when this function is called from ExecDoInitialPruning(), so it is
- * initialized later during ExecInitPartitionPruning() by calling
- * PartitionPruneInitExecPruning().
+ * initialized lazily during find_matching_subplans_recurse().
*/
PartitionPruneState *
ExecCreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
@@ -2099,25 +2097,30 @@ ExecCreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
}
/*
- * The exec pruning context will be initialized in
- * ExecInitPartitionPruning() when called during the initialization
- * of the parent plan node.
+ * The exec pruning context will be initialized lazily when it
+ * will be used for the first time in
+ * find_matching_subplans_recurse().
*
- * pprune->exec_pruning_steps is set to NIL to prevent
- * ExecFindMatchingSubPlans() from accessing an uninitialized
- * pprune->exec_context during the initial pruning by
- * ExecDoInitialPruning().
- *
- * prunestate->do_exec_prune is set to indicate whether
- * PartitionPruneInitExecPruning() needs to be called by
- * ExecInitPartitionPruning(). This optimization avoids
- * unnecessary cycles when only initial pruning is required.
+ * prunestate->do_exec_prune is set to indicate whether we're
+ * actually going to perform exec pruning to inform
+ * ExecInitPartitionPruning() whether it should fix the
+ * subplan_map array based on the result of initial pruning
+ * and also the parent node's code to allow it set up its
+ * data structure accordingly.
*/
- pprune->exec_pruning_steps = NIL;
+ pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
+ pprune->exec_context.is_valid = false;
if (pinfo->exec_pruning_steps &&
!(econtext->ecxt_estate->es_top_eflags & EXEC_FLAG_EXPLAIN_GENERIC))
prunestate->do_exec_prune = true;
+ /*
+ * Accumulate the IDs of all PARAM_EXEC Params affecting the
+ * partitioning decisions at this plan node.
+ */
+ prunestate->execparamids = bms_add_members(prunestate->execparamids,
+ pinfo->execparamids);
+
j++;
}
i++;
@@ -2208,6 +2211,8 @@ InitPartitionPruneContext(PartitionPruneContext *context,
}
}
}
+
+ context->is_valid = true;
}
/*
@@ -2326,84 +2331,6 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
pfree(new_subplan_indexes);
}
-/*
- * PartitionPruneInitExecPruning
- * Initialize PartitionPruneState for exec pruning.
- */
-static void
-PartitionPruneInitExecPruning(PartitionPruneInfo *pruneinfo,
- PartitionPruneState *prunestate,
- PlanState *planstate)
-{
- EState *estate = planstate->state;
- int i;
- ExprContext *econtext;
-
- /* CreatePartitionPruneState() must have initialized. */
- Assert(estate->es_partition_directory != NULL);
-
- /* CreatePartitionPruneState() must have set this. */
- Assert(prunestate->do_exec_prune);
-
- /*
- * Create ExprContext if not already done for the planstate. We may need
- * an expression context to evaluate partition exprs.
- */
- ExecAssignExprContext(estate, planstate);
- econtext = planstate->ps_ExprContext;
- for (i = 0; i < prunestate->num_partprunedata; i++)
- {
- List *partrel_pruneinfos =
- list_nth_node(List, pruneinfo->prune_infos, i);
- PartitionPruningData *prunedata = prunestate->partprunedata[i];
- int j;
-
- for (j = 0; j < prunedata->num_partrelprunedata; j++)
- {
- PartitionedRelPruneInfo *pinfo =
- list_nth_node(PartitionedRelPruneInfo, partrel_pruneinfos, j);
- PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
- Relation partrel = pprune->partrel;
- PartitionDesc partdesc;
- PartitionKey partkey;
-
- /*
- * Nothing to do if there are no exec pruning steps, but do set
- * pprune->exec_pruning_steps, becasue
- * find_matching_subplans_recurse() looks at it.
- *
- * Also skip if doing EXPLAIN (GENERIC_PLAN), since parameter
- * values may be missing.
- */
- pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pprune->exec_pruning_steps == NIL ||
- (econtext->ecxt_estate->es_top_eflags & EXEC_FLAG_EXPLAIN_GENERIC))
- continue;
-
- /*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
- */
- partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
- InitPartitionPruneContext(&pprune->exec_context,
- pprune->exec_pruning_steps,
- partdesc, partkey, planstate,
- econtext);
-
- /*
- * Accumulate the IDs of all PARAM_EXEC Params affecting the
- * partitioning decisions at this plan node.
- */
- prunestate->execparamids = bms_add_members(prunestate->execparamids,
- pinfo->execparamids);
- }
- }
-}
-
/*
* ExecFindMatchingSubPlans
* Determine which subplans match the pruning steps detailed in
@@ -2449,12 +2376,16 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* recursing to other (lower-level) parents as needed.
*/
pprune = &prunedata->partrelprunedata[0];
- find_matching_subplans_recurse(prunedata, pprune, initial_prune,
+ find_matching_subplans_recurse(prunestate->parent_plan,
+ prunedata, pprune, initial_prune,
&result);
/* Expression eval may have used space in ExprContext too */
- if (pprune->exec_pruning_steps)
+ if (pprune->exec_context.is_valid)
+ {
+ Assert(pprune->exec_pruning_steps != NIL);
ResetExprContext(pprune->exec_context.exprcontext);
+ }
}
/* Add in any subplans that partition pruning didn't account for */
@@ -2477,7 +2408,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* Adds valid (non-prunable) subplan IDs to *validsubplans
*/
static void
-find_matching_subplans_recurse(PartitionPruningData *prunedata,
+find_matching_subplans_recurse(PlanState *parent_plan,
+ PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
Bitmapset **validsubplans)
@@ -2497,8 +2429,33 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
partset = get_matching_partitions(&pprune->initial_context,
pprune->initial_pruning_steps);
else if (!initial_prune && pprune->exec_pruning_steps)
+ {
+ /* Initialize exec_context if not already done. */
+ if (unlikely(!pprune->exec_context.is_valid))
+ {
+ ExprContext *econtext;
+ EState *estate = parent_plan->state;
+ /* Must allocate the needed stuff in the query lifetime context. */
+ MemoryContext oldcxt = MemoryContextSwitchTo(estate->es_query_cxt);
+ Relation partrel = pprune->partrel;
+ PartitionKey partkey = RelationGetPartitionKey(partrel);
+ PartitionDesc partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
+ partrel);
+
+ if (parent_plan->ps_ExprContext == NULL)
+ ExecAssignExprContext(estate, parent_plan);
+ econtext = parent_plan->ps_ExprContext;
+
+ InitPartitionPruneContext(&pprune->exec_context,
+ pprune->exec_pruning_steps,
+ partdesc, partkey, parent_plan,
+ econtext);
+
+ MemoryContextSwitchTo(oldcxt);
+ }
partset = get_matching_partitions(&pprune->exec_context,
pprune->exec_pruning_steps);
+ }
else
partset = pprune->present_parts;
@@ -2514,7 +2471,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
int partidx = pprune->subpart_map[i];
if (partidx >= 0)
- find_matching_subplans_recurse(prunedata,
+ find_matching_subplans_recurse(parent_plan,
+ prunedata,
&prunedata->partrelprunedata[partidx],
initial_prune, validsubplans);
else
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 2f45ac1cc8..ef6d8b2d48 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -122,6 +122,7 @@ typedef struct PartitionPruneState
bool do_initial_prune;
bool do_exec_prune;
int num_partprunedata;
+ PlanState *parent_plan;
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index c536a1fe19..b7f48eefcc 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -26,6 +26,7 @@ struct RelOptInfo;
* Stores information needed at runtime for pruning computations
* related to a single partitioned table.
*
+ * is_valid Has the information in this struct been initialized?
* strategy Partition strategy, e.g. LIST, RANGE, HASH.
* partnatts Number of columns in the partition key.
* nparts Number of partitions in this partitioned table.
@@ -48,6 +49,7 @@ struct RelOptInfo;
*/
typedef struct PartitionPruneContext
{
+ bool is_valid;
char strategy;
int partnatts;
int nparts;
--
2.43.0
v55-0002-Perform-runtime-initial-pruning-outside-ExecInit.patchapplication/octet-stream; name=v55-0002-Perform-runtime-initial-pruning-outside-ExecInit.patchDownload
From 808126517d4b0018ee96de1ba28ea664566fd1aa Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 12 Sep 2024 15:44:43 +0900
Subject: [PATCH v55 2/5] Perform runtime initial pruning outside
ExecInitNode()
This commit follows up on the previous change that moved
PartitionPruneInfos out of individual plan nodes into a list in
PlannedStmt. It moves the initialization of PartitionPruneStates
and runtime initial pruning out of ExecInitNode() and into a new
routine, ExecDoInitialPruning(), which is called by InitPlan()
before ExecInitNode() is invoked on the main plan tree and subplans.
ExecDoInitialPruning() stores the PartitionPruneStates that it
creates to do the initial pruning to use during exec pruninng in a
list matching the length of es_part_prune_infos (which holds the
PartitionPruneInfos from PlannedStmt), allowing both lists to share
the same index. It also saves the initial pruning result -- a
bitmapset of indexes for surviving child subnodes -- in a similarly
indexed list.
While the initial pruning is done earlier, the execution pruning
context information (needed for runtime pruning) is initialized
later during ExecInitNode() for the parent plan node, as it requires
access to the parent node's PlanState struct.
---
src/backend/executor/execMain.c | 55 ++++++++
src/backend/executor/execPartition.c | 179 +++++++++++++++++++++------
src/include/executor/execPartition.h | 6 +
src/include/nodes/execnodes.h | 2 +
4 files changed, 202 insertions(+), 40 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index e6197c165e..1994112b2e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -46,6 +46,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "mb/pg_wchar.h"
@@ -818,6 +819,54 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
+/*
+ * ExecDoInitialPruning
+ * Perform runtime "initial" pruning, if necessary, to determine the set
+ * of child subnodes that need to be initialized during ExecInitNode()
+ * for plan nodes that support partition pruning.
+ *
+ * For each PartitionPruneInfo in estate->es_part_prune_infos, this function
+ * creates a PartitionPruneState (even if no initial pruning is done) and adds
+ * it to es_part_prune_states. For PartitionPruneInfo entries that include
+ * initial pruning steps, the result of those steps is saved as a bitmapset
+ * of indexes representing child subnodes that are "valid" and should be
+ * initialized for execution.
+ */
+static void
+ExecDoInitialPruning(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+ Bitmapset *validsubplans = NULL;
+
+ /*
+ * Create the working data structure for pruning, and save it for use
+ * later in ExecInitPartitionPruning(), which will be called by the
+ * parent plan node's ExecInit* function.
+ */
+ prunestate = ExecCreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+
+ /*
+ * Perform an initial partition pruning pass, if necessary, and save
+ * the bitmapset of valid subplans for use in
+ * ExecInitPartitionPruning(). If no initial pruning is performed, we
+ * still store a NULL to ensure that es_part_prune_results is the same
+ * length as es_part_prune_infos. This ensures that
+ * ExecInitPartitionPruning() can use the same index to locate the
+ * result.
+ */
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true);
+ estate->es_part_prune_results = lappend(estate->es_part_prune_results,
+ validsubplans);
+ }
+}
/* ----------------------------------------------------------------
* InitPlan
@@ -850,7 +899,13 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+
+ /*
+ * Perform runtime "initial" pruning to determine the plan nodes that will
+ * not be executed.
+ */
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ ExecDoInitialPruning(estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index ec730674f2..3c7c631867 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -181,8 +181,6 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
-static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -192,6 +190,9 @@ static void InitPartitionPruneContext(PartitionPruneContext *context,
static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
Bitmapset *initially_valid_subplans,
int n_total_subplans);
+static void PartitionPruneInitExecPruning(PartitionPruneInfo *pruneinfo,
+ PartitionPruneState *prunestate,
+ PlanState *planstate);
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
@@ -1783,20 +1784,26 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
/*
* ExecInitPartitionPruning
- * Initialize data structure needed for run-time partition pruning and
- * do initial pruning if needed
+ * Initialize the data structures needed for runtime "exec" partition
+ * pruning and return the result of initial pruning, if available.
*
* 'root_parent_relids' identifies the relation to which both the parent plan
- * and the PartitionPruneInfo given by 'part_prune_index' belong.
+ * and the PartitionPruneInfo associated with 'part_prune_index' belong.
*
- * On return, *initially_valid_subplans is assigned the set of indexes of
- * child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * The PartitionPruneState would have been created by ExecDoInitialPruning()
+ * and stored as the part_prune_index'th element of EState.es_part_prune_states.
+ * Here, we initialize only the PartitionPruneContext necessary for execution
+ * pruning.
*
- * If subplans are indeed pruned, subplan_map arrays contained in the returned
- * PartitionPruneState are re-sequenced to not count those, though only if the
- * maps will be needed for subsequent execution pruning passes.
+ * On return, *initially_valid_subplans is assigned the set of indexes of child
+ * subplans that must be initialized alongside the parent plan node. Initial
+ * pruning would have been performed by ExecDoInitialPruning() if necessary,
+ * and the bitmapset of surviving subplans' indexes would have been stored as
+ * the part_prune_index'th element of EState.es_part_prune_results.
+ *
+ * If subplans are pruned, the subplan_map arrays in the returned
+ * PartitionPruneState are re-sequenced to exclude those subplans, but only if
+ * the maps will be needed for subsequent execution pruning passes.
*/
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
@@ -1821,17 +1828,21 @@ ExecInitPartitionPruning(PlanState *planstate,
bmsToString(root_parent_relids),
bmsToString(pruneinfo->root_parent_relids)));
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
-
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
-
/*
- * Perform an initial partition prune pass, if required.
+ * ExecDoInitialPruning() must have initialized the PartitionPruneState to
+ * perform the initial pruning. Now we simply need to initialize the
+ * context information for exec pruning.
*/
+ prunestate = list_nth(estate->es_part_prune_states, part_prune_index);
+ Assert(prunestate != NULL);
+ if (prunestate->do_exec_prune)
+ PartitionPruneInitExecPruning(pruneinfo, prunestate, planstate);
+
+ /* Use the result of initial pruning done by ExecDoInitialPruning(). */
if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ *initially_valid_subplans = list_nth_node(Bitmapset,
+ estate->es_part_prune_results,
+ part_prune_index);
else
{
/* No pruning, so we'll need to initialize all subplans */
@@ -1877,16 +1888,23 @@ ExecInitPartitionPruning(PlanState *planstate,
* stored in each PartitionedRelPruningData can be re-used each time we
* re-evaluate which partitions match the pruning steps provided in each
* PartitionedRelPruneInfo.
+ *
+ * Note that we only initialize the PartitionPruneContext (which is placed into
+ * each PartitionedRelPruningData) for initial pruning here. Execution pruning
+ * requires access to the parent plan node's PlanState, which is not available
+ * when this function is called from ExecDoInitialPruning(), so it is
+ * initialized later during ExecInitPartitionPruning() by calling
+ * PartitionPruneInitExecPruning().
*/
-static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+PartitionPruneState *
+ExecCreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
- EState *estate = planstate->state;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
+ /* We may need an expression context to evaluate partition exprs */
+ ExprContext *econtext = CreateExprContext(estate);
/* For data reading, executor always includes detached partitions */
if (estate->es_partition_directory == NULL)
@@ -1974,6 +1992,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* set to -1, as if they were pruned. By construction, both
* arrays are in partition bounds order.
*/
+ pprune->partrel = partrel;
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
@@ -2073,29 +2092,31 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
- partdesc, partkey, planstate,
+ partdesc, partkey, NULL,
econtext);
/* Record whether initial pruning is needed at any level */
prunestate->do_initial_prune = true;
}
- pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps &&
- !(econtext->ecxt_estate->es_top_eflags & EXEC_FLAG_EXPLAIN_GENERIC))
- {
- InitPartitionPruneContext(&pprune->exec_context,
- pinfo->exec_pruning_steps,
- partdesc, partkey, planstate,
- econtext);
- /* Record whether exec pruning is needed at any level */
- prunestate->do_exec_prune = true;
- }
/*
- * Accumulate the IDs of all PARAM_EXEC Params affecting the
- * partitioning decisions at this plan node.
+ * The exec pruning context will be initialized in
+ * ExecInitPartitionPruning() when called during the initialization
+ * of the parent plan node.
+ *
+ * pprune->exec_pruning_steps is set to NIL to prevent
+ * ExecFindMatchingSubPlans() from accessing an uninitialized
+ * pprune->exec_context during the initial pruning by
+ * ExecDoInitialPruning().
+ *
+ * prunestate->do_exec_prune is set to indicate whether
+ * PartitionPruneInitExecPruning() needs to be called by
+ * ExecInitPartitionPruning(). This optimization avoids
+ * unnecessary cycles when only initial pruning is required.
*/
- prunestate->execparamids = bms_add_members(prunestate->execparamids,
- pinfo->execparamids);
+ pprune->exec_pruning_steps = NIL;
+ if (pinfo->exec_pruning_steps &&
+ !(econtext->ecxt_estate->es_top_eflags & EXEC_FLAG_EXPLAIN_GENERIC))
+ prunestate->do_exec_prune = true;
j++;
}
@@ -2305,6 +2326,84 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
pfree(new_subplan_indexes);
}
+/*
+ * PartitionPruneInitExecPruning
+ * Initialize PartitionPruneState for exec pruning.
+ */
+static void
+PartitionPruneInitExecPruning(PartitionPruneInfo *pruneinfo,
+ PartitionPruneState *prunestate,
+ PlanState *planstate)
+{
+ EState *estate = planstate->state;
+ int i;
+ ExprContext *econtext;
+
+ /* CreatePartitionPruneState() must have initialized. */
+ Assert(estate->es_partition_directory != NULL);
+
+ /* CreatePartitionPruneState() must have set this. */
+ Assert(prunestate->do_exec_prune);
+
+ /*
+ * Create ExprContext if not already done for the planstate. We may need
+ * an expression context to evaluate partition exprs.
+ */
+ ExecAssignExprContext(estate, planstate);
+ econtext = planstate->ps_ExprContext;
+ for (i = 0; i < prunestate->num_partprunedata; i++)
+ {
+ List *partrel_pruneinfos =
+ list_nth_node(List, pruneinfo->prune_infos, i);
+ PartitionPruningData *prunedata = prunestate->partprunedata[i];
+ int j;
+
+ for (j = 0; j < prunedata->num_partrelprunedata; j++)
+ {
+ PartitionedRelPruneInfo *pinfo =
+ list_nth_node(PartitionedRelPruneInfo, partrel_pruneinfos, j);
+ PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
+ Relation partrel = pprune->partrel;
+ PartitionDesc partdesc;
+ PartitionKey partkey;
+
+ /*
+ * Nothing to do if there are no exec pruning steps, but do set
+ * pprune->exec_pruning_steps, becasue
+ * find_matching_subplans_recurse() looks at it.
+ *
+ * Also skip if doing EXPLAIN (GENERIC_PLAN), since parameter
+ * values may be missing.
+ */
+ pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
+ if (pprune->exec_pruning_steps == NIL ||
+ (econtext->ecxt_estate->es_top_eflags & EXEC_FLAG_EXPLAIN_GENERIC))
+ continue;
+
+ /*
+ * We can rely on the copies of the partitioned table's partition
+ * key and partition descriptor appearing in its relcache entry,
+ * because that entry will be held open and locked for the
+ * duration of this executor run.
+ */
+ partkey = RelationGetPartitionKey(partrel);
+ partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
+ partrel);
+ InitPartitionPruneContext(&pprune->exec_context,
+ pprune->exec_pruning_steps,
+ partdesc, partkey, planstate,
+ econtext);
+
+ /*
+ * Accumulate the IDs of all PARAM_EXEC Params affecting the
+ * partitioning decisions at this plan node.
+ */
+ prunestate->execparamids = bms_add_members(prunestate->execparamids,
+ pinfo->execparamids);
+ }
+ }
+}
+
/*
* ExecFindMatchingSubPlans
* Determine which subplans match the pruning steps detailed in
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 12aacc84ff..2f45ac1cc8 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -42,6 +42,9 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* PartitionedRelPruneInfo (see plannodes.h); though note that here,
* subpart_map contains indexes into PartitionPruningData.partrelprunedata[].
*
+ * partrel Partitioned table; points to
+ * EState.es_relations[rti-1], where rti is the
+ * table's RT index
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
@@ -58,6 +61,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
*/
typedef struct PartitionedRelPruningData
{
+ Relation partrel;
int nparts;
int *subplan_map;
int *subpart_map;
@@ -128,4 +132,6 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
+extern PartitionPruneState *ExecCreatePartitionPruneState(EState *estate,
+ PartitionPruneInfo *pruneinfo);
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 22b928e085..518a9fcd15 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -637,6 +637,8 @@ typedef struct EState
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_states; /* List of PartitionPruneState */
+ List *es_part_prune_results; /* List of Bitmapset */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
--
2.43.0
v55-0004-Defer-locking-of-runtime-prunable-relations-to-e.patchapplication/octet-stream; name=v55-0004-Defer-locking-of-runtime-prunable-relations-to-e.patchDownload
From ad047f0bb7b703c0d2079464622588138e64b117 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 18 Sep 2024 12:00:41 +0900
Subject: [PATCH v55 4/5] Defer locking of runtime-prunable relations to
executor
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When preparing a cached plan for execution, plancache.c locks the
relations in the plan's range table to ensure they are safe for
execution. However, this approach, implemented in
AcquireExecutorLocks(), results in unnecessarily locking relations
that might be pruned during "initial" runtime pruning.
To optimize this, locking is now deferred for relations subject to
"initial" runtime pruning. The planner now provides a set of
"unprunable" relations through the new PlannedStmt.unprunableRelids
field. AcquireExecutorLocks() will only lock these unprunable
relations. PlannedStmt.unprunableRelids is populated by subtracting
the set of initially prunable relids from all RT indexes. The prunable
relids are identified by examining all PartitionPruneInfos during
set_plan_refs() and storing the RT indexes of partitions subject to
"initial" pruning steps. While at it, some duplicated code in
set_append_references() and set_mergeappend_references() that
constructs the prunable relids set has been refactored into a common
function.
Deferred locks are taken, if necessary, after ExecDoInitialPruning()
determines the set of unpruned partitions. To allow the executor to
determine whether the plan tree it’s executing is cached and may
contain unlocked relations, the CachedPlan is now made available via
the QueryDesc. The executor can call CachedPlanRequiresLocking(),
which returns true if the CachedPlan is a reusable generic plan that
might contain unlocked relations.
Plan nodes like Append have already been updated to consider only the
set of unpruned relations. However, there are cases, such as child
RowMarks and child result relations, where the code manipulating those
do not directly receive information about unpruned partitions.
Therefore, code handling child RowMarks and result relations has been
modified to ensure they don’t belong to pruned partitions. For this,
the RT indexes of unpruned partitions are added in
ExecDoInitialPruning() to es_unprunable_relids, which initially
contains PlannedStmt.unprunableRelids. The corresponding code now
processes only those child RowMarks and result relations whose owning
relations are in this set. For result relations managed by a
ModifyTable node, its resultRelations list is truncated in
ExecInitModifyTable to only consider unpruned relations and the
ResultRelInfo structs are created only for those.
Finally, an Assert has also been added in ExecCheckPermissions() to
ensure that all relations whose permissions are checked have been
properly locked, helping to catch any accidental omission of relations
from the unprunableRelids set that should have their permissions
checked.
This deferment introduces a window where prunable relations may be
altered by concurrent DDL, potentially causing the plan to become
invalid. Consequently, the executor might attempt to execute an
invalid plan, leading to errors such as failing to locate the index
of an unpruned partition that may have been dropped concurrently
during ExecInitIndexScan() (if it's partition-local, not inherited,
for example). Future commits will introduce changes to enable the
executor to check plan validity during ExecutorStart() and retry with
a newly created plan if the original becomes invalid after taking
deferred locks.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +--
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 3 +-
src/backend/executor/execMain.c | 75 ++++++++++++++++++++++++--
src/backend/executor/execParallel.c | 9 +++-
src/backend/executor/execPartition.c | 36 ++++++++++---
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeAppend.c | 8 +--
src/backend/executor/nodeLockRows.c | 10 +++-
src/backend/executor/nodeMergeAppend.c | 2 +-
src/backend/executor/nodeModifyTable.c | 38 ++++++++++---
src/backend/executor/spi.c | 1 +
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 7 +++
src/backend/partitioning/partprune.c | 18 +++++++
src/backend/tcop/pquery.c | 10 +++-
src/backend/utils/cache/plancache.c | 40 ++++++++------
src/include/commands/explain.h | 5 +-
src/include/executor/execPartition.h | 5 +-
src/include/executor/execdesc.h | 2 +
src/include/nodes/execnodes.h | 6 +++
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 7 +++
src/include/utils/plancache.h | 10 ++++
27 files changed, 263 insertions(+), 52 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 91de442f43..db976f928a 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -552,7 +552,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 0b629b1f79..57a3375cad 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -324,7 +324,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index aaec439892..49f7370734 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -509,7 +509,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -617,7 +617,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -673,7 +674,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, cplan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index fab59ad5f6..bd169edeff 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -742,6 +742,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 010097873d..69be74b4bd 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -438,7 +438,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 07257d4db9..311b9ebd5b 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -655,7 +655,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
+ ExplainOnePlan(pstmt, cplan, into, es, query_string, paramLI,
+ queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 1994112b2e..df1b5b2dc3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -53,6 +53,7 @@
#include "miscadmin.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -90,6 +91,7 @@ static bool ExecCheckPermissionsModified(Oid relOid, Oid userid,
AclMode requiredPerms);
static void ExecCheckXactReadOnly(PlannedStmt *plannedstmt);
static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
+static inline bool ExecShouldLockRelations(EState *estate);
/* end of local decls */
@@ -600,6 +602,21 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
(rte->rtekind == RTE_SUBQUERY &&
rte->relkind == RELKIND_VIEW));
+ /*
+ * Ensure that we have at least an AccessShareLock on relations
+ * whose permissions need to be checked.
+ *
+ * Skip this check in a parallel worker because locks won't be
+ * taken until ExecInitNode() performs plan initialization.
+ *
+ * XXX: ExecCheckPermissions() in a parallel worker may be
+ * redundant with the checks done in the leader process, so this
+ * should be reviewed to ensure it’s necessary.
+ */
+ Assert(IsParallelWorker() ||
+ CheckRelationOidLockedByMe(rte->relid, AccessShareLock,
+ true));
+
(void) getRTEPermissionInfo(rteperminfos, rte);
/* Many-to-one mapping not allowed */
Assert(!bms_is_member(rte->perminfoindex, indexset));
@@ -862,12 +879,46 @@ ExecDoInitialPruning(EState *estate)
* result.
*/
if (prunestate->do_initial_prune)
- validsubplans = ExecFindMatchingSubPlans(prunestate, true);
+ {
+ Bitmapset *validsubplan_rtis = NULL;
+
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (ExecShouldLockRelations(estate))
+ {
+ int rtindex = -1;
+
+ rtindex = -1;
+ while ((rtindex = bms_next_member(validsubplan_rtis,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, estate);
+
+ Assert(rte->rtekind == RTE_RELATION &&
+ rte->rellockmode != NoLock);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+ estate->es_unprunable_relids = bms_add_members(estate->es_unprunable_relids,
+ validsubplan_rtis);
+ }
+
estate->es_part_prune_results = lappend(estate->es_part_prune_results,
validsubplans);
}
}
+/*
+ * Locks might be needed only if running a cached plan that might contain
+ * unlocked relations, such as reused generic plans.
+ */
+static inline bool
+ExecShouldLockRelations(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? false :
+ CachedPlanRequiresLocking(estate->es_cachedplan);
+}
+
/* ----------------------------------------------------------------
* InitPlan
*
@@ -880,6 +931,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ CachedPlan *cachedplan = queryDesc->cplan;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -899,10 +951,13 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = cachedplan;
+ estate->es_unprunable_relids = bms_copy(plannedstmt->unprunableRelids);
/*
* Perform runtime "initial" pruning to determine the plan nodes that will
- * not be executed.
+ * not be executed. This will also add the RT indexes of surviving leaf
+ * partitions to es_unprunable_relids.
*/
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
ExecDoInitialPruning(estate);
@@ -921,8 +976,13 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Relation relation;
ExecRowMark *erm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unprunable_relids))
continue;
/* get relation's OID (will produce InvalidOid if subquery) */
@@ -2959,6 +3019,13 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
}
}
+ /*
+ * Copy es_unprunable_relids so that RowMarks of pruned relations are
+ * ignored in ExecInitLockRows() and ExecInitModifyTable() when
+ * initializing the plan trees below.
+ */
+ rcestate->es_unprunable_relids = parentestate->es_unprunable_relids;
+
/*
* Initialize private state information for each SubPlan. We must do this
* before running ExecInitNode on the main query tree, since
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index b01a2fdfdd..7519c9a860 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1257,8 +1257,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. We pass NULL for cachedplan, because
+ * we don't have a pointer to the CachedPlan in the leader's process. It's
+ * fine because the only reason the executor needs to see it is to decide
+ * if it should take locks on certain relations, but paraller workers
+ * always take locks anyway.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d9fa593785..551e0ce9b2 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -26,6 +26,7 @@
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
#include "rewrite/rewriteManip.h"
+#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
@@ -194,7 +195,8 @@ static void find_matching_subplans_recurse(PlanState *parent_plan,
PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **validsubplan_rtis);
/*
@@ -1978,8 +1980,8 @@ ExecCreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
* The set of partitions that exist now might not be the same that
* existed when the plan was made. The normal case is that it is;
* optimize for that case with a quick comparison, and just copy
- * the subplan_map and make subpart_map point to the one in
- * PruneInfo.
+ * the subplan_map and make subpart_map, rti_map point to the
+ * ones in PruneInfo.
*
* For the case where they aren't identical, we could have more
* partitions on either side; or even exactly the same number of
@@ -1999,6 +2001,7 @@ ExecCreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
sizeof(int) * partdesc->nparts) == 0)
{
pprune->subpart_map = pinfo->subpart_map;
+ pprune->rti_map = pinfo->rti_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
}
@@ -2019,6 +2022,7 @@ ExecCreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
* mismatches.
*/
pprune->subpart_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(int) * partdesc->nparts);
for (pp_idx = 0; pp_idx < partdesc->nparts; pp_idx++)
{
@@ -2036,6 +2040,8 @@ ExecCreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
continue;
}
@@ -2073,6 +2079,7 @@ ExecCreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map[pp_idx] = -1;
pprune->subplan_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -2339,10 +2346,13 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * valisubplan_rtis must be non-NULL if initial_pruning is true.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **validsubplan_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2378,7 +2388,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunestate->parent_plan,
prunedata, pprune, initial_prune,
- &result);
+ &result, validsubplan_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_context.is_valid)
@@ -2395,6 +2405,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (validsubplan_rtis)
+ *validsubplan_rtis = bms_copy(*validsubplan_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2405,14 +2417,16 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and the RT indexes
+ * of their owning leaf partitions to *validsubplan_rtis if it's non-NULL.
*/
static void
find_matching_subplans_recurse(PlanState *parent_plan,
PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **validsubplan_rtis)
{
Bitmapset *partset;
int i;
@@ -2464,8 +2478,13 @@ find_matching_subplans_recurse(PlanState *parent_plan,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ if (validsubplan_rtis)
+ *validsubplan_rtis = bms_add_member(*validsubplan_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2474,7 +2493,8 @@ find_matching_subplans_recurse(PlanState *parent_plan,
find_matching_subplans_recurse(parent_plan,
prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ validsubplan_rtis);
else
{
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 692854e2b3..6f6f45e0ad 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -840,6 +840,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index de7ebab5c2..006bdafaea 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -581,7 +581,7 @@ choose_next_subplan_locally(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
}
@@ -648,7 +648,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
/*
@@ -724,7 +724,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
mark_invalid_subplans_as_finished(node);
@@ -877,7 +877,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
classify_matching_subplans(node);
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 41754ddfea..b5b2cd53c5 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -28,6 +28,7 @@
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "utils/rel.h"
+#include "utils/lsyscache.h"
/* ----------------------------------------------------------------
@@ -347,8 +348,13 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
ExecRowMark *erm;
ExecAuxRowMark *aerm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unprunable_relids))
continue;
/* find ExecRowMark and build ExecAuxRowMark */
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 3ed91808dd..f7821aa178 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -219,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 8bf4c80d4a..3c02782445 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4176,12 +4176,17 @@ ExecLookupResultRelByOid(ModifyTableState *node, Oid resultoid,
hash_search(node->mt_resultOidHash, &resultoid, HASH_FIND, NULL);
if (mtlookup)
{
+ ResultRelInfo *resultRelInfo;
+
if (update_cache)
{
node->mt_lastResultOid = resultoid;
node->mt_lastResultIndex = mtlookup->relationIndex;
}
- return node->resultRelInfo + mtlookup->relationIndex;
+
+ resultRelInfo = node->resultRelInfo + mtlookup->relationIndex;
+
+ return resultRelInfo;
}
}
else
@@ -4218,7 +4223,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
ModifyTableState *mtstate;
Plan *subplan = outerPlan(node);
CmdType operation = node->operation;
- int nrels = list_length(node->resultRelations);
+ int nrels;
+ List *resultRelations = NIL;
ResultRelInfo *resultRelInfo;
List *arowmarks;
ListCell *l;
@@ -4228,6 +4234,20 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* check for unsupported flags */
Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+ /*
+ * Only consider unpruned relations. In the future, it might be more
+ * efficient to store resultRelations as a bitmapset, which would make
+ * this operation cheaper.
+ */
+ foreach(l, node->resultRelations)
+ {
+ Index rti = lfirst_int(l);
+
+ if (bms_is_member(rti, estate->es_unprunable_relids))
+ resultRelations = lappend_int(resultRelations, rti);
+ }
+ nrels = list_length(resultRelations);
+
/*
* create state structure
*/
@@ -4265,6 +4285,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
*/
if (node->rootRelation > 0)
{
+ Assert(bms_is_member(node->rootRelation, estate->es_unprunable_relids));
mtstate->rootResultRelInfo = makeNode(ResultRelInfo);
ExecInitResultRelation(estate, mtstate->rootResultRelInfo,
node->rootRelation);
@@ -4279,7 +4300,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
- node->epqParam, node->resultRelations);
+ node->epqParam, resultRelations);
mtstate->fireBSTriggers = true;
/*
@@ -4297,7 +4318,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
*/
resultRelInfo = mtstate->resultRelInfo;
i = 0;
- foreach(l, node->resultRelations)
+ foreach(l, resultRelations)
{
Index resultRelation = lfirst_int(l);
List *mergeActions = NIL;
@@ -4589,8 +4610,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
ExecRowMark *erm;
ExecAuxRowMark *aerm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unprunable_relids))
continue;
/* Find ExecRowMark and build ExecAuxRowMark */
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 90d9834576..659bd6dcd9 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2684,6 +2684,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1b9071c774..9e47a7fd50 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -549,6 +549,8 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
+ result->unprunableRelids = bms_difference(bms_add_range(NULL, 1, list_length(result->rtable)),
+ glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index e2ea406c4e..283a61a972 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1764,8 +1764,15 @@ register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *prelinfo = lfirst(l2);
+ int i;
prelinfo->rtindex += rtoffset;
+ for (i = 0; i < prelinfo->nparts; i++)
+ {
+ prelinfo->rti_map[i] += rtoffset;
+ glob->prunableRelids = bms_add_member(glob->prunableRelids,
+ prelinfo->rti_map[i]);
+ }
}
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 60fabb1734..85894c87af 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -645,6 +645,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ int *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -657,6 +658,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (int *) palloc0(nparts * sizeof(int));
present_parts = NULL;
i = -1;
@@ -671,9 +673,24 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+
+ /*
+ * Track the RT indexes of partitions to ensure they are included
+ * in the prunableRelids set of relations that are locked during
+ * execution. This ensures that if the plan is cached, these
+ * partitions are locked when the plan is reused.
+ *
+ * Partitions without a subplan and sub-partitioned partitions
+ * where none of the sub-partitions have a subplan due to
+ * constraint exclusion are not included in this set. Instead,
+ * they are added to the unprunableRelids set, and the relations
+ * in this set are locked by AcquireExecutorLocks() before
+ * executing a cached plan.
+ */
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
+ rti_map[i] = (int) partrel->relid;
/* Record finding this subplan */
subplansfound = bms_add_member(subplansfound, subplanidx);
@@ -695,6 +712,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index a1f8d03db1..6e8f6b1b8f 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -36,6 +36,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +66,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +79,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan supplying the plannedstmt */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * cplan: CachedPlan supplying the plan
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, cplan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -493,6 +498,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1276,6 +1282,7 @@ PortalRunMulti(Portal portal,
{
/* statement can set tag string */
ProcessQuery(pstmt,
+ portal->cplan,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1285,6 +1292,7 @@ PortalRunMulti(Portal portal,
{
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
+ portal->cplan,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 5af1a168ec..5b75dadf13 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -104,7 +104,8 @@ static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ bool generic);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
@@ -815,8 +816,11 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * On a "true" return, we have acquired locks on the "unprunableRelids" set
+ * for all plans in plansource->stmt_list. The plans are not completely
+ * race-condition-free until the executor takes locks on the set of prunable
+ * relations that survive initial runtime pruning during executor
+ * initialization;
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -893,10 +897,10 @@ CheckCachedPlan(CachedPlanSource *plansource)
* or it can be set to NIL if we need to re-copy the plansource's query_list.
*
* To build a generic, parameter-value-independent plan, pass NULL for
- * boundParams. To build a custom plan, pass the actual parameter values via
- * boundParams. For best effect, the PARAM_FLAG_CONST flag should be set on
- * each parameter value; otherwise the planner will treat the value as a
- * hint rather than a hard constant.
+ * boundParams, and true for generic. To build a custom plan, pass the actual
+ * parameter values via boundParams, and false for generic. For best effect,
+ * the PARAM_FLAG_CONST flag should be set on each parameter value; otherwise
+ * the planner will treat the value as a hint rather than a hard constant.
*
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
@@ -904,7 +908,8 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ bool generic)
{
CachedPlan *plan;
List *plist;
@@ -1026,6 +1031,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->refcount = 0;
plan->context = plan_context;
plan->is_oneshot = plansource->is_oneshot;
+ plan->is_generic = generic;
plan->is_saved = false;
plan->is_valid = true;
@@ -1196,7 +1202,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv, true);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1241,7 +1247,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv, false);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1387,8 +1393,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if there are any lockable relations. This is probably
+ * unnecessary given the previous check, but let's be safe.
*/
foreach(lc, plan->stmt_list)
{
@@ -1776,7 +1782,7 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ int rtindex;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1794,9 +1800,13 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ rtindex = -1;
+ while ((rtindex = bms_next_member(plannedstmt->unprunableRelids,
+ rtindex)) >= 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry,
+ plannedstmt->rtable,
+ rtindex - 1);
if (!(rte->rtekind == RTE_RELATION ||
(rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3ab0aae78f..21c71e0d53 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -103,8 +103,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
- ExplainState *es, const char *queryString,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ IntoClause *into, ExplainState *es,
+ const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index ef6d8b2d48..7f2592e3b0 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -48,6 +48,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map RT index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -65,6 +66,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ int *rti_map pg_node_attr(array_size(nparts));
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -132,7 +134,8 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **validsubplan_rtis);
extern PartitionPruneState *ExecCreatePartitionPruneState(EState *estate,
PartitionPruneInfo *pruneinfo);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 0a7274e26c..0e7245435d 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ CachedPlan *cplan; /* CachedPlan that supplies the plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +58,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 518a9fcd15..57170818c0 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -42,6 +42,7 @@
#include "storage/condition_variable.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
+#include "utils/plancache.h"
#include "utils/reltrigger.h"
#include "utils/sharedtuplestore.h"
#include "utils/snapshot.h"
@@ -636,9 +637,14 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ CachedPlan *es_cachedplan;
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
List *es_part_prune_states; /* List of PartitionPruneState */
List *es_part_prune_results; /* List of Bitmapset */
+ Bitmapset *es_unprunable_relids; /* PlannedStmt.unprunableRelids + RT
+ * indexes of leaf partitions that
+ * survive initial pruning; see
+ * ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 8d30b6e896..cc2190ea63 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -116,6 +116,12 @@ typedef struct PlannerGlobal
/* "flat" rangetable for executor */
List *finalrtable;
+ /*
+ * RT indexes of relations subject to removal from the plan due to runtime
+ * pruning at plan initialization time
+ */
+ Bitmapset *prunableRelids;
+
/* "flat" list of RTEPermissionInfos */
List *finalrteperminfos;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 39d0281c23..318e30fe2f 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -74,6 +74,10 @@ typedef struct PlannedStmt
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *unprunableRelids; /* RT indexes of relations that are not
+ * subject to runtime pruning; for
+ * AcquireExecutorLocks() */
+
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
@@ -1474,6 +1478,9 @@ typedef struct PartitionedRelPruneInfo
/* subpart index by partition index, or -1 */
int *subpart_map pg_node_attr(array_size(nparts));
+ /* RT index by partition index, or 0 */
+ int *rti_map pg_node_attr(array_size(nparts));
+
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a90dfdf906..0b5ee007ca 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -149,6 +149,7 @@ typedef struct CachedPlan
int magic; /* should equal CACHEDPLAN_MAGIC */
List *stmt_list; /* list of PlannedStmts */
bool is_oneshot; /* is it a "oneshot" plan? */
+ bool is_generic; /* is it a reusable generic plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
Oid planRoleId; /* Role ID the plan was created for */
@@ -235,4 +236,13 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
+/*
+ * CachedPlanRequiresLocking: should the executor acquire locks?
+ */
+static inline bool
+CachedPlanRequiresLocking(CachedPlan *cplan)
+{
+ return cplan->is_generic;
+}
+
#endif /* PLANCACHE_H */
--
2.43.0
v55-0005-Handle-CachedPlan-invalidation-in-the-executor.patchapplication/octet-stream; name=v55-0005-Handle-CachedPlan-invalidation-in-the-executor.patchDownload
From 24eea4f10fa7129bc6284a7317d413bed2b177b5 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 22 Aug 2024 19:38:13 +0900
Subject: [PATCH v55 5/5] Handle CachedPlan invalidation in the executor
This commit makes changes to handle cases where a cached plan
becomes invalid before deferred locks on prunable relations are taken.
* Add checks at various points in ExecutorStart() and its called
functions to determine if the plan becomes invalid. If detected,
the function and its callers return immediately. A previous commit
ensures any partially initialized PlanState tree objects are cleaned
up appropriately.
* Introduce ExecutorStartExt(), a wrapper over ExecutorStart(), to
handle cases where plan initialization is aborted due to invalidation.
ExecutorStartExt() creates a new transient CachedPlan if needed and
retries execution. This new entry point is only required for sites
using plancache.c. It requires passing the QueryDesc, eflags,
CachedPlanSource, and query_index (index in CachedPlanSource.query_list).
* Add GetSingleCachedPlan() in plancache.c to create a transient
CachedPlan for a specified query in the given CachedPlanSource.
Such CachedPlans are tracked in a separate global list for the
plancache invalidation callbacks to check.
This also adds isolation tests using the delay_execution test module
to verify scenarios where a CachedPlan becomes invalid before the
deferred locks are taken.
All ExecutorStart_hook implementations now must add the following
block after the ExecutorStart() call to ensure it doesn't work with an
invalid plan:
/* The plan may have become invalid during ExecutorStart() */
if (!ExecPlanStillValid(queryDesc->estate))
return;
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.comk
---
contrib/auto_explain/auto_explain.c | 4 +
.../pg_stat_statements/pg_stat_statements.c | 4 +
src/backend/commands/explain.c | 8 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 10 +-
src/backend/commands/trigger.c | 14 ++
src/backend/executor/README | 35 ++-
src/backend/executor/execMain.c | 84 ++++++-
src/backend/executor/execUtils.c | 3 +-
src/backend/executor/spi.c | 19 +-
src/backend/tcop/postgres.c | 4 +-
src/backend/tcop/pquery.c | 31 ++-
src/backend/utils/cache/plancache.c | 206 ++++++++++++++++
src/backend/utils/mmgr/portalmem.c | 4 +-
src/include/commands/explain.h | 1 +
src/include/commands/trigger.h | 1 +
src/include/executor/execdesc.h | 1 +
src/include/executor/executor.h | 17 ++
src/include/nodes/execnodes.h | 1 +
src/include/utils/plancache.h | 26 ++
src/include/utils/portal.h | 4 +-
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 63 ++++-
.../expected/cached-plan-inval.out | 230 ++++++++++++++++++
src/test/modules/delay_execution/meson.build | 1 +
.../specs/cached-plan-inval.spec | 75 ++++++
26 files changed, 814 insertions(+), 36 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-inval.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-inval.spec
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 677c135f59..9eb5e9a619 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -300,6 +300,10 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
+ /* The plan may have become invalid during standard_ExecutorStart() */
+ if (!ExecPlanStillValid(queryDesc->estate))
+ return;
+
if (auto_explain_enabled())
{
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 3c72e437f7..76642b557a 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -985,6 +985,10 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
+ /* The plan may have become invalid during standard_ExecutorStart() */
+ if (!ExecPlanStillValid(queryDesc->estate))
+ return;
+
/*
* If query has queryId zero, don't track it. This prevents double
* counting of optimizable statements that are directly contained in
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 49f7370734..b7a0b8c05b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -509,7 +509,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, NULL, -1, into, es, queryString, params,
+ queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -618,6 +619,7 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
*/
void
ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int query_index,
IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
@@ -688,8 +690,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
if (into)
eflags |= GetIntoRelEFlags(into);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
+ /* Call ExecutorStartExt to prepare the plan for execution. */
+ ExecutorStartExt(queryDesc, eflags, plansource, query_index);
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 4f6acf6719..4b1503c05e 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -107,6 +107,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 311b9ebd5b..4cd79a6e3a 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -202,7 +202,8 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- cplan);
+ cplan,
+ entry->plansource);
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
@@ -583,6 +584,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ int i = 0;
if (es->memory)
{
@@ -655,8 +657,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, cplan, into, es, query_string, paramLI,
- queryEnv,
+ ExplainOnePlan(pstmt, cplan, entry->plansource, i,
+ into, es, query_string, paramLI, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
@@ -668,6 +670,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Separate plans with an appropriate separator */
if (lnext(plan_list, p) != NULL)
ExplainSeparatePlans(es);
+
+ i++;
}
if (estate)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 29d30bfb6f..e33b8f573b 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5120,6 +5120,20 @@ AfterTriggerEndQuery(EState *estate)
afterTriggers.query_depth--;
}
+/* ----------
+ * AfterTriggerAbortQuery()
+ *
+ * Called by ExecutorEnd() if the query execution was aborted due to the
+ * plan becoming invalid during initialization.
+ * ----------
+ */
+void
+AfterTriggerAbortQuery(void)
+{
+ /* Revert the actions of AfterTriggerBeginQuery(). */
+ afterTriggers.query_depth--;
+}
+
/*
* AfterTriggerFreeQuery
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 642d63be61..c76a00b394 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,28 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Typically, when the executor initializes a plan tree for execution, it doesn't
+lock non-index relations if the plan tree is freshly generated and not derived
+from a CachedPlan. This is because such locks have already been established
+during the query's parsing, rewriting, and planning phases. However, with a
+cached plan tree, some relations may remain unlocked. The function
+AcquireExecutorLocks() only locks unprunable relations in the plan, deferring
+the locking of prunable ones to executor initialization. This avoids
+unnecessary locking of relations that will be pruned during "initial" runtime
+pruning in ExecDoInitialPruning().
+
+This approach creates a window where a cached plan tree with child tables
+could become outdated if another backend modifies these tables before
+ExecDoInitialPruning() locks them. As a result, the executor has the added duty
+to verify the plan tree's validity whenever it locks a child table after
+doing initial pruning. This validation is done by checking the CachedPlan.is_valid
+attribute. If the plan tree is outdated (is_valid=false), the executor halts
+further initialization, cleans up anything in EState that would have been
+allocated up to that point, and retries execution after recreating the
+invalid plan in the CachedPlan.
Query Processing Control Flow
-----------------------------
@@ -288,11 +310,13 @@ This is a sketch of control flow for full query processing:
CreateQueryDesc
- ExecutorStart
+ ExecutorStart or ExecutorStartExt
CreateExecutorState
creates per-query context
- switch to per-query context to run ExecInitNode
+ switch to per-query context to run ExecDoInitialPruning and ExecInitNode
AfterTriggerBeginQuery
+ ExecDoInitialPruning
+ does initial pruning and locks surviving partitions if needed
ExecInitNode --- recursively scans plan tree
ExecInitNode
recurse into subsidiary nodes
@@ -316,7 +340,12 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale after locking partitions in ExecDoInitialPruning(), the control is
+immediately returned to ExecutorStartExt(), which will create a new plan tree
+and perform the steps starting from CreateExecutorState() again.
+
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index df1b5b2dc3..df117e9477 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -59,6 +59,7 @@
#include "utils/backend_status.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
+#include "utils/plancache.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
@@ -137,6 +138,60 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
standard_ExecutorStart(queryDesc, eflags);
}
+/*
+ * A variant of ExecutorStart() that handles cleanup and replanning if the
+ * input CachedPlan becomes invalid due to locks being taken during
+ * ExecutorStartInternal(). If that happens, a new CachedPlan is created
+ * only for the at the index 'query_index' in plansource->query_list, which
+ * is released separately from the original CachedPlan.
+ */
+void
+ExecutorStartExt(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index)
+{
+ if (queryDesc->cplan == NULL)
+ {
+ ExecutorStart(queryDesc, eflags);
+ return;
+ }
+
+ while (1)
+ {
+ ExecutorStart(queryDesc, eflags);
+ if (!CachedPlanValid(queryDesc->cplan))
+ {
+ CachedPlan *cplan;
+
+ /*
+ * The plan got invalidated, so try with a new updated plan.
+ *
+ * But first undo what ExecutorStart() would've done. Mark
+ * execution as aborted to ensure that AFTER trigger state is
+ * properly reset.
+ */
+ queryDesc->estate->es_aborted = true;
+ ExecutorEnd(queryDesc);
+
+ cplan = GetSingleCachedPlan(plansource, query_index,
+ queryDesc->queryEnv);
+
+ /*
+ * Install the new transient cplan into the QueryDesc replacing
+ * the old one so that executor initialization code can see it.
+ * Mark it as in use by us and ask FreeQueryDesc() to release it.
+ */
+ cplan->refcount = 1;
+ queryDesc->cplan = cplan;
+ queryDesc->cplan_release = true;
+ queryDesc->plannedstmt = linitial_node(PlannedStmt,
+ queryDesc->cplan->stmt_list);
+ }
+ else
+ break; /* ExecutorStart() succeeded! */
+ }
+}
+
void
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
@@ -320,6 +375,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
estate = queryDesc->estate;
Assert(estate != NULL);
+ Assert(!estate->es_aborted);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* caller must ensure the query's snapshot is active */
@@ -426,8 +482,11 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
Assert(estate != NULL);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
- /* This should be run once and only once per Executor instance */
- Assert(!estate->es_finished);
+ /*
+ * This should be run once and only once per Executor instance and never
+ * if the execution was aborted.
+ */
+ Assert(!estate->es_finished && !estate->es_aborted);
/* Switch into per-query memory context */
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -486,11 +545,10 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
Assert(estate != NULL);
/*
- * Check that ExecutorFinish was called, unless in EXPLAIN-only mode. This
- * Assert is needed because ExecutorFinish is new as of 9.1, and callers
- * might forget to call it.
+ * Check that ExecutorFinish was called, unless in EXPLAIN-only mode or if
+ * execution was aborted.
*/
- Assert(estate->es_finished ||
+ Assert(estate->es_finished || estate->es_aborted ||
(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -504,6 +562,14 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
UnregisterSnapshot(estate->es_snapshot);
UnregisterSnapshot(estate->es_crosscheck_snapshot);
+ /*
+ * Reset AFTER trigger module if the query execution was aborted.
+ */
+ if (estate->es_aborted &&
+ !(estate->es_top_eflags &
+ (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
+ AfterTriggerAbortQuery();
+
/*
* Must switch out of context before destroying it
*/
@@ -962,6 +1028,9 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
ExecDoInitialPruning(estate);
+ if (!ExecPlanStillValid(estate))
+ return;
+
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
*/
@@ -2948,6 +3017,9 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
* result-rel info, etc.
+ *
+ * es_cachedplan is not copied because EPQ plan execution does not acquire
+ * any new locks that could invalidate the CachedPlan.
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 67734979b0..435ae0df7a 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -147,6 +147,7 @@ CreateExecutorState(void)
estate->es_top_eflags = 0;
estate->es_instrument = 0;
estate->es_finished = false;
+ estate->es_aborted = false;
estate->es_exprcontexts = NIL;
@@ -757,7 +758,7 @@ ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
* ExecGetRangeTableRelation
* Open the Relation for a range table entry, if not already done
*
- * The Relations will be closed again in ExecEndPlan().
+ * The Relations will be closed in ExecEndPlan().
*/
Relation
ExecGetRangeTableRelation(EState *estate, Index rti)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 659bd6dcd9..f84f376c9c 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -70,7 +70,8 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index);
static void _SPI_error_callback(void *arg);
@@ -1682,7 +1683,8 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- cplan);
+ cplan,
+ plansource);
/*
* Set up options for portal. Default SCROLL type is chosen the same way
@@ -2494,6 +2496,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ int i = 0;
spicallbackarg.query = plansource->query_string;
@@ -2691,8 +2694,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ res = _SPI_pquery(qdesc, fire_triggers, canSetTag ? options->tcount : 0,
+ plansource, i);
FreeQueryDesc(qdesc);
}
else
@@ -2789,6 +2793,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
my_res = res;
goto fail;
}
+
+ i++;
}
/* Done with this plan, so release refcount */
@@ -2866,7 +2872,8 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index)
{
int operation = queryDesc->operation;
int eflags;
@@ -2922,7 +2929,7 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
else
eflags = EXEC_FLAG_SKIP_TRIGGERS;
- ExecutorStart(queryDesc, eflags);
+ ExecutorStartExt(queryDesc, eflags, plansource, query_index);
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e394f1419a..b95c859655 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1237,6 +1237,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2039,7 +2040,8 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- cplan);
+ cplan,
+ psrc);
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 6e8f6b1b8f..dbb0ffb771 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -37,6 +38,8 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -80,6 +83,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
qd->cplan = cplan; /* CachedPlan supplying the plannedstmt */
+ qd->cplan_release = false;
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -114,6 +118,13 @@ FreeQueryDesc(QueryDesc *qdesc)
UnregisterSnapshot(qdesc->snapshot);
UnregisterSnapshot(qdesc->crosscheck_snapshot);
+ /*
+ * Release CachedPlan if requested. The CachedPlan is not associated with
+ * a ResourceOwner when cplan_release is true; see ExecutorStartExt().
+ */
+ if (qdesc->cplan_release)
+ ReleaseCachedPlan(qdesc->cplan, NULL);
+
/* Only the QueryDesc itself need be freed */
pfree(qdesc);
}
@@ -126,6 +137,8 @@ FreeQueryDesc(QueryDesc *qdesc)
*
* plan: the plan tree for the query
* cplan: CachedPlan supplying the plan
+ * plansource: CachedPlanSource supplying the cplan
+ * query_index: index of the query in plansource->query_list
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -139,6 +152,8 @@ FreeQueryDesc(QueryDesc *qdesc)
static void
ProcessQuery(PlannedStmt *plan,
CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -157,7 +172,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Call ExecutorStart to prepare the plan for execution
*/
- ExecutorStart(queryDesc, 0);
+ ExecutorStartExt(queryDesc, 0, plansource, query_index);
/*
* Run the plan to completion.
@@ -518,9 +533,12 @@ PortalStart(Portal portal, ParamListInfo params,
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * ExecutorStartExt() to prepare the plan for execution. If
+ * the portal is using a cached plan, it may get invalidated
+ * during plan intialization, in which case a new one is
+ * created and saved in the QueryDesc.
*/
- ExecutorStart(queryDesc, myeflags);
+ ExecutorStartExt(queryDesc, myeflags, portal->plansource, 0);
/*
* This tells PortalCleanup to shut down the executor
@@ -1201,6 +1219,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int i = 0;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1283,6 +1302,8 @@ PortalRunMulti(Portal portal,
/* statement can set tag string */
ProcessQuery(pstmt,
portal->cplan,
+ portal->plansource,
+ i,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1293,6 +1314,8 @@ PortalRunMulti(Portal portal,
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
portal->cplan,
+ portal->plansource,
+ i,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1357,6 +1380,8 @@ PortalRunMulti(Portal portal,
*/
if (lnext(portal->stmts, stmtlist_item) != NULL)
CommandCounterIncrement();
+
+ i++;
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 5b75dadf13..d33f871ea2 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -94,6 +94,14 @@
*/
static dlist_head saved_plan_list = DLIST_STATIC_INIT(saved_plan_list);
+/*
+ * Head of the backend's list of "standalone" CachedPlans that are not
+ * associated with a CachedPlanSource, created by GetSingleCachedPlan() for
+ * transient use by the executor in certain scenarios where they're needed
+ * only for one execution of the plan.
+ */
+static dlist_head standalone_plan_list = DLIST_STATIC_INIT(standalone_plan_list);
+
/*
* This is the head of the backend's list of CachedExpressions.
*/
@@ -905,6 +913,8 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * Note: When changing this, you should also look at GetSingleCachedPlan().
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
@@ -1034,6 +1044,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->is_generic = generic;
plan->is_saved = false;
plan->is_valid = true;
+ plan->is_standalone = false;
/* assign generation number to new plan */
plan->generation = ++(plansource->generation);
@@ -1282,6 +1293,121 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
return plan;
}
+/*
+ * Create a fresh CachedPlan for the query_index'th query in the provided
+ * CachedPlanSource.
+ *
+ * The created CachedPlan is standalone, meaning it is not tracked in the
+ * CachedPlanSource. The CachedPlan and its plan trees are allocated in a
+ * child context of the caller's memory context. The caller must ensure they
+ * remain valid until execution is complete, after which the plan should be
+ * released by calling ReleaseCachedPlan().
+ *
+ * This function primarily supports ExecutorStartExt(), which handles cases
+ * where the original generic CachedPlan becomes invalid after prunable
+ * relations are locked.
+ */
+CachedPlan *
+GetSingleCachedPlan(CachedPlanSource *plansource, int query_index,
+ QueryEnvironment *queryEnv)
+{
+ List *query_list = plansource->query_list,
+ *plan_list;
+ CachedPlan *plan = plansource->gplan;
+ MemoryContext oldcxt = CurrentMemoryContext,
+ plan_context;
+ PlannedStmt *plannedstmt;
+
+ Assert(ActiveSnapshotSet());
+
+ /* Sanity checks */
+ if (plan == NULL)
+ elog(ERROR, "GetSingleCachedPlan() called in the wrong context: plansource->gplan is NULL");
+ else if (plan->is_valid)
+ elog(ERROR, "GetSingleCachedPlan() called in the wrong context: plansource->gplan->is_valid");
+
+ /*
+ * The plansource might have become invalid since GetCachedPlan(). See the
+ * comment in BuildCachedPlan() for details on why this might happen.
+ *
+ * The risk is greater here because this function is called from the
+ * executor, meaning much more processing may have occurred compared to
+ * when BuildCachedPlan() is called from GetCachedPlan().
+ */
+ if (!plansource->is_valid)
+ query_list = RevalidateCachedQuery(plansource, queryEnv);
+ Assert(query_list != NIL);
+
+ /*
+ * Build a new generic plan for the query_index'th query, but make a copy
+ * to be scribbled on by the planner
+ */
+ query_list = list_make1(copyObject(list_nth_node(Query, query_list,
+ query_index)));
+ plan_list = pg_plan_queries(query_list, plansource->query_string,
+ plansource->cursor_options, NULL);
+
+ list_free_deep(query_list);
+
+ /*
+ * Make a dedicated memory context for the CachedPlan and its subsidiary
+ * data so that we can release it in ReleaseCachedPlan() that will be
+ * called in FreeQueryDesc().
+ */
+ plan_context = AllocSetContextCreate(CurrentMemoryContext,
+ "Standalone CachedPlan",
+ ALLOCSET_START_SMALL_SIZES);
+ MemoryContextCopyAndSetIdentifier(plan_context, plansource->query_string);
+
+ /*
+ * Copy plan into the new context.
+ */
+ MemoryContextSwitchTo(plan_context);
+ plan_list = copyObject(plan_list);
+
+ /*
+ * Create and fill the CachedPlan struct within the new context.
+ */
+ plan = (CachedPlan *) palloc(sizeof(CachedPlan));
+ plan->magic = CACHEDPLAN_MAGIC;
+ plan->stmt_list = plan_list;
+
+ plan->planRoleId = GetUserId();
+ Assert(list_length(plan_list) == 1);
+ plannedstmt = linitial_node(PlannedStmt, plan_list);
+
+ /*
+ * CachedPlan is dependent on role either if RLS affected the rewrite
+ * phase or if a role dependency was injected during planning. And it's
+ * transient if any plan is marked so.
+ */
+ plan->dependsOnRole = plansource->dependsOnRLS || plannedstmt->dependsOnRole;
+ if (plannedstmt->transientPlan)
+ {
+ Assert(TransactionIdIsNormal(TransactionXmin));
+ plan->saved_xmin = TransactionXmin;
+ }
+ else
+ plan->saved_xmin = InvalidTransactionId;
+ plan->refcount = 0;
+ plan->context = plan_context;
+ plan->is_oneshot = false;
+ plan->is_generic = true;
+ plan->is_saved = false;
+ plan->is_valid = true;
+ plan->is_standalone = true;
+ plan->generation = 1;
+ MemoryContextSwitchTo(oldcxt);
+
+ /*
+ * Add the entry to the global list of "standalone" cached plans. It is
+ * removed from the list by ReleaseCachedPlan().
+ */
+ dlist_push_tail(&standalone_plan_list, &plan->node);
+
+ return plan;
+}
+
/*
* ReleaseCachedPlan: release active use of a cached plan.
*
@@ -1309,6 +1435,10 @@ ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner)
/* Mark it no longer valid */
plan->magic = 0;
+ /* Remove from the global list if we are a standalone plan. */
+ if (plan->is_standalone)
+ dlist_delete(&plan->node);
+
/* One-shot plans do not own their context, so we can't free them */
if (!plan->is_oneshot)
MemoryContextDelete(plan->context);
@@ -2066,6 +2196,33 @@ PlanCacheRelCallback(Datum arg, Oid relid)
cexpr->is_valid = false;
}
}
+
+ /* Finally, invalidate any standalone cached plans */
+ dlist_foreach(iter, &standalone_plan_list)
+ {
+ CachedPlan *cplan = dlist_container(CachedPlan,
+ node, iter.cur);
+
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+
+ if (cplan->is_valid)
+ {
+ ListCell *lc;
+
+ foreach(lc, cplan->stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc);
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ continue; /* Ignore utility statements */
+ if ((relid == InvalidOid) ? plannedstmt->relationOids != NIL :
+ list_member_oid(plannedstmt->relationOids, relid))
+ cplan->is_valid = false;
+ if (!cplan->is_valid)
+ break; /* out of stmt_list scan */
+ }
+ }
+ }
}
/*
@@ -2176,6 +2333,44 @@ PlanCacheObjectCallback(Datum arg, int cacheid, uint32 hashvalue)
}
}
}
+
+ /* Finally, invalidate any standalone cached plans */
+ dlist_foreach(iter, &standalone_plan_list)
+ {
+ CachedPlan *cplan = dlist_container(CachedPlan,
+ node, iter.cur);
+
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+
+ if (cplan->is_valid)
+ {
+ ListCell *lc;
+
+ foreach(lc, cplan->stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc);
+ ListCell *lc3;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ continue; /* Ignore utility statements */
+ foreach(lc3, plannedstmt->invalItems)
+ {
+ PlanInvalItem *item = (PlanInvalItem *) lfirst(lc3);
+
+ if (item->cacheId != cacheid)
+ continue;
+ if (hashvalue == 0 ||
+ item->hashValue == hashvalue)
+ {
+ cplan->is_valid = false;
+ break; /* out of invalItems scan */
+ }
+ }
+ if (!cplan->is_valid)
+ break; /* out of stmt_list scan */
+ }
+ }
+ }
}
/*
@@ -2235,6 +2430,17 @@ ResetPlanCache(void)
cexpr->is_valid = false;
}
+
+ /* Finally, invalidate any standalone cached plans */
+ dlist_foreach(iter, &standalone_plan_list)
+ {
+ CachedPlan *cplan = dlist_container(CachedPlan,
+ node, iter.cur);
+
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+
+ cplan->is_valid = false;
+ }
}
/*
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 4a24613537..bf70fd4ce7 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,7 +284,8 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan)
+ CachedPlan *cplan,
+ CachedPlanSource *plansource)
{
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_NEW);
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
portal->stmts = stmts;
portal->cplan = cplan;
+ portal->plansource = plansource;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 21c71e0d53..a39989a950 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -104,6 +104,7 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ParamListInfo params, QueryEnvironment *queryEnv);
extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int plan_index,
IntoClause *into, ExplainState *es,
const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 8a5a9fe642..db21561c8c 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -258,6 +258,7 @@ extern void ExecASTruncateTriggers(EState *estate,
extern void AfterTriggerBeginXact(void);
extern void AfterTriggerBeginQuery(void);
extern void AfterTriggerEndQuery(EState *estate);
+extern void AfterTriggerAbortQuery(void);
extern void AfterTriggerFireDeferred(void);
extern void AfterTriggerEndXact(bool isCommit);
extern void AfterTriggerBeginSubXact(void);
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 0e7245435d..f6cb6479c0 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -36,6 +36,7 @@ typedef struct QueryDesc
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
CachedPlan *cplan; /* CachedPlan that supplies the plannedstmt */
+ bool cplan_release; /* Should FreeQueryDesc() release cplan? */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 69c3ebff00..5bc0edb5a0 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -198,6 +199,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
* prototypes from functions in execMain.c
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern void ExecutorStartExt(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource, int query_index);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count, bool execute_once);
@@ -261,6 +264,19 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the CachedPlan in es_cachedplan still valid?
+ *
+ * Called from InitPlan() because invalidation messages that affect the plan
+ * might be received after locks have been taken on runtime-prunable relations.
+ * The caller should take appropriate action if the plan has become invalid.
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
@@ -589,6 +605,7 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
+extern Relation ExecOpenScanIndexRelation(EState *estate, Oid indexid, int lockmode);
extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos);
extern void ExecCloseRangeTableRelations(EState *estate);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 57170818c0..f50b6b50a8 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -690,6 +690,7 @@ typedef struct EState
int es_top_eflags; /* eflags passed to ExecutorStart */
int es_instrument; /* OR of InstrumentOption flags */
bool es_finished; /* true when ExecutorFinish is done */
+ bool es_aborted; /* true when execution was aborted */
List *es_exprcontexts; /* List of ExprContexts within EState */
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0b5ee007ca..154f68f671 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -18,6 +18,7 @@
#include "access/tupdesc.h"
#include "lib/ilist.h"
#include "nodes/params.h"
+#include "nodes/parsenodes.h"
#include "tcop/cmdtag.h"
#include "utils/queryenvironment.h"
#include "utils/resowner.h"
@@ -152,6 +153,8 @@ typedef struct CachedPlan
bool is_generic; /* is it a reusable generic plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
+ bool is_standalone; /* is it not associated with a
+ * CachedPlanSource? */
Oid planRoleId; /* Role ID the plan was created for */
bool dependsOnRole; /* is plan specific to that role? */
TransactionId saved_xmin; /* if valid, replan when TransactionXmin
@@ -159,6 +162,12 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+
+ /*
+ * If the plan is not associated with a CachedPlanSource, it is saved in
+ * a separate global list.
+ */
+ dlist_node node; /* list link, if is_standalone */
} CachedPlan;
/*
@@ -224,6 +233,10 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+extern CachedPlan *GetSingleCachedPlan(CachedPlanSource *plansource,
+ int query_index,
+ QueryEnvironment *queryEnv);
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
@@ -245,4 +258,17 @@ CachedPlanRequiresLocking(CachedPlan *cplan)
return cplan->is_generic;
}
+/*
+ * CachedPlanValid
+ * Returns whether a cached generic plan is still valid.
+ *
+ * Invoked by the executor to check if the plan has not been invalidated after
+ * taking locks during the initialization of the plan.
+ */
+static inline bool
+CachedPlanValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
#endif /* PLANCACHE_H */
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 29f49829f2..58c3828d2c 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ CachedPlanSource *plansource; /* CachedPlanSource, for cplan */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -241,7 +242,8 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan);
+ CachedPlan *cplan,
+ CachedPlanSource *plansource);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..3eeb097fde 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-inval
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 155c8a8d55..304ca77f7b 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2024, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,41 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static void
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ CachedPlanValid(queryDesc->cplan) ? "valid" : "not valid");
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +123,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-inval.out b/src/test/modules/delay_execution/expected/cached-plan-inval.out
new file mode 100644
index 0000000000..e002cfbc9c
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-inval.out
@@ -0,0 +1,230 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+------------------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = $1)
+(7 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on foo12_1 foo_1
+ Filter: (a = $1)
+(5 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(11 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(6 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(26 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(16 rows)
+
+
+starting permutation: s1prep4 s2lock s1exec4 s2dropi s2unlock
+step s1prep4: SET plan_cache_mode = force_generic_plan;
+ SET enable_seqscan TO off;
+ PREPARE q4 AS SELECT * FROM generate_series(1, 1) WHERE EXISTS (SELECT * FROM foov WHERE a = $1 FOR UPDATE);
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1);
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------------------------
+Result
+ One-Time Filter: (InitPlan 1).col1
+ InitPlan 1
+ -> LockRows
+ Disabled Nodes: 2
+ -> Append
+ Disabled Nodes: 2
+ Subplans Removed: 2
+ -> Index Scan using foo12_1_a on foo12_1 foo_1
+ Index Cond: (a = $1)
+ -> Function Scan on generate_series
+(11 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec4: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec4: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Result
+ One-Time Filter: (InitPlan 1).col1
+ InitPlan 1
+ -> LockRows
+ Disabled Nodes: 3
+ -> Append
+ Disabled Nodes: 3
+ Subplans Removed: 2
+ -> Seq Scan on foo12_1 foo_1
+ Disabled Nodes: 1
+ Filter: (a = $1)
+ -> Function Scan on generate_series
+(12 rows)
+
diff --git a/src/test/modules/delay_execution/meson.build b/src/test/modules/delay_execution/meson.build
index 41f3ac0b89..5a70b183d0 100644
--- a/src/test/modules/delay_execution/meson.build
+++ b/src/test/modules/delay_execution/meson.build
@@ -24,6 +24,7 @@ tests += {
'specs': [
'partition-addition',
'partition-removal-1',
+ 'cached-plan-inval',
],
},
}
diff --git a/src/test/modules/delay_execution/specs/cached-plan-inval.spec b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
new file mode 100644
index 0000000000..820a843051
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
@@ -0,0 +1,75 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo12 PARTITION OF foo FOR VALUES IN (1, 2) PARTITION BY LIST (a);
+ CREATE TABLE foo12_1 PARTITION OF foo12 FOR VALUES IN (1);
+ CREATE TABLE foo12_2 PARTITION OF foo12 FOR VALUES IN (2);
+ CREATE INDEX foo12_1_a ON foo12_1 (a);
+ CREATE TABLE foo3 PARTITION OF foo FOR VALUES IN (3);
+ CREATE VIEW foov AS SELECT * FROM foo;
+ CREATE FUNCTION one () RETURNS int AS $$ BEGIN RETURN 1; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE FUNCTION two () RETURNS int AS $$ BEGIN RETURN 2; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE RULE update_foo AS ON UPDATE TO foo DO ALSO SELECT 1;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP RULE update_foo ON foo;
+ DROP TABLE foo;
+ DROP FUNCTION one(), two();
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# Another case with Append with run-time pruning
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Case with a rule adding another query
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Another case with Append with run-time pruning in a subquery
+step "s1prep4" { SET plan_cache_mode = force_generic_plan;
+ SET enable_seqscan TO off;
+ PREPARE q4 AS SELECT * FROM generate_series(1, 1) WHERE EXISTS (SELECT * FROM foov WHERE a = $1 FOR UPDATE);
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+step "s1exec4" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo12_1_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
+permutation "s1prep4" "s2lock" "s1exec4" "s2dropi" "s2unlock"
--
2.43.0
On Thu, Sep 19, 2024 at 9:10 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Sep 19, 2024 at 5:39 PM Amit Langote <amitlangote09@gmail.com> wrote:
For
ResultRelInfos, I took the approach of memsetting them to 0 for pruned
result relations and adding checks at multiple sites to ensure the
ResultRelInfo being handled is valid.After some reflection,
Not enough reflection, evidently...
I realized that nobody would think that that
approach is very robust. In the attached, I’ve modified
ExecInitModifyTable() to allocate ResultRelInfos only for unpruned
relations, instead of allocating for all in
ModifyTable.resultRelations and setting pruned ones to 0. This
approach feels more robust.
Except, I forgot that ModifyTable has other lists that parallel
resultRelations (of the same length) viz. withCheckOptionLists,
returningLists, and updateColnosLists, which need to be similarly
truncated to only consider unpruned relations. I've updated 0004 to
do so. This was broken even in the other design where locking is
delayed all the way until ExecInitAppend does initial pruning(),
because ResultRelInfos are created before initializing the plan
subtree containing the Append node, which would try to lock and open
*all* partitions.
Also, I've switched the order of 0002 and 0003 to avoid a situation
where I add a function in 0002 only to remove it in 0003. By doing
the refactoring to initialize PartitionPruneContexts lazily first, the
patch to move the initial pruning to occur before ExecInitNode()
became much simpler as it doesn't need to touch the code related to
exec pruning.
--
Thanks, Amit Langote
Attachments:
v56-0005-Handle-CachedPlan-invalidation-in-the-executor.patchapplication/octet-stream; name=v56-0005-Handle-CachedPlan-invalidation-in-the-executor.patchDownload
From 74830439945fb9d7b593bbea8b19a213aa4eb47c Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 22 Aug 2024 19:38:13 +0900
Subject: [PATCH v56 5/5] Handle CachedPlan invalidation in the executor
This commit makes changes to handle cases where a cached plan
becomes invalid before deferred locks on prunable relations are taken.
* Add checks at various points in ExecutorStart() and its called
functions to determine if the plan becomes invalid. If detected,
the function and its callers return immediately. A previous commit
ensures any partially initialized PlanState tree objects are cleaned
up appropriately.
* Introduce ExecutorStartExt(), a wrapper over ExecutorStart(), to
handle cases where plan initialization is aborted due to invalidation.
ExecutorStartExt() creates a new transient CachedPlan if needed and
retries execution. This new entry point is only required for sites
using plancache.c. It requires passing the QueryDesc, eflags,
CachedPlanSource, and query_index (index in CachedPlanSource.query_list).
* Add GetSingleCachedPlan() in plancache.c to create a transient
CachedPlan for a specified query in the given CachedPlanSource.
Such CachedPlans are tracked in a separate global list for the
plancache invalidation callbacks to check.
This also adds isolation tests using the delay_execution test module
to verify scenarios where a CachedPlan becomes invalid before the
deferred locks are taken.
All ExecutorStart_hook implementations now must add the following
block after the ExecutorStart() call to ensure it doesn't work with an
invalid plan:
/* The plan may have become invalid during ExecutorStart() */
if (!ExecPlanStillValid(queryDesc->estate))
return;
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.comk
---
contrib/auto_explain/auto_explain.c | 4 +
.../pg_stat_statements/pg_stat_statements.c | 4 +
src/backend/commands/explain.c | 8 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 10 +-
src/backend/commands/trigger.c | 14 ++
src/backend/executor/README | 35 ++-
src/backend/executor/execMain.c | 84 ++++++-
src/backend/executor/execUtils.c | 3 +-
src/backend/executor/spi.c | 19 +-
src/backend/tcop/postgres.c | 4 +-
src/backend/tcop/pquery.c | 31 ++-
src/backend/utils/cache/plancache.c | 206 ++++++++++++++++
src/backend/utils/mmgr/portalmem.c | 4 +-
src/include/commands/explain.h | 1 +
src/include/commands/trigger.h | 1 +
src/include/executor/execdesc.h | 1 +
src/include/executor/executor.h | 17 ++
src/include/nodes/execnodes.h | 1 +
src/include/utils/plancache.h | 26 ++
src/include/utils/portal.h | 4 +-
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 63 ++++-
.../expected/cached-plan-inval.out | 230 ++++++++++++++++++
src/test/modules/delay_execution/meson.build | 1 +
.../specs/cached-plan-inval.spec | 75 ++++++
26 files changed, 814 insertions(+), 36 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-inval.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-inval.spec
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 677c135f59..9eb5e9a619 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -300,6 +300,10 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
+ /* The plan may have become invalid during standard_ExecutorStart() */
+ if (!ExecPlanStillValid(queryDesc->estate))
+ return;
+
if (auto_explain_enabled())
{
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 3c72e437f7..76642b557a 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -985,6 +985,10 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
+ /* The plan may have become invalid during standard_ExecutorStart() */
+ if (!ExecPlanStillValid(queryDesc->estate))
+ return;
+
/*
* If query has queryId zero, don't track it. This prevents double
* counting of optimizable statements that are directly contained in
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 49f7370734..b7a0b8c05b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -509,7 +509,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, NULL, -1, into, es, queryString, params,
+ queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -618,6 +619,7 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
*/
void
ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int query_index,
IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
@@ -688,8 +690,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
if (into)
eflags |= GetIntoRelEFlags(into);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
+ /* Call ExecutorStartExt to prepare the plan for execution. */
+ ExecutorStartExt(queryDesc, eflags, plansource, query_index);
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 4f6acf6719..4b1503c05e 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -107,6 +107,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 311b9ebd5b..4cd79a6e3a 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -202,7 +202,8 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- cplan);
+ cplan,
+ entry->plansource);
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
@@ -583,6 +584,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ int i = 0;
if (es->memory)
{
@@ -655,8 +657,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, cplan, into, es, query_string, paramLI,
- queryEnv,
+ ExplainOnePlan(pstmt, cplan, entry->plansource, i,
+ into, es, query_string, paramLI, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
@@ -668,6 +670,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Separate plans with an appropriate separator */
if (lnext(plan_list, p) != NULL)
ExplainSeparatePlans(es);
+
+ i++;
}
if (estate)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 29d30bfb6f..e33b8f573b 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5120,6 +5120,20 @@ AfterTriggerEndQuery(EState *estate)
afterTriggers.query_depth--;
}
+/* ----------
+ * AfterTriggerAbortQuery()
+ *
+ * Called by ExecutorEnd() if the query execution was aborted due to the
+ * plan becoming invalid during initialization.
+ * ----------
+ */
+void
+AfterTriggerAbortQuery(void)
+{
+ /* Revert the actions of AfterTriggerBeginQuery(). */
+ afterTriggers.query_depth--;
+}
+
/*
* AfterTriggerFreeQuery
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 642d63be61..c76a00b394 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,28 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Typically, when the executor initializes a plan tree for execution, it doesn't
+lock non-index relations if the plan tree is freshly generated and not derived
+from a CachedPlan. This is because such locks have already been established
+during the query's parsing, rewriting, and planning phases. However, with a
+cached plan tree, some relations may remain unlocked. The function
+AcquireExecutorLocks() only locks unprunable relations in the plan, deferring
+the locking of prunable ones to executor initialization. This avoids
+unnecessary locking of relations that will be pruned during "initial" runtime
+pruning in ExecDoInitialPruning().
+
+This approach creates a window where a cached plan tree with child tables
+could become outdated if another backend modifies these tables before
+ExecDoInitialPruning() locks them. As a result, the executor has the added duty
+to verify the plan tree's validity whenever it locks a child table after
+doing initial pruning. This validation is done by checking the CachedPlan.is_valid
+attribute. If the plan tree is outdated (is_valid=false), the executor halts
+further initialization, cleans up anything in EState that would have been
+allocated up to that point, and retries execution after recreating the
+invalid plan in the CachedPlan.
Query Processing Control Flow
-----------------------------
@@ -288,11 +310,13 @@ This is a sketch of control flow for full query processing:
CreateQueryDesc
- ExecutorStart
+ ExecutorStart or ExecutorStartExt
CreateExecutorState
creates per-query context
- switch to per-query context to run ExecInitNode
+ switch to per-query context to run ExecDoInitialPruning and ExecInitNode
AfterTriggerBeginQuery
+ ExecDoInitialPruning
+ does initial pruning and locks surviving partitions if needed
ExecInitNode --- recursively scans plan tree
ExecInitNode
recurse into subsidiary nodes
@@ -316,7 +340,12 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale after locking partitions in ExecDoInitialPruning(), the control is
+immediately returned to ExecutorStartExt(), which will create a new plan tree
+and perform the steps starting from CreateExecutorState() again.
+
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2c14ee2b6b..7a6954204e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -60,6 +60,7 @@
#include "utils/backend_status.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
+#include "utils/plancache.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
@@ -138,6 +139,60 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
standard_ExecutorStart(queryDesc, eflags);
}
+/*
+ * A variant of ExecutorStart() that handles cleanup and replanning if the
+ * input CachedPlan becomes invalid due to locks being taken during
+ * ExecutorStartInternal(). If that happens, a new CachedPlan is created
+ * only for the at the index 'query_index' in plansource->query_list, which
+ * is released separately from the original CachedPlan.
+ */
+void
+ExecutorStartExt(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index)
+{
+ if (queryDesc->cplan == NULL)
+ {
+ ExecutorStart(queryDesc, eflags);
+ return;
+ }
+
+ while (1)
+ {
+ ExecutorStart(queryDesc, eflags);
+ if (!CachedPlanValid(queryDesc->cplan))
+ {
+ CachedPlan *cplan;
+
+ /*
+ * The plan got invalidated, so try with a new updated plan.
+ *
+ * But first undo what ExecutorStart() would've done. Mark
+ * execution as aborted to ensure that AFTER trigger state is
+ * properly reset.
+ */
+ queryDesc->estate->es_aborted = true;
+ ExecutorEnd(queryDesc);
+
+ cplan = GetSingleCachedPlan(plansource, query_index,
+ queryDesc->queryEnv);
+
+ /*
+ * Install the new transient cplan into the QueryDesc replacing
+ * the old one so that executor initialization code can see it.
+ * Mark it as in use by us and ask FreeQueryDesc() to release it.
+ */
+ cplan->refcount = 1;
+ queryDesc->cplan = cplan;
+ queryDesc->cplan_release = true;
+ queryDesc->plannedstmt = linitial_node(PlannedStmt,
+ queryDesc->cplan->stmt_list);
+ }
+ else
+ break; /* ExecutorStart() succeeded! */
+ }
+}
+
void
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
@@ -324,6 +379,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
estate = queryDesc->estate;
Assert(estate != NULL);
+ Assert(!estate->es_aborted);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* caller must ensure the query's snapshot is active */
@@ -433,8 +489,11 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
Assert(estate != NULL);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
- /* This should be run once and only once per Executor instance */
- Assert(!estate->es_finished);
+ /*
+ * This should be run once and only once per Executor instance and never
+ * if the execution was aborted.
+ */
+ Assert(!estate->es_finished && !estate->es_aborted);
/* Switch into per-query memory context */
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -496,11 +555,10 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
Assert(estate != NULL);
/*
- * Check that ExecutorFinish was called, unless in EXPLAIN-only mode. This
- * Assert is needed because ExecutorFinish is new as of 9.1, and callers
- * might forget to call it.
+ * Check that ExecutorFinish was called, unless in EXPLAIN-only mode or if
+ * execution was aborted.
*/
- Assert(estate->es_finished ||
+ Assert(estate->es_finished || estate->es_aborted ||
(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -514,6 +572,14 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
UnregisterSnapshot(estate->es_snapshot);
UnregisterSnapshot(estate->es_crosscheck_snapshot);
+ /*
+ * Reset AFTER trigger module if the query execution was aborted.
+ */
+ if (estate->es_aborted &&
+ !(estate->es_top_eflags &
+ (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
+ AfterTriggerAbortQuery();
+
/*
* Must switch out of context before destroying it
*/
@@ -972,6 +1038,9 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
ExecDoInitialPruning(estate);
+ if (!ExecPlanStillValid(estate))
+ return;
+
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
*/
@@ -2958,6 +3027,9 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
* result-rel info, etc.
+ *
+ * es_cachedplan is not copied because EPQ plan execution does not acquire
+ * any new locks that could invalidate the CachedPlan.
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 67734979b0..435ae0df7a 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -147,6 +147,7 @@ CreateExecutorState(void)
estate->es_top_eflags = 0;
estate->es_instrument = 0;
estate->es_finished = false;
+ estate->es_aborted = false;
estate->es_exprcontexts = NIL;
@@ -757,7 +758,7 @@ ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
* ExecGetRangeTableRelation
* Open the Relation for a range table entry, if not already done
*
- * The Relations will be closed again in ExecEndPlan().
+ * The Relations will be closed in ExecEndPlan().
*/
Relation
ExecGetRangeTableRelation(EState *estate, Index rti)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 659bd6dcd9..f84f376c9c 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -70,7 +70,8 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index);
static void _SPI_error_callback(void *arg);
@@ -1682,7 +1683,8 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- cplan);
+ cplan,
+ plansource);
/*
* Set up options for portal. Default SCROLL type is chosen the same way
@@ -2494,6 +2496,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ int i = 0;
spicallbackarg.query = plansource->query_string;
@@ -2691,8 +2694,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ res = _SPI_pquery(qdesc, fire_triggers, canSetTag ? options->tcount : 0,
+ plansource, i);
FreeQueryDesc(qdesc);
}
else
@@ -2789,6 +2793,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
my_res = res;
goto fail;
}
+
+ i++;
}
/* Done with this plan, so release refcount */
@@ -2866,7 +2872,8 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index)
{
int operation = queryDesc->operation;
int eflags;
@@ -2922,7 +2929,7 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
else
eflags = EXEC_FLAG_SKIP_TRIGGERS;
- ExecutorStart(queryDesc, eflags);
+ ExecutorStartExt(queryDesc, eflags, plansource, query_index);
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e394f1419a..b95c859655 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1237,6 +1237,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2039,7 +2040,8 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- cplan);
+ cplan,
+ psrc);
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 6e8f6b1b8f..dbb0ffb771 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -37,6 +38,8 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -80,6 +83,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
qd->cplan = cplan; /* CachedPlan supplying the plannedstmt */
+ qd->cplan_release = false;
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -114,6 +118,13 @@ FreeQueryDesc(QueryDesc *qdesc)
UnregisterSnapshot(qdesc->snapshot);
UnregisterSnapshot(qdesc->crosscheck_snapshot);
+ /*
+ * Release CachedPlan if requested. The CachedPlan is not associated with
+ * a ResourceOwner when cplan_release is true; see ExecutorStartExt().
+ */
+ if (qdesc->cplan_release)
+ ReleaseCachedPlan(qdesc->cplan, NULL);
+
/* Only the QueryDesc itself need be freed */
pfree(qdesc);
}
@@ -126,6 +137,8 @@ FreeQueryDesc(QueryDesc *qdesc)
*
* plan: the plan tree for the query
* cplan: CachedPlan supplying the plan
+ * plansource: CachedPlanSource supplying the cplan
+ * query_index: index of the query in plansource->query_list
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -139,6 +152,8 @@ FreeQueryDesc(QueryDesc *qdesc)
static void
ProcessQuery(PlannedStmt *plan,
CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -157,7 +172,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Call ExecutorStart to prepare the plan for execution
*/
- ExecutorStart(queryDesc, 0);
+ ExecutorStartExt(queryDesc, 0, plansource, query_index);
/*
* Run the plan to completion.
@@ -518,9 +533,12 @@ PortalStart(Portal portal, ParamListInfo params,
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * ExecutorStartExt() to prepare the plan for execution. If
+ * the portal is using a cached plan, it may get invalidated
+ * during plan intialization, in which case a new one is
+ * created and saved in the QueryDesc.
*/
- ExecutorStart(queryDesc, myeflags);
+ ExecutorStartExt(queryDesc, myeflags, portal->plansource, 0);
/*
* This tells PortalCleanup to shut down the executor
@@ -1201,6 +1219,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int i = 0;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1283,6 +1302,8 @@ PortalRunMulti(Portal portal,
/* statement can set tag string */
ProcessQuery(pstmt,
portal->cplan,
+ portal->plansource,
+ i,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1293,6 +1314,8 @@ PortalRunMulti(Portal portal,
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
portal->cplan,
+ portal->plansource,
+ i,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1357,6 +1380,8 @@ PortalRunMulti(Portal portal,
*/
if (lnext(portal->stmts, stmtlist_item) != NULL)
CommandCounterIncrement();
+
+ i++;
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 5b75dadf13..d33f871ea2 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -94,6 +94,14 @@
*/
static dlist_head saved_plan_list = DLIST_STATIC_INIT(saved_plan_list);
+/*
+ * Head of the backend's list of "standalone" CachedPlans that are not
+ * associated with a CachedPlanSource, created by GetSingleCachedPlan() for
+ * transient use by the executor in certain scenarios where they're needed
+ * only for one execution of the plan.
+ */
+static dlist_head standalone_plan_list = DLIST_STATIC_INIT(standalone_plan_list);
+
/*
* This is the head of the backend's list of CachedExpressions.
*/
@@ -905,6 +913,8 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * Note: When changing this, you should also look at GetSingleCachedPlan().
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
@@ -1034,6 +1044,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->is_generic = generic;
plan->is_saved = false;
plan->is_valid = true;
+ plan->is_standalone = false;
/* assign generation number to new plan */
plan->generation = ++(plansource->generation);
@@ -1282,6 +1293,121 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
return plan;
}
+/*
+ * Create a fresh CachedPlan for the query_index'th query in the provided
+ * CachedPlanSource.
+ *
+ * The created CachedPlan is standalone, meaning it is not tracked in the
+ * CachedPlanSource. The CachedPlan and its plan trees are allocated in a
+ * child context of the caller's memory context. The caller must ensure they
+ * remain valid until execution is complete, after which the plan should be
+ * released by calling ReleaseCachedPlan().
+ *
+ * This function primarily supports ExecutorStartExt(), which handles cases
+ * where the original generic CachedPlan becomes invalid after prunable
+ * relations are locked.
+ */
+CachedPlan *
+GetSingleCachedPlan(CachedPlanSource *plansource, int query_index,
+ QueryEnvironment *queryEnv)
+{
+ List *query_list = plansource->query_list,
+ *plan_list;
+ CachedPlan *plan = plansource->gplan;
+ MemoryContext oldcxt = CurrentMemoryContext,
+ plan_context;
+ PlannedStmt *plannedstmt;
+
+ Assert(ActiveSnapshotSet());
+
+ /* Sanity checks */
+ if (plan == NULL)
+ elog(ERROR, "GetSingleCachedPlan() called in the wrong context: plansource->gplan is NULL");
+ else if (plan->is_valid)
+ elog(ERROR, "GetSingleCachedPlan() called in the wrong context: plansource->gplan->is_valid");
+
+ /*
+ * The plansource might have become invalid since GetCachedPlan(). See the
+ * comment in BuildCachedPlan() for details on why this might happen.
+ *
+ * The risk is greater here because this function is called from the
+ * executor, meaning much more processing may have occurred compared to
+ * when BuildCachedPlan() is called from GetCachedPlan().
+ */
+ if (!plansource->is_valid)
+ query_list = RevalidateCachedQuery(plansource, queryEnv);
+ Assert(query_list != NIL);
+
+ /*
+ * Build a new generic plan for the query_index'th query, but make a copy
+ * to be scribbled on by the planner
+ */
+ query_list = list_make1(copyObject(list_nth_node(Query, query_list,
+ query_index)));
+ plan_list = pg_plan_queries(query_list, plansource->query_string,
+ plansource->cursor_options, NULL);
+
+ list_free_deep(query_list);
+
+ /*
+ * Make a dedicated memory context for the CachedPlan and its subsidiary
+ * data so that we can release it in ReleaseCachedPlan() that will be
+ * called in FreeQueryDesc().
+ */
+ plan_context = AllocSetContextCreate(CurrentMemoryContext,
+ "Standalone CachedPlan",
+ ALLOCSET_START_SMALL_SIZES);
+ MemoryContextCopyAndSetIdentifier(plan_context, plansource->query_string);
+
+ /*
+ * Copy plan into the new context.
+ */
+ MemoryContextSwitchTo(plan_context);
+ plan_list = copyObject(plan_list);
+
+ /*
+ * Create and fill the CachedPlan struct within the new context.
+ */
+ plan = (CachedPlan *) palloc(sizeof(CachedPlan));
+ plan->magic = CACHEDPLAN_MAGIC;
+ plan->stmt_list = plan_list;
+
+ plan->planRoleId = GetUserId();
+ Assert(list_length(plan_list) == 1);
+ plannedstmt = linitial_node(PlannedStmt, plan_list);
+
+ /*
+ * CachedPlan is dependent on role either if RLS affected the rewrite
+ * phase or if a role dependency was injected during planning. And it's
+ * transient if any plan is marked so.
+ */
+ plan->dependsOnRole = plansource->dependsOnRLS || plannedstmt->dependsOnRole;
+ if (plannedstmt->transientPlan)
+ {
+ Assert(TransactionIdIsNormal(TransactionXmin));
+ plan->saved_xmin = TransactionXmin;
+ }
+ else
+ plan->saved_xmin = InvalidTransactionId;
+ plan->refcount = 0;
+ plan->context = plan_context;
+ plan->is_oneshot = false;
+ plan->is_generic = true;
+ plan->is_saved = false;
+ plan->is_valid = true;
+ plan->is_standalone = true;
+ plan->generation = 1;
+ MemoryContextSwitchTo(oldcxt);
+
+ /*
+ * Add the entry to the global list of "standalone" cached plans. It is
+ * removed from the list by ReleaseCachedPlan().
+ */
+ dlist_push_tail(&standalone_plan_list, &plan->node);
+
+ return plan;
+}
+
/*
* ReleaseCachedPlan: release active use of a cached plan.
*
@@ -1309,6 +1435,10 @@ ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner)
/* Mark it no longer valid */
plan->magic = 0;
+ /* Remove from the global list if we are a standalone plan. */
+ if (plan->is_standalone)
+ dlist_delete(&plan->node);
+
/* One-shot plans do not own their context, so we can't free them */
if (!plan->is_oneshot)
MemoryContextDelete(plan->context);
@@ -2066,6 +2196,33 @@ PlanCacheRelCallback(Datum arg, Oid relid)
cexpr->is_valid = false;
}
}
+
+ /* Finally, invalidate any standalone cached plans */
+ dlist_foreach(iter, &standalone_plan_list)
+ {
+ CachedPlan *cplan = dlist_container(CachedPlan,
+ node, iter.cur);
+
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+
+ if (cplan->is_valid)
+ {
+ ListCell *lc;
+
+ foreach(lc, cplan->stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc);
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ continue; /* Ignore utility statements */
+ if ((relid == InvalidOid) ? plannedstmt->relationOids != NIL :
+ list_member_oid(plannedstmt->relationOids, relid))
+ cplan->is_valid = false;
+ if (!cplan->is_valid)
+ break; /* out of stmt_list scan */
+ }
+ }
+ }
}
/*
@@ -2176,6 +2333,44 @@ PlanCacheObjectCallback(Datum arg, int cacheid, uint32 hashvalue)
}
}
}
+
+ /* Finally, invalidate any standalone cached plans */
+ dlist_foreach(iter, &standalone_plan_list)
+ {
+ CachedPlan *cplan = dlist_container(CachedPlan,
+ node, iter.cur);
+
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+
+ if (cplan->is_valid)
+ {
+ ListCell *lc;
+
+ foreach(lc, cplan->stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc);
+ ListCell *lc3;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ continue; /* Ignore utility statements */
+ foreach(lc3, plannedstmt->invalItems)
+ {
+ PlanInvalItem *item = (PlanInvalItem *) lfirst(lc3);
+
+ if (item->cacheId != cacheid)
+ continue;
+ if (hashvalue == 0 ||
+ item->hashValue == hashvalue)
+ {
+ cplan->is_valid = false;
+ break; /* out of invalItems scan */
+ }
+ }
+ if (!cplan->is_valid)
+ break; /* out of stmt_list scan */
+ }
+ }
+ }
}
/*
@@ -2235,6 +2430,17 @@ ResetPlanCache(void)
cexpr->is_valid = false;
}
+
+ /* Finally, invalidate any standalone cached plans */
+ dlist_foreach(iter, &standalone_plan_list)
+ {
+ CachedPlan *cplan = dlist_container(CachedPlan,
+ node, iter.cur);
+
+ Assert(cplan->magic == CACHEDPLAN_MAGIC);
+
+ cplan->is_valid = false;
+ }
}
/*
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 4a24613537..bf70fd4ce7 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,7 +284,8 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan)
+ CachedPlan *cplan,
+ CachedPlanSource *plansource)
{
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_NEW);
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
portal->stmts = stmts;
portal->cplan = cplan;
+ portal->plansource = plansource;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 21c71e0d53..a39989a950 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -104,6 +104,7 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ParamListInfo params, QueryEnvironment *queryEnv);
extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int plan_index,
IntoClause *into, ExplainState *es,
const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 8a5a9fe642..db21561c8c 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -258,6 +258,7 @@ extern void ExecASTruncateTriggers(EState *estate,
extern void AfterTriggerBeginXact(void);
extern void AfterTriggerBeginQuery(void);
extern void AfterTriggerEndQuery(EState *estate);
+extern void AfterTriggerAbortQuery(void);
extern void AfterTriggerFireDeferred(void);
extern void AfterTriggerEndXact(bool isCommit);
extern void AfterTriggerBeginSubXact(void);
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 0e7245435d..f6cb6479c0 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -36,6 +36,7 @@ typedef struct QueryDesc
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
CachedPlan *cplan; /* CachedPlan that supplies the plannedstmt */
+ bool cplan_release; /* Should FreeQueryDesc() release cplan? */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 69c3ebff00..5bc0edb5a0 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -198,6 +199,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
* prototypes from functions in execMain.c
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern void ExecutorStartExt(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource, int query_index);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count, bool execute_once);
@@ -261,6 +264,19 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the CachedPlan in es_cachedplan still valid?
+ *
+ * Called from InitPlan() because invalidation messages that affect the plan
+ * might be received after locks have been taken on runtime-prunable relations.
+ * The caller should take appropriate action if the plan has become invalid.
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
@@ -589,6 +605,7 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
+extern Relation ExecOpenScanIndexRelation(EState *estate, Oid indexid, int lockmode);
extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos);
extern void ExecCloseRangeTableRelations(EState *estate);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index bd68c60a0b..c80ccf0349 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -690,6 +690,7 @@ typedef struct EState
int es_top_eflags; /* eflags passed to ExecutorStart */
int es_instrument; /* OR of InstrumentOption flags */
bool es_finished; /* true when ExecutorFinish is done */
+ bool es_aborted; /* true when execution was aborted */
List *es_exprcontexts; /* List of ExprContexts within EState */
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0b5ee007ca..154f68f671 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -18,6 +18,7 @@
#include "access/tupdesc.h"
#include "lib/ilist.h"
#include "nodes/params.h"
+#include "nodes/parsenodes.h"
#include "tcop/cmdtag.h"
#include "utils/queryenvironment.h"
#include "utils/resowner.h"
@@ -152,6 +153,8 @@ typedef struct CachedPlan
bool is_generic; /* is it a reusable generic plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
+ bool is_standalone; /* is it not associated with a
+ * CachedPlanSource? */
Oid planRoleId; /* Role ID the plan was created for */
bool dependsOnRole; /* is plan specific to that role? */
TransactionId saved_xmin; /* if valid, replan when TransactionXmin
@@ -159,6 +162,12 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+
+ /*
+ * If the plan is not associated with a CachedPlanSource, it is saved in
+ * a separate global list.
+ */
+ dlist_node node; /* list link, if is_standalone */
} CachedPlan;
/*
@@ -224,6 +233,10 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+extern CachedPlan *GetSingleCachedPlan(CachedPlanSource *plansource,
+ int query_index,
+ QueryEnvironment *queryEnv);
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
@@ -245,4 +258,17 @@ CachedPlanRequiresLocking(CachedPlan *cplan)
return cplan->is_generic;
}
+/*
+ * CachedPlanValid
+ * Returns whether a cached generic plan is still valid.
+ *
+ * Invoked by the executor to check if the plan has not been invalidated after
+ * taking locks during the initialization of the plan.
+ */
+static inline bool
+CachedPlanValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
#endif /* PLANCACHE_H */
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 29f49829f2..58c3828d2c 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ CachedPlanSource *plansource; /* CachedPlanSource, for cplan */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -241,7 +242,8 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan);
+ CachedPlan *cplan,
+ CachedPlanSource *plansource);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..3eeb097fde 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-inval
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 155c8a8d55..304ca77f7b 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2024, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,41 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static void
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ CachedPlanValid(queryDesc->cplan) ? "valid" : "not valid");
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +123,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-inval.out b/src/test/modules/delay_execution/expected/cached-plan-inval.out
new file mode 100644
index 0000000000..e002cfbc9c
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-inval.out
@@ -0,0 +1,230 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+------------------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = $1)
+(7 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on foo12_1 foo_1
+ Filter: (a = $1)
+(5 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(11 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(6 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(26 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(16 rows)
+
+
+starting permutation: s1prep4 s2lock s1exec4 s2dropi s2unlock
+step s1prep4: SET plan_cache_mode = force_generic_plan;
+ SET enable_seqscan TO off;
+ PREPARE q4 AS SELECT * FROM generate_series(1, 1) WHERE EXISTS (SELECT * FROM foov WHERE a = $1 FOR UPDATE);
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1);
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------------------------
+Result
+ One-Time Filter: (InitPlan 1).col1
+ InitPlan 1
+ -> LockRows
+ Disabled Nodes: 2
+ -> Append
+ Disabled Nodes: 2
+ Subplans Removed: 2
+ -> Index Scan using foo12_1_a on foo12_1 foo_1
+ Index Cond: (a = $1)
+ -> Function Scan on generate_series
+(11 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec4: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec4: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Result
+ One-Time Filter: (InitPlan 1).col1
+ InitPlan 1
+ -> LockRows
+ Disabled Nodes: 3
+ -> Append
+ Disabled Nodes: 3
+ Subplans Removed: 2
+ -> Seq Scan on foo12_1 foo_1
+ Disabled Nodes: 1
+ Filter: (a = $1)
+ -> Function Scan on generate_series
+(12 rows)
+
diff --git a/src/test/modules/delay_execution/meson.build b/src/test/modules/delay_execution/meson.build
index 41f3ac0b89..5a70b183d0 100644
--- a/src/test/modules/delay_execution/meson.build
+++ b/src/test/modules/delay_execution/meson.build
@@ -24,6 +24,7 @@ tests += {
'specs': [
'partition-addition',
'partition-removal-1',
+ 'cached-plan-inval',
],
},
}
diff --git a/src/test/modules/delay_execution/specs/cached-plan-inval.spec b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
new file mode 100644
index 0000000000..820a843051
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
@@ -0,0 +1,75 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo12 PARTITION OF foo FOR VALUES IN (1, 2) PARTITION BY LIST (a);
+ CREATE TABLE foo12_1 PARTITION OF foo12 FOR VALUES IN (1);
+ CREATE TABLE foo12_2 PARTITION OF foo12 FOR VALUES IN (2);
+ CREATE INDEX foo12_1_a ON foo12_1 (a);
+ CREATE TABLE foo3 PARTITION OF foo FOR VALUES IN (3);
+ CREATE VIEW foov AS SELECT * FROM foo;
+ CREATE FUNCTION one () RETURNS int AS $$ BEGIN RETURN 1; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE FUNCTION two () RETURNS int AS $$ BEGIN RETURN 2; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE RULE update_foo AS ON UPDATE TO foo DO ALSO SELECT 1;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP RULE update_foo ON foo;
+ DROP TABLE foo;
+ DROP FUNCTION one(), two();
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# Another case with Append with run-time pruning
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Case with a rule adding another query
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Another case with Append with run-time pruning in a subquery
+step "s1prep4" { SET plan_cache_mode = force_generic_plan;
+ SET enable_seqscan TO off;
+ PREPARE q4 AS SELECT * FROM generate_series(1, 1) WHERE EXISTS (SELECT * FROM foov WHERE a = $1 FOR UPDATE);
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+step "s1exec4" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo12_1_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
+permutation "s1prep4" "s2lock" "s1exec4" "s2dropi" "s2unlock"
--
2.43.0
v56-0002-Initialize-PartitionPruneContexts-lazily.patchapplication/octet-stream; name=v56-0002-Initialize-PartitionPruneContexts-lazily.patchDownload
From 7ba748a1055880ee20f908a2cf2757f2ad82e9ef Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 18 Sep 2024 11:16:48 +0900
Subject: [PATCH v56 2/5] Initialize PartitionPruneContexts lazily
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This commit moves the initialization of PartitionPruneContexts for
both initial and exec pruning steps from CreatePartitionPruneState()
to find_matching_subplans_recurse(), where they are actually needed.
To track whether the context has been initialized and is ready for
use, a boolean field is_valid has been added to PartitionPruneContext.
The primary motivation is to eliminate the need to perform
CreatePartitionPruneState() during ExecInitNode(), as creating the
exec pruning context requires access to the parent plan node’s
PlanState. By deferring context creation to where it’s needed, this
change enables calling CreatePartitionPruneState() before ExecInitNode().
This will be useful in a future commit, which will move initial
pruning to occur before ExecInitNode().
---
src/backend/executor/execPartition.c | 150 +++++++++++++++++++--------
src/include/executor/execPartition.h | 11 ++
src/include/partitioning/partprune.h | 2 +
3 files changed, 120 insertions(+), 43 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index ec730674f2..63c3429fe7 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -181,18 +181,17 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
-static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
+static PartitionPruneState *CreatePartitionPruneState(EState *estate,
PartitionPruneInfo *pruneinfo);
-static void InitPartitionPruneContext(PartitionPruneContext *context,
+static void InitPartitionPruneContext(PartitionedRelPruningData *pprune,
+ PartitionPruneContext *context,
List *pruning_steps,
- PartitionDesc partdesc,
- PartitionKey partkey,
- PlanState *planstate,
- ExprContext *econtext);
+ PlanState *planstate);
static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
Bitmapset *initially_valid_subplans,
int n_total_subplans);
-static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
+static void find_matching_subplans_recurse(PlanState *parent_plan,
+ PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
Bitmapset **validsubplans);
@@ -1825,7 +1824,14 @@ ExecInitPartitionPruning(PlanState *planstate,
ExecAssignExprContext(estate, planstate);
/* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+
+ /*
+ * Store PlanState for using it to initialize exec pruning contexts later
+ * in find_matching_subplans_recurse() where they are needed.
+ */
+ if (prunestate->do_exec_prune)
+ prunestate->parent_plan = planstate;
/*
* Perform an initial partition prune pass, if required.
@@ -1865,8 +1871,6 @@ ExecInitPartitionPruning(PlanState *planstate,
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
- *
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
* PartitionPruningData for each partitioning hierarchy (i.e., each sublist of
@@ -1877,16 +1881,20 @@ ExecInitPartitionPruning(PlanState *planstate,
* stored in each PartitionedRelPruningData can be re-used each time we
* re-evaluate which partitions match the pruning steps provided in each
* PartitionedRelPruneInfo.
+ *
+ * Note that the PartitionPruneContexts for both initial and exec pruning
+ * (which are stored in each PartitionedRelPruningData) are initialized lazily
+ * in find_matching_subplans_recurse() when used for the first time.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
- EState *estate = planstate->state;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
+ /* We may need an expression context to evaluate partition exprs */
+ ExprContext *econtext = CreateExprContext(estate);
/* For data reading, executor always includes detached partitions */
if (estate->es_partition_directory == NULL)
@@ -1908,6 +1916,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->other_subplans = bms_copy(pruneinfo->other_subplans);
prunestate->do_initial_prune = false; /* may be set below */
prunestate->do_exec_prune = false; /* may be set below */
+ prunestate->parent_plan = NULL;
prunestate->num_partprunedata = n_part_hierarchies;
/*
@@ -1943,16 +1952,25 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
Relation partrel;
PartitionDesc partdesc;
- PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Used for initializing the expressions in initial pruning steps.
+ * For exec pruning steps, the parent plan node's PlanState's
+ * ps_ExprContext will be used.
*/
+ pprune->estate = estate;
+ pprune->econtext = econtext;
+
+ /* Remember Relation for use in InitPartitionPruneContext. */
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
- partkey = RelationGetPartitionKey(partrel);
+ pprune->partrel = partrel;
+
+ /*
+ * We can rely on the copy partrtitioned table's partition
+ * descriptor appearing in its relcache entry, because that entry
+ * will be held open and locked for the duration of this executor
+ * run.
+ */
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
@@ -2063,32 +2081,26 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->present_parts = bms_copy(pinfo->present_parts);
/*
- * Initialize pruning contexts as needed. Note that we must skip
- * execution-time partition pruning in EXPLAIN (GENERIC_PLAN),
- * since parameter values may be missing.
+ * Pruning contexts (initial_context and exec_context) are
+ * initialized lazily in find_matching_subplans_recurse() when used
+ * for the first time.
+ *
+ * Note that we must skip execution-time partition pruning in
+ * EXPLAIN (GENERIC_PLAN), since parameter values may be missing.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
+ pprune->initial_context.is_valid = false;
if (pinfo->initial_pruning_steps &&
!(econtext->ecxt_estate->es_top_eflags & EXEC_FLAG_EXPLAIN_GENERIC))
- {
- InitPartitionPruneContext(&pprune->initial_context,
- pinfo->initial_pruning_steps,
- partdesc, partkey, planstate,
- econtext);
/* Record whether initial pruning is needed at any level */
prunestate->do_initial_prune = true;
- }
+
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
+ pprune->exec_context.is_valid = false;
if (pinfo->exec_pruning_steps &&
!(econtext->ecxt_estate->es_top_eflags & EXEC_FLAG_EXPLAIN_GENERIC))
- {
- InitPartitionPruneContext(&pprune->exec_context,
- pinfo->exec_pruning_steps,
- partdesc, partkey, planstate,
- econtext);
/* Record whether exec pruning is needed at any level */
prunestate->do_exec_prune = true;
- }
/*
* Accumulate the IDs of all PARAM_EXEC Params affecting the
@@ -2109,16 +2121,43 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize a PartitionPruneContext for the given list of pruning steps.
*/
static void
-InitPartitionPruneContext(PartitionPruneContext *context,
+InitPartitionPruneContext(PartitionedRelPruningData *pprune,
+ PartitionPruneContext *context,
List *pruning_steps,
- PartitionDesc partdesc,
- PartitionKey partkey,
- PlanState *planstate,
- ExprContext *econtext)
+ PlanState *planstate)
{
int n_steps;
int partnatts;
ListCell *lc;
+ ExprContext *econtext;
+ EState *estate = pprune->estate;
+ MemoryContext oldcxt;
+ Relation partrel = pprune->partrel;
+ PartitionKey partkey;
+ PartitionDesc partdesc;
+
+ /* Must allocate the needed stuff in the query lifetime context. */
+ oldcxt = MemoryContextSwitchTo(estate->es_query_cxt);
+
+ /* Use parent_plan's ExprContext when available. */
+ if (planstate)
+ {
+ if (planstate->ps_ExprContext == NULL)
+ ExecAssignExprContext(estate, planstate);
+ econtext = planstate->ps_ExprContext;
+ }
+ else
+ econtext = pprune->econtext;
+
+ /*
+ * We can rely on the copies of the partitioned table's partition
+ * key and partition descriptor appearing in its relcache entry,
+ * because that entry will be held open and locked for the
+ * duration of this executor run.
+ */
+ partkey = RelationGetPartitionKey(partrel);
+ partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
+ partrel);
n_steps = list_length(pruning_steps);
@@ -2187,6 +2226,9 @@ InitPartitionPruneContext(PartitionPruneContext *context,
}
}
}
+
+ MemoryContextSwitchTo(oldcxt);
+ context->is_valid = true;
}
/*
@@ -2350,12 +2392,16 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* recursing to other (lower-level) parents as needed.
*/
pprune = &prunedata->partrelprunedata[0];
- find_matching_subplans_recurse(prunedata, pprune, initial_prune,
+ find_matching_subplans_recurse(prunestate->parent_plan,
+ prunedata, pprune, initial_prune,
&result);
/* Expression eval may have used space in ExprContext too */
- if (pprune->exec_pruning_steps)
+ if (pprune->exec_context.is_valid)
+ {
+ Assert(pprune->exec_pruning_steps != NIL);
ResetExprContext(pprune->exec_context.exprcontext);
+ }
}
/* Add in any subplans that partition pruning didn't account for */
@@ -2378,7 +2424,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* Adds valid (non-prunable) subplan IDs to *validsubplans
*/
static void
-find_matching_subplans_recurse(PartitionPruningData *prunedata,
+find_matching_subplans_recurse(PlanState *parent_plan,
+ PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
Bitmapset **validsubplans)
@@ -2395,11 +2442,27 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
* level.
*/
if (initial_prune && pprune->initial_pruning_steps)
+ {
+ /* Initialize initial_context if not already done. */
+ if (unlikely(!pprune->initial_context.is_valid))
+ InitPartitionPruneContext(pprune,
+ &pprune->initial_context,
+ pprune->initial_pruning_steps,
+ parent_plan);
partset = get_matching_partitions(&pprune->initial_context,
pprune->initial_pruning_steps);
+ }
else if (!initial_prune && pprune->exec_pruning_steps)
+ {
+ /* Initialize exec_context if not already done. */
+ if (unlikely(!pprune->exec_context.is_valid))
+ InitPartitionPruneContext(pprune,
+ &pprune->exec_context,
+ pprune->exec_pruning_steps,
+ parent_plan);
partset = get_matching_partitions(&pprune->exec_context,
pprune->exec_pruning_steps);
+ }
else
partset = pprune->present_parts;
@@ -2415,7 +2478,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
int partidx = pprune->subpart_map[i];
if (partidx >= 0)
- find_matching_subplans_recurse(prunedata,
+ find_matching_subplans_recurse(parent_plan,
+ prunedata,
&prunedata->partrelprunedata[partidx],
initial_prune, validsubplans);
else
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 12aacc84ff..41afb522f3 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -42,6 +42,10 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* PartitionedRelPruneInfo (see plannodes.h); though note that here,
* subpart_map contains indexes into PartitionPruningData.partrelprunedata[].
*
+ * estate The EState for the query doing runtime pruning
+ * partrel Partitioned table Relation; points to
+ * estate->es_relations[rti-1] where rti is
+ * the table's RT index.
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
@@ -51,6 +55,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* perform executor startup pruning.
* exec_pruning_steps List of PartitionPruneSteps used to
* perform per-scan pruning.
+ * econtext ExprContext to use for initial pruning steps
* initial_context If initial_pruning_steps isn't NIL, contains
* the details needed to execute those steps.
* exec_context If exec_pruning_steps isn't NIL, contains
@@ -58,12 +63,15 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
*/
typedef struct PartitionedRelPruningData
{
+ EState *estate;
+ Relation partrel;
int nparts;
int *subplan_map;
int *subpart_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
+ ExprContext *econtext;
PartitionPruneContext initial_context;
PartitionPruneContext exec_context;
} PartitionedRelPruningData;
@@ -105,6 +113,8 @@ typedef struct PartitionPruningData
* startup (at any hierarchy level).
* do_exec_prune true if pruning should be performed during
* executor run (at any hierarchy level).
+ * parent_plan Parent plan node's PlanState used to initialize exec
+ * pruning contexts
* num_partprunedata Number of items in "partprunedata" array.
* partprunedata Array of PartitionPruningData pointers for the plan's
* partitioned relation(s), one for each partitioning
@@ -117,6 +127,7 @@ typedef struct PartitionPruneState
MemoryContext prune_context;
bool do_initial_prune;
bool do_exec_prune;
+ PlanState *parent_plan;
int num_partprunedata;
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index c536a1fe19..b7f48eefcc 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -26,6 +26,7 @@ struct RelOptInfo;
* Stores information needed at runtime for pruning computations
* related to a single partitioned table.
*
+ * is_valid Has the information in this struct been initialized?
* strategy Partition strategy, e.g. LIST, RANGE, HASH.
* partnatts Number of columns in the partition key.
* nparts Number of partitions in this partitioned table.
@@ -48,6 +49,7 @@ struct RelOptInfo;
*/
typedef struct PartitionPruneContext
{
+ bool is_valid;
char strategy;
int partnatts;
int nparts;
--
2.43.0
v56-0001-Move-PartitionPruneInfo-out-of-plan-nodes-into-P.patchapplication/octet-stream; name=v56-0001-Move-PartitionPruneInfo-out-of-plan-nodes-into-P.patchDownload
From bfc250e76a13546e71e0ea5d95675065075aee42 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 6 Sep 2024 13:11:05 +0900
Subject: [PATCH v56 1/5] Move PartitionPruneInfo out of plan nodes into
PlannedStmt
This change moves PartitionPruneInfo from individual plan nodes to
PlannedStmt, allowing runtime initial pruning to be performed across
the entire plan tree without traversing the tree to find nodes
containing PartitionPruneInfos.
The PartitionPruneInfo pointer fields in Append and MergeAppend nodes
have been replaced with an integer index that points to
PartitionPruneInfos in a list within PlannedStmt, which holds the
PartitionPruneInfos for all subqueries.
Reviewed-by: Alvaro Herrera
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 19 +++++-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/nodeAppend.c | 5 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/optimizer/plan/createplan.c | 24 +++----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 86 ++++++++++++++++---------
src/backend/partitioning/partprune.c | 19 ++++--
src/include/executor/execPartition.h | 4 +-
src/include/nodes/execnodes.h | 1 +
src/include/nodes/pathnodes.h | 6 ++
src/include/nodes/plannodes.h | 14 ++--
src/include/partitioning/partprune.h | 8 +--
15 files changed, 133 insertions(+), 62 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 713cf3e802..f263232c67 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -860,6 +860,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index bfb3419efb..b01a2fdfdd 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -181,6 +181,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->permInfos = estate->es_rteperminfos;
pstmt->resultRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7651886229..ec730674f2 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1786,6 +1786,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* Initialize data structure needed for run-time partition pruning and
* do initial pruning if needed
*
+ * 'root_parent_relids' identifies the relation to which both the parent plan
+ * and the PartitionPruneInfo given by 'part_prune_index' belong.
+ *
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
* Initial pruning is performed here if needed and in that case only the
@@ -1798,11 +1801,25 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
+ Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo;
+
+ /* Obtain the pruneinfo we need, and make sure it's the right one */
+ pruneinfo = list_nth_node(PartitionPruneInfo, estate->es_part_prune_infos,
+ part_prune_index);
+ if (!bms_equal(root_parent_relids, pruneinfo->root_parent_relids))
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("mismatching PartitionPruneInfo found at part_prune_index %d",
+ part_prune_index),
+ errdetail_internal("plan node relids %s, pruneinfo relids %s",
+ bmsToString(root_parent_relids),
+ bmsToString(pruneinfo->root_parent_relids)));
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 5737f9f4eb..67734979b0 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -118,6 +118,7 @@ CreateExecutorState(void)
estate->es_rowmarks = NULL;
estate->es_rteperminfos = NIL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index ca0f54d676..de7ebab5c2 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
+ node->apprelids,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index e1b9b984a7..3ed91808dd 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
+ node->apprelids,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index bb45ef318f..6642d09a39 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1225,7 +1225,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1376,6 +1375,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1399,16 +1401,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1447,7 +1447,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1540,6 +1539,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1555,13 +1557,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
Assert(best_path->path.param_info == NULL);
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d92d43a17e..8cffa447fd 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -553,6 +553,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 91c7c4fe2f..e2ea406c4e 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1732,6 +1732,48 @@ set_customscan_references(PlannerInfo *root,
cscan->custom_relids = offset_relid_set(cscan->custom_relids, rtoffset);
}
+/*
+ * register_partpruneinfo
+ * Subroutine for set_append_references and set_mergeappend_references
+ *
+ * Add the PartitionPruneInfo from root->partPruneInfos at the given index
+ * into PlannerGlobal->partPruneInfos and return its index there.
+ *
+ * Also update the RT indexes present in PartitionedRelPruneInfos to add the
+ * offset.
+ */
+static int
+register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
+{
+ PlannerGlobal *glob = root->glob;
+ PartitionPruneInfo *pinfo;
+ ListCell *l;
+
+ Assert(part_prune_index >= 0 &&
+ part_prune_index < list_length(root->partPruneInfos));
+ pinfo = list_nth_node(PartitionPruneInfo, root->partPruneInfos,
+ part_prune_index);
+
+ pinfo->root_parent_relids = offset_relid_set(pinfo->root_parent_relids,
+ rtoffset);
+ foreach(l, pinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *prelinfo = lfirst(l2);
+
+ prelinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pinfo);
+
+ return list_length(glob->partPruneInfos) - 1;
+}
+
/*
* set_append_references
* Do set_plan_references processing on an Append
@@ -1784,21 +1826,13 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * Add PartitionPruneInfo, if any, to PlannerGlobal and update the index.
+ * Also update the RT indexes present in it to add the offset.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index =
+ register_partpruneinfo(root, aplan->part_prune_index, rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1860,21 +1894,13 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * Add PartitionPruneInfo, if any, to PlannerGlobal and update the index.
+ * Also update the RT indexes present in it to add the offset.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index =
+ register_partpruneinfo(root, mplan->part_prune_index, rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9a1a7faac7..60fabb1734 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -207,16 +207,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -330,10 +334,11 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
+ pruneinfo->root_parent_relids = parentrel->relids;
pruneinfo->prune_infos = prunerelinfos;
/*
@@ -356,7 +361,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index c09bc83b2a..12aacc84ff 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,9 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
+ Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 88467977f8..22b928e085 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -636,6 +636,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 07e2415398..8d30b6e896 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -128,6 +128,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* List of PartitionPruneInfo contained in the plan */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -559,6 +562,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 62cd6a6666..39d0281c23 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -69,6 +69,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in the
+ * plan */
+
List *rtable; /* list of RangeTblEntry nodes */
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
@@ -276,8 +279,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -311,8 +314,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
@@ -1414,6 +1417,8 @@ typedef struct PlanRowMark
* Then, since an Append-type node could have multiple partitioning
* hierarchies among its children, we have an unordered List of those Lists.
*
+ * root_parent_relids RelOptInfo.relids of the relation to which the parent
+ * plan node and this PartitionPruneInfo node belong
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
@@ -1426,6 +1431,7 @@ typedef struct PartitionPruneInfo
pg_node_attr(no_equal, no_query_jumble)
NodeTag type;
+ Bitmapset *root_parent_relids;
List *prune_infos;
Bitmapset *other_subplans;
} PartitionPruneInfo;
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index bd490d154f..c536a1fe19 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.43.0
v56-0004-Defer-locking-of-runtime-prunable-relations-to-e.patchapplication/octet-stream; name=v56-0004-Defer-locking-of-runtime-prunable-relations-to-e.patchDownload
From c6bc55ad693ea5a52936e4a0e1e105036efc8d02 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 18 Sep 2024 12:00:41 +0900
Subject: [PATCH v56 4/5] Defer locking of runtime-prunable relations to
executor
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When preparing a cached plan for execution, plancache.c locks the
relations in the plan's range table to ensure they are safe for
execution. However, this approach, implemented in
AcquireExecutorLocks(), results in unnecessarily locking relations
that might be pruned during "initial" runtime pruning.
To optimize this, locking is now deferred for relations subject to
"initial" runtime pruning. The planner now provides a set of
"unprunable" relations through the new PlannedStmt.unprunableRelids
field. AcquireExecutorLocks() will only lock these unprunable
relations. PlannedStmt.unprunableRelids is populated by subtracting
the set of initially prunable relids from all RT indexes. The prunable
relids are identified by examining all PartitionPruneInfos during
set_plan_refs() and storing the RT indexes of partitions subject to
"initial" pruning steps. While at it, some duplicated code in
set_append_references() and set_mergeappend_references() that
constructs the prunable relids set has been refactored into a common
function.
Deferred locks are taken, if necessary, after ExecDoInitialPruning()
determines the set of unpruned partitions. To allow the executor to
determine whether the plan tree it’s executing is cached and may
contain unlocked relations, the CachedPlan is now made available via
the QueryDesc. The executor can call CachedPlanRequiresLocking(),
which returns true if the CachedPlan is a reusable generic plan that
might contain unlocked relations.
Plan nodes like Append have already been updated to consider only the
set of unpruned relations. However, there are cases, such as child
RowMarks and child result relations, where the code manipulating those
do not directly receive information about unpruned partitions.
Therefore, code handling child RowMarks and result relations has been
modified to ensure they don’t belong to pruned partitions. For this,
the RT indexes of unpruned partitions are added in
ExecDoInitialPruning() to es_unprunable_relids, which initially
contains PlannedStmt.unprunableRelids. The corresponding code now
processes only those child RowMarks and result relations whose owning
relations are in this set. For result relations managed by a
ModifyTable node, its resultRelations list and other lists that
parallel it (withCheckOptionLists, returningLists, and
updateColnosLists) are truncated in ExecInitModifyTable to only
consider unpruned relations and the ResultRelInfo structs are created
only for those.
Finally, an Assert has also been added in ExecCheckPermissions() to
ensure that all relations whose permissions are checked have been
properly locked, helping to catch any accidental omission of relations
from the unprunableRelids set that should have their permissions
checked.
This deferment introduces a window where prunable relations may be
altered by concurrent DDL, potentially causing the plan to become
invalid. Consequently, the executor might attempt to execute an
invalid plan, leading to errors such as failing to locate the index
of an unpruned partition that may have been dropped concurrently
during ExecInitIndexScan() (if it's partition-local, not inherited,
for example). Future commits will introduce changes to enable the
executor to check plan validity during ExecutorStart() and retry with
a newly created plan if the original becomes invalid after taking
deferred locks.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 3 +-
src/backend/executor/execMain.c | 75 +++++++++++++++++-
src/backend/executor/execParallel.c | 9 ++-
src/backend/executor/execPartition.c | 36 +++++++--
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeAppend.c | 8 +-
src/backend/executor/nodeLockRows.c | 10 ++-
src/backend/executor/nodeMergeAppend.c | 2 +-
src/backend/executor/nodeModifyTable.c | 78 ++++++++++++++++---
src/backend/executor/spi.c | 1 +
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 7 ++
src/backend/partitioning/partprune.c | 18 +++++
src/backend/tcop/pquery.c | 10 ++-
src/backend/utils/cache/plancache.c | 40 ++++++----
src/include/commands/explain.h | 5 +-
src/include/executor/execPartition.h | 5 +-
src/include/executor/execdesc.h | 2 +
src/include/nodes/execnodes.h | 12 +++
src/include/nodes/pathnodes.h | 6 ++
src/include/nodes/plannodes.h | 7 ++
src/include/utils/plancache.h | 10 +++
src/test/regress/expected/partition_prune.out | 44 +++++++++++
src/test/regress/sql/partition_prune.sql | 18 +++++
29 files changed, 366 insertions(+), 57 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 91de442f43..db976f928a 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -552,7 +552,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 0b629b1f79..57a3375cad 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -324,7 +324,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index aaec439892..49f7370734 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -509,7 +509,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -617,7 +617,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -673,7 +674,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, cplan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index fab59ad5f6..bd169edeff 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -742,6 +742,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 010097873d..69be74b4bd 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -438,7 +438,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 07257d4db9..311b9ebd5b 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -655,7 +655,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
+ ExplainOnePlan(pstmt, cplan, into, es, query_string, paramLI,
+ queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 5222aa9ab3..2c14ee2b6b 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
#include "nodes/queryjumble.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -91,6 +92,7 @@ static bool ExecCheckPermissionsModified(Oid relOid, Oid userid,
AclMode requiredPerms);
static void ExecCheckXactReadOnly(PlannedStmt *plannedstmt);
static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
+static inline bool ExecShouldLockRelations(EState *estate);
/* end of local decls */
@@ -610,6 +612,21 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
(rte->rtekind == RTE_SUBQUERY &&
rte->relkind == RELKIND_VIEW));
+ /*
+ * Ensure that we have at least an AccessShareLock on relations
+ * whose permissions need to be checked.
+ *
+ * Skip this check in a parallel worker because locks won't be
+ * taken until ExecInitNode() performs plan initialization.
+ *
+ * XXX: ExecCheckPermissions() in a parallel worker may be
+ * redundant with the checks done in the leader process, so this
+ * should be reviewed to ensure it’s necessary.
+ */
+ Assert(IsParallelWorker() ||
+ CheckRelationOidLockedByMe(rte->relid, AccessShareLock,
+ true));
+
(void) getRTEPermissionInfo(rteperminfos, rte);
/* Many-to-one mapping not allowed */
Assert(!bms_is_member(rte->perminfoindex, indexset));
@@ -872,12 +889,46 @@ ExecDoInitialPruning(EState *estate)
* result.
*/
if (prunestate->do_initial_prune)
- validsubplans = ExecFindMatchingSubPlans(prunestate, true);
+ {
+ Bitmapset *validsubplan_rtis = NULL;
+
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (ExecShouldLockRelations(estate))
+ {
+ int rtindex = -1;
+
+ rtindex = -1;
+ while ((rtindex = bms_next_member(validsubplan_rtis,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, estate);
+
+ Assert(rte->rtekind == RTE_RELATION &&
+ rte->rellockmode != NoLock);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+ estate->es_unprunable_relids = bms_add_members(estate->es_unprunable_relids,
+ validsubplan_rtis);
+ }
+
estate->es_part_prune_results = lappend(estate->es_part_prune_results,
validsubplans);
}
}
+/*
+ * Locks might be needed only if running a cached plan that might contain
+ * unlocked relations, such as reused generic plans.
+ */
+static inline bool
+ExecShouldLockRelations(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? false :
+ CachedPlanRequiresLocking(estate->es_cachedplan);
+}
+
/* ----------------------------------------------------------------
* InitPlan
*
@@ -890,6 +941,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ CachedPlan *cachedplan = queryDesc->cplan;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -909,10 +961,13 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = cachedplan;
+ estate->es_unprunable_relids = bms_copy(plannedstmt->unprunableRelids);
/*
* Perform runtime "initial" pruning to determine the plan nodes that will
- * not be executed.
+ * not be executed. This will also add the RT indexes of surviving leaf
+ * partitions to es_unprunable_relids.
*/
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
ExecDoInitialPruning(estate);
@@ -931,8 +986,13 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Relation relation;
ExecRowMark *erm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unprunable_relids))
continue;
/* get relation's OID (will produce InvalidOid if subquery) */
@@ -2969,6 +3029,13 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
}
}
+ /*
+ * Copy es_unprunable_relids so that RowMarks of pruned relations are
+ * ignored in ExecInitLockRows() and ExecInitModifyTable() when
+ * initializing the plan trees below.
+ */
+ rcestate->es_unprunable_relids = parentestate->es_unprunable_relids;
+
/*
* Initialize private state information for each SubPlan. We must do this
* before running ExecInitNode on the main query tree, since
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index b01a2fdfdd..7519c9a860 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1257,8 +1257,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. We pass NULL for cachedplan, because
+ * we don't have a pointer to the CachedPlan in the leader's process. It's
+ * fine because the only reason the executor needs to see it is to decide
+ * if it should take locks on certain relations, but paraller workers
+ * always take locks anyway.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 40eb74d187..13d2542c48 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -26,6 +26,7 @@
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
#include "rewrite/rewriteManip.h"
+#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
@@ -192,7 +193,8 @@ static void find_matching_subplans_recurse(PlanState *parent_plan,
PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **validsubplan_rtis);
/*
@@ -1985,8 +1987,8 @@ ExecCreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
* The set of partitions that exist now might not be the same that
* existed when the plan was made. The normal case is that it is;
* optimize for that case with a quick comparison, and just copy
- * the subplan_map and make subpart_map point to the one in
- * PruneInfo.
+ * the subplan_map and make subpart_map, rti_map point to the
+ * ones in PruneInfo.
*
* For the case where they aren't identical, we could have more
* partitions on either side; or even exactly the same number of
@@ -2005,6 +2007,7 @@ ExecCreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
sizeof(int) * partdesc->nparts) == 0)
{
pprune->subpart_map = pinfo->subpart_map;
+ pprune->rti_map = pinfo->rti_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
}
@@ -2025,6 +2028,7 @@ ExecCreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
* mismatches.
*/
pprune->subpart_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(int) * partdesc->nparts);
for (pp_idx = 0; pp_idx < partdesc->nparts; pp_idx++)
{
@@ -2042,6 +2046,8 @@ ExecCreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
continue;
}
@@ -2079,6 +2085,7 @@ ExecCreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map[pp_idx] = -1;
pprune->subplan_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -2360,10 +2367,13 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * valisubplan_rtis must be non-NULL if initial_pruning is true.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **validsubplan_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2399,7 +2409,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunestate->parent_plan,
prunedata, pprune, initial_prune,
- &result);
+ &result, validsubplan_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_context.is_valid)
@@ -2416,6 +2426,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (validsubplan_rtis)
+ *validsubplan_rtis = bms_copy(*validsubplan_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2426,14 +2438,16 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and the RT indexes
+ * of their owning leaf partitions to *validsubplan_rtis if it's non-NULL.
*/
static void
find_matching_subplans_recurse(PlanState *parent_plan,
PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **validsubplan_rtis)
{
Bitmapset *partset;
int i;
@@ -2476,8 +2490,13 @@ find_matching_subplans_recurse(PlanState *parent_plan,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ if (validsubplan_rtis)
+ *validsubplan_rtis = bms_add_member(*validsubplan_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2486,7 +2505,8 @@ find_matching_subplans_recurse(PlanState *parent_plan,
find_matching_subplans_recurse(parent_plan,
prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ validsubplan_rtis);
else
{
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 692854e2b3..6f6f45e0ad 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -840,6 +840,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index de7ebab5c2..006bdafaea 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -581,7 +581,7 @@ choose_next_subplan_locally(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
}
@@ -648,7 +648,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
/*
@@ -724,7 +724,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
mark_invalid_subplans_as_finished(node);
@@ -877,7 +877,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
classify_matching_subplans(node);
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 41754ddfea..b5b2cd53c5 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -28,6 +28,7 @@
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "utils/rel.h"
+#include "utils/lsyscache.h"
/* ----------------------------------------------------------------
@@ -347,8 +348,13 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
ExecRowMark *erm;
ExecAuxRowMark *aerm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unprunable_relids))
continue;
/* find ExecRowMark and build ExecAuxRowMark */
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 3ed91808dd..f7821aa178 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -219,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 8bf4c80d4a..652d70223c 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -636,7 +636,7 @@ ExecInitUpdateProjection(ModifyTableState *mtstate,
Assert(whichrel >= 0 && whichrel < mtstate->mt_nrels);
}
- updateColnos = (List *) list_nth(node->updateColnosLists, whichrel);
+ updateColnos = (List *) list_nth(mtstate->mt_updateColnosLists, whichrel);
/*
* For UPDATE, we use the old tuple to fill up missing values in the tuple
@@ -4176,12 +4176,17 @@ ExecLookupResultRelByOid(ModifyTableState *node, Oid resultoid,
hash_search(node->mt_resultOidHash, &resultoid, HASH_FIND, NULL);
if (mtlookup)
{
+ ResultRelInfo *resultRelInfo;
+
if (update_cache)
{
node->mt_lastResultOid = resultoid;
node->mt_lastResultIndex = mtlookup->relationIndex;
}
- return node->resultRelInfo + mtlookup->relationIndex;
+
+ resultRelInfo = node->resultRelInfo + mtlookup->relationIndex;
+
+ return resultRelInfo;
}
}
else
@@ -4218,7 +4223,11 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
ModifyTableState *mtstate;
Plan *subplan = outerPlan(node);
CmdType operation = node->operation;
- int nrels = list_length(node->resultRelations);
+ int nrels;
+ List *resultRelations = NIL;
+ List *withCheckOptionLists = NIL;
+ List *returningLists = NIL;
+ List *updateColnosLists = NIL;
ResultRelInfo *resultRelInfo;
List *arowmarks;
ListCell *l;
@@ -4228,6 +4237,46 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* check for unsupported flags */
Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+ /*
+ * Only consider unpruned relations. In the future, it might be more
+ * efficient to store resultRelations as a bitmapset, which would make
+ * this operation cheaper.
+ */
+ i = 0;
+ foreach(l, node->resultRelations)
+ {
+ Index rti = lfirst_int(l);
+
+ if (bms_is_member(rti, estate->es_unprunable_relids))
+ {
+ resultRelations = lappend_int(resultRelations, rti);
+ if (node->withCheckOptionLists)
+ {
+ List *withCheckOptions = list_nth_node(List,
+ node->withCheckOptionLists,
+ i);
+
+ withCheckOptionLists = lappend(withCheckOptionLists, withCheckOptions);
+ }
+ if (node->returningLists)
+ {
+ List *returningList = list_nth_node(List,
+ node->returningLists,
+ i);
+
+ returningLists = lappend(returningLists, returningList);
+ }
+ if (node->updateColnosLists)
+ {
+ List *updateColnosList = list_nth(node->updateColnosLists, i);
+
+ updateColnosLists = lappend(updateColnosLists, updateColnosList);
+ }
+ }
+ i++;
+ }
+ nrels = list_length(resultRelations);
+
/*
* create state structure
*/
@@ -4248,6 +4297,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
mtstate->mt_merge_inserted = 0;
mtstate->mt_merge_updated = 0;
mtstate->mt_merge_deleted = 0;
+ mtstate->mt_updateColnosLists = updateColnosLists;
/*----------
* Resolve the target relation. This is the same as:
@@ -4265,6 +4315,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
*/
if (node->rootRelation > 0)
{
+ Assert(bms_is_member(node->rootRelation, estate->es_unprunable_relids));
mtstate->rootResultRelInfo = makeNode(ResultRelInfo);
ExecInitResultRelation(estate, mtstate->rootResultRelInfo,
node->rootRelation);
@@ -4279,7 +4330,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
- node->epqParam, node->resultRelations);
+ node->epqParam, resultRelations);
mtstate->fireBSTriggers = true;
/*
@@ -4297,7 +4348,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
*/
resultRelInfo = mtstate->resultRelInfo;
i = 0;
- foreach(l, node->resultRelations)
+ foreach(l, resultRelations)
{
Index resultRelation = lfirst_int(l);
List *mergeActions = NIL;
@@ -4441,7 +4492,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Initialize any WITH CHECK OPTION constraints if needed.
*/
resultRelInfo = mtstate->resultRelInfo;
- foreach(l, node->withCheckOptionLists)
+ foreach(l, withCheckOptionLists)
{
List *wcoList = (List *) lfirst(l);
List *wcoExprs = NIL;
@@ -4464,7 +4515,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/*
* Initialize RETURNING projections if needed.
*/
- if (node->returningLists)
+ if (returningLists)
{
TupleTableSlot *slot;
ExprContext *econtext;
@@ -4473,7 +4524,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Initialize result tuple slot and assign its rowtype using the first
* RETURNING list. We assume the rest will look the same.
*/
- mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists);
+ mtstate->ps.plan->targetlist = (List *) linitial(returningLists);
/* Set up a slot for the output of the RETURNING projection(s) */
ExecInitResultTupleSlotTL(&mtstate->ps, &TTSOpsVirtual);
@@ -4488,7 +4539,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Build a projection for each result rel.
*/
resultRelInfo = mtstate->resultRelInfo;
- foreach(l, node->returningLists)
+ foreach(l, returningLists)
{
List *rlist = (List *) lfirst(l);
@@ -4589,8 +4640,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
ExecRowMark *erm;
ExecAuxRowMark *aerm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unprunable_relids))
continue;
/* Find ExecRowMark and build ExecAuxRowMark */
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 90d9834576..659bd6dcd9 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2684,6 +2684,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 8cffa447fd..3b50b767df 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -555,6 +555,8 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
+ result->unprunableRelids = bms_difference(bms_add_range(NULL, 1, list_length(result->rtable)),
+ glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index e2ea406c4e..283a61a972 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1764,8 +1764,15 @@ register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *prelinfo = lfirst(l2);
+ int i;
prelinfo->rtindex += rtoffset;
+ for (i = 0; i < prelinfo->nparts; i++)
+ {
+ prelinfo->rti_map[i] += rtoffset;
+ glob->prunableRelids = bms_add_member(glob->prunableRelids,
+ prelinfo->rti_map[i]);
+ }
}
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 60fabb1734..85894c87af 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -645,6 +645,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ int *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -657,6 +658,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (int *) palloc0(nparts * sizeof(int));
present_parts = NULL;
i = -1;
@@ -671,9 +673,24 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+
+ /*
+ * Track the RT indexes of partitions to ensure they are included
+ * in the prunableRelids set of relations that are locked during
+ * execution. This ensures that if the plan is cached, these
+ * partitions are locked when the plan is reused.
+ *
+ * Partitions without a subplan and sub-partitioned partitions
+ * where none of the sub-partitions have a subplan due to
+ * constraint exclusion are not included in this set. Instead,
+ * they are added to the unprunableRelids set, and the relations
+ * in this set are locked by AcquireExecutorLocks() before
+ * executing a cached plan.
+ */
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
+ rti_map[i] = (int) partrel->relid;
/* Record finding this subplan */
subplansfound = bms_add_member(subplansfound, subplanidx);
@@ -695,6 +712,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index a1f8d03db1..6e8f6b1b8f 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -36,6 +36,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +66,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +79,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan supplying the plannedstmt */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * cplan: CachedPlan supplying the plan
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, cplan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -493,6 +498,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1276,6 +1282,7 @@ PortalRunMulti(Portal portal,
{
/* statement can set tag string */
ProcessQuery(pstmt,
+ portal->cplan,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1285,6 +1292,7 @@ PortalRunMulti(Portal portal,
{
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
+ portal->cplan,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 5af1a168ec..5b75dadf13 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -104,7 +104,8 @@ static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ bool generic);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
@@ -815,8 +816,11 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * On a "true" return, we have acquired locks on the "unprunableRelids" set
+ * for all plans in plansource->stmt_list. The plans are not completely
+ * race-condition-free until the executor takes locks on the set of prunable
+ * relations that survive initial runtime pruning during executor
+ * initialization;
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -893,10 +897,10 @@ CheckCachedPlan(CachedPlanSource *plansource)
* or it can be set to NIL if we need to re-copy the plansource's query_list.
*
* To build a generic, parameter-value-independent plan, pass NULL for
- * boundParams. To build a custom plan, pass the actual parameter values via
- * boundParams. For best effect, the PARAM_FLAG_CONST flag should be set on
- * each parameter value; otherwise the planner will treat the value as a
- * hint rather than a hard constant.
+ * boundParams, and true for generic. To build a custom plan, pass the actual
+ * parameter values via boundParams, and false for generic. For best effect,
+ * the PARAM_FLAG_CONST flag should be set on each parameter value; otherwise
+ * the planner will treat the value as a hint rather than a hard constant.
*
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
@@ -904,7 +908,8 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ bool generic)
{
CachedPlan *plan;
List *plist;
@@ -1026,6 +1031,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->refcount = 0;
plan->context = plan_context;
plan->is_oneshot = plansource->is_oneshot;
+ plan->is_generic = generic;
plan->is_saved = false;
plan->is_valid = true;
@@ -1196,7 +1202,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv, true);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1241,7 +1247,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv, false);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1387,8 +1393,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
}
/*
- * Reject if AcquireExecutorLocks would have anything to do. This is
- * probably unnecessary given the previous check, but let's be safe.
+ * Reject if there are any lockable relations. This is probably
+ * unnecessary given the previous check, but let's be safe.
*/
foreach(lc, plan->stmt_list)
{
@@ -1776,7 +1782,7 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ int rtindex;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1794,9 +1800,13 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ rtindex = -1;
+ while ((rtindex = bms_next_member(plannedstmt->unprunableRelids,
+ rtindex)) >= 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry,
+ plannedstmt->rtable,
+ rtindex - 1);
if (!(rte->rtekind == RTE_RELATION ||
(rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3ab0aae78f..21c71e0d53 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -103,8 +103,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
- ExplainState *es, const char *queryString,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ IntoClause *into, ExplainState *es,
+ const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 8d85fa990e..599ac0318d 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -49,6 +49,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map RT index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -68,6 +69,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ int *rti_map pg_node_attr(array_size(nparts));
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -138,7 +140,8 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **validsubplan_rtis);
extern PartitionPruneState *ExecCreatePartitionPruneState(EState *estate,
PartitionPruneInfo *pruneinfo);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 0a7274e26c..0e7245435d 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ CachedPlan *cplan; /* CachedPlan that supplies the plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +58,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 518a9fcd15..bd68c60a0b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -42,6 +42,7 @@
#include "storage/condition_variable.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
+#include "utils/plancache.h"
#include "utils/reltrigger.h"
#include "utils/sharedtuplestore.h"
#include "utils/snapshot.h"
@@ -636,9 +637,14 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ CachedPlan *es_cachedplan;
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
List *es_part_prune_states; /* List of PartitionPruneState */
List *es_part_prune_results; /* List of Bitmapset */
+ Bitmapset *es_unprunable_relids; /* PlannedStmt.unprunableRelids + RT
+ * indexes of leaf partitions that
+ * survive initial pruning; see
+ * ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -1419,6 +1425,12 @@ typedef struct ModifyTableState
double mt_merge_inserted;
double mt_merge_updated;
double mt_merge_deleted;
+
+ /*
+ * List of valid updateColnosLists. Contains only those belonging to
+ * unpruned relations from ModifyTable.updateColnosLists.
+ */
+ List *mt_updateColnosLists;
} ModifyTableState;
/* ----------------
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 8d30b6e896..cc2190ea63 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -116,6 +116,12 @@ typedef struct PlannerGlobal
/* "flat" rangetable for executor */
List *finalrtable;
+ /*
+ * RT indexes of relations subject to removal from the plan due to runtime
+ * pruning at plan initialization time
+ */
+ Bitmapset *prunableRelids;
+
/* "flat" list of RTEPermissionInfos */
List *finalrteperminfos;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 39d0281c23..318e30fe2f 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -74,6 +74,10 @@ typedef struct PlannedStmt
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *unprunableRelids; /* RT indexes of relations that are not
+ * subject to runtime pruning; for
+ * AcquireExecutorLocks() */
+
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
@@ -1474,6 +1478,9 @@ typedef struct PartitionedRelPruneInfo
/* subpart index by partition index, or -1 */
int *subpart_map pg_node_attr(array_size(nparts));
+ /* RT index by partition index, or 0 */
+ int *rti_map pg_node_attr(array_size(nparts));
+
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a90dfdf906..0b5ee007ca 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -149,6 +149,7 @@ typedef struct CachedPlan
int magic; /* should equal CACHEDPLAN_MAGIC */
List *stmt_list; /* list of PlannedStmts */
bool is_oneshot; /* is it a "oneshot" plan? */
+ bool is_generic; /* is it a reusable generic plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
Oid planRoleId; /* Role ID the plan was created for */
@@ -235,4 +236,13 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
+/*
+ * CachedPlanRequiresLocking: should the executor acquire locks?
+ */
+static inline bool
+CachedPlanRequiresLocking(CachedPlan *cplan)
+{
+ return cplan->is_generic;
+}
+
#endif /* PLANCACHE_H */
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 7a03b4e360..705cd922fc 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4440,3 +4440,47 @@ drop table hp_contradict_test;
drop operator class part_test_int4_ops2 using hash;
drop operator ===(int4, int4);
drop function explain_analyze(text);
+-- Runtime pruning on UPDATE using WITH CHECK OPTIONS and RETURNING
+create table part_abc (a int, b text, c bool) partition by list (a);
+create table part_abc_1 (b text, a int, c bool);
+create table part_abc_2 (a int, c bool, b text);
+alter table part_abc attach partition part_abc_1 for values in (1);
+alter table part_abc attach partition part_abc_2 for values in (2);
+insert into part_abc values (1, 'b', true);
+insert into part_abc values (2, 'c', true);
+create view part_abc_view as select * from part_abc where b <> 'a' with check option;
+prepare update_part_abc_view as update part_abc_view set b = $2 where a = $1 returning *;
+explain (costs off) execute update_part_abc_view (1, 'd');
+ QUERY PLAN
+-------------------------------------------------------
+ Update on part_abc
+ Update on part_abc_1
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on part_abc_1
+ Filter: ((b <> 'a'::text) AND (a = $1))
+(6 rows)
+
+execute update_part_abc_view (1, 'd');
+ a | b | c
+---+---+---
+ 1 | d | t
+(1 row)
+
+explain (costs off) execute update_part_abc_view (2, 'a');
+ QUERY PLAN
+-------------------------------------------------------
+ Update on part_abc
+ Update on part_abc_2 part_abc_1
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on part_abc_2 part_abc_1
+ Filter: ((b <> 'a'::text) AND (a = $1))
+(6 rows)
+
+execute update_part_abc_view (2, 'a');
+ERROR: new row violates check option for view "part_abc_view"
+DETAIL: Failing row contains (2, a, t).
+deallocate update_part_abc_view;
+drop view part_abc_view;
+drop table part_abc;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 442428d937..af26ad2fb2 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1339,3 +1339,21 @@ drop operator class part_test_int4_ops2 using hash;
drop operator ===(int4, int4);
drop function explain_analyze(text);
+
+-- Runtime pruning on UPDATE using WITH CHECK OPTIONS and RETURNING
+create table part_abc (a int, b text, c bool) partition by list (a);
+create table part_abc_1 (b text, a int, c bool);
+create table part_abc_2 (a int, c bool, b text);
+alter table part_abc attach partition part_abc_1 for values in (1);
+alter table part_abc attach partition part_abc_2 for values in (2);
+insert into part_abc values (1, 'b', true);
+insert into part_abc values (2, 'c', true);
+create view part_abc_view as select * from part_abc where b <> 'a' with check option;
+prepare update_part_abc_view as update part_abc_view set b = $2 where a = $1 returning *;
+explain (costs off) execute update_part_abc_view (1, 'd');
+execute update_part_abc_view (1, 'd');
+explain (costs off) execute update_part_abc_view (2, 'a');
+execute update_part_abc_view (2, 'a');
+deallocate update_part_abc_view;
+drop view part_abc_view;
+drop table part_abc;
--
2.43.0
v56-0003-Perform-runtime-initial-pruning-outside-ExecInit.patchapplication/octet-stream; name=v56-0003-Perform-runtime-initial-pruning-outside-ExecInit.patchDownload
From 84b875b1ca3af89a9242cdaf9bea052223f9530e Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Thu, 12 Sep 2024 15:44:43 +0900
Subject: [PATCH v56 3/5] Perform runtime initial pruning outside
ExecInitNode()
This commit follows up on the previous change that moved
PartitionPruneInfos out of individual plan nodes into a list in
PlannedStmt. It moves the initialization of PartitionPruneStates
and runtime initial pruning out of ExecInitNode() and into a new
routine, ExecDoInitialPruning(), which is called by InitPlan()
before ExecInitNode() is invoked on the main plan tree and subplans.
ExecDoInitialPruning() stores the PartitionPruneStates that it
creates to do the initial pruning to use during exec pruninng in a
list matching the length of es_part_prune_infos (which holds the
PartitionPruneInfos from PlannedStmt), allowing both lists to share
the same index. It also saves the initial pruning result -- a
bitmapset of indexes for surviving child subnodes -- in a similarly
indexed list.
---
src/backend/executor/execMain.c | 55 ++++++++++++++++++++++++++++
src/backend/executor/execPartition.c | 51 ++++++++++++++------------
src/include/executor/execPartition.h | 2 +
src/include/nodes/execnodes.h | 2 +
4 files changed, 87 insertions(+), 23 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index f263232c67..5222aa9ab3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -46,6 +46,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "mb/pg_wchar.h"
@@ -828,6 +829,54 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
+/*
+ * ExecDoInitialPruning
+ * Perform runtime "initial" pruning, if necessary, to determine the set
+ * of child subnodes that need to be initialized during ExecInitNode()
+ * for plan nodes that support partition pruning.
+ *
+ * For each PartitionPruneInfo in estate->es_part_prune_infos, this function
+ * creates a PartitionPruneState (even if no initial pruning is done) and adds
+ * it to es_part_prune_states. For PartitionPruneInfo entries that include
+ * initial pruning steps, the result of those steps is saved as a bitmapset
+ * of indexes representing child subnodes that are "valid" and should be
+ * initialized for execution.
+ */
+static void
+ExecDoInitialPruning(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+ Bitmapset *validsubplans = NULL;
+
+ /*
+ * Create the working data structure for pruning, and save it for use
+ * later in ExecInitPartitionPruning(), which will be called by the
+ * parent plan node's ExecInit* function.
+ */
+ prunestate = ExecCreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+
+ /*
+ * Perform an initial partition pruning pass, if necessary, and save
+ * the bitmapset of valid subplans for use in
+ * ExecInitPartitionPruning(). If no initial pruning is performed, we
+ * still store a NULL to ensure that es_part_prune_results is the same
+ * length as es_part_prune_infos. This ensures that
+ * ExecInitPartitionPruning() can use the same index to locate the
+ * result.
+ */
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true);
+ estate->es_part_prune_results = lappend(estate->es_part_prune_results,
+ validsubplans);
+ }
+}
/* ----------------------------------------------------------------
* InitPlan
@@ -860,7 +909,13 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+
+ /*
+ * Perform runtime "initial" pruning to determine the plan nodes that will
+ * not be executed.
+ */
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ ExecDoInitialPruning(estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 63c3429fe7..40eb74d187 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -181,8 +181,6 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
-static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionedRelPruningData *pprune,
PartitionPruneContext *context,
List *pruning_steps,
@@ -1782,20 +1780,26 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
/*
* ExecInitPartitionPruning
- * Initialize data structure needed for run-time partition pruning and
- * do initial pruning if needed
+ * Initialize the data structures needed for runtime "exec" partition
+ * pruning and return the result of initial pruning, if available.
*
* 'root_parent_relids' identifies the relation to which both the parent plan
- * and the PartitionPruneInfo given by 'part_prune_index' belong.
+ * and the PartitionPruneInfo associated with 'part_prune_index' belong.
*
- * On return, *initially_valid_subplans is assigned the set of indexes of
- * child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * The PartitionPruneState would have been created by ExecDoInitialPruning()
+ * and stored as the part_prune_index'th element of EState.es_part_prune_states.
+ * Here, we initialize only the PartitionPruneContext necessary for execution
+ * pruning.
*
- * If subplans are indeed pruned, subplan_map arrays contained in the returned
- * PartitionPruneState are re-sequenced to not count those, though only if the
- * maps will be needed for subsequent execution pruning passes.
+ * On return, *initially_valid_subplans is assigned the set of indexes of child
+ * subplans that must be initialized alongside the parent plan node. Initial
+ * pruning would have been performed by ExecDoInitialPruning() if necessary,
+ * and the bitmapset of surviving subplans' indexes would have been stored as
+ * the part_prune_index'th element of EState.es_part_prune_results.
+ *
+ * If subplans are pruned, the subplan_map arrays in the returned
+ * PartitionPruneState are re-sequenced to exclude those subplans, but only if
+ * the maps will be needed for subsequent execution pruning passes.
*/
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
@@ -1820,11 +1824,12 @@ ExecInitPartitionPruning(PlanState *planstate,
bmsToString(root_parent_relids),
bmsToString(pruneinfo->root_parent_relids)));
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
-
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ /*
+ * ExecDoInitialPruning() must have initialized the PartitionPruneState to
+ * perform the initial pruning.
+ */
+ prunestate = list_nth(estate->es_part_prune_states, part_prune_index);
+ Assert(prunestate != NULL);
/*
* Store PlanState for using it to initialize exec pruning contexts later
@@ -1833,11 +1838,11 @@ ExecInitPartitionPruning(PlanState *planstate,
if (prunestate->do_exec_prune)
prunestate->parent_plan = planstate;
- /*
- * Perform an initial partition prune pass, if required.
- */
+ /* Use the result of initial pruning done by ExecDoInitialPruning(). */
if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ *initially_valid_subplans = list_nth_node(Bitmapset,
+ estate->es_part_prune_results,
+ part_prune_index);
else
{
/* No pruning, so we'll need to initialize all subplans */
@@ -1886,8 +1891,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* (which are stored in each PartitionedRelPruningData) are initialized lazily
* in find_matching_subplans_recurse() when used for the first time.
*/
-static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
+PartitionPruneState *
+ExecCreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 41afb522f3..8d85fa990e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -139,4 +139,6 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
+extern PartitionPruneState *ExecCreatePartitionPruneState(EState *estate,
+ PartitionPruneInfo *pruneinfo);
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 22b928e085..518a9fcd15 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -637,6 +637,8 @@ typedef struct EState
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_states; /* List of PartitionPruneState */
+ List *es_part_prune_results; /* List of Bitmapset */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
--
2.43.0
Hi Amit,
This is not a full review (sorry!) but here are a few comments.
In general, I don't have a problem with this direction. I thought
Tom's previous proposal of abandoning ExecInitNode() in medias res if
we discover that we need to replan was doable and I still think that,
but ISTM that this approach needs to touch less code, because
abandoning ExecInitNode() partly through means we could have leftover
state to clean up in any node in the PlanState tree, and as we've
discussed, ExecEndNode() isn't necessarily prepared to clean up a
PlanState tree that was only partially processed by ExecInitNode(). As
far as I can see in the time I've spent looking at this today, 0001
looks pretty unobjectionable (with some exceptions that I've noted
below). I also think 0003 looks pretty safe. It seems like partition
pruning moves backward across a pretty modest amount of code that does
pretty well-defined things. Basically, initialization-time pruning now
happens before other types of node initialization, and before setting
up row marks. I do however find the changes in 0002 to be less
obviously correct and less obviously safe; see below for some notes
about that.
In 0001, the name root_parent_relids doesn't seem very clear to me,
and neither does the explanation of what it does. You say
"'root_parent_relids' identifies the relation to which both the parent
plan and the PartitionPruneInfo given by 'part_prune_index' belong."
But it's a set, so what does it mean to identify "the" relation? It's
a set of relations, not just one. And why does the name include the
word "root"? It's neither the PlannerGlobal object, which we often
call root, nor is it the root of the partitioning hierarchy. To me, it
looks like it's just the set of relids that we can potentially prune.
I don't see why this isn't just called "relids", like the field from
which it's copied:
+ pruneinfo->root_parent_relids = parentrel->relids;
It just doesn't seem very root-y or very parent-y.
- node->part_prune_info = partpruneinfo;
+
Extra blank line.
In 0002, the handling of ExprContexts seems a little bit hard to
understand. Sometimes we're using the PlanState's ExprContext, and
sometimes we're using a separate context owned by the
PartitionedRelPruningData's context, and it's not exactly clear why
that is or what the consequences are. Likewise I wouldn't mind some
more comments or explanation in the commit message of the changes in
this patch related to EState objects. I can't help wondering if the
changes here could have either semantic implications (like expression
evaluation can produce different results than before) or performance
implications (because we create objects that we didn't previously
create). As noted above, this is really my only design-level concern
about 0001-0003.
Typo: partrtitioned
Regrettably, I have not looked seriously at 0004 and 0005, so I can't
comment on those.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert,
On Fri, Oct 11, 2024 at 5:15 AM Robert Haas <robertmhaas@gmail.com> wrote:
Hi Amit,
This is not a full review (sorry!) but here are a few comments.
Thank you for taking a look.
In general, I don't have a problem with this direction. I thought
Tom's previous proposal of abandoning ExecInitNode() in medias res if
we discover that we need to replan was doable and I still think that,
but ISTM that this approach needs to touch less code, because
abandoning ExecInitNode() partly through means we could have leftover
state to clean up in any node in the PlanState tree, and as we've
discussed, ExecEndNode() isn't necessarily prepared to clean up a
PlanState tree that was only partially processed by ExecInitNode().
I will say that I feel more comfortable committing and be responsible
for the refactoring I'm proposing in 0001-0003 than the changes
required to take locks during ExecInitNode(), as seen in the patches
up to version v52..
As
far as I can see in the time I've spent looking at this today, 0001
looks pretty unobjectionable (with some exceptions that I've noted
below). I also think 0003 looks pretty safe. It seems like partition
pruning moves backward across a pretty modest amount of code that does
pretty well-defined things. Basically, initialization-time pruning now
happens before other types of node initialization, and before setting
up row marks. I do however find the changes in 0002 to be less
obviously correct and less obviously safe; see below for some notes
about that.In 0001, the name root_parent_relids doesn't seem very clear to me,
and neither does the explanation of what it does. You say
"'root_parent_relids' identifies the relation to which both the parent
plan and the PartitionPruneInfo given by 'part_prune_index' belong."
But it's a set, so what does it mean to identify "the" relation? It's
a set of relations, not just one.
The intention is to ensure that the bitmapset in PartitionPruneInfo
corresponds to the apprelids bitmapset in the Append or MergeAppend
node that owns the PartitionPruneInfo. Essentially, root_parent_relids
is used to cross-check that both sets align, ensuring that the pruning
logic applies to the same relations as the parent plan.
And why does the name include the
word "root"? It's neither the PlannerGlobal object, which we often
call root, nor is it the root of the partitioning hierarchy. To me, it
looks like it's just the set of relids that we can potentially prune.
I don't see why this isn't just called "relids", like the field from
which it's copied:+ pruneinfo->root_parent_relids = parentrel->relids;
It just doesn't seem very root-y or very parent-y.
Maybe just "relids" suffices with a comment updated like this:
* relids RelOptInfo.relids of the parent plan node (e.g. Append
* or MergeAppend) to which his PartitionPruneInfo node
* belongs. Used to ensure that the pruning logic matches
* the parent plan's apprelids.
- node->part_prune_info = partpruneinfo; +Extra blank line.
Fixed.
In 0002, the handling of ExprContexts seems a little bit hard to
understand. Sometimes we're using the PlanState's ExprContext, and
sometimes we're using a separate context owned by the
PartitionedRelPruningData's context, and it's not exactly clear why
that is or what the consequences are. Likewise I wouldn't mind some
more comments or explanation in the commit message of the changes in
this patch related to EState objects. I can't help wondering if the
changes here could have either semantic implications (like expression
evaluation can produce different results than before) or performance
implications (because we create objects that we didn't previously
create).
I have taken another look at whether there's any real need to use
separate ExprContexts for initial and runtime pruning and ISTM there
isn't, so we can make "exec" pruning use the same ExprContext as what
"init" would have used. There *is* a difference however in how we
initializing the partition key expressions for initial and runtime
pruning, but it's not problematic to use the same ExprContext.
I'll update the commentary a bit more.
Typo: partrtitioned
Fixed.
Regrettably, I have not looked seriously at 0004 and 0005, so I can't
comment on those.
Ok, I'm updating 0005 to change how the CachedPlan is handled when it
becomes invalid during InitPlan(). Currently (v56), a separate
transient CachedPlan is created for the query being initialized when
invalidation occurs. However, it seems better to update the original
CachedPlan in place to avoid extra bookkeeping for transient plans—an
approach Robert suggested in an off-list discussion.
Will post a new version next week.
--
Thanks, Amit Langote
On Fri, Oct 11, 2024 at 3:30 AM Amit Langote <amitlangote09@gmail.com> wrote:
Maybe just "relids" suffices with a comment updated like this:
* relids RelOptInfo.relids of the parent plan node (e.g. Append
* or MergeAppend) to which his PartitionPruneInfo node
* belongs. Used to ensure that the pruning logic matches
* the parent plan's apprelids.
LGTM.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, Oct 11, 2024 at 3:30 AM Amit Langote <amitlangote09@gmail.com> wrote:
Maybe just "relids" suffices with a comment updated like this:
* relids RelOptInfo.relids of the parent plan node (e.g. Append
* or MergeAppend) to which his PartitionPruneInfo node
* belongs. Used to ensure that the pruning logic matches
* the parent plan's apprelids.
LGTM.
"his" -> "this", surely?
regards, tom lane
On Fri, Oct 11, 2024 at 4:30 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Fri, Oct 11, 2024 at 5:15 AM Robert Haas <robertmhaas@gmail.com> wrote:
Ok, I'm updating 0005 to change how the CachedPlan is handled when it
becomes invalid during InitPlan(). Currently (v56), a separate
transient CachedPlan is created for the query being initialized when
invalidation occurs. However, it seems better to update the original
CachedPlan in place to avoid extra bookkeeping for transient plans—an
approach Robert suggested in an off-list discussion.Will post a new version next week.
Sorry for the delay.
I've completed hacking on the approach to update the existing
CachedPlan in-place when it’s invalidated during plan initialization
in its stmt_list. Previously, we created transient (living for that
execution) CachedPlans for each query/plan, tracked separately from
the original CachedPlan, so that invalidation callbacks could
reference them. This meant that the original CachedPlan would continue
to hold invalid plans until the next GetCachedPlan() call.
With the new approach, the original CachedPlan is updated directly:
new PlannedStmts are installed into the existing stmt_list, allowing
any callers iterating over that list to continue unaffected. The new
UpdateCachedPlan() function now creates new plans for all queries in
the CachedPlan’s owning CachedPlanSource, replacing the previous
plans, and marks it valid. So the CachedPlan becomes valid
immediately instead of in the next GetCachedPlan().
One caveat is that, without a dedicated memory context for the
PlannedStmts in stmt_list, the old ones leak into CacheMemoryContext.
However, since UpdateCachedPlan() is rarely invoked, I haven’t focused
on addressing this leak. If needed, we could introduce an additional
memory context next to CachedPlan.context, which would allow freeing
the PlannedStmts without affecting the stmt_list. For now, I’ve
ensured that stmt_list itself is not overwritten in
UpdateCachedPlan().
UpdateCachedPlan() is added in 0005.
I've kept 0005, the patch to retry execution with an updated plan if
the plan becomes invalid after taking locks on prunable relations
(deferred until initial pruning), separate for now. However, I plan to
eventually merge it into 0004, the patch implementing deferred
locking.
I've also fixed the comment in 0003 about PartitionPruneInfo.relid as
Tom pointed out, which now reads:
* relids RelOptInfo.relids of the parent plan node (e.g. Append
* or MergeAppend) to which this PartitionPruneInfo node
* belongs. The pruning logic ensures that this matches
* the parent plan node's apprelids.
I've stared at the refactoring patches 0001-0003 long enough at this
point to think that they are good for committing.
--
Thanks, Amit Langote
Attachments:
v57-0002-Initialize-PartitionPruneContexts-lazily.patchapplication/octet-stream; name=v57-0002-Initialize-PartitionPruneContexts-lazily.patchDownload
From 98efea44aaa0780d3be013c2ef4acdff5ff39d7b Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 23 Oct 2024 16:55:42 +0900
Subject: [PATCH v57 2/5] Initialize PartitionPruneContexts lazily
This commit moves the initialization of PartitionPruneContexts for
both initial and exec pruning steps from CreatePartitionPruneState()
to find_matching_subplans_recurse(), where they are actually needed.
To track whether the context has been initialized and is ready for
use, a boolean field is_valid has been added to PartitionPruneContext.
The primary motivation is to allow CreatePartitionPruneState() to be
called before ExecInitNode(). Right now, it's coupled with
ExecInitNode() because setting up the exec pruning context requires
access to the parent plan node's PlanState. By deferring context
creation to where it's actually needed, we break this dependency.
The ExprContext used for both pruning phases is now a standalone
context, independent of the parent PlanState.
This change will be useful in a future commit, which will move initial
pruning to occur outside ExecInitNode(), specifically before it is
called by InitPlan().
Reviewed-by: Robert Haas
Reviewed-by: Tom Lane
---
src/backend/executor/execPartition.c | 151 +++++++++++++++++++--------
src/backend/partitioning/partprune.c | 7 +-
src/include/executor/execPartition.h | 12 +++
src/include/partitioning/partprune.h | 2 +
4 files changed, 123 insertions(+), 49 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 323d5330ff..38311d2991 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -181,18 +181,17 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
-static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
+static PartitionPruneState *CreatePartitionPruneState(EState *estate,
PartitionPruneInfo *pruneinfo);
-static void InitPartitionPruneContext(PartitionPruneContext *context,
+static void InitPartitionPruneContext(PartitionedRelPruningData *pprune,
+ PartitionPruneContext *context,
List *pruning_steps,
- PartitionDesc partdesc,
- PartitionKey partkey,
- PlanState *planstate,
- ExprContext *econtext);
+ PlanState *planstate);
static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
Bitmapset *initially_valid_subplans,
int n_total_subplans);
-static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
+static void find_matching_subplans_recurse(PlanState *parent_plan,
+ PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
Bitmapset **validsubplans);
@@ -1825,7 +1824,14 @@ ExecInitPartitionPruning(PlanState *planstate,
ExecAssignExprContext(estate, planstate);
/* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+
+ /*
+ * Store PlanState for using it to initialize exec pruning contexts later
+ * in find_matching_subplans_recurse() where they are needed.
+ */
+ if (prunestate->do_exec_prune)
+ prunestate->parent_plan = planstate;
/*
* Perform an initial partition prune pass, if required.
@@ -1865,8 +1871,6 @@ ExecInitPartitionPruning(PlanState *planstate,
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
- *
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
* PartitionPruningData for each partitioning hierarchy (i.e., each sublist of
@@ -1877,16 +1881,24 @@ ExecInitPartitionPruning(PlanState *planstate,
* stored in each PartitionedRelPruningData can be re-used each time we
* re-evaluate which partitions match the pruning steps provided in each
* PartitionedRelPruneInfo.
+ *
+ * Note that the PartitionPruneContexts for both initial and exec pruning
+ * (which are stored in each PartitionedRelPruningData) are initialized lazily
+ * in find_matching_subplans_recurse() when used for the first time.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
- EState *estate = planstate->state;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
+
+ /*
+ * Expression context that will be used by partkey_datum_from_expr() to
+ * evaluate expressions for comparison against partition bounds.
+ */
+ ExprContext *econtext = CreateExprContext(estate);
/* For data reading, executor always includes detached partitions */
if (estate->es_partition_directory == NULL)
@@ -1908,6 +1920,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->other_subplans = bms_copy(pruneinfo->other_subplans);
prunestate->do_initial_prune = false; /* may be set below */
prunestate->do_exec_prune = false; /* may be set below */
+ prunestate->parent_plan = NULL;
prunestate->num_partprunedata = n_part_hierarchies;
/*
@@ -1943,16 +1956,25 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
Relation partrel;
PartitionDesc partdesc;
- PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Used for initializing the expressions in initial pruning steps.
+ * For exec pruning steps, the parent plan node's PlanState's
+ * ps_ExprContext will be used.
*/
+ pprune->estate = estate;
+ pprune->econtext = econtext;
+
+ /* Remember Relation for use in InitPartitionPruneContext. */
partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
- partkey = RelationGetPartitionKey(partrel);
+ pprune->partrel = partrel;
+
+ /*
+ * We can rely on the copy of partitioned table's partition
+ * descriptor appearing in its relcache entry, because that entry
+ * will be held open and locked for the duration of this executor
+ * run.
+ */
partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
partrel);
@@ -2063,32 +2085,26 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->present_parts = bms_copy(pinfo->present_parts);
/*
- * Initialize pruning contexts as needed. Note that we must skip
- * execution-time partition pruning in EXPLAIN (GENERIC_PLAN),
- * since parameter values may be missing.
+ * Pruning contexts (initial_context and exec_context) are
+ * initialized lazily in find_matching_subplans_recurse() when
+ * used for the first time.
+ *
+ * Note that we must skip execution-time partition pruning in
+ * EXPLAIN (GENERIC_PLAN), since parameter values may be missing.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
+ pprune->initial_context.is_valid = false;
if (pinfo->initial_pruning_steps &&
!(econtext->ecxt_estate->es_top_eflags & EXEC_FLAG_EXPLAIN_GENERIC))
- {
- InitPartitionPruneContext(&pprune->initial_context,
- pinfo->initial_pruning_steps,
- partdesc, partkey, planstate,
- econtext);
/* Record whether initial pruning is needed at any level */
prunestate->do_initial_prune = true;
- }
+
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
+ pprune->exec_context.is_valid = false;
if (pinfo->exec_pruning_steps &&
!(econtext->ecxt_estate->es_top_eflags & EXEC_FLAG_EXPLAIN_GENERIC))
- {
- InitPartitionPruneContext(&pprune->exec_context,
- pinfo->exec_pruning_steps,
- partdesc, partkey, planstate,
- econtext);
/* Record whether exec pruning is needed at any level */
prunestate->do_exec_prune = true;
- }
/*
* Accumulate the IDs of all PARAM_EXEC Params affecting the
@@ -2109,17 +2125,41 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize a PartitionPruneContext for the given list of pruning steps.
*/
static void
-InitPartitionPruneContext(PartitionPruneContext *context,
+InitPartitionPruneContext(PartitionedRelPruningData *pprune,
+ PartitionPruneContext *context,
List *pruning_steps,
- PartitionDesc partdesc,
- PartitionKey partkey,
- PlanState *planstate,
- ExprContext *econtext)
+ PlanState *planstate)
{
int n_steps;
int partnatts;
ListCell *lc;
+ /*
+ * Use the ExprContext that CreatePartitionPruneState() should have
+ * created.
+ */
+ ExprContext *econtext = pprune->econtext;
+ EState *estate = pprune->estate;
+ MemoryContext oldcxt;
+ Relation partrel = pprune->partrel;
+ PartitionKey partkey;
+ PartitionDesc partdesc;
+
+ Assert(econtext != NULL);
+
+ /* Must allocate the needed stuff in the query lifetime context. */
+ oldcxt = MemoryContextSwitchTo(estate->es_query_cxt);
+
+ /*
+ * We can rely on the copies of the partitioned table's partition key and
+ * partition descriptor appearing in its relcache entry, because that
+ * entry will be held open and locked for the duration of this executor
+ * run.
+ */
+ partkey = RelationGetPartitionKey(partrel);
+ partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
+ partrel);
+
n_steps = list_length(pruning_steps);
context->strategy = partkey->strategy;
@@ -2187,6 +2227,9 @@ InitPartitionPruneContext(PartitionPruneContext *context,
}
}
}
+
+ MemoryContextSwitchTo(oldcxt);
+ context->is_valid = true;
}
/*
@@ -2350,12 +2393,16 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* recursing to other (lower-level) parents as needed.
*/
pprune = &prunedata->partrelprunedata[0];
- find_matching_subplans_recurse(prunedata, pprune, initial_prune,
+ find_matching_subplans_recurse(prunestate->parent_plan,
+ prunedata, pprune, initial_prune,
&result);
/* Expression eval may have used space in ExprContext too */
- if (pprune->exec_pruning_steps)
+ if (pprune->exec_context.is_valid)
+ {
+ Assert(pprune->exec_pruning_steps != NIL);
ResetExprContext(pprune->exec_context.exprcontext);
+ }
}
/* Add in any subplans that partition pruning didn't account for */
@@ -2378,7 +2425,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* Adds valid (non-prunable) subplan IDs to *validsubplans
*/
static void
-find_matching_subplans_recurse(PartitionPruningData *prunedata,
+find_matching_subplans_recurse(PlanState *parent_plan,
+ PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
Bitmapset **validsubplans)
@@ -2395,11 +2443,27 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
* level.
*/
if (initial_prune && pprune->initial_pruning_steps)
+ {
+ /* Initialize initial_context if not already done. */
+ if (unlikely(!pprune->initial_context.is_valid))
+ InitPartitionPruneContext(pprune,
+ &pprune->initial_context,
+ pprune->initial_pruning_steps,
+ parent_plan);
partset = get_matching_partitions(&pprune->initial_context,
pprune->initial_pruning_steps);
+ }
else if (!initial_prune && pprune->exec_pruning_steps)
+ {
+ /* Initialize exec_context if not already done. */
+ if (unlikely(!pprune->exec_context.is_valid))
+ InitPartitionPruneContext(pprune,
+ &pprune->exec_context,
+ pprune->exec_pruning_steps,
+ parent_plan);
partset = get_matching_partitions(&pprune->exec_context,
pprune->exec_pruning_steps);
+ }
else
partset = pprune->present_parts;
@@ -2415,7 +2479,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
int partidx = pprune->subpart_map[i];
if (partidx >= 0)
- find_matching_subplans_recurse(prunedata,
+ find_matching_subplans_recurse(parent_plan,
+ prunedata,
&prunedata->partrelprunedata[partidx],
initial_prune, validsubplans);
else
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6f0ead1fa8..df767f9e5b 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -3784,13 +3784,8 @@ partkey_datum_from_expr(PartitionPruneContext *context,
/*
* We should never see a non-Const in a step unless the caller has
* passed a valid ExprContext.
- *
- * When context->planstate is valid, context->exprcontext is same as
- * context->planstate->ps_ExprContext.
*/
- Assert(context->planstate != NULL || context->exprcontext != NULL);
- Assert(context->planstate == NULL ||
- (context->exprcontext == context->planstate->ps_ExprContext));
+ Assert(context->exprcontext != NULL);
exprstate = context->exprstates[stateidx];
ectx = context->exprcontext;
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index ed2b019c09..5178c27743 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -42,6 +42,10 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* PartitionedRelPruneInfo (see plannodes.h); though note that here,
* subpart_map contains indexes into PartitionPruningData.partrelprunedata[].
*
+ * estate The EState for the query doing runtime pruning
+ * partrel Partitioned table Relation; obtained by
+ * ExecGetRangeTableRelation(estate, rti), where
+ * rti is PartitionedRelPruneInfo.rtindex.
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
@@ -51,6 +55,8 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* perform executor startup pruning.
* exec_pruning_steps List of PartitionPruneSteps used to
* perform per-scan pruning.
+ * econtext ExprContext to use for evaluating partition
+ * key
* initial_context If initial_pruning_steps isn't NIL, contains
* the details needed to execute those steps.
* exec_context If exec_pruning_steps isn't NIL, contains
@@ -58,12 +64,15 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
*/
typedef struct PartitionedRelPruningData
{
+ EState *estate;
+ Relation partrel;
int nparts;
int *subplan_map;
int *subpart_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
+ ExprContext *econtext;
PartitionPruneContext initial_context;
PartitionPruneContext exec_context;
} PartitionedRelPruningData;
@@ -105,6 +114,8 @@ typedef struct PartitionPruningData
* startup (at any hierarchy level).
* do_exec_prune true if pruning should be performed during
* executor run (at any hierarchy level).
+ * parent_plan Parent plan node's PlanState used to initialize exec
+ * pruning contexts
* num_partprunedata Number of items in "partprunedata" array.
* partprunedata Array of PartitionPruningData pointers for the plan's
* partitioned relation(s), one for each partitioning
@@ -117,6 +128,7 @@ typedef struct PartitionPruneState
MemoryContext prune_context;
bool do_initial_prune;
bool do_exec_prune;
+ PlanState *parent_plan;
int num_partprunedata;
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 6922e04430..b90c2e57a2 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -26,6 +26,7 @@ struct RelOptInfo;
* Stores information needed at runtime for pruning computations
* related to a single partitioned table.
*
+ * is_valid Has the information in this struct been initialized?
* strategy Partition strategy, e.g. LIST, RANGE, HASH.
* partnatts Number of columns in the partition key.
* nparts Number of partitions in this partitioned table.
@@ -48,6 +49,7 @@ struct RelOptInfo;
*/
typedef struct PartitionPruneContext
{
+ bool is_valid;
char strategy;
int partnatts;
int nparts;
--
2.43.0
v57-0003-Perform-runtime-initial-pruning-outside-ExecInit.patchapplication/octet-stream; name=v57-0003-Perform-runtime-initial-pruning-outside-ExecInit.patchDownload
From 0e18a7dd1e5b1e9aaea442b140e9591ccb25644a Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 23 Oct 2024 17:02:33 +0900
Subject: [PATCH v57 3/5] Perform runtime initial pruning outside
ExecInitNode()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This commit follows up on the previous change that moved
PartitionPruneInfos out of individual plan nodes into a list in
PlannedStmt. It moves the initialization of PartitionPruneStates
and runtime initial pruning out of ExecInitNode() and into a new
routine, ExecDoInitialPruning(), which is called by InitPlan()
before ExecInitNode() is invoked on the main plan tree and subplans.
ExecDoInitialPruning() performs the initial pruning and saves the
result—a bitmapset of indexes for the surviving child subnodes—in
es_part_prune_results, a list in EState. The PartitionPruneStates
created for initial pruning are also saved in es_part_prune_states,
another list in EState, for later use during exec pruning. Both lists
are parallel to es_part_prune_infos (which holds the
PartitionPruneInfos from PlannedStmt), allowing them to share the
same index.
Reviewed-by: Robert Haas
---
src/backend/executor/execMain.c | 59 ++++++++++++++++++++++++++++
src/backend/executor/execPartition.c | 51 +++++++++++++-----------
src/include/executor/execPartition.h | 2 +
src/include/nodes/execnodes.h | 2 +
4 files changed, 90 insertions(+), 24 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index c460c6aa32..2fcec32dcb 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -46,6 +46,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "mb/pg_wchar.h"
@@ -819,6 +820,53 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
+/*
+ * ExecDoInitialPruning
+ * Perform runtime "initial" pruning, if necessary, to determine the set
+ * of child subnodes that need to be initialized during ExecInitNode() for
+ * plan nodes that support partition pruning.
+ *
+ * This function iterates over each PartitionPruneInfo entry in
+ * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
+ * and adds it to es_part_prune_states, where ExecInitPartitionPruning() can
+ * access it for use during "exec" pruning.
+ *
+ * If initial pruning steps exist for a PartitionPruneInfo entry, this function
+ * executes those pruning steps and stores the result as a bitmapset of valid
+ * child subplans, identifying which subplans should be initialized for
+ * execution. The results are saved in estate->es_part_prune_results.
+ *
+ * If no initial pruning is performed for a given PartitionPruneInfo, a NULL
+ * entry is still added to es_part_prune_results to maintain alignment with
+ * es_part_prune_infos. This ensures that ExecInitPartitionPruning() can use
+ * the same index to retrieve the pruning results.
+ */
+static void
+ExecDoInitialPruning(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+ Bitmapset *validsubplans = NULL;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = ExecCreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+
+ /*
+ * Perform initial pruning steps, if any, and save the result
+ * bitmapset or NULL as described in the header comment.
+ */
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true);
+ estate->es_part_prune_results = lappend(estate->es_part_prune_results,
+ validsubplans);
+ }
+}
/* ----------------------------------------------------------------
* InitPlan
@@ -851,7 +899,18 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+
+ /*
+ * Perform runtime "initial" pruning to identify which child subplans,
+ * corresponding to the children of plan nodes that contain
+ * PartitionPruneInfo such as Append, will not be executed. The results,
+ * which are bitmapsets of indexes of the child subplans that will be
+ * executed, are saved in es_part_prune_results. These results correspond
+ * to each PartitionPruneInfo entry, and the es_part_prune_results list is
+ * parallel to es_part_prune_infos.
+ */
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ ExecDoInitialPruning(estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 38311d2991..83d1b61101 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -181,8 +181,6 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
-static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionedRelPruningData *pprune,
PartitionPruneContext *context,
List *pruning_steps,
@@ -1782,20 +1780,24 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
/*
* ExecInitPartitionPruning
- * Initialize data structure needed for run-time partition pruning and
- * do initial pruning if needed
+ * Initialize the data structures needed for runtime "exec" partition
+ * pruning and return the result of initial pruning, if available.
*
* 'relids' identifies the relation to which both the parent plan and the
* PartitionPruneInfo given by 'part_prune_index' belong.
*
- * On return, *initially_valid_subplans is assigned the set of indexes of
- * child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * The PartitionPruneState would have been created by ExecDoInitialPruning()
+ * and stored as the part_prune_index'th element of EState.es_part_prune_states.
*
- * If subplans are indeed pruned, subplan_map arrays contained in the returned
- * PartitionPruneState are re-sequenced to not count those, though only if the
- * maps will be needed for subsequent execution pruning passes.
+ * On return, *initially_valid_subplans is assigned the set of indexes of child
+ * subplans that must be initialized alongside the parent plan node. Initial
+ * pruning would have been performed by ExecDoInitialPruning() if necessary,
+ * and the bitmapset of surviving subplans' indexes would have been stored as
+ * the part_prune_index'th element of EState.es_part_prune_results.
+ *
+ * If subplans were pruned during initial pruning, the subplan_map arrays in
+ * the returned PartitionPruneState are re-sequenced to exclude those subplans,
+ * but only if the maps will be needed for subsequent execution pruning passes.
*/
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
@@ -1820,11 +1822,12 @@ ExecInitPartitionPruning(PlanState *planstate,
bmsToString(relids),
bmsToString(pruneinfo->relids)));
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
-
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ /*
+ * ExecDoInitialPruning() must have initialized the PartitionPruneState to
+ * perform the initial pruning.
+ */
+ prunestate = list_nth(estate->es_part_prune_states, part_prune_index);
+ Assert(prunestate != NULL);
/*
* Store PlanState for using it to initialize exec pruning contexts later
@@ -1833,11 +1836,11 @@ ExecInitPartitionPruning(PlanState *planstate,
if (prunestate->do_exec_prune)
prunestate->parent_plan = planstate;
- /*
- * Perform an initial partition prune pass, if required.
- */
+ /* Use the result of initial pruning done by ExecDoInitialPruning(). */
if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ *initially_valid_subplans = list_nth_node(Bitmapset,
+ estate->es_part_prune_results,
+ part_prune_index);
else
{
/* No pruning, so we'll need to initialize all subplans */
@@ -1848,8 +1851,8 @@ ExecInitPartitionPruning(PlanState *planstate,
/*
* Re-sequence subplan indexes contained in prunestate to account for any
- * that were removed above due to initial pruning. No need to do this if
- * no steps were removed.
+ * that were removed due to initial pruning. No need to do this if no
+ * partitions were removed.
*/
if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
@@ -1886,8 +1889,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* (which are stored in each PartitionedRelPruningData) are initialized lazily
* in find_matching_subplans_recurse() when used for the first time.
*/
-static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
+PartitionPruneState *
+ExecCreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 5178c27743..1497aed533 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -140,4 +140,6 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
+extern PartitionPruneState *ExecCreatePartitionPruneState(EState *estate,
+ PartitionPruneInfo *pruneinfo);
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5deed9232a..b0ceb1ab05 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -640,6 +640,8 @@ typedef struct EState
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_states; /* List of PartitionPruneState */
+ List *es_part_prune_results; /* List of Bitmapset */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
--
2.43.0
v57-0004-Defer-locking-of-runtime-prunable-relations-to-e.patchapplication/octet-stream; name=v57-0004-Defer-locking-of-runtime-prunable-relations-to-e.patchDownload
From beff511dd6d4b87b763a3f70de26988a37c82d31 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 25 Oct 2024 15:45:38 +0900
Subject: [PATCH v57 4/5] Defer locking of runtime-prunable relations to
executor
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When preparing a cached plan for execution, plancache.c locks the
relations in the plan's range table to ensure they are safe for
execution. However, this approach, implemented in
AcquireExecutorLocks(), results in unnecessarily locking relations
that might be pruned during "initial" runtime pruning.
To optimize this, locking is now deferred for relations subject to
"initial" runtime pruning. The planner now provides a set of
"unprunable" relations through the new PlannedStmt.unprunableRelids
field. AcquireExecutorLocks() will only lock these unprunable
relations. PlannedStmt.unprunableRelids is populated by subtracting
the set of initially prunable relids from all RT indexes. The
prunable relids are identified by examining all PartitionPruneInfos
during set_plan_refs() and storing the RT indexes of partitions
subject to "initial" pruning steps in PlannerGlobal.prunableRelids.
Deferred locks are taken, if necessary, after ExecDoInitialPruning()
determines the set of unpruned partitions. To enable this, the
CachedPlan is now available via QueryDesc, allowing the executor to
determine if the plan tree it’s executing is cached and may contain
unlocked relations. The executor calls CachedPlanRequiresLocking()
to check whether a cached plan might contain such unlocked relations,
ensuring that appropriate locks are acquired before execution.
Plan nodes like Append are already updated to consider only unpruned
relations. However, child RowMarks and child result relations are not
directly informed about unpruned partitions. Code handling child
RowMarks and result relations has therefore been modified to ensure
they don’t belong to pruned partitions. ExecDoInitialPruning() now
adds RT indexes of unpruned partitions to es_unpruned_relids,
initially populated with PlannedStmt.unprunableRelids. This ensures
only those child RowMarks and result relations whose owning relations
are in this set are processed.
For ModifyTable nodes, ExecInitModifyTable truncates the
resultRelations list (and parallel lists like withCheckOptionLists,
returningLists, and updateColnosLists) to consider only unpruned
relations, and creates ResultRelInfo structs only for those.
To obtain RT indexes of unpruned leaf partitions for
es_unpruned_relids, each PartitionedRelPruneInfo and the corresponding
PartitionedRelPruningData now includes a mapping from partition
indexes (from get_matching_partitions()) to their RT indexes in a
leafpart_rti_map[] array.
An Assert in ExecCheckPermissions() ensures that all relations
undergoing permission checks are properly locked, helping to catch
any missed additions to the unprunableRelids set.
Deferring locking introduces a window where prunable relations may be
altered by concurrent DDL, which can invalidate the plan. This might
cause errors if the executor attempts to use an invalid plan, such as
failing to locate a dropped partition index during
ExecInitIndexScan(). Future commits will add support for the executor
to validate the plan during ExecutorStart() and retry with a new plan
if the original becomes invalid after deferred locks.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 3 +-
src/backend/executor/execMain.c | 75 ++++++++++++++++++-
src/backend/executor/execParallel.c | 9 ++-
src/backend/executor/execPartition.c | 36 +++++++--
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeAppend.c | 8 +-
src/backend/executor/nodeLockRows.c | 9 ++-
src/backend/executor/nodeMergeAppend.c | 2 +-
src/backend/executor/nodeModifyTable.c | 71 +++++++++++++++---
src/backend/executor/spi.c | 1 +
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 29 ++++++-
src/backend/partitioning/partprune.c | 22 ++++++
src/backend/tcop/pquery.c | 10 ++-
src/backend/utils/cache/plancache.c | 47 +++++++-----
src/include/commands/explain.h | 5 +-
src/include/executor/execPartition.h | 6 +-
src/include/executor/execdesc.h | 2 +
src/include/nodes/execnodes.h | 12 +++
src/include/nodes/pathnodes.h | 8 ++
src/include/nodes/plannodes.h | 7 ++
src/include/utils/plancache.h | 18 +++++
src/test/regress/expected/partition_prune.out | 44 +++++++++++
src/test/regress/sql/partition_prune.sql | 18 +++++
29 files changed, 400 insertions(+), 59 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f55e6d9675..27b6f6f069 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -556,7 +556,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 68ec122dbf..290c8bd240 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -324,7 +324,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 18a5af6b91..b699089bd8 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -515,7 +515,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -623,7 +623,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -679,7 +680,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, cplan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 86ea9cd9da..cb168ab6dd 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -903,6 +903,7 @@ execute_sql_string(const char *sql, const char *filename)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 010097873d..69be74b4bd 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -438,7 +438,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 07257d4db9..311b9ebd5b 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -655,7 +655,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
+ ExplainOnePlan(pstmt, cplan, into, es, query_string, paramLI,
+ queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2fcec32dcb..ed783236eb 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -54,6 +54,7 @@
#include "nodes/queryjumble.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -91,6 +92,7 @@ static bool ExecCheckPermissionsModified(Oid relOid, Oid userid,
AclMode requiredPerms);
static void ExecCheckXactReadOnly(PlannedStmt *plannedstmt);
static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
+static inline bool ExecShouldLockRelations(EState *estate);
/* end of local decls */
@@ -601,6 +603,21 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
(rte->rtekind == RTE_SUBQUERY &&
rte->relkind == RELKIND_VIEW));
+ /*
+ * Ensure that we have at least an AccessShareLock on relations
+ * whose permissions need to be checked.
+ *
+ * Skip this check in a parallel worker because locks won't be
+ * taken until ExecInitNode() performs plan initialization.
+ *
+ * XXX: ExecCheckPermissions() in a parallel worker may be
+ * redundant with the checks done in the leader process, so this
+ * should be reviewed to ensure it’s necessary.
+ */
+ Assert(IsParallelWorker() ||
+ CheckRelationOidLockedByMe(rte->relid, AccessShareLock,
+ true));
+
(void) getRTEPermissionInfo(rteperminfos, rte);
/* Many-to-one mapping not allowed */
Assert(!bms_is_member(rte->perminfoindex, indexset));
@@ -862,12 +879,46 @@ ExecDoInitialPruning(EState *estate)
* bitmapset or NULL as described in the header comment.
*/
if (prunestate->do_initial_prune)
- validsubplans = ExecFindMatchingSubPlans(prunestate, true);
+ {
+ Bitmapset *validsubplan_rtis = NULL;
+
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (ExecShouldLockRelations(estate))
+ {
+ int rtindex = -1;
+
+ rtindex = -1;
+ while ((rtindex = bms_next_member(validsubplan_rtis,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, estate);
+
+ Assert(rte->rtekind == RTE_RELATION &&
+ rte->rellockmode != NoLock);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+ estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
+ validsubplan_rtis);
+ }
+
estate->es_part_prune_results = lappend(estate->es_part_prune_results,
validsubplans);
}
}
+/*
+ * Locks might be needed only if running a cached plan that might contain
+ * unlocked relations, such as reused generic plans.
+ */
+static inline bool
+ExecShouldLockRelations(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? false :
+ CachedPlanRequiresLocking(estate->es_cachedplan);
+}
+
/* ----------------------------------------------------------------
* InitPlan
*
@@ -880,6 +931,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ CachedPlan *cachedplan = queryDesc->cplan;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -899,6 +951,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+ estate->es_cachedplan = cachedplan;
+ estate->es_unpruned_relids = bms_copy(plannedstmt->unprunableRelids);
/*
* Perform runtime "initial" pruning to identify which child subplans,
@@ -908,6 +962,9 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* executed, are saved in es_part_prune_results. These results correspond
* to each PartitionPruneInfo entry, and the es_part_prune_results list is
* parallel to es_part_prune_infos.
+ *
+ * This will also add the RT indexes of surviving leaf partitions to
+ * es_unpruned_relids.
*/
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
ExecDoInitialPruning(estate);
@@ -926,8 +983,13 @@ InitPlan(QueryDesc *queryDesc, int eflags)
Relation relation;
ExecRowMark *erm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
/* get relation's OID (will produce InvalidOid if subquery) */
@@ -2970,6 +3032,13 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
}
}
+ /*
+ * Copy es_unpruned_relids so that RowMarks of pruned relations are
+ * ignored in ExecInitLockRows() and ExecInitModifyTable() when
+ * initializing the plan trees below.
+ */
+ rcestate->es_unpruned_relids = parentestate->es_unpruned_relids;
+
/*
* Initialize private state information for each SubPlan. We must do this
* before running ExecInitNode on the main query tree, since
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index b01a2fdfdd..7519c9a860 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1257,8 +1257,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /* Create a QueryDesc for the query. */
+ /*
+ * Create a QueryDesc for the query. We pass NULL for cachedplan, because
+ * we don't have a pointer to the CachedPlan in the leader's process. It's
+ * fine because the only reason the executor needs to see it is to decide
+ * if it should take locks on certain relations, but paraller workers
+ * always take locks anyway.
+ */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 83d1b61101..802a16b6fa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -192,7 +192,8 @@ static void find_matching_subplans_recurse(PlanState *parent_plan,
PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **validsubplan_rtis);
/*
@@ -1987,8 +1988,8 @@ ExecCreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
* The set of partitions that exist now might not be the same that
* existed when the plan was made. The normal case is that it is;
* optimize for that case with a quick comparison, and just copy
- * the subplan_map and make subpart_map point to the one in
- * PruneInfo.
+ * the subplan_map and make subpart_map, leafpart_rti_map point to
+ * the ones in PruneInfo.
*
* For the case where they aren't identical, we could have more
* partitions on either side; or even exactly the same number of
@@ -2007,6 +2008,7 @@ ExecCreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
sizeof(int) * partdesc->nparts) == 0)
{
pprune->subpart_map = pinfo->subpart_map;
+ pprune->leafpart_rti_map = pinfo->leafpart_rti_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
}
@@ -2027,6 +2029,7 @@ ExecCreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
* mismatches.
*/
pprune->subpart_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->leafpart_rti_map = palloc(sizeof(int) * partdesc->nparts);
for (pp_idx = 0; pp_idx < partdesc->nparts; pp_idx++)
{
@@ -2044,6 +2047,8 @@ ExecCreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->leafpart_rti_map[pp_idx] =
+ pinfo->leafpart_rti_map[pd_idx];
pd_idx++;
continue;
}
@@ -2081,6 +2086,7 @@ ExecCreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map[pp_idx] = -1;
pprune->subplan_map[pp_idx] = -1;
+ pprune->leafpart_rti_map[pp_idx] = 0;
}
}
@@ -2359,10 +2365,13 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * valisubplan_rtis must be non-NULL if initial_prune is true.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **validsubplan_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2398,7 +2407,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunestate->parent_plan,
prunedata, pprune, initial_prune,
- &result);
+ &result, validsubplan_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_context.is_valid)
@@ -2415,6 +2424,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (validsubplan_rtis)
+ *validsubplan_rtis = bms_copy(*validsubplan_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2425,14 +2436,17 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and the RT indexes
+ * of their corresponding leaf partitions to *validsubplan_rtis if
+ * it's non-NULL.
*/
static void
find_matching_subplans_recurse(PlanState *parent_plan,
PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **validsubplan_rtis)
{
Bitmapset *partset;
int i;
@@ -2475,8 +2489,13 @@ find_matching_subplans_recurse(PlanState *parent_plan,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ if (validsubplan_rtis)
+ *validsubplan_rtis = bms_add_member(*validsubplan_rtis,
+ pprune->leafpart_rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2485,7 +2504,8 @@ find_matching_subplans_recurse(PlanState *parent_plan,
find_matching_subplans_recurse(parent_plan,
prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ validsubplan_rtis);
else
{
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 692854e2b3..6f6f45e0ad 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -840,6 +840,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index de7ebab5c2..006bdafaea 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -581,7 +581,7 @@ choose_next_subplan_locally(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
}
@@ -648,7 +648,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
/*
@@ -724,7 +724,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
mark_invalid_subplans_as_finished(node);
@@ -877,7 +877,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
node->as_valid_subplans_identified = true;
classify_matching_subplans(node);
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 41754ddfea..cfead7ded2 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -347,8 +347,13 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
ExecRowMark *erm;
ExecAuxRowMark *aerm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
/* find ExecRowMark and build ExecAuxRowMark */
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 3ed91808dd..f7821aa178 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -219,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 1161520f76..004273f868 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -636,7 +636,7 @@ ExecInitUpdateProjection(ModifyTableState *mtstate,
Assert(whichrel >= 0 && whichrel < mtstate->mt_nrels);
}
- updateColnos = (List *) list_nth(node->updateColnosLists, whichrel);
+ updateColnos = (List *) list_nth(mtstate->mt_updateColnosLists, whichrel);
/*
* For UPDATE, we use the old tuple to fill up missing values in the tuple
@@ -4245,6 +4245,7 @@ ExecLookupResultRelByOid(ModifyTableState *node, Oid resultoid,
node->mt_lastResultOid = resultoid;
node->mt_lastResultIndex = mtlookup->relationIndex;
}
+
return node->resultRelInfo + mtlookup->relationIndex;
}
}
@@ -4282,7 +4283,11 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
ModifyTableState *mtstate;
Plan *subplan = outerPlan(node);
CmdType operation = node->operation;
- int nrels = list_length(node->resultRelations);
+ int nrels;
+ List *resultRelations = NIL;
+ List *withCheckOptionLists = NIL;
+ List *returningLists = NIL;
+ List *updateColnosLists = NIL;
ResultRelInfo *resultRelInfo;
List *arowmarks;
ListCell *l;
@@ -4292,6 +4297,45 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* check for unsupported flags */
Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+ /*
+ * Only consider unpruned relations for initializing their ResultRelInfo
+ * struct and other fields such as withCheckOptions, etc.
+ */
+ i = 0;
+ foreach(l, node->resultRelations)
+ {
+ Index rti = lfirst_int(l);
+
+ if (bms_is_member(rti, estate->es_unpruned_relids))
+ {
+ resultRelations = lappend_int(resultRelations, rti);
+ if (node->withCheckOptionLists)
+ {
+ List *withCheckOptions = list_nth_node(List,
+ node->withCheckOptionLists,
+ i);
+
+ withCheckOptionLists = lappend(withCheckOptionLists, withCheckOptions);
+ }
+ if (node->returningLists)
+ {
+ List *returningList = list_nth_node(List,
+ node->returningLists,
+ i);
+
+ returningLists = lappend(returningLists, returningList);
+ }
+ if (node->updateColnosLists)
+ {
+ List *updateColnosList = list_nth(node->updateColnosLists, i);
+
+ updateColnosLists = lappend(updateColnosLists, updateColnosList);
+ }
+ }
+ i++;
+ }
+ nrels = list_length(resultRelations);
+
/*
* create state structure
*/
@@ -4312,6 +4356,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
mtstate->mt_merge_inserted = 0;
mtstate->mt_merge_updated = 0;
mtstate->mt_merge_deleted = 0;
+ mtstate->mt_updateColnosLists = updateColnosLists;
/*----------
* Resolve the target relation. This is the same as:
@@ -4329,6 +4374,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
*/
if (node->rootRelation > 0)
{
+ Assert(bms_is_member(node->rootRelation, estate->es_unpruned_relids));
mtstate->rootResultRelInfo = makeNode(ResultRelInfo);
ExecInitResultRelation(estate, mtstate->rootResultRelInfo,
node->rootRelation);
@@ -4343,7 +4389,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
- node->epqParam, node->resultRelations);
+ node->epqParam, resultRelations);
mtstate->fireBSTriggers = true;
/*
@@ -4361,7 +4407,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
*/
resultRelInfo = mtstate->resultRelInfo;
i = 0;
- foreach(l, node->resultRelations)
+ foreach(l, resultRelations)
{
Index resultRelation = lfirst_int(l);
List *mergeActions = NIL;
@@ -4505,7 +4551,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Initialize any WITH CHECK OPTION constraints if needed.
*/
resultRelInfo = mtstate->resultRelInfo;
- foreach(l, node->withCheckOptionLists)
+ foreach(l, withCheckOptionLists)
{
List *wcoList = (List *) lfirst(l);
List *wcoExprs = NIL;
@@ -4528,7 +4574,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/*
* Initialize RETURNING projections if needed.
*/
- if (node->returningLists)
+ if (returningLists)
{
TupleTableSlot *slot;
ExprContext *econtext;
@@ -4537,7 +4583,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Initialize result tuple slot and assign its rowtype using the first
* RETURNING list. We assume the rest will look the same.
*/
- mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists);
+ mtstate->ps.plan->targetlist = (List *) linitial(returningLists);
/* Set up a slot for the output of the RETURNING projection(s) */
ExecInitResultTupleSlotTL(&mtstate->ps, &TTSOpsVirtual);
@@ -4552,7 +4598,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* Build a projection for each result rel.
*/
resultRelInfo = mtstate->resultRelInfo;
- foreach(l, node->returningLists)
+ foreach(l, returningLists)
{
List *rlist = (List *) lfirst(l);
@@ -4653,8 +4699,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
ExecRowMark *erm;
ExecAuxRowMark *aerm;
- /* ignore "parent" rowmarks; they are irrelevant at runtime */
- if (rc->isParent)
+ /*
+ * Ignore "parent" rowmarks, because they are irrelevant at
+ * runtime. Also ignore the rowmarks belonging to child tables
+ * that have been pruned in ExecDoInitialPruning().
+ */
+ if (rc->isParent ||
+ !bms_is_member(rc->rti, estate->es_unpruned_relids))
continue;
/* Find ExecRowMark and build ExecAuxRowMark */
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 2fb2e73604..e2b781e939 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2690,6 +2690,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index cce226fff1..c98895976e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -555,6 +555,8 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
+ result->unprunableRelids = bms_difference(glob->allRelids,
+ glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 8deb012d8e..a6899e100f 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -565,7 +565,8 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, List *rteperminfos,
/*
* If it's a plain relation RTE (or a subquery that was once a view
- * reference), add the relation OID to relationOids.
+ * reference), add the relation OID to relationOids. Also add its new RT
+ * index to the set of relations that need to be locked for execution.
*
* We do this even though the RTE might be unreferenced in the plan tree;
* this would correspond to cases such as views that were expanded, child
@@ -577,7 +578,11 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, List *rteperminfos,
*/
if (newrte->rtekind == RTE_RELATION ||
(newrte->rtekind == RTE_SUBQUERY && OidIsValid(newrte->relid)))
+ {
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
+ glob->allRelids = bms_add_member(glob->allRelids,
+ list_length(glob->finalrtable));
+ }
/*
* Add a copy of the RTEPermissionInfo, if any, corresponding to this RTE
@@ -1741,6 +1746,11 @@ set_customscan_references(PlannerInfo *root,
*
* Also update the RT indexes present in PartitionedRelPruneInfos to add the
* offset.
+ *
+ * Finally, if there are initial pruning steps, add the RT indexes of the
+ * leaf partitions to the set of relations prunable at execution startup time.
+ * This set indicates which relations should not be locked before executor
+ * startup, as they may be pruned during initial pruning.
*/
static int
register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
@@ -1763,8 +1773,25 @@ register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *prelinfo = lfirst(l2);
+ int i;
prelinfo->rtindex += rtoffset;
+
+ for (i = 0; i < prelinfo->nparts; i++)
+ {
+ /*
+ * Non-leaf partitions and partitions that do not have a
+ * subplan are not included in this map as mentioned in
+ * make_partitionedrel_pruneinfo().
+ */
+ if (prelinfo->leafpart_rti_map[i])
+ {
+ prelinfo->leafpart_rti_map[i] += rtoffset;
+ if (prelinfo->initial_pruning_steps)
+ glob->prunableRelids = bms_add_member(glob->prunableRelids,
+ prelinfo->leafpart_rti_map[i]);
+ }
+ }
}
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index df767f9e5b..5a518e99bc 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -645,6 +645,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ int *leafpart_rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -657,6 +658,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ leafpart_rti_map = (int *) palloc0(nparts * sizeof(int));
present_parts = NULL;
i = -1;
@@ -671,9 +673,28 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+
+ /*
+ * Track the RT indexes of "leaf" partitions so they can be
+ * included in the PlannerGlobal.prunableRelids set, indicating
+ * relations whose locking is deferred until executor startup.
+ *
+ * We don’t defer locking of sub-partitioned partitions because
+ * setting up PartitionedRelPruningData currently occurs before
+ * initial pruning, so the relation must be locked at that stage,
+ * even if it may be pruned.
+ *
+ * Only leaf partitions with a valid subplan that are prunable
+ * using initial pruning are added to prunableRelids. So
+ * partitions without a subplan due to constraint exclusion will
+ * remain in PlannedStmt.unprunableRelids and thus their locking
+ * will not be deferred even if they may ultimately be pruned due
+ * to initial pruning.
+ */
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
+ leafpart_rti_map[i] = (int) partrel->relid;
/* Record finding this subplan */
subplansfound = bms_add_member(subplansfound, subplanidx);
@@ -695,6 +716,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->leafpart_rti_map = leafpart_rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index a1f8d03db1..6e8f6b1b8f 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -36,6 +36,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +66,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +79,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->cplan = cplan; /* CachedPlan supplying the plannedstmt */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * cplan: CachedPlan supplying the plan
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ CachedPlan *cplan,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, cplan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -493,6 +498,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1276,6 +1282,7 @@ PortalRunMulti(Portal portal,
{
/* statement can set tag string */
ProcessQuery(pstmt,
+ portal->cplan,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1285,6 +1292,7 @@ PortalRunMulti(Portal portal,
{
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
+ portal->cplan,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 5af1a168ec..449fb8f4e2 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -104,7 +104,8 @@ static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ bool generic);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
@@ -815,8 +816,11 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * On a "true" return, we have acquired locks on the "unprunableRelids" set
+ * for all plans in plansource->stmt_list. However, the plans are not fully
+ * race-condition-free until the executor acquires locks on the prunable
+ * relations that survive initial runtime pruning during executor
+ * initialization.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -893,10 +897,10 @@ CheckCachedPlan(CachedPlanSource *plansource)
* or it can be set to NIL if we need to re-copy the plansource's query_list.
*
* To build a generic, parameter-value-independent plan, pass NULL for
- * boundParams. To build a custom plan, pass the actual parameter values via
- * boundParams. For best effect, the PARAM_FLAG_CONST flag should be set on
- * each parameter value; otherwise the planner will treat the value as a
- * hint rather than a hard constant.
+ * boundParams, and true for generic. To build a custom plan, pass the actual
+ * parameter values via boundParams, and false for generic. For best effect,
+ * the PARAM_FLAG_CONST flag should be set on each parameter value; otherwise
+ * the planner will treat the value as a hint rather than a hard constant.
*
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
@@ -904,7 +908,8 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ bool generic)
{
CachedPlan *plan;
List *plist;
@@ -1026,6 +1031,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->refcount = 0;
plan->context = plan_context;
plan->is_oneshot = plansource->is_oneshot;
+ plan->is_generic = generic;
plan->is_saved = false;
plan->is_valid = true;
@@ -1153,8 +1159,10 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid, but not all locks are acquired if a cached
+ * generic plan is being reused. In such cases, locks on relations subject
+ * to initial runtime pruning are deferred until the execution startup phase,
+ * specifically when ExecDoInitialPruning() performs initial pruning.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1196,7 +1204,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv, true);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1241,7 +1249,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv, false);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1776,7 +1784,7 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ int rtindex;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1794,13 +1802,16 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ rtindex = -1;
+ while ((rtindex = bms_next_member(plannedstmt->unprunableRelids,
+ rtindex)) >= 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry,
+ plannedstmt->rtable,
+ rtindex - 1);
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- continue;
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
/*
* Acquire the appropriate type of lock on each relation OID. Note
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 3ab0aae78f..21c71e0d53 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -103,8 +103,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
- ExplainState *es, const char *queryString,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ IntoClause *into, ExplainState *es,
+ const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 1497aed533..ec5cf4233e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -49,6 +49,8 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * leafpart_rti_map RT index by partition index, or 0 if not a leaf
+ * partition.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -69,6 +71,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ int *leafpart_rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -139,7 +142,8 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **validsubplan_rtis);
extern PartitionPruneState *ExecCreatePartitionPruneState(EState *estate,
PartitionPruneInfo *pruneinfo);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 0a7274e26c..0e7245435d 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ CachedPlan *cplan; /* CachedPlan that supplies the plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +58,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index b0ceb1ab05..ac9be82e19 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -42,6 +42,7 @@
#include "storage/condition_variable.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
+#include "utils/plancache.h"
#include "utils/reltrigger.h"
#include "utils/sharedtuplestore.h"
#include "utils/snapshot.h"
@@ -639,9 +640,14 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ CachedPlan *es_cachedplan;
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
List *es_part_prune_states; /* List of PartitionPruneState */
List *es_part_prune_results; /* List of Bitmapset */
+ Bitmapset *es_unpruned_relids; /* PlannedStmt.unprunableRelids + RT
+ * indexes of leaf partitions that
+ * survive initial pruning; see
+ * ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -1427,6 +1433,12 @@ typedef struct ModifyTableState
double mt_merge_inserted;
double mt_merge_updated;
double mt_merge_deleted;
+
+ /*
+ * List of valid updateColnosLists. Contains only those belonging to
+ * unpruned relations from ModifyTable.updateColnosLists.
+ */
+ List *mt_updateColnosLists;
} ModifyTableState;
/* ----------------
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c603a9bb1c..ab33b8faf9 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -116,6 +116,14 @@ typedef struct PlannerGlobal
/* "flat" rangetable for executor */
List *finalrtable;
+ /*
+ * RT indexes of all relation RTEs in finalrtable (RTE_RELATION and
+ * RTE_SUBQUERY RTEs of views) and of those that are subject to runtime
+ * pruning at plan initialization time ("initial" pruning).
+ */
+ Bitmapset *allRelids;
+ Bitmapset *prunableRelids;
+
/* "flat" list of RTEPermissionInfos */
List *finalrteperminfos;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index ef89927471..59699a1f86 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -74,6 +74,10 @@ typedef struct PlannedStmt
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *unprunableRelids; /* RT indexes of relations that are not
+ * subject to runtime pruning; set for
+ * AcquireExecutorLocks(). */
+
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
@@ -1476,6 +1480,9 @@ typedef struct PartitionedRelPruneInfo
/* subpart index by partition index, or -1 */
int *subpart_map pg_node_attr(array_size(nparts));
+ /* RT index by partition index, or 0 if not a leaf partition */
+ int *leafpart_rti_map pg_node_attr(array_size(nparts));
+
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a90dfdf906..e227c4f11b 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -149,6 +149,7 @@ typedef struct CachedPlan
int magic; /* should equal CACHEDPLAN_MAGIC */
List *stmt_list; /* list of PlannedStmts */
bool is_oneshot; /* is it a "oneshot" plan? */
+ bool is_generic; /* is it a reusable generic plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
Oid planRoleId; /* Role ID the plan was created for */
@@ -235,4 +236,21 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
+/*
+ * CachedPlanRequiresLocking: should the executor acquire additional locks?
+ *
+ * If the plan is a saved generic plan, the executor must acquire locks for
+ * relations that are not covered by AcquireExecutorLocks(), such as partitions
+ * that are subject to initial runtime pruning.
+ *
+ * Note: These locks are unnecessary if the plan is executed immediately after
+ * its creation, since the planner would have already acquired them. However,
+ * we do not optimize for that case.
+ */
+static inline bool
+CachedPlanRequiresLocking(CachedPlan *cplan)
+{
+ return !cplan->is_oneshot && cplan->is_generic;
+}
+
#endif /* PLANCACHE_H */
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 7a03b4e360..705cd922fc 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4440,3 +4440,47 @@ drop table hp_contradict_test;
drop operator class part_test_int4_ops2 using hash;
drop operator ===(int4, int4);
drop function explain_analyze(text);
+-- Runtime pruning on UPDATE using WITH CHECK OPTIONS and RETURNING
+create table part_abc (a int, b text, c bool) partition by list (a);
+create table part_abc_1 (b text, a int, c bool);
+create table part_abc_2 (a int, c bool, b text);
+alter table part_abc attach partition part_abc_1 for values in (1);
+alter table part_abc attach partition part_abc_2 for values in (2);
+insert into part_abc values (1, 'b', true);
+insert into part_abc values (2, 'c', true);
+create view part_abc_view as select * from part_abc where b <> 'a' with check option;
+prepare update_part_abc_view as update part_abc_view set b = $2 where a = $1 returning *;
+explain (costs off) execute update_part_abc_view (1, 'd');
+ QUERY PLAN
+-------------------------------------------------------
+ Update on part_abc
+ Update on part_abc_1
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on part_abc_1
+ Filter: ((b <> 'a'::text) AND (a = $1))
+(6 rows)
+
+execute update_part_abc_view (1, 'd');
+ a | b | c
+---+---+---
+ 1 | d | t
+(1 row)
+
+explain (costs off) execute update_part_abc_view (2, 'a');
+ QUERY PLAN
+-------------------------------------------------------
+ Update on part_abc
+ Update on part_abc_2 part_abc_1
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on part_abc_2 part_abc_1
+ Filter: ((b <> 'a'::text) AND (a = $1))
+(6 rows)
+
+execute update_part_abc_view (2, 'a');
+ERROR: new row violates check option for view "part_abc_view"
+DETAIL: Failing row contains (2, a, t).
+deallocate update_part_abc_view;
+drop view part_abc_view;
+drop table part_abc;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 442428d937..af26ad2fb2 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1339,3 +1339,21 @@ drop operator class part_test_int4_ops2 using hash;
drop operator ===(int4, int4);
drop function explain_analyze(text);
+
+-- Runtime pruning on UPDATE using WITH CHECK OPTIONS and RETURNING
+create table part_abc (a int, b text, c bool) partition by list (a);
+create table part_abc_1 (b text, a int, c bool);
+create table part_abc_2 (a int, c bool, b text);
+alter table part_abc attach partition part_abc_1 for values in (1);
+alter table part_abc attach partition part_abc_2 for values in (2);
+insert into part_abc values (1, 'b', true);
+insert into part_abc values (2, 'c', true);
+create view part_abc_view as select * from part_abc where b <> 'a' with check option;
+prepare update_part_abc_view as update part_abc_view set b = $2 where a = $1 returning *;
+explain (costs off) execute update_part_abc_view (1, 'd');
+execute update_part_abc_view (1, 'd');
+explain (costs off) execute update_part_abc_view (2, 'a');
+execute update_part_abc_view (2, 'a');
+deallocate update_part_abc_view;
+drop view part_abc_view;
+drop table part_abc;
--
2.43.0
v57-0001-Move-PartitionPruneInfo-out-of-plan-nodes-into-P.patchapplication/octet-stream; name=v57-0001-Move-PartitionPruneInfo-out-of-plan-nodes-into-P.patchDownload
From 0e7344da196e8f20ebe46c5b8104720e1e3725fa Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Wed, 23 Oct 2024 15:37:32 +0900
Subject: [PATCH v57 1/5] Move PartitionPruneInfo out of plan nodes into
PlannedStmt
This change moves PartitionPruneInfo from individual plan nodes to
PlannedStmt, enabling runtime initial pruning to be performed across
the entire plan tree without traversing it to find nodes containing
PartitionPruneInfos.
The PartitionPruneInfo pointer fields in Append and MergeAppend nodes
have been replaced with an integer index that points to a list of
PartitionPruneInfos within PlannedStmt, which now holds the
PartitionPruneInfos for all subqueries.
A bitmapset field has been added to PartitionPruneInfo to store the RT
indexes that correspond to the apprelids field in Append or
MergeAppend. This ensures that the execution pruning logic
cross-checks that it operates on the correct plan node.
Duplicated code in set_append_references() and
set_mergeappend_references() has been moved to a new function,
register_pruneinfo(), which both updates the RT indexes by adding
rtoffset and adds the PartitionPruneInfo to the global list in
PlannerGlobal.
Reviewed-by: Alvaro Herrera
Reviewed-by: Robert Haas
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 19 +++++-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/nodeAppend.c | 5 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/optimizer/plan/createplan.c | 23 +++----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 85 ++++++++++++++++---------
src/backend/partitioning/partprune.c | 19 ++++--
src/include/executor/execPartition.h | 4 +-
src/include/nodes/execnodes.h | 1 +
src/include/nodes/pathnodes.h | 6 ++
src/include/nodes/plannodes.h | 16 +++--
src/include/partitioning/partprune.h | 8 +--
15 files changed, 133 insertions(+), 62 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index cc9a594cba..c460c6aa32 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -851,6 +851,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index bfb3419efb..b01a2fdfdd 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -181,6 +181,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->permInfos = estate->es_rteperminfos;
pstmt->resultRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7651886229..323d5330ff 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1786,6 +1786,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* Initialize data structure needed for run-time partition pruning and
* do initial pruning if needed
*
+ * 'relids' identifies the relation to which both the parent plan and the
+ * PartitionPruneInfo given by 'part_prune_index' belong.
+ *
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
* Initial pruning is performed here if needed and in that case only the
@@ -1798,11 +1801,25 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
+ Bitmapset *relids,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo;
+
+ /* Obtain the pruneinfo we need, and make sure it's the right one */
+ pruneinfo = list_nth_node(PartitionPruneInfo, estate->es_part_prune_infos,
+ part_prune_index);
+ if (!bms_equal(relids, pruneinfo->relids))
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("mismatching PartitionPruneInfo found at part_prune_index %d",
+ part_prune_index),
+ errdetail_internal("plan node relids %s, pruneinfo relids %s",
+ bmsToString(relids),
+ bmsToString(pruneinfo->relids)));
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 740e8fb148..bc905a0cdc 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -118,6 +118,7 @@ CreateExecutorState(void)
estate->es_rowmarks = NULL;
estate->es_rteperminfos = NIL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index ca0f54d676..de7ebab5c2 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
+ node->apprelids,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index e1b9b984a7..3ed91808dd 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
+ node->apprelids,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f2ed0d81f6..fafcb8f1ad 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1227,7 +1227,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1378,6 +1377,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1401,16 +1403,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1449,7 +1449,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1542,6 +1541,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1557,13 +1559,12 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
Assert(best_path->path.param_info == NULL);
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0f423e9684..cce226fff1 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -553,6 +553,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 91c7c4fe2f..8deb012d8e 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1732,6 +1732,47 @@ set_customscan_references(PlannerInfo *root,
cscan->custom_relids = offset_relid_set(cscan->custom_relids, rtoffset);
}
+/*
+ * register_partpruneinfo
+ * Subroutine for set_append_references and set_mergeappend_references
+ *
+ * Add the PartitionPruneInfo from root->partPruneInfos at the given index
+ * into PlannerGlobal->partPruneInfos and return its index there.
+ *
+ * Also update the RT indexes present in PartitionedRelPruneInfos to add the
+ * offset.
+ */
+static int
+register_partpruneinfo(PlannerInfo *root, int part_prune_index, int rtoffset)
+{
+ PlannerGlobal *glob = root->glob;
+ PartitionPruneInfo *pinfo;
+ ListCell *l;
+
+ Assert(part_prune_index >= 0 &&
+ part_prune_index < list_length(root->partPruneInfos));
+ pinfo = list_nth_node(PartitionPruneInfo, root->partPruneInfos,
+ part_prune_index);
+
+ pinfo->relids = offset_relid_set(pinfo->relids, rtoffset);
+ foreach(l, pinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *prelinfo = lfirst(l2);
+
+ prelinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pinfo);
+
+ return list_length(glob->partPruneInfos) - 1;
+}
+
/*
* set_append_references
* Do set_plan_references processing on an Append
@@ -1784,21 +1825,13 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * Add PartitionPruneInfo, if any, to PlannerGlobal and update the index.
+ * Also update the RT indexes present in it to add the offset.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index =
+ register_partpruneinfo(root, aplan->part_prune_index, rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1860,21 +1893,13 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * Add PartitionPruneInfo, if any, to PlannerGlobal and update the index.
+ * Also update the RT indexes present in it to add the offset.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index =
+ register_partpruneinfo(root, mplan->part_prune_index, rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9a1a7faac7..6f0ead1fa8 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -207,16 +207,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -330,10 +334,11 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
+ pruneinfo->relids = bms_copy(parentrel->relids);
pruneinfo->prune_infos = prunerelinfos;
/*
@@ -356,7 +361,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index c09bc83b2a..ed2b019c09 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,9 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
+ Bitmapset *relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e4698a28c4..5deed9232a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -639,6 +639,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index add0f9e45f..c603a9bb1c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -128,6 +128,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* List of PartitionPruneInfo contained in the plan */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -559,6 +562,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 52f29bcdb6..ef89927471 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -69,6 +69,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in the
+ * plan */
+
List *rtable; /* list of RangeTblEntry nodes */
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
@@ -276,8 +279,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -311,8 +314,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
@@ -1414,6 +1417,10 @@ typedef struct PlanRowMark
* Then, since an Append-type node could have multiple partitioning
* hierarchies among its children, we have an unordered List of those Lists.
*
+ * relids RelOptInfo.relids of the parent plan node (e.g. Append
+ * or MergeAppend) to which this PartitionPruneInfo node
+ * belongs. The pruning logic ensures that this matches
+ * the parent plan node's apprelids.
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
@@ -1426,6 +1433,7 @@ typedef struct PartitionPruneInfo
pg_node_attr(no_equal, no_query_jumble)
NodeTag type;
+ Bitmapset *relids;
List *prune_infos;
Bitmapset *other_subplans;
} PartitionPruneInfo;
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index bd490d154f..6922e04430 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.43.0
v57-0005-Handle-CachedPlan-invalidation-in-the-executor.patchapplication/octet-stream; name=v57-0005-Handle-CachedPlan-invalidation-in-the-executor.patchDownload
From 7423b46ffe28f0def985f49932704504c71ca8e1 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 18 Oct 2024 22:06:02 +0900
Subject: [PATCH v57 5/5] Handle CachedPlan invalidation in the executor
This commit makes changes to handle cases where a cached plan
becomes invalid after deferred locks on prunable relations are taken.
InitPlan() now returns immediately without doing anything after
finding that the locks taken by ExecDoInitialPruning() have
invalidated the CachedPlan.
ExecutorStartExt(), a wrapper over ExecutorStart(), is added to
handle cases where InitPlan() returns early due to plan invalidation.
ExecutorStartExt() updates the CachedPlan to create fresh plans for
all queries contained it its owning CachedPlanSource and
retries execution with the new plan for the query. This new function
is only called by sites that use plancache.c for getting a plan.
To update an invalid CachedPlan, ExecutorStartExt() calls the new
plancache.c function UpdateCachedPlan(), which creates fresh plans
for each query in the CachedPlanSource and replaces the old stale
ones in CachedPlan.stmt_list in place. This leads to the old ones
leaking into CachedPlan.plan_context, but UpdateCachedPlan() should
be called fairly rarely for this to amount to huge amount of leaked
memory.
This also adds isolation tests using the delay_execution test module
to verify scenarios where a CachedPlan becomes invalid before the
deferred locks are taken.
All ExecutorStart_hook implementations now must add the following
block after the ExecutorStart() call to ensure it doesn't work with an
invalid plan:
/* The plan may have become invalid during ExecutorStart() */
if (!ExecPlanStillValid(queryDesc->estate))
return;
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.comk
---
contrib/auto_explain/auto_explain.c | 4 +
.../pg_stat_statements/pg_stat_statements.c | 4 +
src/backend/commands/explain.c | 8 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 10 +-
src/backend/commands/trigger.c | 14 +
src/backend/executor/README | 35 ++-
src/backend/executor/execMain.c | 103 ++++++-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/spi.c | 19 +-
src/backend/tcop/postgres.c | 4 +-
src/backend/tcop/pquery.c | 20 +-
src/backend/utils/cache/plancache.c | 125 +++++++-
src/backend/utils/mmgr/portalmem.c | 4 +-
src/include/commands/explain.h | 1 +
src/include/commands/trigger.h | 1 +
src/include/executor/executor.h | 16 +
src/include/nodes/execnodes.h | 1 +
src/include/utils/plancache.h | 25 ++
src/include/utils/portal.h | 4 +-
src/test/modules/delay_execution/Makefile | 3 +-
.../modules/delay_execution/delay_execution.c | 63 +++-
.../expected/cached-plan-inval.out | 282 ++++++++++++++++++
src/test/modules/delay_execution/meson.build | 1 +
.../specs/cached-plan-inval.spec | 80 +++++
25 files changed, 787 insertions(+), 42 deletions(-)
create mode 100644 src/test/modules/delay_execution/expected/cached-plan-inval.out
create mode 100644 src/test/modules/delay_execution/specs/cached-plan-inval.spec
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 677c135f59..9eb5e9a619 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -300,6 +300,10 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
+ /* The plan may have become invalid during standard_ExecutorStart() */
+ if (!ExecPlanStillValid(queryDesc->estate))
+ return;
+
if (auto_explain_enabled())
{
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 21b26b7b6e..0bddcf8a48 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -997,6 +997,10 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
else
standard_ExecutorStart(queryDesc, eflags);
+ /* The plan may have become invalid during standard_ExecutorStart() */
+ if (!ExecPlanStillValid(queryDesc->estate))
+ return;
+
/*
* If query has queryId zero, don't track it. This prevents double
* counting of optimizable statements that are directly contained in
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index b699089bd8..07781ce915 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -515,7 +515,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, NULL, -1, into, es, queryString, params,
+ queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -624,6 +625,7 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
*/
void
ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int query_index,
IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
@@ -694,8 +696,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
if (into)
eflags |= GetIntoRelEFlags(into);
- /* call ExecutorStart to prepare the plan for execution */
- ExecutorStart(queryDesc, eflags);
+ /* Call ExecutorStartExt to prepare the plan for execution. */
+ ExecutorStartExt(queryDesc, eflags, plansource, query_index);
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 4f6acf6719..4b1503c05e 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -107,6 +107,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 311b9ebd5b..4cd79a6e3a 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -202,7 +202,8 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- cplan);
+ cplan,
+ entry->plansource);
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
@@ -583,6 +584,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ int i = 0;
if (es->memory)
{
@@ -655,8 +657,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, cplan, into, es, query_string, paramLI,
- queryEnv,
+ ExplainOnePlan(pstmt, cplan, entry->plansource, i,
+ into, es, query_string, paramLI, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
@@ -668,6 +670,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Separate plans with an appropriate separator */
if (lnext(plan_list, p) != NULL)
ExplainSeparatePlans(es);
+
+ i++;
}
if (estate)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 09356e46d1..79572ec8f1 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5123,6 +5123,20 @@ AfterTriggerEndQuery(EState *estate)
afterTriggers.query_depth--;
}
+/* ----------
+ * AfterTriggerAbortQuery()
+ *
+ * Called by ExecutorEnd() if the query execution was aborted due to the
+ * plan becoming invalid during initialization.
+ * ----------
+ */
+void
+AfterTriggerAbortQuery(void)
+{
+ /* Revert the actions of AfterTriggerBeginQuery(). */
+ afterTriggers.query_depth--;
+}
+
/*
* AfterTriggerFreeQuery
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 642d63be61..c76a00b394 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,28 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
+Relation Locking
+----------------
+
+Typically, when the executor initializes a plan tree for execution, it doesn't
+lock non-index relations if the plan tree is freshly generated and not derived
+from a CachedPlan. This is because such locks have already been established
+during the query's parsing, rewriting, and planning phases. However, with a
+cached plan tree, some relations may remain unlocked. The function
+AcquireExecutorLocks() only locks unprunable relations in the plan, deferring
+the locking of prunable ones to executor initialization. This avoids
+unnecessary locking of relations that will be pruned during "initial" runtime
+pruning in ExecDoInitialPruning().
+
+This approach creates a window where a cached plan tree with child tables
+could become outdated if another backend modifies these tables before
+ExecDoInitialPruning() locks them. As a result, the executor has the added duty
+to verify the plan tree's validity whenever it locks a child table after
+doing initial pruning. This validation is done by checking the CachedPlan.is_valid
+attribute. If the plan tree is outdated (is_valid=false), the executor halts
+further initialization, cleans up anything in EState that would have been
+allocated up to that point, and retries execution after recreating the
+invalid plan in the CachedPlan.
Query Processing Control Flow
-----------------------------
@@ -288,11 +310,13 @@ This is a sketch of control flow for full query processing:
CreateQueryDesc
- ExecutorStart
+ ExecutorStart or ExecutorStartExt
CreateExecutorState
creates per-query context
- switch to per-query context to run ExecInitNode
+ switch to per-query context to run ExecDoInitialPruning and ExecInitNode
AfterTriggerBeginQuery
+ ExecDoInitialPruning
+ does initial pruning and locks surviving partitions if needed
ExecInitNode --- recursively scans plan tree
ExecInitNode
recurse into subsidiary nodes
@@ -316,7 +340,12 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-Per above comments, it's not really critical for ExecEndNode to free any
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale after locking partitions in ExecDoInitialPruning(), the control is
+immediately returned to ExecutorStartExt(), which will create a new plan tree
+and perform the steps starting from CreateExecutorState() again.
+
+Per above comments, it's not really critical for ExecEndPlan to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ed783236eb..5427bdfd4c 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -60,6 +60,7 @@
#include "utils/backend_status.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
+#include "utils/plancache.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
@@ -138,6 +139,63 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
standard_ExecutorStart(queryDesc, eflags);
}
+/*
+ * ExecutorStartExt
+ * Start query execution, replanning if the plan is invalidated due to
+ * locks taken during initialization, which can occur when the plan is
+ * from a CachedPlan.
+ *
+ * This function is a variant of ExecutorStart() that handles cases where
+ * the CachedPlan might become invalid during initialization, particularly
+ * when prunable relations are locked. If locks taken during ExecutorStart()
+ * invalidate the plan, the function calls UpdateCachedPlan() to replan all
+ * queries in the CachedPlan, including the query at query_index, and then
+ * retries initialization.
+ *
+ * The function repeats the process until ExecutorStart() successfully
+ * initializes the query at query_index with a valid plan. If invalidation
+ * occurs, the current execution state is cleaned up by calling ExecutorEnd(),
+ * and the plan is updated by UpdateCachedPlan(). The loop exits once the
+ * query is successfully initialized with a valid CachedPlan.
+ */
+void
+ExecutorStartExt(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource,
+ int query_index)
+{
+ if (queryDesc->cplan == NULL)
+ {
+ ExecutorStart(queryDesc, eflags);
+ return;
+ }
+
+ /*
+ * For a CachedPlan, locks acquired during ExecutorStart() may invalidate it.
+ * Therefore, we must loop and retry with an updated plan until no further
+ * invalidation occurs.
+ */
+ while (1)
+ {
+ ExecutorStart(queryDesc, eflags);
+ if (!CachedPlanValid(queryDesc->cplan))
+ {
+ /*
+ * Clean up the current execution state before creating the new
+ * plan to retry ExecutorStart(). Mark execution as aborted to
+ * ensure that AFTER trigger state is properly reset.
+ */
+ queryDesc->estate->es_aborted = true;
+ ExecutorEnd(queryDesc);
+
+ queryDesc->plannedstmt = UpdateCachedPlan(plansource, query_index,
+ queryDesc->queryEnv);
+ }
+ else
+ /* Exit the loop if the plan is initialized successfully. */
+ break;
+ }
+}
+
void
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
@@ -321,6 +379,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
estate = queryDesc->estate;
Assert(estate != NULL);
+ Assert(!estate->es_aborted);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* caller must ensure the query's snapshot is active */
@@ -427,8 +486,11 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
Assert(estate != NULL);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
- /* This should be run once and only once per Executor instance */
- Assert(!estate->es_finished);
+ /*
+ * This should be run once and only once per Executor instance and never
+ * if the execution was aborted.
+ */
+ Assert(!estate->es_finished && !estate->es_aborted);
/* Switch into per-query memory context */
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -487,11 +549,10 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
Assert(estate != NULL);
/*
- * Check that ExecutorFinish was called, unless in EXPLAIN-only mode. This
- * Assert is needed because ExecutorFinish is new as of 9.1, and callers
- * might forget to call it.
+ * Check that ExecutorFinish was called, unless in EXPLAIN-only mode or if
+ * execution was aborted.
*/
- Assert(estate->es_finished ||
+ Assert(estate->es_finished || estate->es_aborted ||
(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -505,6 +566,14 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
UnregisterSnapshot(estate->es_snapshot);
UnregisterSnapshot(estate->es_crosscheck_snapshot);
+ /*
+ * Reset AFTER trigger module if the query execution was aborted.
+ */
+ if (estate->es_aborted &&
+ !(estate->es_top_eflags &
+ (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
+ AfterTriggerAbortQuery();
+
/*
* Must switch out of context before destroying it
*/
@@ -862,6 +931,7 @@ static void
ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
+ List *locked_relids = NIL;
foreach(lc, estate->es_part_prune_infos)
{
@@ -897,6 +967,7 @@ ExecDoInitialPruning(EState *estate)
Assert(rte->rtekind == RTE_RELATION &&
rte->rellockmode != NoLock);
LockRelationOid(rte->relid, rte->rellockmode);
+ locked_relids = lappend_int(locked_relids, rtindex);
}
}
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
@@ -906,6 +977,20 @@ ExecDoInitialPruning(EState *estate)
estate->es_part_prune_results = lappend(estate->es_part_prune_results,
validsubplans);
}
+
+ /*
+ * Release the useless locks if the plan won't be executed. This is the
+ * same as what CheckCachedPlan() in plancache.c does.
+ */
+ if (!ExecPlanStillValid(estate))
+ {
+ foreach(lc, locked_relids)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(lfirst_int(lc), estate);
+
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
}
/*
@@ -969,6 +1054,9 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
ExecDoInitialPruning(estate);
+ if (!ExecPlanStillValid(estate))
+ return;
+
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
*/
@@ -2961,6 +3049,9 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
* result-rel info, etc.
+ *
+ * es_cachedplan is not copied because EPQ plan execution does not acquire
+ * any new locks that could invalidate the CachedPlan.
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index bc905a0cdc..b7c914d66c 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -147,6 +147,7 @@ CreateExecutorState(void)
estate->es_top_eflags = 0;
estate->es_instrument = 0;
estate->es_finished = false;
+ estate->es_aborted = false;
estate->es_exprcontexts = NIL;
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index e2b781e939..70ab0ece1d 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -70,7 +70,8 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
+static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index);
static void _SPI_error_callback(void *arg);
@@ -1685,7 +1686,8 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- cplan);
+ cplan,
+ plansource);
/*
* Set up options for portal. Default SCROLL type is chosen the same way
@@ -2500,6 +2502,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ int i = 0;
spicallbackarg.query = plansource->query_string;
@@ -2697,8 +2700,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
options->params,
_SPI_current->queryEnv,
0);
- res = _SPI_pquery(qdesc, fire_triggers,
- canSetTag ? options->tcount : 0);
+
+ res = _SPI_pquery(qdesc, fire_triggers, canSetTag ? options->tcount : 0,
+ plansource, i);
FreeQueryDesc(qdesc);
}
else
@@ -2795,6 +2799,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
my_res = res;
goto fail;
}
+
+ i++;
}
/* Done with this plan, so release refcount */
@@ -2872,7 +2878,8 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
+_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
+ CachedPlanSource *plansource, int query_index)
{
int operation = queryDesc->operation;
int eflags;
@@ -2928,7 +2935,7 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
else
eflags = EXEC_FLAG_SKIP_TRIGGERS;
- ExecutorStart(queryDesc, eflags);
+ ExecutorStartExt(queryDesc, eflags, plansource, query_index);
ExecutorRun(queryDesc, ForwardScanDirection, tcount, true);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7f5eada9d4..3b98248ad4 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1237,6 +1237,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2039,7 +2040,8 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- cplan);
+ cplan,
+ psrc);
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 6e8f6b1b8f..ee5eea4ce1 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,6 +19,7 @@
#include "access/xact.h"
#include "commands/prepare.h"
+#include "executor/execdesc.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -37,6 +38,8 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -126,6 +129,8 @@ FreeQueryDesc(QueryDesc *qdesc)
*
* plan: the plan tree for the query
* cplan: CachedPlan supplying the plan
+ * plansource: CachedPlanSource supplying the cplan
+ * query_index: index of the query in plansource->query_list
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -139,6 +144,8 @@ FreeQueryDesc(QueryDesc *qdesc)
static void
ProcessQuery(PlannedStmt *plan,
CachedPlan *cplan,
+ CachedPlanSource *plansource,
+ int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -157,7 +164,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Call ExecutorStart to prepare the plan for execution
*/
- ExecutorStart(queryDesc, 0);
+ ExecutorStartExt(queryDesc, 0, plansource, query_index);
/*
* Run the plan to completion.
@@ -518,9 +525,9 @@ PortalStart(Portal portal, ParamListInfo params,
myeflags = eflags;
/*
- * Call ExecutorStart to prepare the plan for execution
+ * Call ExecutorStartExt() to prepare the plan for execution.
*/
- ExecutorStart(queryDesc, myeflags);
+ ExecutorStartExt(queryDesc, myeflags, portal->plansource, 0);
/*
* This tells PortalCleanup to shut down the executor
@@ -1201,6 +1208,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int i = 0;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1283,6 +1291,8 @@ PortalRunMulti(Portal portal,
/* statement can set tag string */
ProcessQuery(pstmt,
portal->cplan,
+ portal->plansource,
+ i,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1293,6 +1303,8 @@ PortalRunMulti(Portal portal,
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
portal->cplan,
+ portal->plansource,
+ i,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1357,6 +1369,8 @@ PortalRunMulti(Portal portal,
*/
if (lnext(portal->stmts, stmtlist_item) != NULL)
CommandCounterIncrement();
+
+ i++;
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 449fb8f4e2..d3e78afd97 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -101,7 +101,8 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ bool release_generic);
static bool CheckCachedPlan(CachedPlanSource *plansource);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv,
@@ -579,10 +580,17 @@ ReleaseGenericPlan(CachedPlanSource *plansource)
* The result value is the transient analyzed-and-rewritten query tree if we
* had to do re-analysis, and NIL otherwise. (This is returned just to save
* a tree copying step in a subsequent BuildCachedPlan call.)
+ *
+ * This also releases and drops the generic plan (plansource->gplan), if any,
+ * as most callers will typically build a new CachedPlan for the plansource
+ * right after this. However, when called from UpdateCachedPlan(), the
+ * function does not release the generic plan, as UpdateCachedPlan() updates
+ * an existing CachedPlan in place.
*/
static List *
RevalidateCachedQuery(CachedPlanSource *plansource,
- QueryEnvironment *queryEnv)
+ QueryEnvironment *queryEnv,
+ bool release_generic)
{
bool snapshot_set;
RawStmt *rawtree;
@@ -679,8 +687,9 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
MemoryContextDelete(qcxt);
}
- /* Drop the generic plan reference if any */
- ReleaseGenericPlan(plansource);
+ /* Drop the generic plan reference, if any, and if requested */
+ if (release_generic)
+ ReleaseGenericPlan(plansource);
/*
* Now re-do parse analysis and rewrite. This not incidentally acquires
@@ -905,6 +914,8 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * Note: When changing this, you should also look at UpdateCachedPlan().
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
@@ -933,7 +944,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
* let's treat it as real and redo the RevalidateCachedQuery call.
*/
if (!plansource->is_valid)
- qlist = RevalidateCachedQuery(plansource, queryEnv);
+ qlist = RevalidateCachedQuery(plansource, queryEnv, true);
/*
* If we don't already have a copy of the querytree list that can be
@@ -1188,7 +1199,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
elog(ERROR, "cannot apply ResourceOwner to non-saved cached plan");
/* Make sure the querytree list is valid and we have parse-time locks */
- qlist = RevalidateCachedQuery(plansource, queryEnv);
+ qlist = RevalidateCachedQuery(plansource, queryEnv, true);
/* Decide whether to use a custom plan */
customplan = choose_custom_plan(plansource, boundParams);
@@ -1284,6 +1295,106 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
return plan;
}
+/*
+ * UpdateCachedPlan
+ * Create fresh plans for all the queries in the plansource, replacing
+ * those in the generic plan's stmt_list, and return the plan for the
+ * query_index'th query.
+ *
+ * This function is primarily intended for ExecutorStartExt(), which handles
+ * cases where the original generic CachedPlan becomes invalid when prunable
+ * relations in the old plan for the query_index'th query are locked for
+ * execution.
+ *
+ * Note that even though this function is called due to invalidations received
+ * during the execution of the query_index'th query, they might affect both
+ * queries that have already finished execution (e.g., due to concurrent
+ * modifications on prunable relations that were not locked during their
+ * execution) and those that have not yet executed. Therefore, we must update
+ * all plans to safely set CachedPlan.is_valid to true.
+ */
+
+PlannedStmt *
+UpdateCachedPlan(CachedPlanSource *plansource, int query_index,
+ QueryEnvironment *queryEnv)
+{
+ List *query_list = plansource->query_list,
+ *plan_list;
+ ListCell *l1,
+ *l2;
+ CachedPlan *plan = plansource->gplan;
+ MemoryContext oldcxt;
+
+ Assert(ActiveSnapshotSet());
+
+ /* Sanity checks */
+ if (plan == NULL)
+ elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan is NULL");
+ else if (plan->is_valid)
+ elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan->is_valid is true");
+
+ /*
+ * The plansource might have become invalid since GetCachedPlan(). See the
+ * comment in BuildCachedPlan() for details on why this might happen.
+ *
+ * The risk of invalidation is higher here than when BuildCachedPlan()
+ * is called from GetCachedPlan(), because this function is called
+ * within the executor, where much more processing could have occurred
+ * since GetCachedPlan() initially returned the CachedPlan.
+ *
+ * Although invalidation is likely a false positive, we make the
+ * plan valid to ensure the query list used for planning is up to date.
+ *
+ * However, plansource->gplan must not be released, as the upstream
+ * callers (such as the callers of ExecutorStartExt()) still reference it.
+ * The freshly created plans will replace any potentially invalid ones in
+ * plansource->gplan->stmt_list.
+ */
+ if (!plansource->is_valid)
+ query_list = RevalidateCachedQuery(plansource, queryEnv, false);
+ Assert(query_list != NIL);
+
+ /*
+ * Build a new generic plan for all the queries after make a copy
+ * to be scribbled on by the planner.
+ */
+ query_list = copyObject(query_list);
+
+ /*
+ * Planning work is done in the caller's memory context. The resulting
+ * PlannedStmt is then copied into plan->context.
+ */
+ plan_list = pg_plan_queries(query_list, plansource->query_string,
+ plansource->cursor_options, NULL);
+ Assert(list_length(plan_list) == list_length(plan->stmt_list));
+
+ oldcxt = MemoryContextSwitchTo(plan->context);
+ forboth (l1, plan_list, l2, plan->stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst(l1);
+
+ lfirst(l2) = copyObject(plannedstmt);
+ }
+ MemoryContextSwitchTo(oldcxt);
+
+ /*
+ * XXX Should this also (re)set the properties of the CachedPlan that are
+ * set in BuildCachedPlan() after creating the fresh plans such as
+ * planRoleId, dependsOnRole, and save_xmin?
+ */
+
+ /*
+ * We've updated all the plans that might have been invalidated, so mark
+ * the CachedPlan as valid.
+ */
+ plan->is_valid = true;
+
+ /* Also update generic_cost because we just created a new generic plan. */
+ plansource->generic_cost = cached_plan_cost(plan, false);
+
+ return list_nth_node(PlannedStmt, plan->stmt_list, query_index);
+}
+
/*
* ReleaseCachedPlan: release active use of a cached plan.
*
@@ -1662,7 +1773,7 @@ CachedPlanGetTargetList(CachedPlanSource *plansource,
return NIL;
/* Make sure the querytree list is valid and we have parse-time locks */
- RevalidateCachedQuery(plansource, queryEnv);
+ RevalidateCachedQuery(plansource, queryEnv, true);
/* Get the primary statement and find out what it returns */
pstmt = QueryListGetPrimaryStmt(plansource->query_list);
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 93137820ac..ef4791bf65 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,7 +284,8 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan)
+ CachedPlan *cplan,
+ CachedPlanSource *plansource)
{
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_NEW);
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
portal->stmts = stmts;
portal->cplan = cplan;
+ portal->plansource = plansource;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 21c71e0d53..a39989a950 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -104,6 +104,7 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ParamListInfo params, QueryEnvironment *queryEnv);
extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
+ CachedPlanSource *plansource, int plan_index,
IntoClause *into, ExplainState *es,
const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 8a5a9fe642..db21561c8c 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -258,6 +258,7 @@ extern void ExecASTruncateTriggers(EState *estate,
extern void AfterTriggerBeginXact(void);
extern void AfterTriggerBeginQuery(void);
extern void AfterTriggerEndQuery(EState *estate);
+extern void AfterTriggerAbortQuery(void);
extern void AfterTriggerFireDeferred(void);
extern void AfterTriggerEndXact(bool isCommit);
extern void AfterTriggerBeginSubXact(void);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 69c3ebff00..1270af3be5 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -19,6 +19,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/plancache.h"
/*
@@ -198,6 +199,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
* prototypes from functions in execMain.c
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern void ExecutorStartExt(QueryDesc *queryDesc, int eflags,
+ CachedPlanSource *plansource, int query_index);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count, bool execute_once);
@@ -261,6 +264,19 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * Is the CachedPlan in es_cachedplan still valid?
+ *
+ * Called from InitPlan() because invalidation messages that affect the plan
+ * might be received after locks have been taken on runtime-prunable relations.
+ * The caller should take appropriate action if the plan has become invalid.
+ */
+static inline bool
+ExecPlanStillValid(EState *estate)
+{
+ return estate->es_cachedplan == NULL ? true :
+ CachedPlanValid(estate->es_cachedplan);
+}
/* ----------------------------------------------------------------
* ExecProcNode
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index ac9be82e19..1ec1021808 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -693,6 +693,7 @@ typedef struct EState
int es_top_eflags; /* eflags passed to ExecutorStart */
int es_instrument; /* OR of InstrumentOption flags */
bool es_finished; /* true when ExecutorFinish is done */
+ bool es_aborted; /* true when execution was aborted */
List *es_exprcontexts; /* List of ExprContexts within EState */
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index e227c4f11b..7b2f3ced26 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -18,6 +18,8 @@
#include "access/tupdesc.h"
#include "lib/ilist.h"
#include "nodes/params.h"
+#include "nodes/parsenodes.h"
+#include "nodes/plannodes.h"
#include "tcop/cmdtag.h"
#include "utils/queryenvironment.h"
#include "utils/resowner.h"
@@ -159,6 +161,12 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+
+ /*
+ * If the plan is not associated with a CachedPlanSource, it is saved in
+ * a separate global list.
+ */
+ dlist_node node; /* list link, if is_standalone */
} CachedPlan;
/*
@@ -224,6 +232,10 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+extern PlannedStmt *UpdateCachedPlan(CachedPlanSource *plansource,
+ int query_index,
+ QueryEnvironment *queryEnv);
+
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
@@ -253,4 +265,17 @@ CachedPlanRequiresLocking(CachedPlan *cplan)
return !cplan->is_oneshot && cplan->is_generic;
}
+/*
+ * CachedPlanValid
+ * Returns whether a cached generic plan is still valid.
+ *
+ * Invoked by the executor to check if the plan has not been invalidated after
+ * taking locks during the initialization of the plan.
+ */
+static inline bool
+CachedPlanValid(CachedPlan *cplan)
+{
+ return cplan->is_valid;
+}
+
#endif /* PLANCACHE_H */
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 29f49829f2..58c3828d2c 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ CachedPlanSource *plansource; /* CachedPlanSource, for cplan */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -241,7 +242,8 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan);
+ CachedPlan *cplan,
+ CachedPlanSource *plansource);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
diff --git a/src/test/modules/delay_execution/Makefile b/src/test/modules/delay_execution/Makefile
index 70f24e846d..3eeb097fde 100644
--- a/src/test/modules/delay_execution/Makefile
+++ b/src/test/modules/delay_execution/Makefile
@@ -8,7 +8,8 @@ OBJS = \
delay_execution.o
ISOLATION = partition-addition \
- partition-removal-1
+ partition-removal-1 \
+ cached-plan-inval
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index 155c8a8d55..304ca77f7b 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -1,14 +1,18 @@
/*-------------------------------------------------------------------------
*
* delay_execution.c
- * Test module to allow delay between parsing and execution of a query.
+ * Test module to introduce delay at various points during execution of a
+ * query to test that execution proceeds safely in light of concurrent
+ * changes.
*
* The delay is implemented by taking and immediately releasing a specified
* advisory lock. If another process has previously taken that lock, the
* current process will be blocked until the lock is released; otherwise,
* there's no effect. This allows an isolationtester script to reliably
- * test behaviors where some specified action happens in another backend
- * between parsing and execution of any desired query.
+ * test behaviors where some specified action happens in another backend in
+ * a couple of cases: 1) between parsing and execution of any desired query
+ * when using the planner_hook, 2) between RevalidateCachedQuery() and
+ * ExecutorStart() when using the ExecutorStart_hook.
*
* Copyright (c) 2020-2024, PostgreSQL Global Development Group
*
@@ -22,6 +26,7 @@
#include <limits.h>
+#include "executor/executor.h"
#include "optimizer/planner.h"
#include "utils/builtins.h"
#include "utils/guc.h"
@@ -32,9 +37,11 @@ PG_MODULE_MAGIC;
/* GUC: advisory lock ID to use. Zero disables the feature. */
static int post_planning_lock_id = 0;
+static int executor_start_lock_id = 0;
-/* Save previous planner hook user to be a good citizen */
+/* Save previous hook users to be a good citizen */
static planner_hook_type prev_planner_hook = NULL;
+static ExecutorStart_hook_type prev_ExecutorStart_hook = NULL;
/* planner_hook function to provide the desired delay */
@@ -70,11 +77,41 @@ delay_execution_planner(Query *parse, const char *query_string,
return result;
}
+/* ExecutorStart_hook function to provide the desired delay */
+static void
+delay_execution_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+ /* If enabled, delay by taking and releasing the specified lock */
+ if (executor_start_lock_id != 0)
+ {
+ DirectFunctionCall1(pg_advisory_lock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+ DirectFunctionCall1(pg_advisory_unlock_int8,
+ Int64GetDatum((int64) executor_start_lock_id));
+
+ /*
+ * Ensure that we notice any pending invalidations, since the advisory
+ * lock functions don't do this.
+ */
+ AcceptInvalidationMessages();
+ }
+
+ /* Now start the executor, possibly via a previous hook user */
+ if (prev_ExecutorStart_hook)
+ prev_ExecutorStart_hook(queryDesc, eflags);
+ else
+ standard_ExecutorStart(queryDesc, eflags);
+
+ if (executor_start_lock_id != 0)
+ elog(NOTICE, "Finished ExecutorStart(): CachedPlan is %s",
+ CachedPlanValid(queryDesc->cplan) ? "valid" : "not valid");
+}
+
/* Module load function */
void
_PG_init(void)
{
- /* Set up the GUC to control which lock is used */
+ /* Set up GUCs to control which lock is used */
DefineCustomIntVariable("delay_execution.post_planning_lock_id",
"Sets the advisory lock ID to be locked/unlocked after planning.",
"Zero disables the delay.",
@@ -86,10 +123,22 @@ _PG_init(void)
NULL,
NULL,
NULL);
-
+ DefineCustomIntVariable("delay_execution.executor_start_lock_id",
+ "Sets the advisory lock ID to be locked/unlocked before starting execution.",
+ "Zero disables the delay.",
+ &executor_start_lock_id,
+ 0,
+ 0, INT_MAX,
+ PGC_USERSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
MarkGUCPrefixReserved("delay_execution");
- /* Install our hook */
+ /* Install our hooks. */
prev_planner_hook = planner_hook;
planner_hook = delay_execution_planner;
+ prev_ExecutorStart_hook = ExecutorStart_hook;
+ ExecutorStart_hook = delay_execution_ExecutorStart;
}
diff --git a/src/test/modules/delay_execution/expected/cached-plan-inval.out b/src/test/modules/delay_execution/expected/cached-plan-inval.out
new file mode 100644
index 0000000000..5bfb2b33b3
--- /dev/null
+++ b/src/test/modules/delay_execution/expected/cached-plan-inval.out
@@ -0,0 +1,282 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1prep s2lock s1exec s2dropi s2unlock
+step s1prep: SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1);
+QUERY PLAN
+------------------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: (a = $1)
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = $1)
+(7 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-------------------------------------
+LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on foo12_1 foo_1
+ Filter: (a = $1)
+(5 rows)
+
+
+starting permutation: s1prep2 s2lock s1exec2 s2dropi s2unlock
+step s1prep2: SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(11 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec2: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec2: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------
+Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(6 rows)
+
+
+starting permutation: s1prep3 s2lock s1exec3 s2dropi s2unlock
+step s1prep3: SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3;
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+--------------------------------------------------------------
+Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on bar1 bar_1
+ Recheck Cond: (a = one())
+ -> Bitmap Index Scan on bar1_a_idx
+ Index Cond: (a = one())
+
+Update on bar
+ Update on bar1 bar_1
+ -> Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on bar1 bar_1
+ Recheck Cond: (a = one())
+ -> Bitmap Index Scan on bar1_a_idx
+ Index Cond: (a = one())
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on foo12_1 foo_1
+ Recheck Cond: ((a = one()) OR (a = two()))
+ -> BitmapOr
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = one())
+ -> Bitmap Index Scan on foo12_1_a
+ Index Cond: (a = two())
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(56 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec3: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec3: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+-------------------------------------------------------------
+Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on bar1 bar_1
+ Recheck Cond: (a = one())
+ -> Bitmap Index Scan on bar1_a_idx
+ Index Cond: (a = one())
+
+Update on bar
+ Update on bar1 bar_1
+ -> Nested Loop
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+ -> Materialize
+ -> Append
+ Subplans Removed: 1
+ -> Bitmap Heap Scan on bar1 bar_1
+ Recheck Cond: (a = one())
+ -> Bitmap Index Scan on bar1_a_idx
+ Index Cond: (a = one())
+
+Update on foo
+ Update on foo12_1 foo_1
+ Update on foo12_2 foo_2
+ -> Append
+ Subplans Removed: 1
+ -> Seq Scan on foo12_1 foo_1
+ Filter: ((a = one()) OR (a = two()))
+ -> Seq Scan on foo12_2 foo_2
+ Filter: ((a = one()) OR (a = two()))
+(41 rows)
+
+
+starting permutation: s1prep4 s2lock s1exec4 s2dropi s2unlock
+step s1prep4: SET plan_cache_mode = force_generic_plan;
+ SET enable_seqscan TO off;
+ PREPARE q4 AS SELECT * FROM generate_series(1, 1) WHERE EXISTS (SELECT * FROM foov WHERE a = $1 FOR UPDATE);
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1);
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------------------------
+Result
+ One-Time Filter: (InitPlan 1).col1
+ InitPlan 1
+ -> LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Index Scan using foo12_1_a on foo12_1 foo_1
+ Index Cond: (a = $1)
+ -> Function Scan on generate_series
+(9 rows)
+
+step s2lock: SELECT pg_advisory_lock(12345);
+pg_advisory_lock
+----------------
+
+(1 row)
+
+step s1exec4: LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); <waiting ...>
+step s2dropi: DROP INDEX foo12_1_a;
+step s2unlock: SELECT pg_advisory_unlock(12345);
+pg_advisory_unlock
+------------------
+t
+(1 row)
+
+step s1exec4: <... completed>
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is not valid
+s1: NOTICE: Finished ExecutorStart(): CachedPlan is valid
+QUERY PLAN
+---------------------------------------------
+Result
+ One-Time Filter: (InitPlan 1).col1
+ InitPlan 1
+ -> LockRows
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on foo12_1 foo_1
+ Disabled: true
+ Filter: (a = $1)
+ -> Function Scan on generate_series
+(10 rows)
+
diff --git a/src/test/modules/delay_execution/meson.build b/src/test/modules/delay_execution/meson.build
index 41f3ac0b89..5a70b183d0 100644
--- a/src/test/modules/delay_execution/meson.build
+++ b/src/test/modules/delay_execution/meson.build
@@ -24,6 +24,7 @@ tests += {
'specs': [
'partition-addition',
'partition-removal-1',
+ 'cached-plan-inval',
],
},
}
diff --git a/src/test/modules/delay_execution/specs/cached-plan-inval.spec b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
new file mode 100644
index 0000000000..f27e8fb521
--- /dev/null
+++ b/src/test/modules/delay_execution/specs/cached-plan-inval.spec
@@ -0,0 +1,80 @@
+# Test to check that invalidation of cached generic plans during ExecutorStart
+# correctly triggers replanning and re-execution.
+
+setup
+{
+ CREATE TABLE foo (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE foo12 PARTITION OF foo FOR VALUES IN (1, 2) PARTITION BY LIST (a);
+ CREATE TABLE foo12_1 PARTITION OF foo12 FOR VALUES IN (1);
+ CREATE TABLE foo12_2 PARTITION OF foo12 FOR VALUES IN (2);
+ CREATE INDEX foo12_1_a ON foo12_1 (a);
+ CREATE TABLE foo3 PARTITION OF foo FOR VALUES IN (3);
+ CREATE VIEW foov AS SELECT * FROM foo;
+ CREATE FUNCTION one () RETURNS int AS $$ BEGIN RETURN 1; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE FUNCTION two () RETURNS int AS $$ BEGIN RETURN 2; END; $$ LANGUAGE PLPGSQL STABLE;
+ CREATE TABLE bar (a int, b text) PARTITION BY LIST(a);
+ CREATE TABLE bar1 PARTITION OF bar FOR VALUES IN (1);
+ CREATE INDEX ON bar1(a);
+ CREATE TABLE bar2 PARTITION OF bar FOR VALUES IN (2);
+ CREATE RULE update_foo AS ON UPDATE TO foo DO ALSO UPDATE bar SET a = a WHERE a = one();
+ CREATE RULE update_bar AS ON UPDATE TO bar DO ALSO SELECT 1;
+}
+
+teardown
+{
+ DROP VIEW foov;
+ DROP RULE update_foo ON foo;
+ DROP TABLE foo, bar;
+ DROP FUNCTION one(), two();
+}
+
+session "s1"
+# Append with run-time pruning
+step "s1prep" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q AS SELECT * FROM foov WHERE a = $1 FOR UPDATE;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+
+# Another case with Append with run-time pruning
+step "s1prep2" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q2 AS SELECT * FROM foov WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+
+# Case with a rule adding another query
+step "s1prep3" { SET plan_cache_mode = force_generic_plan;
+ PREPARE q3 AS UPDATE foov SET a = a WHERE a = one() or a = two();
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+
+# Another case with Append with run-time pruning in a subquery
+step "s1prep4" { SET plan_cache_mode = force_generic_plan;
+ SET enable_seqscan TO off;
+ PREPARE q4 AS SELECT * FROM generate_series(1, 1) WHERE EXISTS (SELECT * FROM foov WHERE a = $1 FOR UPDATE);
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); }
+
+# Executes a generic plan
+step "s1exec" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q (1); }
+step "s1exec2" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q2; }
+step "s1exec3" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q3; }
+step "s1exec4" { LOAD 'delay_execution';
+ SET delay_execution.executor_start_lock_id = 12345;
+ EXPLAIN (COSTS OFF) EXECUTE q4 (1); }
+
+session "s2"
+step "s2lock" { SELECT pg_advisory_lock(12345); }
+step "s2unlock" { SELECT pg_advisory_unlock(12345); }
+step "s2dropi" { DROP INDEX foo12_1_a; }
+
+# While "s1exec", etc. wait to acquire the advisory lock, "s2drop" is able to
+# drop the index being used in the cached plan. When "s1exec" is then
+# unblocked and initializes the cached plan for execution, it detects the
+# concurrent index drop and causes the cached plan to be discarded and
+# recreated without the index.
+permutation "s1prep" "s2lock" "s1exec" "s2dropi" "s2unlock"
+permutation "s1prep2" "s2lock" "s1exec2" "s2dropi" "s2unlock"
+permutation "s1prep3" "s2lock" "s1exec3" "s2dropi" "s2unlock"
+permutation "s1prep4" "s2lock" "s1exec4" "s2dropi" "s2unlock"
--
2.43.0
Hi,
I took a look at this patch, mostly to familiarize myself with the
pruning etc. I have a bunch of comments, but all of that is minor,
perhaps even nitpicking - with prior feedback from David, Tom and
Robert, I can't really compete with that.
FWIW the patch needs a rebase, there's a minor bitrot - but it was
simply enough to fix for a review / testing.
0001
----
1) But if we don't expect this error to actually happen, do we really
need to make it ereport()? Maybe it should be plain elog(). I mean, it's
"can't happen" and thus doesn't need translations etc.
if (!bms_equal(relids, pruneinfo->relids))
ereport(ERROR,
errcode(ERRCODE_INTERNAL_ERROR),
errmsg_internal("mismatching PartitionPruneInfo found at
part_prune_index %d",
part_prune_index),
errdetail_internal("plan node relids %s, pruneinfo
relids %s",
bmsToString(relids),
bmsToString(pruneinfo->relids)));
Perhaps it should even be an assert?
2) unnecessary newline added to execPartition.h
3) this comment in EState doesn't seem very helpful
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
5) PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
Why does this say "contained in the plan" unlike the other fields? Is
there some sort of difference? I'm not saying it's wrong.
0002
----
1) Isn't it weird/undesirable partkey_datum_from_expr() loses some of
the asserts? Would the assert be incorrect in the new implementation, or
are we removing it simply because we happen to not have one of the fields?
2) inconsistent spelling: run-time vs. runtime
3) PartitionPruneContext.is_valid - I think I'd rename the flag to
"initialized" or something like that. The "is_valid" is a bit confusing,
because it might seem the context can get invalidated later, but AFAICS
that's not the case - we just initialize it lazily.
0003
----
1) In InitPlan I'd move
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
before the comment, which is more about ExecDoInitialPruning.
2) I'm not quite sure what "exec" partition pruning is?
/*
* ExecInitPartitionPruning
* Initialize the data structures needed for runtime "exec" partition
* pruning and return the result of initial pruning, if available.
Is that the same thing as "runtime pruning"?
0004
----
1) typo: paraller/parallel
2) What about adding an assert to ExecFindMatchingSubPlans, to check
valisubplan_rtis is not NULL? It's just mentioned in a comment, but
better to explicitly enforce that?
2) It may not be quite clear why ExecInitUpdateProjection() switches to
mt_updateColnosLists. Should that be explained in a comment, somewhere?
3) unnecessary newline in ExecLookupResultRelByOid
0005
----
1) auto_explain.c - So what happens if the plan gets invalidated? The
hook explain_ExecutorStart returns early, but then what? Does that break
the user session somehow, or what?
2) Isn't it a bit fragile if this requires every extension to update
and add the ExecPlanStillValid() calls to various places? What if an
extension doesn't do that? What weirdness will happen? Maybe it'd be
possible to at least check this in some other executor hook? Or at least
we could ensure the check was done in assert-enabled builds? Or
something to make extension authors aware of this?
Aside from going through the patches, I did a simple benchmark to see
how this works in practice. I did a simple test, with pgbench -S and
variable number of partitions/clients. I also varied the number of locks
per transaction, because I was wondering if it may interact with the
fast-path improvements. See the attached xeon.sh script and CSV with
results from the 44/88-core machine.
There's also two PDFs visualizing the results, to show the impact as a
difference between "master" (no patches) vs. "pruning" build with v57
applied. As usual, "green" is good (faster), read is "bad" (slower).
For most combinations of parameters, there's no impact on throughput.
Anything in 99-101% is just regular noise, possibly even more. I'm
trying to reduce the noise a bit more, but this seems acceptable. I'd
like to discuss three "cases" I see in the results:
1) bad #1
IIRC the patch should not affect results for "force_custom_plan" cache
mode (and "auto", which does mostly the same thing, I think). And for
most runs that's true, with results ~100% of master. But there's a
couple curious exceptions - e.g. results for 0 partitions and 16 locks
show a consistent regression of ~10% (in the "-M prepared" mode).
I'm not terribly worried about this because it only shows for 16 locks,
and the default is 64. If someone reduces this GUC value, they should
expect some impact.
Still, it only shows in the "auto" case. I wonder why is that. Strange.
2) bad #2
There's also a similar regression in the "force_generic_plan" without
partitions (with "-M prepared"). This seems more consistent and affects
all the lock counts.
3) good
There's an area os massive improvements (in the 2-200x range) with 100+
partitions. The fast-path patch helped a bit, but this is much better,
of course.
costing / auto mode
-------------------
Anyway, this leads me to a related question - not quite a "bug" in the
patch, but something to perhaps think about. And that's costing, and
what "auto" should do.
There are two PNG charts, showing throughput for runs with -M prepared
and 1000 partitions. Each chart shows throughput for the three cache
modes, and different client counts. There's a clear distinction between
"master" and "patched" runs - the "generic" plans performed terribly, by
orders of magnitude. With the patches it beats the "custom" plans.
Which is great! But it also means that while "auto" used to do the right
thing, with the patches that's not the case.
AFAIK that's because we don't consider the runtime pruning when costing
the plans, so the cost is calculated as if no pruning happened. And so
it seems way more expensive than it should ... and it loses with the
custom scans. Is that correct, or do I understand this wrong?
Just to be clear, I'm not claiming the patch has to deal with this. I
suppose it can be handled as a future improvement, and I'm not even sure
there's a good way to consider this during costing. For example, can we
estimate how many partitions will be pruned?
regards
--
Tomas Vondra
Attachments:
xeon-complete.pdfapplication/pdf; name=xeon-complete.pdfDownload
%PDF-1.4
% ����
3
0
obj
<<
/Type
/Catalog
/Names
<<
>>
/PageLabels
<<
/Nums
[
0
<<
/S
/D
/St
1
>>
]
>>
/Outlines
2
0
R
/Pages
1
0
R
>>
endobj
4
0
obj
<<
/Creator
(�� G o o g l e S h e e t s)
/Title
(�� x e o n)
>>
endobj
5
0
obj
<<
/Type
/Page
/Parent
1
0
R
/MediaBox
[
0
0
842
595
]
/Contents
6
0
R
/Resources
7
0
R
/Annots
9
0
R
/Group
<<
/S
/Transparency
/CS
/DeviceRGB
>>
>>
endobj
6
0
obj
<<
/Filter
/FlateDecode
/Length
8
0
R
>>
stream
x��}]�����r�mw���s�v��H:�$��������xf�WYl����]�����"���<]��`�]o�}�UzD���?�����7�c�����oO���}���
~X���0��:k�ork�uD����o��n�y�������X� �����?���y��_��w��l���
-����O��������W�yr4�������{���<�?3x���JoM�[���},�h:�l?����i�;��v���<�M`���������4��9���v���y�]�����s���S�q�y�n�B&��J1�&q��w�(���@<�L�4��
6c����b�R�!��i�.e��!] �����x���w>� ����!���e���������+�����[�����_���s���{+B����W�G|���.������[Ne���[���VA�)��e��8.�#.��l��[bs��Y#w^F���oN�d6'�P,x����0�������~����,�[�-��3��3�
l���~��#�y|���|+W��0�)D"[�["