Support run-time partition pruning for hash join
If we have a hash join with an Append node on the outer side, something
like
Hash Join
Hash Cond: (pt.a = t.a)
-> Append
-> Seq Scan on pt_p1 pt_1
-> Seq Scan on pt_p2 pt_2
-> Seq Scan on pt_p3 pt_3
-> Hash
-> Seq Scan on t
We can actually prune those subnodes of the Append that cannot possibly
contain any matching tuples from the other side of the join. To do
that, when building the Hash table, for each row from the inner side we
can compute the minimum set of subnodes that can possibly match the join
condition. When we have built the Hash table and start to execute the
Append node, we should have known which subnodes are survived and thus
can skip other subnodes.
This kind of partition pruning can be extended to happen across multiple
join levels. For instance,
Hash Join
Hash Cond: (pt.a = t2.a)
-> Hash Join
Hash Cond: (pt.a = t1.a)
-> Append
-> Seq Scan on pt_p1 pt_1
-> Seq Scan on pt_p2 pt_2
-> Seq Scan on pt_p3 pt_3
-> Hash
-> Seq Scan on t1
-> Hash
-> Seq Scan on t2
We can compute the matching subnodes of the Append when building Hash
table for 't1' according to the join condition 'pt.a = t1.a', and when
building Hash table for 't2' according to join condition 'pt.a = t2.a',
and the final surviving subnodes would be their intersection.
Greenplum [1]https://github.com/greenplum-db/gpdb has implemented this kind of partition pruning as
'Partition Selector'. Attached is a patch that refactores Greenplum's
implementation to make it work on PostgreSQL master. Here are some
details about the patch.
During planning:
1. When creating a hash join plan in create_hashjoin_plan() we first
collect information required to build PartitionPruneInfos at this
join, which includes the join's RestrictInfos and the join's inner
relids, and put this information in a stack.
2. When we call create_append_plan() for an appendrel, for each of the
joins we check if join partition pruning is possible to take place
for this appendrel, based on the information collected at that join,
and if so build a PartitionPruneInfo and add it to the stack entry.
3. After finishing the outer side of the hash join, we should have built
all the PartitionPruneInfos that can be used to perform join
partition pruning at this join. So we pop out the stack entry to get
the PartitionPruneInfos and add them to Hash node.
During executing:
When building the hash table for a hash join, we perform the partition
prunning for each row according to each of the JoinPartitionPruneStates
at this join, and store each result in a special executor parameter to
make it available to Append nodes. When executing an Append node, we
can directly use the pre-computed pruning results to skip subnodes that
cannot contain any matching rows.
Here is a query that shows the effect of the join partition prunning.
CREATE TABLE pt (a int, b int, c varchar) PARTITION BY RANGE(a);
CREATE TABLE pt_p1 PARTITION OF pt FOR VALUES FROM (0) TO (250);
CREATE TABLE pt_p2 PARTITION OF pt FOR VALUES FROM (250) TO (500);
CREATE TABLE pt_p3 PARTITION OF pt FOR VALUES FROM (500) TO (600);
INSERT INTO pt SELECT i, i % 25, to_char(i, 'FM0000') FROM
generate_series(0, 599) i WHERE i % 2 = 0;
CREATE TABLE t1 (a int, b int);
INSERT INTO t1 values (10, 10);
CREATE TABLE t2 (a int, b int);
INSERT INTO t2 values (300, 300);
ANALYZE pt, t1, t2;
SET enable_nestloop TO off;
explain (analyze, costs off, summary off, timing off)
select * from pt join t1 on pt.a = t1.a right join t2 on pt.a = t2.a;
QUERY PLAN
------------------------------------------------------------
Hash Right Join (actual rows=1 loops=1)
Hash Cond: (pt.a = t2.a)
-> Hash Join (actual rows=0 loops=1)
Hash Cond: (pt.a = t1.a)
-> Append (actual rows=0 loops=1)
-> Seq Scan on pt_p1 pt_1 (never executed)
-> Seq Scan on pt_p2 pt_2 (never executed)
-> Seq Scan on pt_p3 pt_3 (never executed)
-> Hash (actual rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on t1 (actual rows=1 loops=1)
-> Hash (actual rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on t2 (actual rows=1 loops=1)
(14 rows)
There are several points that need more consideration.
1. All the join partition prunning decisions are made in createplan.c
where the best path tree has been decided. This is not great. Maybe
it's better to make it happen when we build up the path tree, so that
we can take the partition prunning into consideration when estimating
the costs.
2. In order to make the join partition prunning take effect, the patch
hacks the empty-outer optimization in ExecHashJoinImpl(). Not sure
if this is a good practice.
3. This patch does not support parallel hash join yet. But it's not
hard to add the support.
4. Is it possible and worthwhile to extend the join partition prunning
mechanism to support nestloop and mergejoin also?
Any thoughts or comments?
[1]: https://github.com/greenplum-db/gpdb
Thanks
Richard
Attachments:
v1-0001-Support-run-time-partition-pruning-for-hash-join.patchapplication/octet-stream; name=v1-0001-Support-run-time-partition-pruning-for-hash-join.patchDownload
From 1b464d52d118a3b68734f076b5fd5e77413dcc50 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Mon, 14 Aug 2023 14:55:26 +0800
Subject: [PATCH v1] Support run-time partition pruning for hash join
If we have a hash join with an Append node on the outer side, something
like
Hash Join
Hash Cond: (pt.a = t.a)
-> Append
-> Seq Scan on pt_p1 pt_1
-> Seq Scan on pt_p2 pt_2
-> Seq Scan on pt_p3 pt_3
-> Hash
-> Seq Scan on t
We can actually prune those subnodes of the Append that cannot possibly
contain any matching tuples from the other side of the join. To do
that, when building the Hash table, for each row from the inner side we
can compute the minimum set of subnodes that can possibly match the join
condition. When we have built the Hash table and start to execute the
Append node, we should have known which subnodes are survived and thus
can skip other subnodes.
This patch implements this idea.
---
.../postgres_fdw/expected/postgres_fdw.out | 8 +-
src/backend/commands/explain.c | 61 ++++
src/backend/executor/execPartition.c | 125 +++++++-
src/backend/executor/nodeAppend.c | 32 ++-
src/backend/executor/nodeHash.c | 59 ++++
src/backend/executor/nodeHashjoin.c | 10 +
src/backend/executor/nodeMergeAppend.c | 22 +-
src/backend/optimizer/plan/createplan.c | 45 ++-
src/backend/optimizer/plan/setrefs.c | 61 ++++
src/backend/partitioning/partprune.c | 269 ++++++++++++++++--
src/include/executor/execPartition.h | 15 +-
src/include/nodes/execnodes.h | 3 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 28 ++
src/include/partitioning/partprune.h | 10 +-
src/test/regress/expected/partition_prune.out | 68 +++++
src/test/regress/sql/partition_prune.sql | 18 ++
17 files changed, 786 insertions(+), 51 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 77df7eb8e4..e8463e8919 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8617,6 +8617,7 @@ update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
Output: 1, "*VALUES*".*, "*VALUES*".column1, utrtest.tableoid, utrtest.ctid, utrtest.*
Hash Cond: (utrtest.a = "*VALUES*".column1)
-> Append
+ Join Partition Pruning: $1
-> Foreign Scan on public.remp utrtest_1
Output: utrtest_1.a, utrtest_1.tableoid, utrtest_1.ctid, utrtest_1.*
Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
@@ -8624,9 +8625,10 @@ update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
Output: utrtest_2.a, utrtest_2.tableoid, utrtest_2.ctid, NULL::record
-> Hash
Output: "*VALUES*".*, "*VALUES*".column1
+ Partition Prune: $1
-> Values Scan on "*VALUES*"
Output: "*VALUES*".*, "*VALUES*".column1
-(18 rows)
+(20 rows)
update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
ERROR: cannot route tuples into foreign table to be updated "remp"
@@ -8674,6 +8676,7 @@ update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *;
Output: 3, "*VALUES*".*, "*VALUES*".column1, utrtest.tableoid, utrtest.ctid, (NULL::record)
Hash Cond: (utrtest.a = "*VALUES*".column1)
-> Append
+ Join Partition Pruning: $1
-> Seq Scan on public.locp utrtest_1
Output: utrtest_1.a, utrtest_1.tableoid, utrtest_1.ctid, NULL::record
-> Foreign Scan on public.remp utrtest_2
@@ -8681,9 +8684,10 @@ update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *;
Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
-> Hash
Output: "*VALUES*".*, "*VALUES*".column1
+ Partition Prune: $1
-> Values Scan on "*VALUES*"
Output: "*VALUES*".*, "*VALUES*".column1
-(18 rows)
+(20 rows)
update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *; -- ERROR
ERROR: cannot route tuples into foreign table to be updated "remp"
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..e244c93ff5 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -18,6 +18,7 @@
#include "commands/createas.h"
#include "commands/defrem.h"
#include "commands/prepare.h"
+#include "executor/execPartition.h"
#include "executor/nodeHash.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -118,6 +119,9 @@ static void show_instrumentation_count(const char *qlabel, int which,
PlanState *planstate, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
+static void show_join_pruning_result_info(Bitmapset *join_prune_paramids,
+ ExplainState *es);
+static void show_joinpartprune_info(HashState *hashstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static void show_buffer_usage(ExplainState *es, const BufferUsage *usage,
bool planning);
@@ -2049,9 +2053,17 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_incremental_sort_info(castNode(IncrementalSortState, planstate),
es);
break;
+ case T_Append:
+ if (es->verbose)
+ show_join_pruning_result_info(((Append *) plan)->join_prune_paramids,
+ es);
+ break;
case T_MergeAppend:
show_merge_append_keys(castNode(MergeAppendState, planstate),
ancestors, es);
+ if (es->verbose)
+ show_join_pruning_result_info(((MergeAppend *) plan)->join_prune_paramids,
+ es);
break;
case T_Result:
show_upper_qual((List *) ((Result *) plan)->resconstantqual,
@@ -2067,6 +2079,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
break;
case T_Hash:
show_hash_info(castNode(HashState, planstate), es);
+ if (es->verbose)
+ show_joinpartprune_info(castNode(HashState, planstate), es);
break;
case T_Memoize:
show_memoize_info(castNode(MemoizeState, planstate), ancestors,
@@ -3507,6 +3521,53 @@ show_eval_params(Bitmapset *bms_params, ExplainState *es)
ExplainPropertyList("Params Evaluated", params, es);
}
+/*
+ * Show join partition pruning results at Append/MergeAppend nodes.
+ */
+static void
+show_join_pruning_result_info(Bitmapset *join_prune_paramids, ExplainState *es)
+{
+ int paramid = -1;
+ List *params = NIL;
+
+ if (bms_is_empty(join_prune_paramids))
+ return;
+
+ while ((paramid = bms_next_member(join_prune_paramids, paramid)) >= 0)
+ {
+ char param[32];
+
+ snprintf(param, sizeof(param), "$%d", paramid);
+ params = lappend(params, pstrdup(param));
+ }
+
+ ExplainPropertyList("Join Partition Pruning", params, es);
+}
+
+/*
+ * Show join partition pruning infos at Hash nodes.
+ */
+static void
+show_joinpartprune_info(HashState *hashstate, ExplainState *es)
+{
+ List *params = NIL;
+ ListCell *lc;
+
+ if (!hashstate->joinpartprune_state_list)
+ return;
+
+ foreach(lc, hashstate->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ char param[32];
+
+ snprintf(param, sizeof(param), "$%d", jpstate->paramid);
+ params = lappend(params, pstrdup(param));
+ }
+
+ ExplainPropertyList("Partition Prune", params, es);
+}
+
/*
* Fetch the name of an index in an EXPLAIN
*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index eb8a87fd63..543107a3a7 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -199,6 +199,8 @@ static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
Bitmapset **validsubplans);
+static bool get_join_prune_matching_subplans(PlanState *planstate,
+ Bitmapset **partset);
/*
@@ -1806,7 +1808,7 @@ ExecInitPartitionPruning(PlanState *planstate,
* Perform an initial partition prune pass, if required.
*/
if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true, NULL);
else
{
/* No pruning, so we'll need to initialize all subplans */
@@ -1836,6 +1838,35 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecInitJoinpartpruneList
+ * Initialize data structures needed for join partition pruning
+ */
+List *
+ExecInitJoinpartpruneList(PlanState *planstate,
+ List *joinpartprune_info_list)
+{
+ ListCell *lc;
+ List *result = NIL;
+
+ foreach(lc, joinpartprune_info_list)
+ {
+ JoinPartitionPruneInfo *jpinfo = (JoinPartitionPruneInfo *) lfirst(lc);
+ JoinPartitionPruneState *jpstate = palloc(sizeof(JoinPartitionPruneState));
+
+ jpstate->part_prune_state =
+ CreatePartitionPruneState(planstate, jpinfo->part_prune_info);
+ Assert(jpstate->part_prune_state->do_exec_prune);
+
+ jpstate->paramid = jpinfo->paramid;
+ jpstate->part_prune_result = NULL;
+
+ result = lappend(result, jpstate);
+ }
+
+ return result;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
@@ -2268,7 +2299,9 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
/*
* ExecFindMatchingSubPlans
* Determine which subplans match the pruning steps detailed in
- * 'prunestate' for the current comparison expression values.
+ * 'prunestate' if any for the current comparison expression values, and
+ * meanwhile match the join partition pruning results if any stored in
+ * Append/MergeAppend node's join_prune_paramids.
*
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
@@ -2276,11 +2309,30 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ PlanState *planstate)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
int i;
+ Bitmapset *join_prune_partset = NULL;
+ bool do_join_prune;
+
+ /* Retrieve the join partition pruning results if any */
+ do_join_prune =
+ get_join_prune_matching_subplans(planstate, &join_prune_partset);
+
+ /*
+ * Either we're here on partition prune done according to the pruning steps
+ * detailed in 'prunestate', or we have done join partition prune.
+ */
+ Assert(do_join_prune || prunestate != NULL);
+
+ /*
+ * If there is no 'prunestate', then rely entirely on join pruning.
+ */
+ if (prunestate == NULL)
+ return join_prune_partset;
/*
* Either we're here on the initial prune done during pruning
@@ -2321,6 +2373,10 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Add in any subplans that partition pruning didn't account for */
result = bms_add_members(result, prunestate->other_subplans);
+ /* Intersect join partition pruning results */
+ if (do_join_prune)
+ result = bms_intersect(result, join_prune_partset);
+
MemoryContextSwitchTo(oldcontext);
/* Copy result out of the temp context before we reset it */
@@ -2391,3 +2447,66 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
}
}
}
+
+/*
+ * get_join_prune_matching_subplans
+ * Retrieve the join partition pruning results if any stored in
+ * Append/MergeAppend node's join_prune_paramids. Return true if we can
+ * do join partition pruning, otherwise return false.
+ *
+ * Adds valid (non-prunable) subplan IDs to *partset
+ */
+static bool
+get_join_prune_matching_subplans(PlanState *planstate, Bitmapset **partset)
+{
+ Bitmapset *join_prune_paramids;
+ int nplans;
+ int paramid;
+
+ if (planstate == NULL)
+ return false;
+
+ if (IsA(planstate, AppendState))
+ {
+ join_prune_paramids =
+ ((Append *) planstate->plan)->join_prune_paramids;
+ nplans = ((AppendState *) planstate)->as_nplans;
+ }
+ else if (IsA(planstate, MergeAppendState))
+ {
+ join_prune_paramids =
+ ((MergeAppend *) planstate->plan)->join_prune_paramids;
+ nplans = ((MergeAppendState *) planstate)->ms_nplans;
+ }
+ else
+ {
+ elog(ERROR, "unrecognized node type: %d", (int) nodeTag(planstate));
+ return false;
+ }
+
+ if (bms_is_empty(join_prune_paramids))
+ return false;
+
+ Assert(nplans > 0);
+ *partset = bms_add_range(NULL, 0, nplans - 1);
+
+ paramid = -1;
+ while ((paramid = bms_next_member(join_prune_paramids, paramid)) >= 0)
+ {
+ ParamExecData *param;
+ JoinPartitionPruneState *jpstate;
+
+ param = &(planstate->state->es_param_exec_vals[paramid]);
+ Assert(param->execPlan == NULL);
+ Assert(!param->isnull);
+ jpstate = (JoinPartitionPruneState *) DatumGetPointer(param->value);
+
+ if (jpstate != NULL)
+ *partset = bms_intersect(*partset, jpstate->part_prune_result);
+ else /* the Hash node for this pruning has not been executed */
+ elog(WARNING, "Join partition pruning $%d has not been performed yet.",
+ paramid);
+ }
+
+ return true;
+}
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..c8dd8583d2 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -151,11 +151,13 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
nplans = bms_num_members(validsubplans);
/*
- * When no run-time pruning is required and there's at least one
- * subplan, we can fill as_valid_subplans immediately, preventing
- * later calls to ExecFindMatchingSubPlans.
+ * When no run-time pruning or join pruning is required and there's at
+ * least one subplan, we can fill as_valid_subplans immediately,
+ * preventing later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (!prunestate->do_exec_prune &&
+ bms_is_empty(node->join_prune_paramids) &&
+ nplans > 0)
{
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
appendstate->as_valid_subplans_identified = true;
@@ -170,10 +172,18 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplans as valid; they must also all be initialized.
*/
Assert(nplans > 0);
- appendstate->as_valid_subplans = validsubplans =
- bms_add_range(NULL, 0, nplans - 1);
- appendstate->as_valid_subplans_identified = true;
+ validsubplans = bms_add_range(NULL, 0, nplans - 1);
appendstate->as_prune_state = NULL;
+
+ /*
+ * When join pruning is not enabled we can fill as_valid_subplans
+ * immediately, preventing later calls to ExecFindMatchingSubPlans.
+ */
+ if (bms_is_empty(node->join_prune_paramids))
+ {
+ appendstate->as_valid_subplans = validsubplans;
+ appendstate->as_valid_subplans_identified = true;
+ }
}
/*
@@ -580,7 +590,7 @@ choose_next_subplan_locally(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
}
@@ -647,7 +657,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
/*
@@ -723,7 +733,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
mark_invalid_subplans_as_finished(node);
@@ -876,7 +886,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
classify_matching_subplans(node);
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 8b5c35b82b..80479715a1 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -31,6 +31,7 @@
#include "catalog/pg_statistic.h"
#include "commands/tablespace.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/hashjoin.h"
#include "executor/nodeHash.h"
#include "executor/nodeHashjoin.h"
@@ -48,6 +49,8 @@ static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable);
static void ExecParallelHashIncreaseNumBatches(HashJoinTable hashtable);
static void ExecParallelHashIncreaseNumBuckets(HashJoinTable hashtable);
+static void ExecJoinPartitionPrune(HashState *node);
+static void ExecStoreJoinPartitionPruneResult(HashState *node);
static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node,
int mcvsToUse);
static void ExecHashSkewTableInsert(HashJoinTable hashtable,
@@ -189,8 +192,14 @@ MultiExecPrivateHash(HashState *node)
}
hashtable->totalTuples += 1;
}
+
+ /* Perform join partition pruning */
+ ExecJoinPartitionPrune(node);
}
+ /* Store the surviving partitions for Append/MergeAppend nodes */
+ ExecStoreJoinPartitionPruneResult(node);
+
/* resize the hash table if needed (NTUP_PER_BUCKET exceeded) */
if (hashtable->nbuckets != hashtable->nbuckets_optimal)
ExecHashIncreaseNumBuckets(hashtable);
@@ -401,6 +410,12 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
hashstate->hashkeys =
ExecInitExprList(node->hashkeys, (PlanState *) hashstate);
+ /*
+ * initialize join partition pruning infos
+ */
+ hashstate->joinpartprune_state_list =
+ ExecInitJoinpartpruneList(&hashstate->ps, node->joinpartprune_info_list);
+
return hashstate;
}
@@ -1606,6 +1621,50 @@ ExecParallelHashIncreaseNumBuckets(HashJoinTable hashtable)
}
}
+/*
+ * ExecJoinPartitionPrune
+ * Perform join partition pruning at this join for each
+ * JoinPartitionPruneState.
+ */
+static void
+ExecJoinPartitionPrune(HashState *node)
+{
+ ListCell *lc;
+
+ foreach(lc, node->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ Bitmapset *matching_subPlans;
+
+ matching_subPlans =
+ ExecFindMatchingSubPlans(jpstate->part_prune_state, false, NULL);
+ jpstate->part_prune_result =
+ bms_add_members(jpstate->part_prune_result, matching_subPlans);
+ }
+}
+
+/*
+ * ExecStoreJoinPartitionPruneResult
+ * For each JoinPartitionPruneState, store the set of surviving partitions
+ * to make it available for the Append/MergeAppend node.
+ */
+static void
+ExecStoreJoinPartitionPruneResult(HashState *node)
+{
+ ListCell *lc;
+
+ foreach(lc, node->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ ParamExecData *param;
+
+ param = &(node->ps.state->es_param_exec_vals[jpstate->paramid]);
+ Assert(param->execPlan == NULL);
+ Assert(!param->isnull);
+ param->value = PointerGetDatum(jpstate);
+ }
+}
+
/*
* ExecHashTableInsert
* insert a tuple into the hash table depending on the hash value
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 980746128b..ec9a660635 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -311,6 +311,16 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
*/
node->hj_FirstOuterTupleSlot = NULL;
}
+ else if (hashNode->joinpartprune_state_list != NIL)
+ {
+ /*
+ * Give the hash node a chance to run join partition
+ * pruning if there is any JoinPartitionPruneState that can
+ * be evaluated at it. So do not apply the empty-outer
+ * optimization in this case.
+ */
+ node->hj_FirstOuterTupleSlot = NULL;
+ }
else if (HJ_FILL_OUTER(node) ||
(outerNode->plan->startup_cost < hashNode->ps.plan->total_cost &&
!node->hj_OuterNotEmpty))
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 21b5726e6e..9eb276abc8 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -99,11 +99,13 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
nplans = bms_num_members(validsubplans);
/*
- * When no run-time pruning is required and there's at least one
- * subplan, we can fill ms_valid_subplans immediately, preventing
- * later calls to ExecFindMatchingSubPlans.
+ * When no run-time pruning or join pruning is required and there's at
+ * least one subplan, we can fill ms_valid_subplans immediately,
+ * preventing later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (!prunestate->do_exec_prune &&
+ bms_is_empty(node->join_prune_paramids) &&
+ nplans > 0)
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -115,9 +117,15 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplans as valid; they must also all be initialized.
*/
Assert(nplans > 0);
- mergestate->ms_valid_subplans = validsubplans =
- bms_add_range(NULL, 0, nplans - 1);
+ validsubplans = bms_add_range(NULL, 0, nplans - 1);
mergestate->ms_prune_state = NULL;
+
+ /*
+ * When join pruning is not enabled we can fill ms_valid_subplans
+ * immediately, preventing later calls to ExecFindMatchingSubPlans.
+ */
+ if (bms_is_empty(node->join_prune_paramids))
+ mergestate->ms_valid_subplans = validsubplans;
}
mergeplanstates = (PlanState **) palloc(nplans * sizeof(PlanState *));
@@ -218,7 +226,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, &node->ps);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 34ca6d4ac2..2c8c6ba208 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -242,7 +242,8 @@ static Hash *make_hash(Plan *lefttree,
List *hashkeys,
Oid skewTable,
AttrNumber skewColumn,
- bool skewInherit);
+ bool skewInherit,
+ List *joinpartprune_info_list);
static MergeJoin *make_mergejoin(List *tlist,
List *joinclauses, List *otherclauses,
List *mergeclauses,
@@ -342,6 +343,7 @@ create_plan(PlannerInfo *root, Path *best_path)
/* Initialize this module's workspace in PlannerInfo */
root->curOuterRels = NULL;
root->curOuterParams = NIL;
+ root->join_partition_prune_candidates = NIL;
/* Recursively process the path tree, demanding the correct tlist result */
plan = create_plan_recurse(root, best_path, CP_EXACT_TLIST);
@@ -369,6 +371,8 @@ create_plan(PlannerInfo *root, Path *best_path)
if (root->curOuterParams != NIL)
elog(ERROR, "failed to assign all NestLoopParams to plan nodes");
+ Assert(root->join_partition_prune_candidates == NIL);
+
/*
* Reset plan_params to ensure param IDs used for nestloop params are not
* re-used later
@@ -1223,6 +1227,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ Bitmapset *join_prune_paramids = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1377,6 +1382,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
* do partition pruning.
+ *
+ * Also gather information needed by the executor to do join pruning.
*/
if (enable_partition_pruning)
{
@@ -1399,13 +1406,18 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
partpruneinfo =
make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ NULL);
+
+ join_prune_paramids =
+ make_join_partition_pruneinfos(root, rel, best_path->subpaths);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
plan->part_prune_info = partpruneinfo;
+ plan->join_prune_paramids = join_prune_paramids;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1445,6 +1457,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ Bitmapset *join_prune_paramids = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1541,6 +1554,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
* do partition pruning.
+ *
+ * Also gather information needed by the executor to do join pruning.
*/
if (enable_partition_pruning)
{
@@ -1554,11 +1569,16 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
partpruneinfo = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ NULL);
+
+ join_prune_paramids =
+ make_join_partition_pruneinfos(root, rel, best_path->subpaths);
}
node->mergeplans = subplans;
node->part_prune_info = partpruneinfo;
+ node->join_prune_paramids = join_prune_paramids;
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
@@ -4734,6 +4754,13 @@ create_hashjoin_plan(PlannerInfo *root,
AttrNumber skewColumn = InvalidAttrNumber;
bool skewInherit = false;
ListCell *lc;
+ List *joinpartprune_info_list;
+
+ /*
+ * Collect information required to build JoinPartitionPruneInfos at this
+ * join.
+ */
+ prepare_join_partition_prune_candidate(root, &best_path->jpath);
/*
* HashJoin can project, so we don't have to demand exact tlists from the
@@ -4745,6 +4772,11 @@ create_hashjoin_plan(PlannerInfo *root,
outer_plan = create_plan_recurse(root, best_path->jpath.outerjoinpath,
(best_path->num_batches > 1) ? CP_SMALL_TLIST : 0);
+ /*
+ * Retrieve all the JoinPartitionPruneInfos for this join.
+ */
+ joinpartprune_info_list = get_join_partition_prune_candidate(root);
+
inner_plan = create_plan_recurse(root, best_path->jpath.innerjoinpath,
CP_SMALL_TLIST);
@@ -4850,7 +4882,8 @@ create_hashjoin_plan(PlannerInfo *root,
inner_hashkeys,
skewTable,
skewColumn,
- skewInherit);
+ skewInherit,
+ joinpartprune_info_list);
/*
* Set Hash node's startup & total costs equal to total cost of input
@@ -5977,7 +6010,8 @@ make_hash(Plan *lefttree,
List *hashkeys,
Oid skewTable,
AttrNumber skewColumn,
- bool skewInherit)
+ bool skewInherit,
+ List *joinpartprune_info_list)
{
Hash *node = makeNode(Hash);
Plan *plan = &node->plan;
@@ -5991,6 +6025,7 @@ make_hash(Plan *lefttree,
node->skewTable = skewTable;
node->skewColumn = skewColumn;
node->skewInherit = skewInherit;
+ node->joinpartprune_info_list = joinpartprune_info_list;
return node;
}
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 97fa561e4e..7013f7f656 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -156,6 +156,11 @@ static Plan *set_mergeappend_references(PlannerInfo *root,
MergeAppend *mplan,
int rtoffset);
static void set_hash_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_joinpartitionprune_references(PlannerInfo *root,
+ List *joinpartprune_info_list,
+ indexed_tlist *outer_itlist,
+ int rtoffset,
+ double num_exec);
static Relids offset_relid_set(Relids relids, int rtoffset);
static Node *fix_scan_expr(PlannerInfo *root, Node *node,
int rtoffset, double num_exec);
@@ -1897,6 +1902,62 @@ set_hash_references(PlannerInfo *root, Plan *plan, int rtoffset)
/* Hash nodes don't have their own quals */
Assert(plan->qual == NIL);
+
+ set_joinpartitionprune_references(root,
+ hplan->joinpartprune_info_list,
+ outer_itlist,
+ rtoffset,
+ NUM_EXEC_TLIST(plan));
+}
+
+/*
+ * set_joinpartitionprune_references
+ * Do set_plan_references processing on JoinPartitionPruneInfos
+ */
+static void
+set_joinpartitionprune_references(PlannerInfo *root,
+ List *joinpartprune_info_list,
+ indexed_tlist *outer_itlist,
+ int rtoffset,
+ double num_exec)
+{
+ ListCell *l;
+
+ foreach(l, joinpartprune_info_list)
+ {
+ JoinPartitionPruneInfo *jpinfo = (JoinPartitionPruneInfo *) lfirst(l);
+ ListCell *l1;
+
+ foreach(l1, jpinfo->part_prune_info->prune_infos)
+ {
+ List *prune_infos = lfirst(l1);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ pinfo->rtindex += rtoffset;
+
+ pinfo->initial_pruning_steps = (List *)
+ fix_upper_expr(root,
+ (Node *) pinfo->initial_pruning_steps,
+ outer_itlist,
+ OUTER_VAR,
+ rtoffset,
+ NRM_EQUAL,
+ num_exec);
+ pinfo->exec_pruning_steps = (List *)
+ fix_upper_expr(root,
+ (Node *) pinfo->exec_pruning_steps,
+ outer_itlist,
+ OUTER_VAR,
+ rtoffset,
+ NRM_EQUAL,
+ num_exec);
+ }
+ }
+ }
}
/*
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 7179b22a05..7240dc0b43 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -48,7 +48,9 @@
#include "optimizer/appendinfo.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
+#include "optimizer/paramassign.h"
#include "optimizer/pathnode.h"
+#include "optimizer/restrictinfo.h"
#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partprune.h"
@@ -103,15 +105,16 @@ typedef enum PartClauseTarget
*
* gen_partprune_steps() initializes and returns an instance of this struct.
*
- * Note that has_mutable_op, has_mutable_arg, and has_exec_param are set if
- * we found any potentially-useful-for-pruning clause having those properties,
- * whether or not we actually used the clause in the steps list. This
- * definition allows us to skip the PARTTARGET_EXEC pass in some cases.
+ * Note that has_mutable_op, has_mutable_arg, has_exec_param and has_vars are
+ * set if we found any potentially-useful-for-pruning clause having those
+ * properties, whether or not we actually used the clause in the steps list.
+ * This definition allows us to skip the PARTTARGET_EXEC pass in some cases.
*/
typedef struct GeneratePruningStepsContext
{
/* Copies of input arguments for gen_partprune_steps: */
RelOptInfo *rel; /* the partitioned relation */
+ Bitmapset *available_rels; /* rels whose Vars may be used for pruning */
PartClauseTarget target; /* use-case we're generating steps for */
/* Result data: */
List *steps; /* list of PartitionPruneSteps */
@@ -119,6 +122,7 @@ typedef struct GeneratePruningStepsContext
bool has_mutable_arg; /* clauses include any mutable comparison
* values, *other than* exec params */
bool has_exec_param; /* clauses include any PARAM_EXEC params */
+ bool has_vars; /* clauses include any Vars from 'available_rels' */
bool contradictory; /* clauses were proven self-contradictory */
/* Working state: */
int next_step_id;
@@ -144,8 +148,10 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
+ Bitmapset *available_rels,
Bitmapset **matchedsubplans);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
+ Bitmapset *available_rels,
PartClauseTarget target,
GeneratePruningStepsContext *context);
static List *gen_partprune_steps_internal(GeneratePruningStepsContext *context,
@@ -206,6 +212,10 @@ static PartClauseMatchStatus match_boolean_partition_clause(Oid partopfamily,
static void partkey_datum_from_expr(PartitionPruneContext *context,
Expr *expr, int stateidx,
Datum *value, bool *isnull);
+static bool contain_forbidden_var_clause(Node *node,
+ GeneratePruningStepsContext *context);
+static bool contain_forbidden_var_clause_walker(Node *node,
+ GeneratePruningStepsContext *context);
/*
@@ -218,11 +228,14 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'available_rels' is the relid set of rels whose Vars may be used for
+ * pruning.
*/
PartitionPruneInfo *
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ Bitmapset *available_rels)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
@@ -315,6 +328,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
prunequal,
partrelids,
relid_subplan_map,
+ available_rels,
&matchedsubplans);
/* When pruning is possible, record the matched subplans */
@@ -362,6 +376,155 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return pruneinfo;
}
+/*
+ * make_join_partition_pruneinfos
+ * Builds one JoinPartitionPruneInfo for each join at which join partition
+ * pruning is possible for this appendrel.
+ *
+ * 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
+ * of scan paths for its child rels.
+ */
+Bitmapset *
+make_join_partition_pruneinfos(PlannerInfo *root, RelOptInfo *parentrel,
+ List *subpaths)
+{
+ Bitmapset *result = NULL;
+ ListCell *lc;
+
+ foreach(lc, root->join_partition_prune_candidates)
+ {
+ JoinPartitionPruneCandidateInfo *candidate =
+ (JoinPartitionPruneCandidateInfo *) lfirst(lc);
+ PartitionPruneInfo *part_prune_info;
+ List *prunequal;
+ Relids joinrelids;
+ ListCell *l;
+
+ if (candidate == NULL)
+ continue;
+
+ /*
+ * Identify all joinclauses that are movable to this appendrel given
+ * this inner side relids. Only those clauses can be used for join
+ * partition pruning.
+ */
+ joinrelids = bms_union(parentrel->relids, candidate->inner_relids);
+ prunequal = NIL;
+ foreach(l, candidate->joinrestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ if (join_clause_is_movable_into(rinfo,
+ parentrel->relids,
+ joinrelids))
+ prunequal = lappend(prunequal, rinfo);
+ }
+
+ if (prunequal == NIL)
+ continue;
+
+ part_prune_info = make_partition_pruneinfo(root, parentrel,
+ subpaths,
+ prunequal,
+ candidate->inner_relids);
+
+ if (part_prune_info)
+ {
+ JoinPartitionPruneInfo *jpinfo;
+
+ jpinfo = palloc(sizeof(JoinPartitionPruneInfo));
+
+ jpinfo->part_prune_info = part_prune_info;
+ jpinfo->paramid = assign_special_exec_param(root);
+
+ candidate->joinpartprune_info_list =
+ lappend(candidate->joinpartprune_info_list, jpinfo);
+
+ result = bms_add_member(result, jpinfo->paramid);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * prepare_join_partition_prune_candidate
+ * Check if join partition pruning is possible at this join and if so
+ * collect information required to build JoinPartitionPruneInfos.
+ *
+ * Note that we may build more than one JoinPartitionPruneInfo at one join, for
+ * different Append/MergeAppend paths.
+ */
+void
+prepare_join_partition_prune_candidate(PlannerInfo *root, JoinPath *jpath)
+{
+ JoinPartitionPruneCandidateInfo *candidate;
+
+ if (!enable_partition_pruning)
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ /*
+ * We cannot perform join partition pruning if the outer is the
+ * non-nullable side.
+ */
+ if (!(jpath->jointype == JOIN_INNER ||
+ jpath->jointype == JOIN_SEMI ||
+ jpath->jointype == JOIN_RIGHT ||
+ jpath->jointype == JOIN_RIGHT_ANTI))
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ /*
+ * For now we only support HashJoin.
+ */
+ if (jpath->path.pathtype != T_HashJoin)
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ candidate = palloc(sizeof(JoinPartitionPruneCandidateInfo));
+ candidate->joinrestrictinfo = jpath->joinrestrictinfo;
+ candidate->inner_relids = jpath->innerjoinpath->parent->relids;
+ candidate->joinpartprune_info_list = NIL;
+
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, candidate);
+}
+
+/*
+ * get_join_partition_prune_candidate
+ * Pop out the JoinPartitionPruneCandidateInfo for this join and retrieve
+ * the JoinPartitionPruneInfos.
+ */
+List *
+get_join_partition_prune_candidate(PlannerInfo *root)
+{
+ JoinPartitionPruneCandidateInfo *candidate;
+ List *result;
+
+ candidate = llast(root->join_partition_prune_candidates);
+ root->join_partition_prune_candidates =
+ list_delete_last(root->join_partition_prune_candidates);
+
+ if (candidate == NULL)
+ return NIL;
+
+ result = candidate->joinpartprune_info_list;
+
+ pfree(candidate);
+
+ return result;
+}
+
/*
* add_part_relids
* Add new info to a list of Bitmapsets of partitioned relids.
@@ -430,6 +593,8 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* partrelids: Set of RT indexes identifying relevant partitioned tables
* within a single partitioning hierarchy
* relid_subplan_map[]: maps child relation relids to subplan indexes
+ * available_rels: the relid set of rels whose Vars may be used for
+ * pruning.
* matchedsubplans: on success, receives the set of subplan indexes which
* were matched to this partition hierarchy
*
@@ -442,6 +607,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
+ Bitmapset *available_rels,
Bitmapset **matchedsubplans)
{
RelOptInfo *targetpart = NULL;
@@ -541,8 +707,8 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
*/
- gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
- &context);
+ gen_partprune_steps(subpart, partprunequal, available_rels,
+ PARTTARGET_INITIAL, &context);
if (context.contradictory)
{
@@ -569,14 +735,15 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
initial_pruning_steps = NIL;
/*
- * If no exec Params appear in potentially-usable pruning clauses,
- * then there's no point in even thinking about per-scan pruning.
+ * If no exec Params or available Vars appear in potentially-usable
+ * pruning clauses, then there's no point in even thinking about
+ * per-scan pruning.
*/
- if (context.has_exec_param)
+ if (context.has_exec_param || context.has_vars)
{
/* ... OK, we'd better think about it */
- gen_partprune_steps(subpart, partprunequal, PARTTARGET_EXEC,
- &context);
+ gen_partprune_steps(subpart, partprunequal, available_rels,
+ PARTTARGET_EXEC, &context);
if (context.contradictory)
{
@@ -589,11 +756,14 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/*
* Detect which exec Params actually got used; the fact that some
* were in available clauses doesn't mean we actually used them.
- * Skip per-scan pruning if there are none.
*/
execparamids = get_partkey_exec_paramids(exec_pruning_steps);
- if (bms_is_empty(execparamids))
+ /*
+ * Skip per-scan pruning if there are none used exec Params and
+ * there are none available Vars.
+ */
+ if (bms_is_empty(execparamids) && !context.has_vars)
exec_pruning_steps = NIL;
}
else
@@ -705,6 +875,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* Process 'clauses' (typically a rel's baserestrictinfo list of clauses)
* and create a list of "partition pruning steps".
*
+ * 'available_rels' is the relid set of rels whose Vars may be used for
+ * pruning.
+ *
* 'target' tells whether to generate pruning steps for planning (use
* immutable clauses only), or for executor startup (use any allowable
* clause except ones containing PARAM_EXEC Params), or for executor
@@ -714,12 +887,13 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* some subsidiary flags; see the GeneratePruningStepsContext typedef.
*/
static void
-gen_partprune_steps(RelOptInfo *rel, List *clauses, PartClauseTarget target,
- GeneratePruningStepsContext *context)
+gen_partprune_steps(RelOptInfo *rel, List *clauses, Bitmapset *available_rels,
+ PartClauseTarget target, GeneratePruningStepsContext *context)
{
/* Initialize all output values to zero/false/NULL */
memset(context, 0, sizeof(GeneratePruningStepsContext));
context->rel = rel;
+ context->available_rels = available_rels;
context->target = target;
/*
@@ -775,7 +949,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
* If the clauses are found to be contradictory, we can return the empty
* set.
*/
- gen_partprune_steps(rel, clauses, PARTTARGET_PLANNER,
+ gen_partprune_steps(rel, clauses, NULL, PARTTARGET_PLANNER,
&gcontext);
if (gcontext.contradictory)
return NULL;
@@ -1962,9 +2136,10 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
return PARTCLAUSE_UNSUPPORTED;
/*
- * We can never prune using an expression that contains Vars.
+ * We can never prune using an expression that contains Vars except
+ * for Vars belonging to context->available_rels.
*/
- if (contain_var_clause((Node *) expr))
+ if (contain_forbidden_var_clause((Node *) expr, context))
return PARTCLAUSE_UNSUPPORTED;
/*
@@ -2160,9 +2335,10 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
return PARTCLAUSE_UNSUPPORTED;
/*
- * We can never prune using an expression that contains Vars.
+ * We can never prune using an expression that contains Vars except
+ * for Vars belonging to context->available_rels.
*/
- if (contain_var_clause((Node *) rightop))
+ if (contain_forbidden_var_clause((Node *) rightop, context))
return PARTCLAUSE_UNSUPPORTED;
/*
@@ -3712,3 +3888,54 @@ partkey_datum_from_expr(PartitionPruneContext *context,
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
+
+/*
+ * contain_forbidden_var_clause
+ * Recursively scan a clause to discover whether it contains any Var nodes
+ * (of the current query level) that do not belong to relations in
+ * context->available_rels.
+ *
+ * Returns true if any such varnode found.
+ *
+ * Does not examine subqueries, therefore must only be used after reduction
+ * of sublinks to subplans!
+ */
+static bool
+contain_forbidden_var_clause(Node *node, GeneratePruningStepsContext *context)
+{
+ return contain_forbidden_var_clause_walker(node, context);
+}
+
+static bool
+contain_forbidden_var_clause_walker(Node *node, GeneratePruningStepsContext *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varlevelsup != 0)
+ return false;
+
+ if (!bms_is_member(var->varno, context->available_rels))
+ return true; /* abort the tree traversal and return true */
+
+ context->has_vars = true;
+
+ if (context->target != PARTTARGET_EXEC)
+ return true; /* abort the tree traversal and return true */
+
+ return false;
+ }
+ if (IsA(node, CurrentOfExpr))
+ return true;
+ if (IsA(node, PlaceHolderVar))
+ {
+ if (((PlaceHolderVar *) node)->phlevelsup == 0)
+ return true; /* abort the tree traversal and return true */
+ /* else fall through to check the contained expr */
+ }
+ return expression_tree_walker(node, contain_forbidden_var_clause_walker,
+ (void *) context);
+}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 15ec869ac8..e9a234b649 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -121,11 +121,24 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+/*
+ * JoinPartitionPruneState - State object required for plan nodes to perform
+ * join partition pruning.
+ */
+typedef struct JoinPartitionPruneState
+{
+ PartitionPruneState *part_prune_state;
+ int paramid;
+ Bitmapset *part_prune_result;
+} JoinPartitionPruneState;
+
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
PartitionPruneInfo *pruneinfo,
Bitmapset **initially_valid_subplans);
+extern List *ExecInitJoinpartpruneList(PlanState *planstate, List *joinpartprune_info_list);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ PlanState *planstate);
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cb714f4a19..9c8440d00c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2679,6 +2679,9 @@ typedef struct HashState
/* Parallel hash state. */
struct ParallelHashJoinState *parallel_state;
+
+ /* Infos for join partition pruning. */
+ List *joinpartprune_state_list;
} HashState;
/* ----------------
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 5702fbba60..297a683b4a 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -530,6 +530,9 @@ struct PlannerInfo
/* not-yet-assigned NestLoopParams */
List *curOuterParams;
+ /* a stack of JoinPartitionPruneInfos */
+ List *join_partition_prune_candidates;
+
/*
* These fields are workspace for setrefs.c. Each is an array
* corresponding to glob->subplans. (We could probably teach
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 1b787fe031..f85be1b3e7 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -275,6 +275,9 @@ typedef struct Append
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /* Info for join partition pruning; NULL if we're not doing that */
+ Bitmapset *join_prune_paramids;
} Append;
/* ----------------
@@ -310,6 +313,9 @@ typedef struct MergeAppend
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /* Info for join partition pruning; NULL if we're not doing that */
+ Bitmapset *join_prune_paramids;
} MergeAppend;
/* ----------------
@@ -1206,6 +1212,7 @@ typedef struct Hash
bool skewInherit; /* is outer join rel an inheritance tree? */
/* all other info is in the parent HashJoin node */
Cardinality rows_total; /* estimate total rows if parallel_aware */
+ List *joinpartprune_info_list; /* infos for join partition pruning */
} Hash;
/* ----------------
@@ -1552,6 +1559,27 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*
+ * JoinPartitionPruneCandidateInfo - Information required to build
+ * JoinPartitionPruneInfos.
+ */
+typedef struct JoinPartitionPruneCandidateInfo
+{
+ List *joinrestrictinfo;
+ Bitmapset *inner_relids;
+ List *joinpartprune_info_list;
+} JoinPartitionPruneCandidateInfo;
+
+/*
+ * JoinPartitionPruneInfo - Details required to allow the executor to prune
+ * partitions during join.
+ */
+typedef struct JoinPartitionPruneInfo
+{
+ PartitionPruneInfo *part_prune_info;
+ int paramid;
+} JoinPartitionPruneInfo;
+
/*
* Plan invalidation info
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 8636e04e37..41fa5172b3 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -19,6 +19,7 @@
struct PlannerInfo; /* avoid including pathnodes.h here */
struct RelOptInfo;
+struct JoinPath;
/*
@@ -73,7 +74,14 @@ typedef struct PartitionPruneContext
extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ Bitmapset *available_rels);
+extern Bitmapset *make_join_partition_pruneinfos(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths);
+extern void prepare_join_partition_prune_candidate(struct PlannerInfo *root,
+ struct JoinPath *jpath);
+extern List *get_join_partition_prune_candidate(struct PlannerInfo *root);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 1eb347503a..f1e5e33024 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -2983,6 +2983,74 @@ order by tbl1.col1, tprt.col1;
------+------
(0 rows)
+-- join partition pruning
+delete from tbl1;
+insert into tbl1 values (501), (505);
+analyze tbl1, tprt;
+set enable_nestloop = off;
+set enable_mergejoin = off;
+set enable_hashjoin = on;
+explain (analyze, verbose, costs off, summary off, timing off)
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;
+ QUERY PLAN
+--------------------------------------------------------------------------------
+ Hash Right Join (actual rows=2 loops=1)
+ Output: p1.col1, p2.col1, t.col1
+ Hash Cond: ((p1.col1 = t.col1) AND (p2.col1 = t.col1))
+ -> Hash Join (actual rows=3 loops=1)
+ Output: p1.col1, p2.col1
+ Hash Cond: (p1.col1 = p2.col1)
+ -> Append (actual rows=3 loops=1)
+ Join Partition Pruning: $0, $1
+ -> Seq Scan on public.tprt_1 p1_1 (never executed)
+ Output: p1_1.col1
+ -> Seq Scan on public.tprt_2 p1_2 (actual rows=3 loops=1)
+ Output: p1_2.col1
+ -> Seq Scan on public.tprt_3 p1_3 (never executed)
+ Output: p1_3.col1
+ -> Seq Scan on public.tprt_4 p1_4 (never executed)
+ Output: p1_4.col1
+ -> Seq Scan on public.tprt_5 p1_5 (never executed)
+ Output: p1_5.col1
+ -> Seq Scan on public.tprt_6 p1_6 (never executed)
+ Output: p1_6.col1
+ -> Hash (actual rows=3 loops=1)
+ Output: p2.col1
+ Buckets: 1024 Batches: 1 Memory Usage: 9kB
+ Partition Prune: $1
+ -> Append (actual rows=3 loops=1)
+ Join Partition Pruning: $2
+ -> Seq Scan on public.tprt_1 p2_1 (never executed)
+ Output: p2_1.col1
+ -> Seq Scan on public.tprt_2 p2_2 (actual rows=3 loops=1)
+ Output: p2_2.col1
+ -> Seq Scan on public.tprt_3 p2_3 (never executed)
+ Output: p2_3.col1
+ -> Seq Scan on public.tprt_4 p2_4 (never executed)
+ Output: p2_4.col1
+ -> Seq Scan on public.tprt_5 p2_5 (never executed)
+ Output: p2_5.col1
+ -> Seq Scan on public.tprt_6 p2_6 (never executed)
+ Output: p2_6.col1
+ -> Hash (actual rows=2 loops=1)
+ Output: t.col1
+ Buckets: 1024 Batches: 1 Memory Usage: 9kB
+ Partition Prune: $0, $2
+ -> Seq Scan on public.tbl1 t (actual rows=2 loops=1)
+ Output: t.col1
+(44 rows)
+
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;
+ col1 | col1 | col1
+------+------+------
+ 501 | 501 | 501
+ 505 | 505 | 505
+(2 rows)
+
drop table tbl1, tprt;
-- Test with columns defined in varying orders between each level
create table part_abc (a int not null, b int not null, c int not null) partition by list (a);
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d1c60b8fe9..976a2e2b89 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -710,6 +710,24 @@ select tbl1.col1, tprt.col1 from tbl1
inner join tprt on tbl1.col1 = tprt.col1
order by tbl1.col1, tprt.col1;
+-- join partition pruning
+delete from tbl1;
+insert into tbl1 values (501), (505);
+analyze tbl1, tprt;
+
+set enable_nestloop = off;
+set enable_mergejoin = off;
+set enable_hashjoin = on;
+
+explain (analyze, verbose, costs off, summary off, timing off)
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;
+
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;
+
drop table tbl1, tprt;
-- Test with columns defined in varying orders between each level
--
2.31.0
On Mon, Aug 21, 2023 at 11:48 AM Richard Guo <guofenglinux@gmail.com> wrote:
If we have a hash join with an Append node on the outer side, something
likeHash Join
Hash Cond: (pt.a = t.a)
-> Append
-> Seq Scan on pt_p1 pt_1
-> Seq Scan on pt_p2 pt_2
-> Seq Scan on pt_p3 pt_3
-> Hash
-> Seq Scan on tWe can actually prune those subnodes of the Append that cannot possibly
contain any matching tuples from the other side of the join. To do
that, when building the Hash table, for each row from the inner side we
can compute the minimum set of subnodes that can possibly match the join
condition. When we have built the Hash table and start to execute the
Append node, we should have known which subnodes are survived and thus
can skip other subnodes.
This feature looks good, but is it possible to know if we can prune
any subnodes before we pay the extra effort (building the Hash
table, for each row... stuff)? IIUC, looks no. If so, I think this area
needs more attention. I can't provide any good suggestions yet.
Maybe at least, if we have found no subnodes can be skipped
during the hashing, we can stop doing such work anymore.
There are several points that need more consideration.
1. All the join partition prunning decisions are made in createplan.c
where the best path tree has been decided. This is not great. Maybe
it's better to make it happen when we build up the path tree, so that
we can take the partition prunning into consideration when estimating
the costs.
fwiw, the current master totally ignores the cost reduction for run-time
partition prune, even for init partition prune. So in some real cases,
pg chooses a hash join just because the cost of nest loop join is
highly over estimated.
4. Is it possible and worthwhile to extend the join partition prunning
mechanism to support nestloop and mergejoin also?
In my current knowledge, we have to build the inner table first for this
optimization? so hash join and sort merge should be OK, but nestloop should
be impossible unless I missed something.
--
Best Regards
Andy Fan
On Tue, 22 Aug 2023 at 00:34, Andy Fan <zhihui.fan1213@gmail.com> wrote:
On Mon, Aug 21, 2023 at 11:48 AM Richard Guo <guofenglinux@gmail.com> wrote:
1. All the join partition prunning decisions are made in createplan.c
where the best path tree has been decided. This is not great. Maybe
it's better to make it happen when we build up the path tree, so that
we can take the partition prunning into consideration when estimating
the costs.fwiw, the current master totally ignores the cost reduction for run-time
partition prune, even for init partition prune. So in some real cases,
pg chooses a hash join just because the cost of nest loop join is
highly over estimated.
This is true about the existing code. It's a very tricky thing to cost
given that the parameter values are always unknown to the planner.
The best we have for these today is the various hardcoded constants in
selfuncs.h. While I do agree that it's not great that the costing code
knows nothing about run-time pruning, I also think that run-time
pruning during execution with parameterised nested loops is much more
likely to be able to prune partitions and save actual work than the
equivalent with Hash Joins. It's more common for the planner to
choose to Nested Loop when there are fewer outer rows, so the pruning
code is likely to be called fewer times with Nested Loop than with
Hash Join.
With Hash Join, it seems to me that the pruning must take place for
every row that makes it into the hash table. There will be maybe
cases where the unioned set of partitions simply yields every
partition and all the work results in no savings. Pruning on a scalar
value seems much more likely to be able to prune away unneeded
Append/MergeAppend subnodes.
Perhaps there can be something adaptive in Hash Join which stops
trying to prune when all partitions must be visited. On a quick
glance of the patch, I don't see any code in ExecJoinPartitionPrune()
which gives up trying to prune when the number of members in
part_prune_result is equal to the prunable Append/MergeAppend
subnodes.
It would be good to see some performance comparisons of the worst case
to see how much overhead the pruning code adds to Hash Join. It may
well be that we need to consider two Hash Join paths, one with and one
without run-time pruning. It's pretty difficult to meaningfully cost,
as I already mentioned, however.
4. Is it possible and worthwhile to extend the join partition prunning
mechanism to support nestloop and mergejoin also?In my current knowledge, we have to build the inner table first for this
optimization? so hash join and sort merge should be OK, but nestloop should
be impossible unless I missed something.
But run-time pruning already works for Nested Loops... I must be
missing something here.
I imagine for Merge Joins a more generic approach would be better by
implementing parameterised Merge Joins (a.k.a zigzag merge joins).
The Append/MergeAppend node could then select the correct partition(s)
based on the current parameter value at rescan. I don't think any code
changes would be needed in node[Merge]Append.c for that to work. This
could also speed up Merge Joins to non-partitioned tables when an
index is providing presorted input to the join.
David
On Mon, Aug 21, 2023 at 8:34 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
This feature looks good, but is it possible to know if we can prune
any subnodes before we pay the extra effort (building the Hash
table, for each row... stuff)?
It might be possible if we take the partition prunning into
consideration when estimating costs. But it seems not easy to calculate
the costs accurately.
Maybe at least, if we have found no subnodes can be skipped
during the hashing, we can stop doing such work anymore.
Yeah, this is what we can do.
In my current knowledge, we have to build the inner table first for this
optimization? so hash join and sort merge should be OK, but nestloop
should
be impossible unless I missed something.
For nestloop and mergejoin, we'd always execute the outer side first.
So the Append/MergeAppend nodes need to be on the inner side for the
join partition prunning to take effect. For a mergejoin that will
explicitly sort the outer side, the sort node would process all the
outer rows before scanning the inner side, so we can do the join
partition prunning with that. For a nestloop, if we have a Material
node on the outer side, we can do that too, but I wonder if we'd have
such a plan in real world, because we only add Material to the inner
side of nestloop.
Thanks
Richard
On Tue, Aug 22, 2023 at 2:38 PM David Rowley <dgrowleyml@gmail.com> wrote:
With Hash Join, it seems to me that the pruning must take place for
every row that makes it into the hash table. There will be maybe
cases where the unioned set of partitions simply yields every
partition and all the work results in no savings. Pruning on a scalar
value seems much more likely to be able to prune away unneeded
Append/MergeAppend subnodes.
Yeah, you're right. If we have 'pt HashJoin t', for a subnode of 'pt'
to be pruned, it needs every row of 't' to be able to prune that
subnode. The situation may improve if we have more than 2-way hash
joins, because the final surviving subnodes would be the intersection of
matching subnodes in each Hash.
With parameterized nestloop I agree that it's more likely to be able to
prune subnodes at rescan of Append/MergeAppend nodes based on scalar
values.
Sometimes we may just not generate parameterized nestloop as final plan,
such as when there are no indexes and no lateral references in the
Append/MergeAppend node. In this case I think it would be great if we
can still do some partition prunning. So I think this new 'join
partition prunning mechanism' (maybe this is not a proper name) should
be treated as a supplement to, not a substitute for, the current
run-time partition prunning based on parameterized nestloop, and it is
so implemented in the patch.
Perhaps there can be something adaptive in Hash Join which stops
trying to prune when all partitions must be visited. On a quick
glance of the patch, I don't see any code in ExecJoinPartitionPrune()
which gives up trying to prune when the number of members in
part_prune_result is equal to the prunable Append/MergeAppend
subnodes.
Yeah, we can do that.
But run-time pruning already works for Nested Loops... I must be
missing something here.
Here I mean nestloop with non-parameterized inner path. As I explained
upthread, we need to have a Material node on the outer side for that to
work, which seems not possible in real world.
Thanks
Richard
On Tue, Aug 22, 2023 at 5:43 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Mon, Aug 21, 2023 at 8:34 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
This feature looks good, but is it possible to know if we can prune
any subnodes before we pay the extra effort (building the Hash
table, for each row... stuff)?It might be possible if we take the partition prunning into
consideration when estimating costs. But it seems not easy to calculate
the costs accurately.
This is a real place I am worried about the future of this patch.
Personally, I do like this patch, but not sure what if this issue can't be
fixed to make everyone happy, and fixing this perfectly looks hopeless
for me. However, let's see what will happen.
Maybe at least, if we have found no subnodes can be skipped
during the hashing, we can stop doing such work anymore.Yeah, this is what we can do.
cool.
In my current knowledge, we have to build the inner table first for this
optimization? so hash join and sort merge should be OK, but nestloop
should
be impossible unless I missed something.For nestloop and mergejoin, we'd always execute the outer side first.
So the Append/MergeAppend nodes need to be on the inner side for the
join partition prunning to take effect. For a mergejoin that will
explicitly sort the outer side, the sort node would process all the
outer rows before scanning the inner side, so we can do the join
partition prunning with that. For a nestloop, if we have a Material
node on the outer side, we can do that too, but I wonder if we'd have
such a plan in real world, because we only add Material to the inner
side of nestloop.
This is more interesting than I expected,thanks for the explaination.
--
Best Regards
Andy Fan
fwiw, the current master totally ignores the cost reduction for run-time
partition prune, even for init partition prune. So in some real cases,
pg chooses a hash join just because the cost of nest loop join is
highly over estimated.This is true about the existing code. It's a very tricky thing to cost
given that the parameter values are always unknown to the planner.
The best we have for these today is the various hardcoded constants in
selfuncs.h. While I do agree that it's not great that the costing code
knows nothing about run-time pruning, I also think that run-time
pruning during execution with parameterised nested loops is much more
likely to be able to prune partitions and save actual work than the
equivalent with Hash Joins. It's more common for the planner to
choose to Nested Loop when there are fewer outer rows, so the pruning
code is likely to be called fewer times with Nested Loop than with
Hash Join.
Yes, I agree with this. In my 4 years of PostgresSQL, I just run into
2 cases of this issue and 1 of them is joining 12+ tables with run-time
partition prune for every join. But this situation causes more issues than
generating a wrong plan, like for a simple SELECT * FROM p WHERE
partkey = $1; generic plan will never win so we have to pay the expensive
planning cost for partitioned table.
If we don't require very accurate costing for every case, like we only
care about '=' operator which is the most common case, it should be
easier than the case here since we just need to know if only 1 partition
will survive after pruning, but don't care about which one it is. I'd like
to discuss in another thread, and leave this thread for Richard's patch
only.
--
Best Regards
Andy Fan
On Tue, Aug 22, 2023 at 2:38 PM David Rowley <dgrowleyml@gmail.com> wrote:
It would be good to see some performance comparisons of the worst case
to see how much overhead the pruning code adds to Hash Join. It may
well be that we need to consider two Hash Join paths, one with and one
without run-time pruning. It's pretty difficult to meaningfully cost,
as I already mentioned, however.
I performed some performance comparisons of the worst case with two
tables as below:
1. The partitioned table has 1000 children, and 100,000 tuples in total.
2. The other table is designed that
a) its tuples occupy every partition of the partitioned table so
that no partitions can be pruned during execution,
b) tuples belong to the same partition are placed together so that
we need to scan all its tuples before we could know that no
pruning would happen and we could stop trying to prune,
c) the tuples are unique on the hash key so as to minimize the cost
of hash probe, so that we can highlight the impact of the pruning
codes.
Here is the execution time (ms) I get with different sizes of the other
table.
tuples unpatched patched
10000 45.74 53.46 (+0.17)
20000 54.48 70.18 (+0.29)
30000 62.57 85.18 (+0.36)
40000 69.14 99.19 (+0.43)
50000 76.46 111.09 (+0.45)
60000 82.68 126.37 (+0.53)
70000 92.69 137.89 (+0.49)
80000 94.49 151.46 (+0.60)
90000 101.53 164.93 (+0.62)
100000 107.22 178.44 (+0.66)
So the overhead the pruning code adds to Hash Join is too large to be
accepted :(. I think we need to solve this problem first before we can
make this new partition pruning mechanism some useful in practice, but
how? Some thoughts currently in my mind include
1) we try our best to estimate the cost of this partition pruning when
creating hash join paths, and decide based on the cost whether to use it
or not. But this does not seem to be an easy task.
2) we use some heuristics when executing hash join, such as when we
notice that a $threshold percentage of the partitions must be visited
we just abort the pruning and assume that no partitions can be pruned.
Any thoughts or comments?
Thanks
Richard
On Thu, 24 Aug 2023 at 21:27, Richard Guo <guofenglinux@gmail.com> wrote:
I performed some performance comparisons of the worst case with two
tables as below:1. The partitioned table has 1000 children, and 100,000 tuples in total.
2. The other table is designed that
a) its tuples occupy every partition of the partitioned table so
that no partitions can be pruned during execution,
b) tuples belong to the same partition are placed together so that
we need to scan all its tuples before we could know that no
pruning would happen and we could stop trying to prune,
c) the tuples are unique on the hash key so as to minimize the cost
of hash probe, so that we can highlight the impact of the pruning
codes.Here is the execution time (ms) I get with different sizes of the other
table.tuples unpatched patched
10000 45.74 53.46 (+0.17)
20000 54.48 70.18 (+0.29)
30000 62.57 85.18 (+0.36)
40000 69.14 99.19 (+0.43)
50000 76.46 111.09 (+0.45)
60000 82.68 126.37 (+0.53)
70000 92.69 137.89 (+0.49)
80000 94.49 151.46 (+0.60)
90000 101.53 164.93 (+0.62)
100000 107.22 178.44 (+0.66)So the overhead the pruning code adds to Hash Join is too large to be
accepted :(.
Agreed. Run-time pruning is pretty fast to execute, but so is
inserting a row into a hash table.
I think we need to solve this problem first before we can
make this new partition pruning mechanism some useful in practice, but
how? Some thoughts currently in my mind include1) we try our best to estimate the cost of this partition pruning when
creating hash join paths, and decide based on the cost whether to use it
or not. But this does not seem to be an easy task.
I think we need to consider another Hash Join path when we detect that
the outer side of the Hash Join involves scanning a partitioned table.
I'd suggest writing some cost which costs an execution of run-time
pruning. With LIST and RANGE you probably want something like
cpu_operator_cost * LOG2(nparts) once for each hashed tuple to account
for the binary search over the sorted datum array. For HASH
partitions, something like cpu_operator_cost * npartcols once for each
hashed tuple.
You'll need to then come up with some counter costs to subtract from
the Append/MergeAppend. This is tricky, as discussed. Just come up
with something crude for now.
To start with, it could just be as crude as:
total_costs *= (Min(expected_outer_rows, n_append_subnodes) /
n_append_subnodes);
i.e assume that every outer joined row will require exactly 1 new
partition up to the total number of partitions. That's pretty much
worst-case, but it'll at least allow the optimisation to work for
cases like where the hash table is expected to contain just a tiny
number of rows (fewer than the number of partitions)
To make it better, you might want to look at join selectivity
estimation and see if you can find something there to influence
something better.
2) we use some heuristics when executing hash join, such as when we
notice that a $threshold percentage of the partitions must be visited
we just abort the pruning and assume that no partitions can be pruned.
You could likely code in something that checks
bms_num_members(jpstate->part_prune_result) to see if it still remains
below the total Append/MergeAppend subplans whenever, say whenever the
lower 8 bits of hashtable->totalTuples are all off. You can just give
up doing any further pruning when all partitions are already required.
David
On Fri, Aug 25, 2023 at 11:03 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 24 Aug 2023 at 21:27, Richard Guo <guofenglinux@gmail.com> wrote:
I think we need to solve this problem first before we can
make this new partition pruning mechanism some useful in practice, but
how? Some thoughts currently in my mind include1) we try our best to estimate the cost of this partition pruning when
creating hash join paths, and decide based on the cost whether to use it
or not. But this does not seem to be an easy task.I think we need to consider another Hash Join path when we detect that
the outer side of the Hash Join involves scanning a partitioned table.I'd suggest writing some cost which costs an execution of run-time
pruning. With LIST and RANGE you probably want something like
cpu_operator_cost * LOG2(nparts) once for each hashed tuple to account
for the binary search over the sorted datum array. For HASH
partitions, something like cpu_operator_cost * npartcols once for each
hashed tuple.You'll need to then come up with some counter costs to subtract from
the Append/MergeAppend. This is tricky, as discussed. Just come up
with something crude for now.To start with, it could just be as crude as:
total_costs *= (Min(expected_outer_rows, n_append_subnodes) /
n_append_subnodes);i.e assume that every outer joined row will require exactly 1 new
partition up to the total number of partitions. That's pretty much
worst-case, but it'll at least allow the optimisation to work for
cases like where the hash table is expected to contain just a tiny
number of rows (fewer than the number of partitions)To make it better, you might want to look at join selectivity
estimation and see if you can find something there to influence
something better.
Thank you for the suggestion. I will take some time considering it.
When we have multiple join levels, it seems the situation becomes even
more complex. One Append/MergeAppend node might be pruned by more than
one Hash node, and one Hash node might provide pruning for more than one
Append/MergeAppend node. For instance, below is the plan from the test
case added in the v1 patch:
explain (analyze, costs off, summary off, timing off)
select * from tprt p1
inner join tprt p2 on p1.col1 = p2.col1
right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;
QUERY PLAN
-------------------------------------------------------------------------
Hash Right Join (actual rows=2 loops=1)
Hash Cond: ((p1.col1 = t.col1) AND (p2.col1 = t.col1))
-> Hash Join (actual rows=3 loops=1)
Hash Cond: (p1.col1 = p2.col1)
-> Append (actual rows=3 loops=1)
-> Seq Scan on tprt_1 p1_1 (never executed)
-> Seq Scan on tprt_2 p1_2 (actual rows=3 loops=1)
-> Seq Scan on tprt_3 p1_3 (never executed)
-> Seq Scan on tprt_4 p1_4 (never executed)
-> Seq Scan on tprt_5 p1_5 (never executed)
-> Seq Scan on tprt_6 p1_6 (never executed)
-> Hash (actual rows=3 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Append (actual rows=3 loops=1)
-> Seq Scan on tprt_1 p2_1 (never executed)
-> Seq Scan on tprt_2 p2_2 (actual rows=3 loops=1)
-> Seq Scan on tprt_3 p2_3 (never executed)
-> Seq Scan on tprt_4 p2_4 (never executed)
-> Seq Scan on tprt_5 p2_5 (never executed)
-> Seq Scan on tprt_6 p2_6 (never executed)
-> Hash (actual rows=2 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on tbl1 t (actual rows=2 loops=1)
(23 rows)
In this plan, the Append node of 'p1' is pruned by two Hash nodes: Hash
node of 't' and Hash node of 'p2'. Meanwhile, the Hash node of 't'
provides pruning for two Append nodes: Append node of 'p1' and Append
node of 'p2'.
In this case, meaningfully costing for the partition pruning seems even
more difficult. Do you have any suggestion on that?
2) we use some heuristics when executing hash join, such as when we
notice that a $threshold percentage of the partitions must be visited
we just abort the pruning and assume that no partitions can be pruned.You could likely code in something that checks
bms_num_members(jpstate->part_prune_result) to see if it still remains
below the total Append/MergeAppend subplans whenever, say whenever the
lower 8 bits of hashtable->totalTuples are all off. You can just give
up doing any further pruning when all partitions are already required.
Yeah, we can do that. While this may not help in the tests I performed
for the worst case because the table in the hash side is designed that
tuples belong to the same partition are placed together so that we need
to scan almost all its tuples before we could know that all partitions
are already required, I think this might help a lot in real world.
Thanks
Richard
On Fri, Aug 25, 2023 at 11:03 AM David Rowley <dgrowleyml@gmail.com> wrote:
I'd suggest writing some cost which costs an execution of run-time
pruning. With LIST and RANGE you probably want something like
cpu_operator_cost * LOG2(nparts) once for each hashed tuple to account
for the binary search over the sorted datum array. For HASH
partitions, something like cpu_operator_cost * npartcols once for each
hashed tuple.You'll need to then come up with some counter costs to subtract from
the Append/MergeAppend. This is tricky, as discussed. Just come up
with something crude for now.To start with, it could just be as crude as:
total_costs *= (Min(expected_outer_rows, n_append_subnodes) /
n_append_subnodes);i.e assume that every outer joined row will require exactly 1 new
partition up to the total number of partitions. That's pretty much
worst-case, but it'll at least allow the optimisation to work for
cases like where the hash table is expected to contain just a tiny
number of rows (fewer than the number of partitions)To make it better, you might want to look at join selectivity
estimation and see if you can find something there to influence
something better.
I have a go at writing some costing codes according to your suggestion.
That's compute_partprune_cost() in the v2 patch.
For the hash side, this function computes the pruning cost as
cpu_operator_cost * LOG2(nparts) * inner_rows for LIST and RANGE, and
cpu_operator_cost * nparts * inner_rows for HASH.
For the Append/MergeAppend side, this function first estimates the size
of outer side that matches, using the same idea as we estimate the
joinrel size for JOIN_SEMI. Then it assumes that each outer joined row
occupies one new partition (the worst case) and computes how much cost
can be saved from partition pruning.
If the cost saved from the Append/MergeAppend side is larger than the
pruning cost from the Hash side, then we say that partition pruning is a
win.
Note that this costing logic runs for each Append-Hash pair, so it copes
with the case where we have multiple join levels.
With this costing logic added, I performed the same performance
comparisons of the worst case as in [1]/messages/by-id/CAMbWs49+p6hBxXJHFiSwOtPCSkAHwhJj3hTpCR_pmMiUUVLZ1Q@mail.gmail.com, and here is what I got.
tuples unpatched patched
10000 44.66 44.37 -0.006493506
20000 52.41 52.29 -0.002289639
30000 61.11 61.12 +0.000163639
40000 67.87 68.24 +0.005451599
50000 74.51 74.75 +0.003221044
60000 82.3 81.55 -0.009113001
70000 87.16 86.98 -0.002065168
80000 93.49 93.89 +0.004278532
90000 101.52 100.83 -0.00679669
100000 108.34 108.56 +0.002030644
So the costing logic successfully avoids performing the partition
pruning in the worst case.
I also tested the cases where partition pruning is possible with
different sizes of the hash side.
tuples unpatched patched
100 36.86 2.4 -0.934888768
200 35.87 2.37 -0.933928074
300 35.95 2.55 -0.92906815
400 36.4 2.63 -0.927747253
500 36.39 2.85 -0.921681781
600 36.32 2.97 -0.918226872
700 36.6 3.23 -0.911748634
800 36.88 3.44 -0.906724512
900 37.02 3.46 -0.906537007
1000 37.25 37.21 -0.001073826
The first 9 rows show that the costing logic allows the partition
pruning to be performed and the pruning turns out to be a big win. The
last row shows that the partition pruning is disallowed by the costing
logic because it thinks no partition can be pruned (we have 1000
partitions in total).
So it seems that the new costing logic is quite crude and tends to be
very conservative, but it can help avoid the large overhead in the worst
cases. I think this might be a good start to push this patch forward.
Any thoughts or comments?
[1]: /messages/by-id/CAMbWs49+p6hBxXJHFiSwOtPCSkAHwhJj3hTpCR_pmMiUUVLZ1Q@mail.gmail.com
/messages/by-id/CAMbWs49+p6hBxXJHFiSwOtPCSkAHwhJj3hTpCR_pmMiUUVLZ1Q@mail.gmail.com
Thanks
Richard
Attachments:
v2-0001-Support-run-time-partition-pruning-for-hash-join.patchapplication/octet-stream; name=v2-0001-Support-run-time-partition-pruning-for-hash-join.patchDownload
From 67c8b0096986444e4402661a74c362448e22a6d4 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Mon, 14 Aug 2023 14:55:26 +0800
Subject: [PATCH v2] Support run-time partition pruning for hash join
If we have a hash join with an Append node on the outer side, something
like
Hash Join
Hash Cond: (pt.a = t.a)
-> Append
-> Seq Scan on pt_p1 pt_1
-> Seq Scan on pt_p2 pt_2
-> Seq Scan on pt_p3 pt_3
-> Hash
-> Seq Scan on t
We can actually prune those subnodes of the Append that cannot possibly
contain any matching tuples from the other side of the join. To do
that, when building the Hash table, for each row from the inner side we
can compute the minimum set of subnodes that can possibly match the join
condition. When we have built the Hash table and start to execute the
Append node, we should have known which subnodes are survived and thus
can skip other subnodes.
This patch implements this idea.
---
src/backend/commands/explain.c | 61 ++++
src/backend/executor/execPartition.c | 127 +++++++-
src/backend/executor/nodeAppend.c | 32 +-
src/backend/executor/nodeHash.c | 75 +++++
src/backend/executor/nodeHashjoin.c | 10 +
src/backend/executor/nodeMergeAppend.c | 22 +-
src/backend/optimizer/path/costsize.c | 106 +++++++
src/backend/optimizer/plan/createplan.c | 49 ++-
src/backend/optimizer/plan/setrefs.c | 61 ++++
src/backend/partitioning/partprune.c | 288 ++++++++++++++++--
src/include/executor/execPartition.h | 17 +-
src/include/nodes/execnodes.h | 3 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 30 ++
src/include/optimizer/cost.h | 4 +
src/include/partitioning/partprune.h | 12 +-
src/test/regress/expected/partition_prune.out | 67 ++++
src/test/regress/sql/partition_prune.sql | 18 ++
18 files changed, 936 insertions(+), 49 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..e244c93ff5 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -18,6 +18,7 @@
#include "commands/createas.h"
#include "commands/defrem.h"
#include "commands/prepare.h"
+#include "executor/execPartition.h"
#include "executor/nodeHash.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -118,6 +119,9 @@ static void show_instrumentation_count(const char *qlabel, int which,
PlanState *planstate, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
+static void show_join_pruning_result_info(Bitmapset *join_prune_paramids,
+ ExplainState *es);
+static void show_joinpartprune_info(HashState *hashstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static void show_buffer_usage(ExplainState *es, const BufferUsage *usage,
bool planning);
@@ -2049,9 +2053,17 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_incremental_sort_info(castNode(IncrementalSortState, planstate),
es);
break;
+ case T_Append:
+ if (es->verbose)
+ show_join_pruning_result_info(((Append *) plan)->join_prune_paramids,
+ es);
+ break;
case T_MergeAppend:
show_merge_append_keys(castNode(MergeAppendState, planstate),
ancestors, es);
+ if (es->verbose)
+ show_join_pruning_result_info(((MergeAppend *) plan)->join_prune_paramids,
+ es);
break;
case T_Result:
show_upper_qual((List *) ((Result *) plan)->resconstantqual,
@@ -2067,6 +2079,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
break;
case T_Hash:
show_hash_info(castNode(HashState, planstate), es);
+ if (es->verbose)
+ show_joinpartprune_info(castNode(HashState, planstate), es);
break;
case T_Memoize:
show_memoize_info(castNode(MemoizeState, planstate), ancestors,
@@ -3507,6 +3521,53 @@ show_eval_params(Bitmapset *bms_params, ExplainState *es)
ExplainPropertyList("Params Evaluated", params, es);
}
+/*
+ * Show join partition pruning results at Append/MergeAppend nodes.
+ */
+static void
+show_join_pruning_result_info(Bitmapset *join_prune_paramids, ExplainState *es)
+{
+ int paramid = -1;
+ List *params = NIL;
+
+ if (bms_is_empty(join_prune_paramids))
+ return;
+
+ while ((paramid = bms_next_member(join_prune_paramids, paramid)) >= 0)
+ {
+ char param[32];
+
+ snprintf(param, sizeof(param), "$%d", paramid);
+ params = lappend(params, pstrdup(param));
+ }
+
+ ExplainPropertyList("Join Partition Pruning", params, es);
+}
+
+/*
+ * Show join partition pruning infos at Hash nodes.
+ */
+static void
+show_joinpartprune_info(HashState *hashstate, ExplainState *es)
+{
+ List *params = NIL;
+ ListCell *lc;
+
+ if (!hashstate->joinpartprune_state_list)
+ return;
+
+ foreach(lc, hashstate->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ char param[32];
+
+ snprintf(param, sizeof(param), "$%d", jpstate->paramid);
+ params = lappend(params, pstrdup(param));
+ }
+
+ ExplainPropertyList("Partition Prune", params, es);
+}
+
/*
* Fetch the name of an index in an EXPLAIN
*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index eb8a87fd63..e9121c2a8e 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -199,6 +199,8 @@ static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
Bitmapset **validsubplans);
+static bool get_join_prune_matching_subplans(PlanState *planstate,
+ Bitmapset **partset);
/*
@@ -1806,7 +1808,7 @@ ExecInitPartitionPruning(PlanState *planstate,
* Perform an initial partition prune pass, if required.
*/
if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true, NULL);
else
{
/* No pruning, so we'll need to initialize all subplans */
@@ -1836,6 +1838,37 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecInitJoinpartpruneList
+ * Initialize data structures needed for join partition pruning
+ */
+List *
+ExecInitJoinpartpruneList(PlanState *planstate,
+ List *joinpartprune_info_list)
+{
+ ListCell *lc;
+ List *result = NIL;
+
+ foreach(lc, joinpartprune_info_list)
+ {
+ JoinPartitionPruneInfo *jpinfo = (JoinPartitionPruneInfo *) lfirst(lc);
+ JoinPartitionPruneState *jpstate = palloc(sizeof(JoinPartitionPruneState));
+
+ jpstate->part_prune_state =
+ CreatePartitionPruneState(planstate, jpinfo->part_prune_info);
+ Assert(jpstate->part_prune_state->do_exec_prune);
+
+ jpstate->paramid = jpinfo->paramid;
+ jpstate->nplans = jpinfo->nplans;
+ jpstate->finished = false;
+ jpstate->part_prune_result = NULL;
+
+ result = lappend(result, jpstate);
+ }
+
+ return result;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
@@ -2268,7 +2301,9 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
/*
* ExecFindMatchingSubPlans
* Determine which subplans match the pruning steps detailed in
- * 'prunestate' for the current comparison expression values.
+ * 'prunestate' if any for the current comparison expression values, and
+ * meanwhile match the join partition pruning results if any stored in
+ * Append/MergeAppend node's join_prune_paramids.
*
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
@@ -2276,11 +2311,30 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ PlanState *planstate)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
int i;
+ Bitmapset *join_prune_partset = NULL;
+ bool do_join_prune;
+
+ /* Retrieve the join partition pruning results if any */
+ do_join_prune =
+ get_join_prune_matching_subplans(planstate, &join_prune_partset);
+
+ /*
+ * Either we're here on partition prune done according to the pruning steps
+ * detailed in 'prunestate', or we have done join partition prune.
+ */
+ Assert(do_join_prune || prunestate != NULL);
+
+ /*
+ * If there is no 'prunestate', then rely entirely on join pruning.
+ */
+ if (prunestate == NULL)
+ return join_prune_partset;
/*
* Either we're here on the initial prune done during pruning
@@ -2321,6 +2375,10 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Add in any subplans that partition pruning didn't account for */
result = bms_add_members(result, prunestate->other_subplans);
+ /* Intersect join partition pruning results */
+ if (do_join_prune)
+ result = bms_intersect(result, join_prune_partset);
+
MemoryContextSwitchTo(oldcontext);
/* Copy result out of the temp context before we reset it */
@@ -2391,3 +2449,66 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
}
}
}
+
+/*
+ * get_join_prune_matching_subplans
+ * Retrieve the join partition pruning results if any stored in
+ * Append/MergeAppend node's join_prune_paramids. Return true if we can
+ * do join partition pruning, otherwise return false.
+ *
+ * Adds valid (non-prunable) subplan IDs to *partset
+ */
+static bool
+get_join_prune_matching_subplans(PlanState *planstate, Bitmapset **partset)
+{
+ Bitmapset *join_prune_paramids;
+ int nplans;
+ int paramid;
+
+ if (planstate == NULL)
+ return false;
+
+ if (IsA(planstate, AppendState))
+ {
+ join_prune_paramids =
+ ((Append *) planstate->plan)->join_prune_paramids;
+ nplans = ((AppendState *) planstate)->as_nplans;
+ }
+ else if (IsA(planstate, MergeAppendState))
+ {
+ join_prune_paramids =
+ ((MergeAppend *) planstate->plan)->join_prune_paramids;
+ nplans = ((MergeAppendState *) planstate)->ms_nplans;
+ }
+ else
+ {
+ elog(ERROR, "unrecognized node type: %d", (int) nodeTag(planstate));
+ return false;
+ }
+
+ if (bms_is_empty(join_prune_paramids))
+ return false;
+
+ Assert(nplans > 0);
+ *partset = bms_add_range(NULL, 0, nplans - 1);
+
+ paramid = -1;
+ while ((paramid = bms_next_member(join_prune_paramids, paramid)) >= 0)
+ {
+ ParamExecData *param;
+ JoinPartitionPruneState *jpstate;
+
+ param = &(planstate->state->es_param_exec_vals[paramid]);
+ Assert(param->execPlan == NULL);
+ Assert(!param->isnull);
+ jpstate = (JoinPartitionPruneState *) DatumGetPointer(param->value);
+
+ if (jpstate != NULL)
+ *partset = bms_intersect(*partset, jpstate->part_prune_result);
+ else /* the Hash node for this pruning has not been executed */
+ elog(WARNING, "Join partition pruning $%d has not been performed yet.",
+ paramid);
+ }
+
+ return true;
+}
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..c8dd8583d2 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -151,11 +151,13 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
nplans = bms_num_members(validsubplans);
/*
- * When no run-time pruning is required and there's at least one
- * subplan, we can fill as_valid_subplans immediately, preventing
- * later calls to ExecFindMatchingSubPlans.
+ * When no run-time pruning or join pruning is required and there's at
+ * least one subplan, we can fill as_valid_subplans immediately,
+ * preventing later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (!prunestate->do_exec_prune &&
+ bms_is_empty(node->join_prune_paramids) &&
+ nplans > 0)
{
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
appendstate->as_valid_subplans_identified = true;
@@ -170,10 +172,18 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplans as valid; they must also all be initialized.
*/
Assert(nplans > 0);
- appendstate->as_valid_subplans = validsubplans =
- bms_add_range(NULL, 0, nplans - 1);
- appendstate->as_valid_subplans_identified = true;
+ validsubplans = bms_add_range(NULL, 0, nplans - 1);
appendstate->as_prune_state = NULL;
+
+ /*
+ * When join pruning is not enabled we can fill as_valid_subplans
+ * immediately, preventing later calls to ExecFindMatchingSubPlans.
+ */
+ if (bms_is_empty(node->join_prune_paramids))
+ {
+ appendstate->as_valid_subplans = validsubplans;
+ appendstate->as_valid_subplans_identified = true;
+ }
}
/*
@@ -580,7 +590,7 @@ choose_next_subplan_locally(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
}
@@ -647,7 +657,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
/*
@@ -723,7 +733,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
mark_invalid_subplans_as_finished(node);
@@ -876,7 +886,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
classify_matching_subplans(node);
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 8b5c35b82b..768ca8bf61 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -31,6 +31,7 @@
#include "catalog/pg_statistic.h"
#include "commands/tablespace.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/hashjoin.h"
#include "executor/nodeHash.h"
#include "executor/nodeHashjoin.h"
@@ -48,6 +49,8 @@ static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable);
static void ExecParallelHashIncreaseNumBatches(HashJoinTable hashtable);
static void ExecParallelHashIncreaseNumBuckets(HashJoinTable hashtable);
+static void ExecJoinPartitionPrune(HashState *node);
+static void ExecStoreJoinPartitionPruneResult(HashState *node);
static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node,
int mcvsToUse);
static void ExecHashSkewTableInsert(HashJoinTable hashtable,
@@ -189,8 +192,14 @@ MultiExecPrivateHash(HashState *node)
}
hashtable->totalTuples += 1;
}
+
+ /* Perform join partition pruning */
+ ExecJoinPartitionPrune(node);
}
+ /* Store the surviving partitions for Append/MergeAppend nodes */
+ ExecStoreJoinPartitionPruneResult(node);
+
/* resize the hash table if needed (NTUP_PER_BUCKET exceeded) */
if (hashtable->nbuckets != hashtable->nbuckets_optimal)
ExecHashIncreaseNumBuckets(hashtable);
@@ -401,6 +410,12 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
hashstate->hashkeys =
ExecInitExprList(node->hashkeys, (PlanState *) hashstate);
+ /*
+ * initialize join partition pruning infos
+ */
+ hashstate->joinpartprune_state_list =
+ ExecInitJoinpartpruneList(&hashstate->ps, node->joinpartprune_info_list);
+
return hashstate;
}
@@ -1606,6 +1621,56 @@ ExecParallelHashIncreaseNumBuckets(HashJoinTable hashtable)
}
}
+/*
+ * ExecJoinPartitionPrune
+ * Perform join partition pruning at this join for each
+ * JoinPartitionPruneState.
+ */
+static void
+ExecJoinPartitionPrune(HashState *node)
+{
+ ListCell *lc;
+
+ foreach(lc, node->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ Bitmapset *matching_subPlans;
+
+ if (jpstate->finished)
+ continue;
+
+ matching_subPlans =
+ ExecFindMatchingSubPlans(jpstate->part_prune_state, false, NULL);
+ jpstate->part_prune_result =
+ bms_add_members(jpstate->part_prune_result, matching_subPlans);
+
+ if (bms_num_members(jpstate->part_prune_result) == jpstate->nplans)
+ jpstate->finished = true;
+ }
+}
+
+/*
+ * ExecStoreJoinPartitionPruneResult
+ * For each JoinPartitionPruneState, store the set of surviving partitions
+ * to make it available for the Append/MergeAppend node.
+ */
+static void
+ExecStoreJoinPartitionPruneResult(HashState *node)
+{
+ ListCell *lc;
+
+ foreach(lc, node->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ ParamExecData *param;
+
+ param = &(node->ps.state->es_param_exec_vals[jpstate->paramid]);
+ Assert(param->execPlan == NULL);
+ Assert(!param->isnull);
+ param->value = PointerGetDatum(jpstate);
+ }
+}
+
/*
* ExecHashTableInsert
* insert a tuple into the hash table depending on the hash value
@@ -2350,6 +2415,16 @@ void
ExecReScanHash(HashState *node)
{
PlanState *outerPlan = outerPlanState(node);
+ ListCell *lc;
+
+ /* reset the state in JoinPartitionPruneStates */
+ foreach(lc, node->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+
+ jpstate->finished = false;
+ jpstate->part_prune_result = NULL;
+ }
/*
* if chgParam of subnode is not null then plan will be re-scanned by
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 980746128b..ec9a660635 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -311,6 +311,16 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
*/
node->hj_FirstOuterTupleSlot = NULL;
}
+ else if (hashNode->joinpartprune_state_list != NIL)
+ {
+ /*
+ * Give the hash node a chance to run join partition
+ * pruning if there is any JoinPartitionPruneState that can
+ * be evaluated at it. So do not apply the empty-outer
+ * optimization in this case.
+ */
+ node->hj_FirstOuterTupleSlot = NULL;
+ }
else if (HJ_FILL_OUTER(node) ||
(outerNode->plan->startup_cost < hashNode->ps.plan->total_cost &&
!node->hj_OuterNotEmpty))
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 21b5726e6e..9eb276abc8 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -99,11 +99,13 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
nplans = bms_num_members(validsubplans);
/*
- * When no run-time pruning is required and there's at least one
- * subplan, we can fill ms_valid_subplans immediately, preventing
- * later calls to ExecFindMatchingSubPlans.
+ * When no run-time pruning or join pruning is required and there's at
+ * least one subplan, we can fill ms_valid_subplans immediately,
+ * preventing later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (!prunestate->do_exec_prune &&
+ bms_is_empty(node->join_prune_paramids) &&
+ nplans > 0)
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -115,9 +117,15 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplans as valid; they must also all be initialized.
*/
Assert(nplans > 0);
- mergestate->ms_valid_subplans = validsubplans =
- bms_add_range(NULL, 0, nplans - 1);
+ validsubplans = bms_add_range(NULL, 0, nplans - 1);
mergestate->ms_prune_state = NULL;
+
+ /*
+ * When join pruning is not enabled we can fill ms_valid_subplans
+ * immediately, preventing later calls to ExecFindMatchingSubPlans.
+ */
+ if (bms_is_empty(node->join_prune_paramids))
+ mergestate->ms_valid_subplans = validsubplans;
}
mergeplanstates = (PlanState **) palloc(nplans * sizeof(PlanState *));
@@ -218,7 +226,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, &node->ps);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index d6ceafd51c..9bdc88a9db 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -173,6 +173,10 @@ static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
static bool has_indexed_join_quals(NestPath *path);
static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
List *quals);
+static double get_joinrel_matching_outer_size(PlannerInfo *root,
+ RelOptInfo *outer_rel,
+ Relids inner_relids,
+ List *restrictlist);
static double calc_joinrel_size_estimate(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outer_rel,
@@ -5380,6 +5384,61 @@ get_parameterized_joinrel_size(PlannerInfo *root, RelOptInfo *rel,
return nrows;
}
+/*
+ * get_joinrel_matching_outer_size
+ * Make a size estimate for the outer side that matches the inner side.
+ */
+static double
+get_joinrel_matching_outer_size(PlannerInfo *root,
+ RelOptInfo *outer_rel,
+ Relids inner_relids,
+ List *restrictlist)
+{
+ double nrows;
+ Selectivity fkselec;
+ Selectivity jselec;
+ SpecialJoinInfo *sjinfo;
+ SpecialJoinInfo sjinfo_data;
+
+ sjinfo = &sjinfo_data;
+ sjinfo->type = T_SpecialJoinInfo;
+ sjinfo->min_lefthand = outer_rel->relids;
+ sjinfo->min_righthand = inner_relids;
+ sjinfo->syn_lefthand = outer_rel->relids;
+ sjinfo->syn_righthand = inner_relids;
+ sjinfo->jointype = JOIN_SEMI;
+ sjinfo->ojrelid = 0;
+ sjinfo->commute_above_l = NULL;
+ sjinfo->commute_above_r = NULL;
+ sjinfo->commute_below_l = NULL;
+ sjinfo->commute_below_r = NULL;
+ /* we don't bother trying to make the remaining fields valid */
+ sjinfo->lhs_strict = false;
+ sjinfo->semi_can_btree = false;
+ sjinfo->semi_can_hash = false;
+ sjinfo->semi_operators = NIL;
+ sjinfo->semi_rhs_exprs = NIL;
+
+ fkselec = get_foreign_key_join_selectivity(root,
+ outer_rel->relids,
+ inner_relids,
+ sjinfo,
+ &restrictlist);
+ jselec = clauselist_selectivity(root,
+ restrictlist,
+ 0,
+ sjinfo->jointype,
+ sjinfo);
+
+ nrows = outer_rel->rows * fkselec * jselec;
+ nrows = clamp_row_est(nrows);
+
+ /* For safety, make sure result is not more than the base estimate */
+ if (nrows > outer_rel->rows)
+ nrows = outer_rel->rows;
+ return nrows;
+}
+
/*
* calc_joinrel_size_estimate
* Workhorse for set_joinrel_size_estimates and
@@ -6495,3 +6554,50 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
return pages_fetched;
}
+
+/*
+ * compute_partprune_cost
+ * Compute the overhead of join partition pruning.
+ */
+double
+compute_partprune_cost(PlannerInfo *root, RelOptInfo *appendrel,
+ Cost append_total_cost, int append_nplans,
+ Relids inner_relids, double inner_rows,
+ List *prunequal)
+{
+ Cost prune_cost;
+ Cost saved_cost;
+ double matching_outer_rows;
+ double unmatched_nplans;
+
+ switch (appendrel->part_scheme->strategy)
+ {
+
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ prune_cost = cpu_operator_cost * LOG2(append_nplans) * inner_rows;
+ break;
+ case PARTITION_STRATEGY_HASH:
+ prune_cost = cpu_operator_cost * append_nplans * inner_rows;
+ break;
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) appendrel->part_scheme->strategy);
+ break;
+ }
+
+ matching_outer_rows =
+ get_joinrel_matching_outer_size(root,
+ appendrel,
+ inner_relids,
+ prunequal);
+
+ /*
+ * We assume that each outer joined row occupies one new partition. This
+ * is really the worst case.
+ */
+ unmatched_nplans = append_nplans - Min(matching_outer_rows, append_nplans);
+ saved_cost = (unmatched_nplans / append_nplans) * append_total_cost;
+
+ return prune_cost - saved_cost;
+}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 34ca6d4ac2..308ff452d3 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -242,7 +242,8 @@ static Hash *make_hash(Plan *lefttree,
List *hashkeys,
Oid skewTable,
AttrNumber skewColumn,
- bool skewInherit);
+ bool skewInherit,
+ List *joinpartprune_info_list);
static MergeJoin *make_mergejoin(List *tlist,
List *joinclauses, List *otherclauses,
List *mergeclauses,
@@ -342,6 +343,7 @@ create_plan(PlannerInfo *root, Path *best_path)
/* Initialize this module's workspace in PlannerInfo */
root->curOuterRels = NULL;
root->curOuterParams = NIL;
+ root->join_partition_prune_candidates = NIL;
/* Recursively process the path tree, demanding the correct tlist result */
plan = create_plan_recurse(root, best_path, CP_EXACT_TLIST);
@@ -369,6 +371,8 @@ create_plan(PlannerInfo *root, Path *best_path)
if (root->curOuterParams != NIL)
elog(ERROR, "failed to assign all NestLoopParams to plan nodes");
+ Assert(root->join_partition_prune_candidates == NIL);
+
/*
* Reset plan_params to ensure param IDs used for nestloop params are not
* re-used later
@@ -1223,6 +1227,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ Bitmapset *join_prune_paramids = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1377,6 +1382,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
* do partition pruning.
+ *
+ * Also gather information needed by the executor to do join pruning.
*/
if (enable_partition_pruning)
{
@@ -1399,13 +1406,20 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
partpruneinfo =
make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ NULL);
+
+ join_prune_paramids =
+ make_join_partition_pruneinfos(root, rel,
+ (Path *) best_path,
+ best_path->subpaths);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
plan->part_prune_info = partpruneinfo;
+ plan->join_prune_paramids = join_prune_paramids;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1445,6 +1459,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ Bitmapset *join_prune_paramids = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1541,6 +1556,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
* do partition pruning.
+ *
+ * Also gather information needed by the executor to do join pruning.
*/
if (enable_partition_pruning)
{
@@ -1554,11 +1571,18 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
partpruneinfo = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ NULL);
+
+ join_prune_paramids =
+ make_join_partition_pruneinfos(root, rel,
+ (Path *) best_path,
+ best_path->subpaths);
}
node->mergeplans = subplans;
node->part_prune_info = partpruneinfo;
+ node->join_prune_paramids = join_prune_paramids;
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
@@ -4734,6 +4758,13 @@ create_hashjoin_plan(PlannerInfo *root,
AttrNumber skewColumn = InvalidAttrNumber;
bool skewInherit = false;
ListCell *lc;
+ List *joinpartprune_info_list;
+
+ /*
+ * Collect information required to build JoinPartitionPruneInfos at this
+ * join.
+ */
+ prepare_join_partition_prune_candidate(root, &best_path->jpath);
/*
* HashJoin can project, so we don't have to demand exact tlists from the
@@ -4745,6 +4776,11 @@ create_hashjoin_plan(PlannerInfo *root,
outer_plan = create_plan_recurse(root, best_path->jpath.outerjoinpath,
(best_path->num_batches > 1) ? CP_SMALL_TLIST : 0);
+ /*
+ * Retrieve all the JoinPartitionPruneInfos for this join.
+ */
+ joinpartprune_info_list = get_join_partition_prune_candidate(root);
+
inner_plan = create_plan_recurse(root, best_path->jpath.innerjoinpath,
CP_SMALL_TLIST);
@@ -4850,7 +4886,8 @@ create_hashjoin_plan(PlannerInfo *root,
inner_hashkeys,
skewTable,
skewColumn,
- skewInherit);
+ skewInherit,
+ joinpartprune_info_list);
/*
* Set Hash node's startup & total costs equal to total cost of input
@@ -5977,7 +6014,8 @@ make_hash(Plan *lefttree,
List *hashkeys,
Oid skewTable,
AttrNumber skewColumn,
- bool skewInherit)
+ bool skewInherit,
+ List *joinpartprune_info_list)
{
Hash *node = makeNode(Hash);
Plan *plan = &node->plan;
@@ -5991,6 +6029,7 @@ make_hash(Plan *lefttree,
node->skewTable = skewTable;
node->skewColumn = skewColumn;
node->skewInherit = skewInherit;
+ node->joinpartprune_info_list = joinpartprune_info_list;
return node;
}
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 97fa561e4e..7013f7f656 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -156,6 +156,11 @@ static Plan *set_mergeappend_references(PlannerInfo *root,
MergeAppend *mplan,
int rtoffset);
static void set_hash_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_joinpartitionprune_references(PlannerInfo *root,
+ List *joinpartprune_info_list,
+ indexed_tlist *outer_itlist,
+ int rtoffset,
+ double num_exec);
static Relids offset_relid_set(Relids relids, int rtoffset);
static Node *fix_scan_expr(PlannerInfo *root, Node *node,
int rtoffset, double num_exec);
@@ -1897,6 +1902,62 @@ set_hash_references(PlannerInfo *root, Plan *plan, int rtoffset)
/* Hash nodes don't have their own quals */
Assert(plan->qual == NIL);
+
+ set_joinpartitionprune_references(root,
+ hplan->joinpartprune_info_list,
+ outer_itlist,
+ rtoffset,
+ NUM_EXEC_TLIST(plan));
+}
+
+/*
+ * set_joinpartitionprune_references
+ * Do set_plan_references processing on JoinPartitionPruneInfos
+ */
+static void
+set_joinpartitionprune_references(PlannerInfo *root,
+ List *joinpartprune_info_list,
+ indexed_tlist *outer_itlist,
+ int rtoffset,
+ double num_exec)
+{
+ ListCell *l;
+
+ foreach(l, joinpartprune_info_list)
+ {
+ JoinPartitionPruneInfo *jpinfo = (JoinPartitionPruneInfo *) lfirst(l);
+ ListCell *l1;
+
+ foreach(l1, jpinfo->part_prune_info->prune_infos)
+ {
+ List *prune_infos = lfirst(l1);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ pinfo->rtindex += rtoffset;
+
+ pinfo->initial_pruning_steps = (List *)
+ fix_upper_expr(root,
+ (Node *) pinfo->initial_pruning_steps,
+ outer_itlist,
+ OUTER_VAR,
+ rtoffset,
+ NRM_EQUAL,
+ num_exec);
+ pinfo->exec_pruning_steps = (List *)
+ fix_upper_expr(root,
+ (Node *) pinfo->exec_pruning_steps,
+ outer_itlist,
+ OUTER_VAR,
+ rtoffset,
+ NRM_EQUAL,
+ num_exec);
+ }
+ }
+ }
}
/*
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 7179b22a05..b8bacfd22b 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -48,7 +48,9 @@
#include "optimizer/appendinfo.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
+#include "optimizer/paramassign.h"
#include "optimizer/pathnode.h"
+#include "optimizer/restrictinfo.h"
#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partprune.h"
@@ -103,15 +105,16 @@ typedef enum PartClauseTarget
*
* gen_partprune_steps() initializes and returns an instance of this struct.
*
- * Note that has_mutable_op, has_mutable_arg, and has_exec_param are set if
- * we found any potentially-useful-for-pruning clause having those properties,
- * whether or not we actually used the clause in the steps list. This
- * definition allows us to skip the PARTTARGET_EXEC pass in some cases.
+ * Note that has_mutable_op, has_mutable_arg, has_exec_param and has_vars are
+ * set if we found any potentially-useful-for-pruning clause having those
+ * properties, whether or not we actually used the clause in the steps list.
+ * This definition allows us to skip the PARTTARGET_EXEC pass in some cases.
*/
typedef struct GeneratePruningStepsContext
{
/* Copies of input arguments for gen_partprune_steps: */
RelOptInfo *rel; /* the partitioned relation */
+ Bitmapset *available_rels; /* rels whose Vars may be used for pruning */
PartClauseTarget target; /* use-case we're generating steps for */
/* Result data: */
List *steps; /* list of PartitionPruneSteps */
@@ -119,6 +122,7 @@ typedef struct GeneratePruningStepsContext
bool has_mutable_arg; /* clauses include any mutable comparison
* values, *other than* exec params */
bool has_exec_param; /* clauses include any PARAM_EXEC params */
+ bool has_vars; /* clauses include any Vars from 'available_rels' */
bool contradictory; /* clauses were proven self-contradictory */
/* Working state: */
int next_step_id;
@@ -144,8 +148,10 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
+ Bitmapset *available_rels,
Bitmapset **matchedsubplans);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
+ Bitmapset *available_rels,
PartClauseTarget target,
GeneratePruningStepsContext *context);
static List *gen_partprune_steps_internal(GeneratePruningStepsContext *context,
@@ -206,6 +212,10 @@ static PartClauseMatchStatus match_boolean_partition_clause(Oid partopfamily,
static void partkey_datum_from_expr(PartitionPruneContext *context,
Expr *expr, int stateidx,
Datum *value, bool *isnull);
+static bool contain_forbidden_var_clause(Node *node,
+ GeneratePruningStepsContext *context);
+static bool contain_forbidden_var_clause_walker(Node *node,
+ GeneratePruningStepsContext *context);
/*
@@ -218,11 +228,14 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'available_rels' is the relid set of rels whose Vars may be used for
+ * pruning.
*/
PartitionPruneInfo *
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ Bitmapset *available_rels)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
@@ -315,6 +328,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
prunequal,
partrelids,
relid_subplan_map,
+ available_rels,
&matchedsubplans);
/* When pruning is possible, record the matched subplans */
@@ -362,6 +376,174 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return pruneinfo;
}
+/*
+ * make_join_partition_pruneinfos
+ * Builds one JoinPartitionPruneInfo for each join at which join partition
+ * pruning is possible for this appendrel.
+ *
+ * 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
+ * of scan paths for its child rels.
+ */
+Bitmapset *
+make_join_partition_pruneinfos(PlannerInfo *root, RelOptInfo *parentrel,
+ Path *best_path, List *subpaths)
+{
+ Bitmapset *result = NULL;
+ ListCell *lc;
+
+ if (!IS_PARTITIONED_REL(parentrel))
+ return NULL;
+
+ foreach(lc, root->join_partition_prune_candidates)
+ {
+ JoinPartitionPruneCandidateInfo *candidate =
+ (JoinPartitionPruneCandidateInfo *) lfirst(lc);
+ PartitionPruneInfo *part_prune_info;
+ List *prunequal;
+ Relids joinrelids;
+ ListCell *l;
+ double prune_cost;
+
+ if (candidate == NULL)
+ continue;
+
+ /*
+ * Identify all joinclauses that are movable to this appendrel given
+ * this inner side relids. Only those clauses can be used for join
+ * partition pruning.
+ */
+ joinrelids = bms_union(parentrel->relids, candidate->inner_relids);
+ prunequal = NIL;
+ foreach(l, candidate->joinrestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ if (join_clause_is_movable_into(rinfo,
+ parentrel->relids,
+ joinrelids))
+ prunequal = lappend(prunequal, rinfo);
+ }
+
+ if (prunequal == NIL)
+ continue;
+
+ /*
+ * Check the overhead of this pruning
+ */
+ prune_cost = compute_partprune_cost(root,
+ parentrel,
+ best_path->total_cost,
+ list_length(subpaths),
+ candidate->inner_relids,
+ candidate->inner_rows,
+ prunequal);
+ if (prune_cost > 0)
+ continue;
+
+ part_prune_info = make_partition_pruneinfo(root, parentrel,
+ subpaths,
+ prunequal,
+ candidate->inner_relids);
+
+ if (part_prune_info)
+ {
+ JoinPartitionPruneInfo *jpinfo;
+
+ jpinfo = palloc(sizeof(JoinPartitionPruneInfo));
+
+ jpinfo->part_prune_info = part_prune_info;
+ jpinfo->paramid = assign_special_exec_param(root);
+ jpinfo->nplans = list_length(subpaths);
+
+ candidate->joinpartprune_info_list =
+ lappend(candidate->joinpartprune_info_list, jpinfo);
+
+ result = bms_add_member(result, jpinfo->paramid);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * prepare_join_partition_prune_candidate
+ * Check if join partition pruning is possible at this join and if so
+ * collect information required to build JoinPartitionPruneInfos.
+ *
+ * Note that we may build more than one JoinPartitionPruneInfo at one join, for
+ * different Append/MergeAppend paths.
+ */
+void
+prepare_join_partition_prune_candidate(PlannerInfo *root, JoinPath *jpath)
+{
+ JoinPartitionPruneCandidateInfo *candidate;
+
+ if (!enable_partition_pruning)
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ /*
+ * We cannot perform join partition pruning if the outer is the
+ * non-nullable side.
+ */
+ if (!(jpath->jointype == JOIN_INNER ||
+ jpath->jointype == JOIN_SEMI ||
+ jpath->jointype == JOIN_RIGHT ||
+ jpath->jointype == JOIN_RIGHT_ANTI))
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ /*
+ * For now we only support HashJoin.
+ */
+ if (jpath->path.pathtype != T_HashJoin)
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ candidate = palloc(sizeof(JoinPartitionPruneCandidateInfo));
+ candidate->joinrestrictinfo = jpath->joinrestrictinfo;
+ candidate->inner_relids = jpath->innerjoinpath->parent->relids;
+ candidate->inner_rows = jpath->innerjoinpath->parent->rows;
+ candidate->joinpartprune_info_list = NIL;
+
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, candidate);
+}
+
+/*
+ * get_join_partition_prune_candidate
+ * Pop out the JoinPartitionPruneCandidateInfo for this join and retrieve
+ * the JoinPartitionPruneInfos.
+ */
+List *
+get_join_partition_prune_candidate(PlannerInfo *root)
+{
+ JoinPartitionPruneCandidateInfo *candidate;
+ List *result;
+
+ candidate = llast(root->join_partition_prune_candidates);
+ root->join_partition_prune_candidates =
+ list_delete_last(root->join_partition_prune_candidates);
+
+ if (candidate == NULL)
+ return NIL;
+
+ result = candidate->joinpartprune_info_list;
+
+ pfree(candidate);
+
+ return result;
+}
+
/*
* add_part_relids
* Add new info to a list of Bitmapsets of partitioned relids.
@@ -430,6 +612,8 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* partrelids: Set of RT indexes identifying relevant partitioned tables
* within a single partitioning hierarchy
* relid_subplan_map[]: maps child relation relids to subplan indexes
+ * available_rels: the relid set of rels whose Vars may be used for
+ * pruning.
* matchedsubplans: on success, receives the set of subplan indexes which
* were matched to this partition hierarchy
*
@@ -442,6 +626,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
+ Bitmapset *available_rels,
Bitmapset **matchedsubplans)
{
RelOptInfo *targetpart = NULL;
@@ -541,8 +726,8 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
*/
- gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
- &context);
+ gen_partprune_steps(subpart, partprunequal, available_rels,
+ PARTTARGET_INITIAL, &context);
if (context.contradictory)
{
@@ -569,14 +754,15 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
initial_pruning_steps = NIL;
/*
- * If no exec Params appear in potentially-usable pruning clauses,
- * then there's no point in even thinking about per-scan pruning.
+ * If no exec Params or available Vars appear in potentially-usable
+ * pruning clauses, then there's no point in even thinking about
+ * per-scan pruning.
*/
- if (context.has_exec_param)
+ if (context.has_exec_param || context.has_vars)
{
/* ... OK, we'd better think about it */
- gen_partprune_steps(subpart, partprunequal, PARTTARGET_EXEC,
- &context);
+ gen_partprune_steps(subpart, partprunequal, available_rels,
+ PARTTARGET_EXEC, &context);
if (context.contradictory)
{
@@ -589,11 +775,14 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/*
* Detect which exec Params actually got used; the fact that some
* were in available clauses doesn't mean we actually used them.
- * Skip per-scan pruning if there are none.
*/
execparamids = get_partkey_exec_paramids(exec_pruning_steps);
- if (bms_is_empty(execparamids))
+ /*
+ * Skip per-scan pruning if there are none used exec Params and
+ * there are none available Vars.
+ */
+ if (bms_is_empty(execparamids) && !context.has_vars)
exec_pruning_steps = NIL;
}
else
@@ -705,6 +894,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* Process 'clauses' (typically a rel's baserestrictinfo list of clauses)
* and create a list of "partition pruning steps".
*
+ * 'available_rels' is the relid set of rels whose Vars may be used for
+ * pruning.
+ *
* 'target' tells whether to generate pruning steps for planning (use
* immutable clauses only), or for executor startup (use any allowable
* clause except ones containing PARAM_EXEC Params), or for executor
@@ -714,12 +906,13 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* some subsidiary flags; see the GeneratePruningStepsContext typedef.
*/
static void
-gen_partprune_steps(RelOptInfo *rel, List *clauses, PartClauseTarget target,
- GeneratePruningStepsContext *context)
+gen_partprune_steps(RelOptInfo *rel, List *clauses, Bitmapset *available_rels,
+ PartClauseTarget target, GeneratePruningStepsContext *context)
{
/* Initialize all output values to zero/false/NULL */
memset(context, 0, sizeof(GeneratePruningStepsContext));
context->rel = rel;
+ context->available_rels = available_rels;
context->target = target;
/*
@@ -775,7 +968,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
* If the clauses are found to be contradictory, we can return the empty
* set.
*/
- gen_partprune_steps(rel, clauses, PARTTARGET_PLANNER,
+ gen_partprune_steps(rel, clauses, NULL, PARTTARGET_PLANNER,
&gcontext);
if (gcontext.contradictory)
return NULL;
@@ -1962,9 +2155,10 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
return PARTCLAUSE_UNSUPPORTED;
/*
- * We can never prune using an expression that contains Vars.
+ * We can never prune using an expression that contains Vars except
+ * for Vars belonging to context->available_rels.
*/
- if (contain_var_clause((Node *) expr))
+ if (contain_forbidden_var_clause((Node *) expr, context))
return PARTCLAUSE_UNSUPPORTED;
/*
@@ -2160,9 +2354,10 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
return PARTCLAUSE_UNSUPPORTED;
/*
- * We can never prune using an expression that contains Vars.
+ * We can never prune using an expression that contains Vars except
+ * for Vars belonging to context->available_rels.
*/
- if (contain_var_clause((Node *) rightop))
+ if (contain_forbidden_var_clause((Node *) rightop, context))
return PARTCLAUSE_UNSUPPORTED;
/*
@@ -3712,3 +3907,54 @@ partkey_datum_from_expr(PartitionPruneContext *context,
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
+
+/*
+ * contain_forbidden_var_clause
+ * Recursively scan a clause to discover whether it contains any Var nodes
+ * (of the current query level) that do not belong to relations in
+ * context->available_rels.
+ *
+ * Returns true if any such varnode found.
+ *
+ * Does not examine subqueries, therefore must only be used after reduction
+ * of sublinks to subplans!
+ */
+static bool
+contain_forbidden_var_clause(Node *node, GeneratePruningStepsContext *context)
+{
+ return contain_forbidden_var_clause_walker(node, context);
+}
+
+static bool
+contain_forbidden_var_clause_walker(Node *node, GeneratePruningStepsContext *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varlevelsup != 0)
+ return false;
+
+ if (!bms_is_member(var->varno, context->available_rels))
+ return true; /* abort the tree traversal and return true */
+
+ context->has_vars = true;
+
+ if (context->target != PARTTARGET_EXEC)
+ return true; /* abort the tree traversal and return true */
+
+ return false;
+ }
+ if (IsA(node, CurrentOfExpr))
+ return true;
+ if (IsA(node, PlaceHolderVar))
+ {
+ if (((PlaceHolderVar *) node)->phlevelsup == 0)
+ return true; /* abort the tree traversal and return true */
+ /* else fall through to check the contained expr */
+ }
+ return expression_tree_walker(node, contain_forbidden_var_clause_walker,
+ (void *) context);
+}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 15ec869ac8..720bcc1149 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -121,11 +121,26 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+/*
+ * JoinPartitionPruneState - State object required for plan nodes to perform
+ * join partition pruning.
+ */
+typedef struct JoinPartitionPruneState
+{
+ PartitionPruneState *part_prune_state;
+ int paramid;
+ int nplans;
+ bool finished;
+ Bitmapset *part_prune_result;
+} JoinPartitionPruneState;
+
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
PartitionPruneInfo *pruneinfo,
Bitmapset **initially_valid_subplans);
+extern List *ExecInitJoinpartpruneList(PlanState *planstate, List *joinpartprune_info_list);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ PlanState *planstate);
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cb714f4a19..9c8440d00c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2679,6 +2679,9 @@ typedef struct HashState
/* Parallel hash state. */
struct ParallelHashJoinState *parallel_state;
+
+ /* Infos for join partition pruning. */
+ List *joinpartprune_state_list;
} HashState;
/* ----------------
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 5702fbba60..297a683b4a 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -530,6 +530,9 @@ struct PlannerInfo
/* not-yet-assigned NestLoopParams */
List *curOuterParams;
+ /* a stack of JoinPartitionPruneInfos */
+ List *join_partition_prune_candidates;
+
/*
* These fields are workspace for setrefs.c. Each is an array
* corresponding to glob->subplans. (We could probably teach
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 1b787fe031..2749206e0b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -275,6 +275,9 @@ typedef struct Append
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /* Info for join partition pruning; NULL if we're not doing that */
+ Bitmapset *join_prune_paramids;
} Append;
/* ----------------
@@ -310,6 +313,9 @@ typedef struct MergeAppend
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /* Info for join partition pruning; NULL if we're not doing that */
+ Bitmapset *join_prune_paramids;
} MergeAppend;
/* ----------------
@@ -1206,6 +1212,7 @@ typedef struct Hash
bool skewInherit; /* is outer join rel an inheritance tree? */
/* all other info is in the parent HashJoin node */
Cardinality rows_total; /* estimate total rows if parallel_aware */
+ List *joinpartprune_info_list; /* infos for join partition pruning */
} Hash;
/* ----------------
@@ -1552,6 +1559,29 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*
+ * JoinPartitionPruneCandidateInfo - Information required to build
+ * JoinPartitionPruneInfos.
+ */
+typedef struct JoinPartitionPruneCandidateInfo
+{
+ List *joinrestrictinfo;
+ Bitmapset *inner_relids;
+ double inner_rows;
+ List *joinpartprune_info_list;
+} JoinPartitionPruneCandidateInfo;
+
+/*
+ * JoinPartitionPruneInfo - Details required to allow the executor to prune
+ * partitions during join.
+ */
+typedef struct JoinPartitionPruneInfo
+{
+ PartitionPruneInfo *part_prune_info;
+ int paramid;
+ int nplans;
+} JoinPartitionPruneInfo;
+
/*
* Plan invalidation info
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index bee090ffc2..4ab9accf70 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -211,5 +211,9 @@ extern void set_foreign_size_estimates(PlannerInfo *root, RelOptInfo *rel);
extern PathTarget *set_pathtarget_cost_width(PlannerInfo *root, PathTarget *target);
extern double compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel,
Path *bitmapqual, int loop_count, Cost *cost, double *tuple);
+extern double compute_partprune_cost(PlannerInfo *root, RelOptInfo *appendrel,
+ Cost append_total_cost, int append_nplans,
+ Relids inner_relids, double inner_rows,
+ List *prunequal);
#endif /* COST_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 8636e04e37..899aa61b34 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -19,6 +19,8 @@
struct PlannerInfo; /* avoid including pathnodes.h here */
struct RelOptInfo;
+struct Path;
+struct JoinPath;
/*
@@ -73,7 +75,15 @@ typedef struct PartitionPruneContext
extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ Bitmapset *available_rels);
+extern Bitmapset *make_join_partition_pruneinfos(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ struct Path *best_path,
+ List *subpaths);
+extern void prepare_join_partition_prune_candidate(struct PlannerInfo *root,
+ struct JoinPath *jpath);
+extern List *get_join_partition_prune_candidate(struct PlannerInfo *root);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 1eb347503a..80cc8cdae6 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -2983,6 +2983,73 @@ order by tbl1.col1, tprt.col1;
------+------
(0 rows)
+-- join partition pruning
+delete from tbl1;
+insert into tbl1 values (501), (505);
+analyze tbl1, tprt;
+set enable_nestloop = off;
+set enable_mergejoin = off;
+set enable_hashjoin = on;
+explain (analyze, verbose, costs off, summary off, timing off)
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;
+ QUERY PLAN
+--------------------------------------------------------------------------------
+ Hash Right Join (actual rows=2 loops=1)
+ Output: p1.col1, p2.col1, t.col1
+ Hash Cond: ((p1.col1 = t.col1) AND (p2.col1 = t.col1))
+ -> Hash Join (actual rows=3 loops=1)
+ Output: p1.col1, p2.col1
+ Hash Cond: (p1.col1 = p2.col1)
+ -> Append (actual rows=3 loops=1)
+ Join Partition Pruning: $0
+ -> Seq Scan on public.tprt_1 p1_1 (never executed)
+ Output: p1_1.col1
+ -> Seq Scan on public.tprt_2 p1_2 (actual rows=3 loops=1)
+ Output: p1_2.col1
+ -> Seq Scan on public.tprt_3 p1_3 (never executed)
+ Output: p1_3.col1
+ -> Seq Scan on public.tprt_4 p1_4 (never executed)
+ Output: p1_4.col1
+ -> Seq Scan on public.tprt_5 p1_5 (never executed)
+ Output: p1_5.col1
+ -> Seq Scan on public.tprt_6 p1_6 (never executed)
+ Output: p1_6.col1
+ -> Hash (actual rows=3 loops=1)
+ Output: p2.col1
+ Buckets: 1024 Batches: 1 Memory Usage: 9kB
+ -> Append (actual rows=3 loops=1)
+ Join Partition Pruning: $1
+ -> Seq Scan on public.tprt_1 p2_1 (never executed)
+ Output: p2_1.col1
+ -> Seq Scan on public.tprt_2 p2_2 (actual rows=3 loops=1)
+ Output: p2_2.col1
+ -> Seq Scan on public.tprt_3 p2_3 (never executed)
+ Output: p2_3.col1
+ -> Seq Scan on public.tprt_4 p2_4 (never executed)
+ Output: p2_4.col1
+ -> Seq Scan on public.tprt_5 p2_5 (never executed)
+ Output: p2_5.col1
+ -> Seq Scan on public.tprt_6 p2_6 (never executed)
+ Output: p2_6.col1
+ -> Hash (actual rows=2 loops=1)
+ Output: t.col1
+ Buckets: 1024 Batches: 1 Memory Usage: 9kB
+ Partition Prune: $0, $1
+ -> Seq Scan on public.tbl1 t (actual rows=2 loops=1)
+ Output: t.col1
+(43 rows)
+
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;
+ col1 | col1 | col1
+------+------+------
+ 501 | 501 | 501
+ 505 | 505 | 505
+(2 rows)
+
drop table tbl1, tprt;
-- Test with columns defined in varying orders between each level
create table part_abc (a int not null, b int not null, c int not null) partition by list (a);
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d1c60b8fe9..976a2e2b89 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -710,6 +710,24 @@ select tbl1.col1, tprt.col1 from tbl1
inner join tprt on tbl1.col1 = tprt.col1
order by tbl1.col1, tprt.col1;
+-- join partition pruning
+delete from tbl1;
+insert into tbl1 values (501), (505);
+analyze tbl1, tprt;
+
+set enable_nestloop = off;
+set enable_mergejoin = off;
+set enable_hashjoin = on;
+
+explain (analyze, verbose, costs off, summary off, timing off)
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;
+
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;
+
drop table tbl1, tprt;
-- Test with columns defined in varying orders between each level
--
2.31.0
On Tue, Aug 29, 2023 at 6:41 PM Richard Guo <guofenglinux@gmail.com> wrote:
So it seems that the new costing logic is quite crude and tends to be
very conservative, but it can help avoid the large overhead in the worst
cases. I think this might be a good start to push this patch forward.Any thoughts or comments?
I rebased this patch over the latest master. Nothing changed except
that I revised the new added test case to make it more stable.
However, the cfbot indicates that there are test cases that fail on
FreeBSD [1]https://api.cirrus-ci.com/v1/artifact/task/5334808075698176/testrun/build/testrun/regress/regress/regression.diffs (no failure on other platforms). So I set up a FreeBSD-13
locally but just cannot reproduce the failure. I must be doing
something wrong. Can anyone give me some hints or suggestions?
FYI. The failure looks like:
explain (costs off)
select p2.a, p1.c from permtest_parent p1 inner join permtest_parent p2
on p1.a = p2.a and left(p1.c, 3) ~ 'a1$';
- QUERY PLAN
-----------------------------------------------------
- Hash Join
- Hash Cond: (p2.a = p1.a)
- -> Seq Scan on permtest_grandchild p2
- -> Hash
- -> Seq Scan on permtest_grandchild p1
- Filter: ("left"(c, 3) ~ 'a1$'::text)
-(6 rows)
-
+ERROR: unrecognized node type: 1130127496
[1]: https://api.cirrus-ci.com/v1/artifact/task/5334808075698176/testrun/build/testrun/regress/regress/regression.diffs
https://api.cirrus-ci.com/v1/artifact/task/5334808075698176/testrun/build/testrun/regress/regress/regression.diffs
Thanks
Richard
Attachments:
v3-0001-Support-run-time-partition-pruning-for-hash-join.patchapplication/octet-stream; name=v3-0001-Support-run-time-partition-pruning-for-hash-join.patchDownload
From 69602dd1062ea4eeecf0008b34b4fca3617d2620 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Mon, 14 Aug 2023 14:55:26 +0800
Subject: [PATCH v3] Support run-time partition pruning for hash join
If we have a hash join with an Append node on the outer side, something
like
Hash Join
Hash Cond: (pt.a = t.a)
-> Append
-> Seq Scan on pt_p1 pt_1
-> Seq Scan on pt_p2 pt_2
-> Seq Scan on pt_p3 pt_3
-> Hash
-> Seq Scan on t
We can actually prune those subnodes of the Append that cannot possibly
contain any matching tuples from the other side of the join. To do
that, when building the Hash table, for each row from the inner side we
can compute the minimum set of subnodes that can possibly match the join
condition. When we have built the Hash table and start to execute the
Append node, we should have known which subnodes are survived and thus
can skip other subnodes.
This patch implements this idea.
---
src/backend/commands/explain.c | 61 ++++
src/backend/executor/execPartition.c | 127 +++++++-
src/backend/executor/nodeAppend.c | 32 +-
src/backend/executor/nodeHash.c | 75 +++++
src/backend/executor/nodeHashjoin.c | 10 +
src/backend/executor/nodeMergeAppend.c | 22 +-
src/backend/optimizer/path/costsize.c | 106 +++++++
src/backend/optimizer/plan/createplan.c | 49 ++-
src/backend/optimizer/plan/setrefs.c | 61 ++++
src/backend/partitioning/partprune.c | 288 ++++++++++++++++--
src/include/executor/execPartition.h | 17 +-
src/include/nodes/execnodes.h | 3 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 30 ++
src/include/optimizer/cost.h | 4 +
src/include/partitioning/partprune.h | 12 +-
src/test/regress/expected/partition_prune.out | 86 ++++++
src/test/regress/sql/partition_prune.sql | 39 +++
18 files changed, 976 insertions(+), 49 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f1d71bc54e..c51cf6beb6 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -18,6 +18,7 @@
#include "commands/createas.h"
#include "commands/defrem.h"
#include "commands/prepare.h"
+#include "executor/execPartition.h"
#include "executor/nodeHash.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -118,6 +119,9 @@ static void show_instrumentation_count(const char *qlabel, int which,
PlanState *planstate, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
+static void show_join_pruning_result_info(Bitmapset *join_prune_paramids,
+ ExplainState *es);
+static void show_joinpartprune_info(HashState *hashstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static void show_buffer_usage(ExplainState *es, const BufferUsage *usage,
bool planning);
@@ -2057,9 +2061,17 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_incremental_sort_info(castNode(IncrementalSortState, planstate),
es);
break;
+ case T_Append:
+ if (es->verbose)
+ show_join_pruning_result_info(((Append *) plan)->join_prune_paramids,
+ es);
+ break;
case T_MergeAppend:
show_merge_append_keys(castNode(MergeAppendState, planstate),
ancestors, es);
+ if (es->verbose)
+ show_join_pruning_result_info(((MergeAppend *) plan)->join_prune_paramids,
+ es);
break;
case T_Result:
show_upper_qual((List *) ((Result *) plan)->resconstantqual,
@@ -2075,6 +2087,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
break;
case T_Hash:
show_hash_info(castNode(HashState, planstate), es);
+ if (es->verbose)
+ show_joinpartprune_info(castNode(HashState, planstate), es);
break;
case T_Memoize:
show_memoize_info(castNode(MemoizeState, planstate), ancestors,
@@ -3515,6 +3529,53 @@ show_eval_params(Bitmapset *bms_params, ExplainState *es)
ExplainPropertyList("Params Evaluated", params, es);
}
+/*
+ * Show join partition pruning results at Append/MergeAppend nodes.
+ */
+static void
+show_join_pruning_result_info(Bitmapset *join_prune_paramids, ExplainState *es)
+{
+ int paramid = -1;
+ List *params = NIL;
+
+ if (bms_is_empty(join_prune_paramids))
+ return;
+
+ while ((paramid = bms_next_member(join_prune_paramids, paramid)) >= 0)
+ {
+ char param[32];
+
+ snprintf(param, sizeof(param), "$%d", paramid);
+ params = lappend(params, pstrdup(param));
+ }
+
+ ExplainPropertyList("Join Partition Pruning", params, es);
+}
+
+/*
+ * Show join partition pruning infos at Hash nodes.
+ */
+static void
+show_joinpartprune_info(HashState *hashstate, ExplainState *es)
+{
+ List *params = NIL;
+ ListCell *lc;
+
+ if (!hashstate->joinpartprune_state_list)
+ return;
+
+ foreach(lc, hashstate->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ char param[32];
+
+ snprintf(param, sizeof(param), "$%d", jpstate->paramid);
+ params = lappend(params, pstrdup(param));
+ }
+
+ ExplainPropertyList("Partition Prune", params, es);
+}
+
/*
* Fetch the name of an index in an EXPLAIN
*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index f6c34328b8..35a9149a39 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -199,6 +199,8 @@ static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
Bitmapset **validsubplans);
+static bool get_join_prune_matching_subplans(PlanState *planstate,
+ Bitmapset **partset);
/*
@@ -1806,7 +1808,7 @@ ExecInitPartitionPruning(PlanState *planstate,
* Perform an initial partition prune pass, if required.
*/
if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true, NULL);
else
{
/* No pruning, so we'll need to initialize all subplans */
@@ -1836,6 +1838,37 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecInitJoinpartpruneList
+ * Initialize data structures needed for join partition pruning
+ */
+List *
+ExecInitJoinpartpruneList(PlanState *planstate,
+ List *joinpartprune_info_list)
+{
+ ListCell *lc;
+ List *result = NIL;
+
+ foreach(lc, joinpartprune_info_list)
+ {
+ JoinPartitionPruneInfo *jpinfo = (JoinPartitionPruneInfo *) lfirst(lc);
+ JoinPartitionPruneState *jpstate = palloc(sizeof(JoinPartitionPruneState));
+
+ jpstate->part_prune_state =
+ CreatePartitionPruneState(planstate, jpinfo->part_prune_info);
+ Assert(jpstate->part_prune_state->do_exec_prune);
+
+ jpstate->paramid = jpinfo->paramid;
+ jpstate->nplans = jpinfo->nplans;
+ jpstate->finished = false;
+ jpstate->part_prune_result = NULL;
+
+ result = lappend(result, jpstate);
+ }
+
+ return result;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
@@ -2273,7 +2306,9 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
/*
* ExecFindMatchingSubPlans
* Determine which subplans match the pruning steps detailed in
- * 'prunestate' for the current comparison expression values.
+ * 'prunestate' if any for the current comparison expression values, and
+ * meanwhile match the join partition pruning results if any stored in
+ * Append/MergeAppend node's join_prune_paramids.
*
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
@@ -2281,11 +2316,30 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ PlanState *planstate)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
int i;
+ Bitmapset *join_prune_partset = NULL;
+ bool do_join_prune;
+
+ /* Retrieve the join partition pruning results if any */
+ do_join_prune =
+ get_join_prune_matching_subplans(planstate, &join_prune_partset);
+
+ /*
+ * Either we're here on partition prune done according to the pruning steps
+ * detailed in 'prunestate', or we have done join partition prune.
+ */
+ Assert(do_join_prune || prunestate != NULL);
+
+ /*
+ * If there is no 'prunestate', then rely entirely on join pruning.
+ */
+ if (prunestate == NULL)
+ return join_prune_partset;
/*
* Either we're here on the initial prune done during pruning
@@ -2326,6 +2380,10 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Add in any subplans that partition pruning didn't account for */
result = bms_add_members(result, prunestate->other_subplans);
+ /* Intersect join partition pruning results */
+ if (do_join_prune)
+ result = bms_intersect(result, join_prune_partset);
+
MemoryContextSwitchTo(oldcontext);
/* Copy result out of the temp context before we reset it */
@@ -2396,3 +2454,66 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
}
}
}
+
+/*
+ * get_join_prune_matching_subplans
+ * Retrieve the join partition pruning results if any stored in
+ * Append/MergeAppend node's join_prune_paramids. Return true if we can
+ * do join partition pruning, otherwise return false.
+ *
+ * Adds valid (non-prunable) subplan IDs to *partset
+ */
+static bool
+get_join_prune_matching_subplans(PlanState *planstate, Bitmapset **partset)
+{
+ Bitmapset *join_prune_paramids;
+ int nplans;
+ int paramid;
+
+ if (planstate == NULL)
+ return false;
+
+ if (IsA(planstate, AppendState))
+ {
+ join_prune_paramids =
+ ((Append *) planstate->plan)->join_prune_paramids;
+ nplans = ((AppendState *) planstate)->as_nplans;
+ }
+ else if (IsA(planstate, MergeAppendState))
+ {
+ join_prune_paramids =
+ ((MergeAppend *) planstate->plan)->join_prune_paramids;
+ nplans = ((MergeAppendState *) planstate)->ms_nplans;
+ }
+ else
+ {
+ elog(ERROR, "unrecognized node type: %d", (int) nodeTag(planstate));
+ return false;
+ }
+
+ if (bms_is_empty(join_prune_paramids))
+ return false;
+
+ Assert(nplans > 0);
+ *partset = bms_add_range(NULL, 0, nplans - 1);
+
+ paramid = -1;
+ while ((paramid = bms_next_member(join_prune_paramids, paramid)) >= 0)
+ {
+ ParamExecData *param;
+ JoinPartitionPruneState *jpstate;
+
+ param = &(planstate->state->es_param_exec_vals[paramid]);
+ Assert(param->execPlan == NULL);
+ Assert(!param->isnull);
+ jpstate = (JoinPartitionPruneState *) DatumGetPointer(param->value);
+
+ if (jpstate != NULL)
+ *partset = bms_intersect(*partset, jpstate->part_prune_result);
+ else /* the Hash node for this pruning has not been executed */
+ elog(WARNING, "Join partition pruning $%d has not been performed yet.",
+ paramid);
+ }
+
+ return true;
+}
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..c8dd8583d2 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -151,11 +151,13 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
nplans = bms_num_members(validsubplans);
/*
- * When no run-time pruning is required and there's at least one
- * subplan, we can fill as_valid_subplans immediately, preventing
- * later calls to ExecFindMatchingSubPlans.
+ * When no run-time pruning or join pruning is required and there's at
+ * least one subplan, we can fill as_valid_subplans immediately,
+ * preventing later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (!prunestate->do_exec_prune &&
+ bms_is_empty(node->join_prune_paramids) &&
+ nplans > 0)
{
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
appendstate->as_valid_subplans_identified = true;
@@ -170,10 +172,18 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplans as valid; they must also all be initialized.
*/
Assert(nplans > 0);
- appendstate->as_valid_subplans = validsubplans =
- bms_add_range(NULL, 0, nplans - 1);
- appendstate->as_valid_subplans_identified = true;
+ validsubplans = bms_add_range(NULL, 0, nplans - 1);
appendstate->as_prune_state = NULL;
+
+ /*
+ * When join pruning is not enabled we can fill as_valid_subplans
+ * immediately, preventing later calls to ExecFindMatchingSubPlans.
+ */
+ if (bms_is_empty(node->join_prune_paramids))
+ {
+ appendstate->as_valid_subplans = validsubplans;
+ appendstate->as_valid_subplans_identified = true;
+ }
}
/*
@@ -580,7 +590,7 @@ choose_next_subplan_locally(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
}
@@ -647,7 +657,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
/*
@@ -723,7 +733,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
mark_invalid_subplans_as_finished(node);
@@ -876,7 +886,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
classify_matching_subplans(node);
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index e72f0986c2..9ca8bf49d9 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -31,6 +31,7 @@
#include "catalog/pg_statistic.h"
#include "commands/tablespace.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/hashjoin.h"
#include "executor/nodeHash.h"
#include "executor/nodeHashjoin.h"
@@ -48,6 +49,8 @@ static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable);
static void ExecParallelHashIncreaseNumBatches(HashJoinTable hashtable);
static void ExecParallelHashIncreaseNumBuckets(HashJoinTable hashtable);
+static void ExecJoinPartitionPrune(HashState *node);
+static void ExecStoreJoinPartitionPruneResult(HashState *node);
static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node,
int mcvsToUse);
static void ExecHashSkewTableInsert(HashJoinTable hashtable,
@@ -189,8 +192,14 @@ MultiExecPrivateHash(HashState *node)
}
hashtable->totalTuples += 1;
}
+
+ /* Perform join partition pruning */
+ ExecJoinPartitionPrune(node);
}
+ /* Store the surviving partitions for Append/MergeAppend nodes */
+ ExecStoreJoinPartitionPruneResult(node);
+
/* resize the hash table if needed (NTUP_PER_BUCKET exceeded) */
if (hashtable->nbuckets != hashtable->nbuckets_optimal)
ExecHashIncreaseNumBuckets(hashtable);
@@ -401,6 +410,12 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
hashstate->hashkeys =
ExecInitExprList(node->hashkeys, (PlanState *) hashstate);
+ /*
+ * initialize join partition pruning infos
+ */
+ hashstate->joinpartprune_state_list =
+ ExecInitJoinpartpruneList(&hashstate->ps, node->joinpartprune_info_list);
+
return hashstate;
}
@@ -1601,6 +1616,56 @@ ExecParallelHashIncreaseNumBuckets(HashJoinTable hashtable)
}
}
+/*
+ * ExecJoinPartitionPrune
+ * Perform join partition pruning at this join for each
+ * JoinPartitionPruneState.
+ */
+static void
+ExecJoinPartitionPrune(HashState *node)
+{
+ ListCell *lc;
+
+ foreach(lc, node->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ Bitmapset *matching_subPlans;
+
+ if (jpstate->finished)
+ continue;
+
+ matching_subPlans =
+ ExecFindMatchingSubPlans(jpstate->part_prune_state, false, NULL);
+ jpstate->part_prune_result =
+ bms_add_members(jpstate->part_prune_result, matching_subPlans);
+
+ if (bms_num_members(jpstate->part_prune_result) == jpstate->nplans)
+ jpstate->finished = true;
+ }
+}
+
+/*
+ * ExecStoreJoinPartitionPruneResult
+ * For each JoinPartitionPruneState, store the set of surviving partitions
+ * to make it available for the Append/MergeAppend node.
+ */
+static void
+ExecStoreJoinPartitionPruneResult(HashState *node)
+{
+ ListCell *lc;
+
+ foreach(lc, node->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ ParamExecData *param;
+
+ param = &(node->ps.state->es_param_exec_vals[jpstate->paramid]);
+ Assert(param->execPlan == NULL);
+ Assert(!param->isnull);
+ param->value = PointerGetDatum(jpstate);
+ }
+}
+
/*
* ExecHashTableInsert
* insert a tuple into the hash table depending on the hash value
@@ -2345,6 +2410,16 @@ void
ExecReScanHash(HashState *node)
{
PlanState *outerPlan = outerPlanState(node);
+ ListCell *lc;
+
+ /* reset the state in JoinPartitionPruneStates */
+ foreach(lc, node->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+
+ jpstate->finished = false;
+ jpstate->part_prune_result = NULL;
+ }
/*
* if chgParam of subnode is not null then plan will be re-scanned by
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 25a2d78f15..ddca824206 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -311,6 +311,16 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
*/
node->hj_FirstOuterTupleSlot = NULL;
}
+ else if (hashNode->joinpartprune_state_list != NIL)
+ {
+ /*
+ * Give the hash node a chance to run join partition
+ * pruning if there is any JoinPartitionPruneState that can
+ * be evaluated at it. So do not apply the empty-outer
+ * optimization in this case.
+ */
+ node->hj_FirstOuterTupleSlot = NULL;
+ }
else if (HJ_FILL_OUTER(node) ||
(outerNode->plan->startup_cost < hashNode->ps.plan->total_cost &&
!node->hj_OuterNotEmpty))
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 21b5726e6e..9eb276abc8 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -99,11 +99,13 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
nplans = bms_num_members(validsubplans);
/*
- * When no run-time pruning is required and there's at least one
- * subplan, we can fill ms_valid_subplans immediately, preventing
- * later calls to ExecFindMatchingSubPlans.
+ * When no run-time pruning or join pruning is required and there's at
+ * least one subplan, we can fill ms_valid_subplans immediately,
+ * preventing later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (!prunestate->do_exec_prune &&
+ bms_is_empty(node->join_prune_paramids) &&
+ nplans > 0)
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -115,9 +117,15 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplans as valid; they must also all be initialized.
*/
Assert(nplans > 0);
- mergestate->ms_valid_subplans = validsubplans =
- bms_add_range(NULL, 0, nplans - 1);
+ validsubplans = bms_add_range(NULL, 0, nplans - 1);
mergestate->ms_prune_state = NULL;
+
+ /*
+ * When join pruning is not enabled we can fill ms_valid_subplans
+ * immediately, preventing later calls to ExecFindMatchingSubPlans.
+ */
+ if (bms_is_empty(node->join_prune_paramids))
+ mergestate->ms_valid_subplans = validsubplans;
}
mergeplanstates = (PlanState **) palloc(nplans * sizeof(PlanState *));
@@ -218,7 +226,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, &node->ps);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index d6ceafd51c..9bdc88a9db 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -173,6 +173,10 @@ static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
static bool has_indexed_join_quals(NestPath *path);
static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
List *quals);
+static double get_joinrel_matching_outer_size(PlannerInfo *root,
+ RelOptInfo *outer_rel,
+ Relids inner_relids,
+ List *restrictlist);
static double calc_joinrel_size_estimate(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outer_rel,
@@ -5380,6 +5384,61 @@ get_parameterized_joinrel_size(PlannerInfo *root, RelOptInfo *rel,
return nrows;
}
+/*
+ * get_joinrel_matching_outer_size
+ * Make a size estimate for the outer side that matches the inner side.
+ */
+static double
+get_joinrel_matching_outer_size(PlannerInfo *root,
+ RelOptInfo *outer_rel,
+ Relids inner_relids,
+ List *restrictlist)
+{
+ double nrows;
+ Selectivity fkselec;
+ Selectivity jselec;
+ SpecialJoinInfo *sjinfo;
+ SpecialJoinInfo sjinfo_data;
+
+ sjinfo = &sjinfo_data;
+ sjinfo->type = T_SpecialJoinInfo;
+ sjinfo->min_lefthand = outer_rel->relids;
+ sjinfo->min_righthand = inner_relids;
+ sjinfo->syn_lefthand = outer_rel->relids;
+ sjinfo->syn_righthand = inner_relids;
+ sjinfo->jointype = JOIN_SEMI;
+ sjinfo->ojrelid = 0;
+ sjinfo->commute_above_l = NULL;
+ sjinfo->commute_above_r = NULL;
+ sjinfo->commute_below_l = NULL;
+ sjinfo->commute_below_r = NULL;
+ /* we don't bother trying to make the remaining fields valid */
+ sjinfo->lhs_strict = false;
+ sjinfo->semi_can_btree = false;
+ sjinfo->semi_can_hash = false;
+ sjinfo->semi_operators = NIL;
+ sjinfo->semi_rhs_exprs = NIL;
+
+ fkselec = get_foreign_key_join_selectivity(root,
+ outer_rel->relids,
+ inner_relids,
+ sjinfo,
+ &restrictlist);
+ jselec = clauselist_selectivity(root,
+ restrictlist,
+ 0,
+ sjinfo->jointype,
+ sjinfo);
+
+ nrows = outer_rel->rows * fkselec * jselec;
+ nrows = clamp_row_est(nrows);
+
+ /* For safety, make sure result is not more than the base estimate */
+ if (nrows > outer_rel->rows)
+ nrows = outer_rel->rows;
+ return nrows;
+}
+
/*
* calc_joinrel_size_estimate
* Workhorse for set_joinrel_size_estimates and
@@ -6495,3 +6554,50 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
return pages_fetched;
}
+
+/*
+ * compute_partprune_cost
+ * Compute the overhead of join partition pruning.
+ */
+double
+compute_partprune_cost(PlannerInfo *root, RelOptInfo *appendrel,
+ Cost append_total_cost, int append_nplans,
+ Relids inner_relids, double inner_rows,
+ List *prunequal)
+{
+ Cost prune_cost;
+ Cost saved_cost;
+ double matching_outer_rows;
+ double unmatched_nplans;
+
+ switch (appendrel->part_scheme->strategy)
+ {
+
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ prune_cost = cpu_operator_cost * LOG2(append_nplans) * inner_rows;
+ break;
+ case PARTITION_STRATEGY_HASH:
+ prune_cost = cpu_operator_cost * append_nplans * inner_rows;
+ break;
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) appendrel->part_scheme->strategy);
+ break;
+ }
+
+ matching_outer_rows =
+ get_joinrel_matching_outer_size(root,
+ appendrel,
+ inner_relids,
+ prunequal);
+
+ /*
+ * We assume that each outer joined row occupies one new partition. This
+ * is really the worst case.
+ */
+ unmatched_nplans = append_nplans - Min(matching_outer_rows, append_nplans);
+ saved_cost = (unmatched_nplans / append_nplans) * append_total_cost;
+
+ return prune_cost - saved_cost;
+}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 34ca6d4ac2..308ff452d3 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -242,7 +242,8 @@ static Hash *make_hash(Plan *lefttree,
List *hashkeys,
Oid skewTable,
AttrNumber skewColumn,
- bool skewInherit);
+ bool skewInherit,
+ List *joinpartprune_info_list);
static MergeJoin *make_mergejoin(List *tlist,
List *joinclauses, List *otherclauses,
List *mergeclauses,
@@ -342,6 +343,7 @@ create_plan(PlannerInfo *root, Path *best_path)
/* Initialize this module's workspace in PlannerInfo */
root->curOuterRels = NULL;
root->curOuterParams = NIL;
+ root->join_partition_prune_candidates = NIL;
/* Recursively process the path tree, demanding the correct tlist result */
plan = create_plan_recurse(root, best_path, CP_EXACT_TLIST);
@@ -369,6 +371,8 @@ create_plan(PlannerInfo *root, Path *best_path)
if (root->curOuterParams != NIL)
elog(ERROR, "failed to assign all NestLoopParams to plan nodes");
+ Assert(root->join_partition_prune_candidates == NIL);
+
/*
* Reset plan_params to ensure param IDs used for nestloop params are not
* re-used later
@@ -1223,6 +1227,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ Bitmapset *join_prune_paramids = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1377,6 +1382,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
* do partition pruning.
+ *
+ * Also gather information needed by the executor to do join pruning.
*/
if (enable_partition_pruning)
{
@@ -1399,13 +1406,20 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
partpruneinfo =
make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ NULL);
+
+ join_prune_paramids =
+ make_join_partition_pruneinfos(root, rel,
+ (Path *) best_path,
+ best_path->subpaths);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
plan->part_prune_info = partpruneinfo;
+ plan->join_prune_paramids = join_prune_paramids;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1445,6 +1459,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ Bitmapset *join_prune_paramids = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1541,6 +1556,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
* do partition pruning.
+ *
+ * Also gather information needed by the executor to do join pruning.
*/
if (enable_partition_pruning)
{
@@ -1554,11 +1571,18 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
partpruneinfo = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ NULL);
+
+ join_prune_paramids =
+ make_join_partition_pruneinfos(root, rel,
+ (Path *) best_path,
+ best_path->subpaths);
}
node->mergeplans = subplans;
node->part_prune_info = partpruneinfo;
+ node->join_prune_paramids = join_prune_paramids;
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
@@ -4734,6 +4758,13 @@ create_hashjoin_plan(PlannerInfo *root,
AttrNumber skewColumn = InvalidAttrNumber;
bool skewInherit = false;
ListCell *lc;
+ List *joinpartprune_info_list;
+
+ /*
+ * Collect information required to build JoinPartitionPruneInfos at this
+ * join.
+ */
+ prepare_join_partition_prune_candidate(root, &best_path->jpath);
/*
* HashJoin can project, so we don't have to demand exact tlists from the
@@ -4745,6 +4776,11 @@ create_hashjoin_plan(PlannerInfo *root,
outer_plan = create_plan_recurse(root, best_path->jpath.outerjoinpath,
(best_path->num_batches > 1) ? CP_SMALL_TLIST : 0);
+ /*
+ * Retrieve all the JoinPartitionPruneInfos for this join.
+ */
+ joinpartprune_info_list = get_join_partition_prune_candidate(root);
+
inner_plan = create_plan_recurse(root, best_path->jpath.innerjoinpath,
CP_SMALL_TLIST);
@@ -4850,7 +4886,8 @@ create_hashjoin_plan(PlannerInfo *root,
inner_hashkeys,
skewTable,
skewColumn,
- skewInherit);
+ skewInherit,
+ joinpartprune_info_list);
/*
* Set Hash node's startup & total costs equal to total cost of input
@@ -5977,7 +6014,8 @@ make_hash(Plan *lefttree,
List *hashkeys,
Oid skewTable,
AttrNumber skewColumn,
- bool skewInherit)
+ bool skewInherit,
+ List *joinpartprune_info_list)
{
Hash *node = makeNode(Hash);
Plan *plan = &node->plan;
@@ -5991,6 +6029,7 @@ make_hash(Plan *lefttree,
node->skewTable = skewTable;
node->skewColumn = skewColumn;
node->skewInherit = skewInherit;
+ node->joinpartprune_info_list = joinpartprune_info_list;
return node;
}
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index fc3709510d..c416e7ccda 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -156,6 +156,11 @@ static Plan *set_mergeappend_references(PlannerInfo *root,
MergeAppend *mplan,
int rtoffset);
static void set_hash_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_joinpartitionprune_references(PlannerInfo *root,
+ List *joinpartprune_info_list,
+ indexed_tlist *outer_itlist,
+ int rtoffset,
+ double num_exec);
static Relids offset_relid_set(Relids relids, int rtoffset);
static Node *fix_scan_expr(PlannerInfo *root, Node *node,
int rtoffset, double num_exec);
@@ -1897,6 +1902,62 @@ set_hash_references(PlannerInfo *root, Plan *plan, int rtoffset)
/* Hash nodes don't have their own quals */
Assert(plan->qual == NIL);
+
+ set_joinpartitionprune_references(root,
+ hplan->joinpartprune_info_list,
+ outer_itlist,
+ rtoffset,
+ NUM_EXEC_TLIST(plan));
+}
+
+/*
+ * set_joinpartitionprune_references
+ * Do set_plan_references processing on JoinPartitionPruneInfos
+ */
+static void
+set_joinpartitionprune_references(PlannerInfo *root,
+ List *joinpartprune_info_list,
+ indexed_tlist *outer_itlist,
+ int rtoffset,
+ double num_exec)
+{
+ ListCell *l;
+
+ foreach(l, joinpartprune_info_list)
+ {
+ JoinPartitionPruneInfo *jpinfo = (JoinPartitionPruneInfo *) lfirst(l);
+ ListCell *l1;
+
+ foreach(l1, jpinfo->part_prune_info->prune_infos)
+ {
+ List *prune_infos = lfirst(l1);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ pinfo->rtindex += rtoffset;
+
+ pinfo->initial_pruning_steps = (List *)
+ fix_upper_expr(root,
+ (Node *) pinfo->initial_pruning_steps,
+ outer_itlist,
+ OUTER_VAR,
+ rtoffset,
+ NRM_EQUAL,
+ num_exec);
+ pinfo->exec_pruning_steps = (List *)
+ fix_upper_expr(root,
+ (Node *) pinfo->exec_pruning_steps,
+ outer_itlist,
+ OUTER_VAR,
+ rtoffset,
+ NRM_EQUAL,
+ num_exec);
+ }
+ }
+ }
}
/*
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 3f31ecc956..0125ab0fed 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -48,7 +48,9 @@
#include "optimizer/appendinfo.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
+#include "optimizer/paramassign.h"
#include "optimizer/pathnode.h"
+#include "optimizer/restrictinfo.h"
#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partprune.h"
@@ -103,15 +105,16 @@ typedef enum PartClauseTarget
*
* gen_partprune_steps() initializes and returns an instance of this struct.
*
- * Note that has_mutable_op, has_mutable_arg, and has_exec_param are set if
- * we found any potentially-useful-for-pruning clause having those properties,
- * whether or not we actually used the clause in the steps list. This
- * definition allows us to skip the PARTTARGET_EXEC pass in some cases.
+ * Note that has_mutable_op, has_mutable_arg, has_exec_param and has_vars are
+ * set if we found any potentially-useful-for-pruning clause having those
+ * properties, whether or not we actually used the clause in the steps list.
+ * This definition allows us to skip the PARTTARGET_EXEC pass in some cases.
*/
typedef struct GeneratePruningStepsContext
{
/* Copies of input arguments for gen_partprune_steps: */
RelOptInfo *rel; /* the partitioned relation */
+ Bitmapset *available_rels; /* rels whose Vars may be used for pruning */
PartClauseTarget target; /* use-case we're generating steps for */
/* Result data: */
List *steps; /* list of PartitionPruneSteps */
@@ -119,6 +122,7 @@ typedef struct GeneratePruningStepsContext
bool has_mutable_arg; /* clauses include any mutable comparison
* values, *other than* exec params */
bool has_exec_param; /* clauses include any PARAM_EXEC params */
+ bool has_vars; /* clauses include any Vars from 'available_rels' */
bool contradictory; /* clauses were proven self-contradictory */
/* Working state: */
int next_step_id;
@@ -144,8 +148,10 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
+ Bitmapset *available_rels,
Bitmapset **matchedsubplans);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
+ Bitmapset *available_rels,
PartClauseTarget target,
GeneratePruningStepsContext *context);
static List *gen_partprune_steps_internal(GeneratePruningStepsContext *context,
@@ -204,6 +210,10 @@ static PartClauseMatchStatus match_boolean_partition_clause(Oid partopfamily,
static void partkey_datum_from_expr(PartitionPruneContext *context,
Expr *expr, int stateidx,
Datum *value, bool *isnull);
+static bool contain_forbidden_var_clause(Node *node,
+ GeneratePruningStepsContext *context);
+static bool contain_forbidden_var_clause_walker(Node *node,
+ GeneratePruningStepsContext *context);
/*
@@ -216,11 +226,14 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'available_rels' is the relid set of rels whose Vars may be used for
+ * pruning.
*/
PartitionPruneInfo *
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ Bitmapset *available_rels)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
@@ -313,6 +326,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
prunequal,
partrelids,
relid_subplan_map,
+ available_rels,
&matchedsubplans);
/* When pruning is possible, record the matched subplans */
@@ -360,6 +374,174 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return pruneinfo;
}
+/*
+ * make_join_partition_pruneinfos
+ * Builds one JoinPartitionPruneInfo for each join at which join partition
+ * pruning is possible for this appendrel.
+ *
+ * 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
+ * of scan paths for its child rels.
+ */
+Bitmapset *
+make_join_partition_pruneinfos(PlannerInfo *root, RelOptInfo *parentrel,
+ Path *best_path, List *subpaths)
+{
+ Bitmapset *result = NULL;
+ ListCell *lc;
+
+ if (!IS_PARTITIONED_REL(parentrel))
+ return NULL;
+
+ foreach(lc, root->join_partition_prune_candidates)
+ {
+ JoinPartitionPruneCandidateInfo *candidate =
+ (JoinPartitionPruneCandidateInfo *) lfirst(lc);
+ PartitionPruneInfo *part_prune_info;
+ List *prunequal;
+ Relids joinrelids;
+ ListCell *l;
+ double prune_cost;
+
+ if (candidate == NULL)
+ continue;
+
+ /*
+ * Identify all joinclauses that are movable to this appendrel given
+ * this inner side relids. Only those clauses can be used for join
+ * partition pruning.
+ */
+ joinrelids = bms_union(parentrel->relids, candidate->inner_relids);
+ prunequal = NIL;
+ foreach(l, candidate->joinrestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ if (join_clause_is_movable_into(rinfo,
+ parentrel->relids,
+ joinrelids))
+ prunequal = lappend(prunequal, rinfo);
+ }
+
+ if (prunequal == NIL)
+ continue;
+
+ /*
+ * Check the overhead of this pruning
+ */
+ prune_cost = compute_partprune_cost(root,
+ parentrel,
+ best_path->total_cost,
+ list_length(subpaths),
+ candidate->inner_relids,
+ candidate->inner_rows,
+ prunequal);
+ if (prune_cost > 0)
+ continue;
+
+ part_prune_info = make_partition_pruneinfo(root, parentrel,
+ subpaths,
+ prunequal,
+ candidate->inner_relids);
+
+ if (part_prune_info)
+ {
+ JoinPartitionPruneInfo *jpinfo;
+
+ jpinfo = palloc(sizeof(JoinPartitionPruneInfo));
+
+ jpinfo->part_prune_info = part_prune_info;
+ jpinfo->paramid = assign_special_exec_param(root);
+ jpinfo->nplans = list_length(subpaths);
+
+ candidate->joinpartprune_info_list =
+ lappend(candidate->joinpartprune_info_list, jpinfo);
+
+ result = bms_add_member(result, jpinfo->paramid);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * prepare_join_partition_prune_candidate
+ * Check if join partition pruning is possible at this join and if so
+ * collect information required to build JoinPartitionPruneInfos.
+ *
+ * Note that we may build more than one JoinPartitionPruneInfo at one join, for
+ * different Append/MergeAppend paths.
+ */
+void
+prepare_join_partition_prune_candidate(PlannerInfo *root, JoinPath *jpath)
+{
+ JoinPartitionPruneCandidateInfo *candidate;
+
+ if (!enable_partition_pruning)
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ /*
+ * We cannot perform join partition pruning if the outer is the
+ * non-nullable side.
+ */
+ if (!(jpath->jointype == JOIN_INNER ||
+ jpath->jointype == JOIN_SEMI ||
+ jpath->jointype == JOIN_RIGHT ||
+ jpath->jointype == JOIN_RIGHT_ANTI))
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ /*
+ * For now we only support HashJoin.
+ */
+ if (jpath->path.pathtype != T_HashJoin)
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ candidate = palloc(sizeof(JoinPartitionPruneCandidateInfo));
+ candidate->joinrestrictinfo = jpath->joinrestrictinfo;
+ candidate->inner_relids = jpath->innerjoinpath->parent->relids;
+ candidate->inner_rows = jpath->innerjoinpath->parent->rows;
+ candidate->joinpartprune_info_list = NIL;
+
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, candidate);
+}
+
+/*
+ * get_join_partition_prune_candidate
+ * Pop out the JoinPartitionPruneCandidateInfo for this join and retrieve
+ * the JoinPartitionPruneInfos.
+ */
+List *
+get_join_partition_prune_candidate(PlannerInfo *root)
+{
+ JoinPartitionPruneCandidateInfo *candidate;
+ List *result;
+
+ candidate = llast(root->join_partition_prune_candidates);
+ root->join_partition_prune_candidates =
+ list_delete_last(root->join_partition_prune_candidates);
+
+ if (candidate == NULL)
+ return NIL;
+
+ result = candidate->joinpartprune_info_list;
+
+ pfree(candidate);
+
+ return result;
+}
+
/*
* add_part_relids
* Add new info to a list of Bitmapsets of partitioned relids.
@@ -428,6 +610,8 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* partrelids: Set of RT indexes identifying relevant partitioned tables
* within a single partitioning hierarchy
* relid_subplan_map[]: maps child relation relids to subplan indexes
+ * available_rels: the relid set of rels whose Vars may be used for
+ * pruning.
* matchedsubplans: on success, receives the set of subplan indexes which
* were matched to this partition hierarchy
*
@@ -440,6 +624,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
+ Bitmapset *available_rels,
Bitmapset **matchedsubplans)
{
RelOptInfo *targetpart = NULL;
@@ -539,8 +724,8 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
*/
- gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
- &context);
+ gen_partprune_steps(subpart, partprunequal, available_rels,
+ PARTTARGET_INITIAL, &context);
if (context.contradictory)
{
@@ -567,14 +752,15 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
initial_pruning_steps = NIL;
/*
- * If no exec Params appear in potentially-usable pruning clauses,
- * then there's no point in even thinking about per-scan pruning.
+ * If no exec Params or available Vars appear in potentially-usable
+ * pruning clauses, then there's no point in even thinking about
+ * per-scan pruning.
*/
- if (context.has_exec_param)
+ if (context.has_exec_param || context.has_vars)
{
/* ... OK, we'd better think about it */
- gen_partprune_steps(subpart, partprunequal, PARTTARGET_EXEC,
- &context);
+ gen_partprune_steps(subpart, partprunequal, available_rels,
+ PARTTARGET_EXEC, &context);
if (context.contradictory)
{
@@ -587,11 +773,14 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/*
* Detect which exec Params actually got used; the fact that some
* were in available clauses doesn't mean we actually used them.
- * Skip per-scan pruning if there are none.
*/
execparamids = get_partkey_exec_paramids(exec_pruning_steps);
- if (bms_is_empty(execparamids))
+ /*
+ * Skip per-scan pruning if there are none used exec Params and
+ * there are none available Vars.
+ */
+ if (bms_is_empty(execparamids) && !context.has_vars)
exec_pruning_steps = NIL;
}
else
@@ -703,6 +892,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* Process 'clauses' (typically a rel's baserestrictinfo list of clauses)
* and create a list of "partition pruning steps".
*
+ * 'available_rels' is the relid set of rels whose Vars may be used for
+ * pruning.
+ *
* 'target' tells whether to generate pruning steps for planning (use
* immutable clauses only), or for executor startup (use any allowable
* clause except ones containing PARAM_EXEC Params), or for executor
@@ -712,12 +904,13 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* some subsidiary flags; see the GeneratePruningStepsContext typedef.
*/
static void
-gen_partprune_steps(RelOptInfo *rel, List *clauses, PartClauseTarget target,
- GeneratePruningStepsContext *context)
+gen_partprune_steps(RelOptInfo *rel, List *clauses, Bitmapset *available_rels,
+ PartClauseTarget target, GeneratePruningStepsContext *context)
{
/* Initialize all output values to zero/false/NULL */
memset(context, 0, sizeof(GeneratePruningStepsContext));
context->rel = rel;
+ context->available_rels = available_rels;
context->target = target;
/*
@@ -773,7 +966,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
* If the clauses are found to be contradictory, we can return the empty
* set.
*/
- gen_partprune_steps(rel, clauses, PARTTARGET_PLANNER,
+ gen_partprune_steps(rel, clauses, NULL, PARTTARGET_PLANNER,
&gcontext);
if (gcontext.contradictory)
return NULL;
@@ -1957,9 +2150,10 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
return PARTCLAUSE_UNSUPPORTED;
/*
- * We can never prune using an expression that contains Vars.
+ * We can never prune using an expression that contains Vars except
+ * for Vars belonging to context->available_rels.
*/
- if (contain_var_clause((Node *) expr))
+ if (contain_forbidden_var_clause((Node *) expr, context))
return PARTCLAUSE_UNSUPPORTED;
/*
@@ -2155,9 +2349,10 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
return PARTCLAUSE_UNSUPPORTED;
/*
- * We can never prune using an expression that contains Vars.
+ * We can never prune using an expression that contains Vars except
+ * for Vars belonging to context->available_rels.
*/
- if (contain_var_clause((Node *) rightop))
+ if (contain_forbidden_var_clause((Node *) rightop, context))
return PARTCLAUSE_UNSUPPORTED;
/*
@@ -3727,3 +3922,54 @@ partkey_datum_from_expr(PartitionPruneContext *context,
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
+
+/*
+ * contain_forbidden_var_clause
+ * Recursively scan a clause to discover whether it contains any Var nodes
+ * (of the current query level) that do not belong to relations in
+ * context->available_rels.
+ *
+ * Returns true if any such varnode found.
+ *
+ * Does not examine subqueries, therefore must only be used after reduction
+ * of sublinks to subplans!
+ */
+static bool
+contain_forbidden_var_clause(Node *node, GeneratePruningStepsContext *context)
+{
+ return contain_forbidden_var_clause_walker(node, context);
+}
+
+static bool
+contain_forbidden_var_clause_walker(Node *node, GeneratePruningStepsContext *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varlevelsup != 0)
+ return false;
+
+ if (!bms_is_member(var->varno, context->available_rels))
+ return true; /* abort the tree traversal and return true */
+
+ context->has_vars = true;
+
+ if (context->target != PARTTARGET_EXEC)
+ return true; /* abort the tree traversal and return true */
+
+ return false;
+ }
+ if (IsA(node, CurrentOfExpr))
+ return true;
+ if (IsA(node, PlaceHolderVar))
+ {
+ if (((PlaceHolderVar *) node)->phlevelsup == 0)
+ return true; /* abort the tree traversal and return true */
+ /* else fall through to check the contained expr */
+ }
+ return expression_tree_walker(node, contain_forbidden_var_clause_walker,
+ (void *) context);
+}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 15ec869ac8..720bcc1149 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -121,11 +121,26 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+/*
+ * JoinPartitionPruneState - State object required for plan nodes to perform
+ * join partition pruning.
+ */
+typedef struct JoinPartitionPruneState
+{
+ PartitionPruneState *part_prune_state;
+ int paramid;
+ int nplans;
+ bool finished;
+ Bitmapset *part_prune_result;
+} JoinPartitionPruneState;
+
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
PartitionPruneInfo *pruneinfo,
Bitmapset **initially_valid_subplans);
+extern List *ExecInitJoinpartpruneList(PlanState *planstate, List *joinpartprune_info_list);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ PlanState *planstate);
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5d7f17dee0..0aeafcabff 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2678,6 +2678,9 @@ typedef struct HashState
/* Parallel hash state. */
struct ParallelHashJoinState *parallel_state;
+
+ /* Infos for join partition pruning. */
+ List *joinpartprune_state_list;
} HashState;
/* ----------------
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index ed85dc7414..d066b6105c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -530,6 +530,9 @@ struct PlannerInfo
/* not-yet-assigned NestLoopParams */
List *curOuterParams;
+ /* a stack of JoinPartitionPruneInfos */
+ List *join_partition_prune_candidates;
+
/*
* These fields are workspace for setrefs.c. Each is an array
* corresponding to glob->subplans. (We could probably teach
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 24d46c76dc..2453ca39d9 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -276,6 +276,9 @@ typedef struct Append
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /* Info for join partition pruning; NULL if we're not doing that */
+ Bitmapset *join_prune_paramids;
} Append;
/* ----------------
@@ -311,6 +314,9 @@ typedef struct MergeAppend
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /* Info for join partition pruning; NULL if we're not doing that */
+ Bitmapset *join_prune_paramids;
} MergeAppend;
/* ----------------
@@ -1207,6 +1213,7 @@ typedef struct Hash
bool skewInherit; /* is outer join rel an inheritance tree? */
/* all other info is in the parent HashJoin node */
Cardinality rows_total; /* estimate total rows if parallel_aware */
+ List *joinpartprune_info_list; /* infos for join partition pruning */
} Hash;
/* ----------------
@@ -1553,6 +1560,29 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*
+ * JoinPartitionPruneCandidateInfo - Information required to build
+ * JoinPartitionPruneInfos.
+ */
+typedef struct JoinPartitionPruneCandidateInfo
+{
+ List *joinrestrictinfo;
+ Bitmapset *inner_relids;
+ double inner_rows;
+ List *joinpartprune_info_list;
+} JoinPartitionPruneCandidateInfo;
+
+/*
+ * JoinPartitionPruneInfo - Details required to allow the executor to prune
+ * partitions during join.
+ */
+typedef struct JoinPartitionPruneInfo
+{
+ PartitionPruneInfo *part_prune_info;
+ int paramid;
+ int nplans;
+} JoinPartitionPruneInfo;
+
/*
* Plan invalidation info
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6d50afbf74..52de844f6d 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -211,5 +211,9 @@ extern void set_foreign_size_estimates(PlannerInfo *root, RelOptInfo *rel);
extern PathTarget *set_pathtarget_cost_width(PlannerInfo *root, PathTarget *target);
extern double compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel,
Path *bitmapqual, int loop_count, Cost *cost, double *tuple);
+extern double compute_partprune_cost(PlannerInfo *root, RelOptInfo *appendrel,
+ Cost append_total_cost, int append_nplans,
+ Relids inner_relids, double inner_rows,
+ List *prunequal);
#endif /* COST_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 8636e04e37..899aa61b34 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -19,6 +19,8 @@
struct PlannerInfo; /* avoid including pathnodes.h here */
struct RelOptInfo;
+struct Path;
+struct JoinPath;
/*
@@ -73,7 +75,15 @@ typedef struct PartitionPruneContext
extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ Bitmapset *available_rels);
+extern Bitmapset *make_join_partition_pruneinfos(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ struct Path *best_path,
+ List *subpaths);
+extern void prepare_join_partition_prune_candidate(struct PlannerInfo *root,
+ struct JoinPath *jpath);
+extern List *get_join_partition_prune_candidate(struct PlannerInfo *root);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 9a4c48c055..a08e7a1f0a 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -3003,6 +3003,92 @@ order by tbl1.col1, tprt.col1;
------+------
(0 rows)
+-- join partition pruning
+-- The 'Memory Usage' from the Hash node can vary between machines. Let's just
+-- replace the number with an 'N'.
+-- We need to run EXPLAIN ANALYZE because we need to see '(never executed)'
+-- notations because that's the only way to verify runtime pruning.
+create function explain_join_partition_pruning(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, verbose, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+ return next ln;
+ end loop;
+end;
+$$;
+delete from tbl1;
+insert into tbl1 values (501), (505);
+analyze tbl1, tprt;
+set enable_nestloop = off;
+set enable_mergejoin = off;
+set enable_hashjoin = on;
+select explain_join_partition_pruning('
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;');
+ explain_join_partition_pruning
+--------------------------------------------------------------------------------
+ Hash Right Join (actual rows=2 loops=1)
+ Output: p1.col1, p2.col1, t.col1
+ Hash Cond: ((p1.col1 = t.col1) AND (p2.col1 = t.col1))
+ -> Hash Join (actual rows=3 loops=1)
+ Output: p1.col1, p2.col1
+ Hash Cond: (p1.col1 = p2.col1)
+ -> Append (actual rows=3 loops=1)
+ Join Partition Pruning: $0
+ -> Seq Scan on public.tprt_1 p1_1 (never executed)
+ Output: p1_1.col1
+ -> Seq Scan on public.tprt_2 p1_2 (actual rows=3 loops=1)
+ Output: p1_2.col1
+ -> Seq Scan on public.tprt_3 p1_3 (never executed)
+ Output: p1_3.col1
+ -> Seq Scan on public.tprt_4 p1_4 (never executed)
+ Output: p1_4.col1
+ -> Seq Scan on public.tprt_5 p1_5 (never executed)
+ Output: p1_5.col1
+ -> Seq Scan on public.tprt_6 p1_6 (never executed)
+ Output: p1_6.col1
+ -> Hash (actual rows=3 loops=1)
+ Output: p2.col1
+ Buckets: 1024 Batches: 1 Memory Usage: NkB
+ -> Append (actual rows=3 loops=1)
+ Join Partition Pruning: $1
+ -> Seq Scan on public.tprt_1 p2_1 (never executed)
+ Output: p2_1.col1
+ -> Seq Scan on public.tprt_2 p2_2 (actual rows=3 loops=1)
+ Output: p2_2.col1
+ -> Seq Scan on public.tprt_3 p2_3 (never executed)
+ Output: p2_3.col1
+ -> Seq Scan on public.tprt_4 p2_4 (never executed)
+ Output: p2_4.col1
+ -> Seq Scan on public.tprt_5 p2_5 (never executed)
+ Output: p2_5.col1
+ -> Seq Scan on public.tprt_6 p2_6 (never executed)
+ Output: p2_6.col1
+ -> Hash (actual rows=2 loops=1)
+ Output: t.col1
+ Buckets: 1024 Batches: 1 Memory Usage: NkB
+ Partition Prune: $0, $1
+ -> Seq Scan on public.tbl1 t (actual rows=2 loops=1)
+ Output: t.col1
+(43 rows)
+
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;
+ col1 | col1 | col1
+------+------+------
+ 501 | 501 | 501
+ 505 | 505 | 505
+(2 rows)
+
drop table tbl1, tprt;
-- Test with columns defined in varying orders between each level
create table part_abc (a int not null, b int not null, c int not null) partition by list (a);
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 7bf3920827..fc5982edcf 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -727,6 +727,45 @@ select tbl1.col1, tprt.col1 from tbl1
inner join tprt on tbl1.col1 = tprt.col1
order by tbl1.col1, tprt.col1;
+-- join partition pruning
+
+-- The 'Memory Usage' from the Hash node can vary between machines. Let's just
+-- replace the number with an 'N'.
+-- We need to run EXPLAIN ANALYZE because we need to see '(never executed)'
+-- notations because that's the only way to verify runtime pruning.
+create function explain_join_partition_pruning(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, verbose, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+ return next ln;
+ end loop;
+end;
+$$;
+
+delete from tbl1;
+insert into tbl1 values (501), (505);
+analyze tbl1, tprt;
+
+set enable_nestloop = off;
+set enable_mergejoin = off;
+set enable_hashjoin = on;
+
+select explain_join_partition_pruning('
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;');
+
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;
+
drop table tbl1, tprt;
-- Test with columns defined in varying orders between each level
--
2.31.0
Hello Richard,
02.11.2023 14:19, Richard Guo wrote:
However, the cfbot indicates that there are test cases that fail on
FreeBSD [1] (no failure on other platforms). So I set up a FreeBSD-13
locally but just cannot reproduce the failure. I must be doing
something wrong. Can anyone give me some hints or suggestions?FYI. The failure looks like:
explain (costs off) select p2.a, p1.c from permtest_parent p1 inner join permtest_parent p2 on p1.a = p2.a and left(p1.c, 3) ~ 'a1$'; - QUERY PLAN ----------------------------------------------------- - Hash Join - Hash Cond: (p2.a = p1.a) - -> Seq Scan on permtest_grandchild p2 - -> Hash - -> Seq Scan on permtest_grandchild p1 - Filter: ("left"(c, 3) ~ 'a1$'::text) -(6 rows) - +ERROR: unrecognized node type: 1130127496
I've managed to reproduce that failure on my Ubuntu with:
CPPFLAGS="-Og -DWRITE_READ_PARSE_PLAN_TREES -DCOPY_PARSE_PLAN_TREES" ./configure ... make check
...
SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b = 0 ORDER BY t1.a, t2.b;
- QUERY PLAN
---------------------------------------------------
- Sort
- Sort Key: t1.a, t2.b
- -> Hash Right Join
- Hash Cond: (t2.b = t1.a)
- -> Append
- -> Seq Scan on prt2_p1 t2_1
- -> Seq Scan on prt2_p2 t2_2
- -> Seq Scan on prt2_p3 t2_3
- -> Hash
- -> Append
- -> Seq Scan on prt1_p1 t1_1
- Filter: (b = 0)
- -> Seq Scan on prt1_p2 t1_2
- Filter: (b = 0)
- -> Seq Scan on prt1_p3 t1_3
- Filter: (b = 0)
-(16 rows)
-
+ERROR: unrecognized node type: -1465804424
...
As far as I can see from https://cirrus-ci.com/task/6642692659085312,
the FreeBSD host has the following CPPFLAGS specified:
-DRELCACHE_FORCE_RELEASE
-DCOPY_PARSE_PLAN_TREES
-DWRITE_READ_PARSE_PLAN_TREES
-DRAW_EXPRESSION_COVERAGE_TEST
-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
Best regards,
Alexander
On Sat, Nov 4, 2023 at 6:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
02.11.2023 14:19, Richard Guo wrote:
However, the cfbot indicates that there are test cases that fail on
FreeBSD [1] (no failure on other platforms). So I set up a FreeBSD-13
locally but just cannot reproduce the failure. I must be doing
something wrong. Can anyone give me some hints or suggestions?I've managed to reproduce that failure on my Ubuntu with:
CPPFLAGS="-Og -DWRITE_READ_PARSE_PLAN_TREES -DCOPY_PARSE_PLAN_TREES"
./configure ... make check
Wow, thank you so much. You saved me a lot of time. It turns out that
it was caused by me not making JoinPartitionPruneInfo a node. The same
issue can also exist for JoinPartitionPruneCandidateInfo - if you
pprint(root) at some point you'll see 'could not dump unrecognized node
type' warning.
Fixed this issue in v4.
Thanks
Richard
Attachments:
v4-0001-Support-run-time-partition-pruning-for-hash-join.patchapplication/octet-stream; name=v4-0001-Support-run-time-partition-pruning-for-hash-join.patchDownload
From 9878f4d4ff6f3c4c9b4e5e17b2a2370d40ce047b Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Mon, 14 Aug 2023 14:55:26 +0800
Subject: [PATCH v4] Support run-time partition pruning for hash join
If we have a hash join with an Append node on the outer side, something
like
Hash Join
Hash Cond: (pt.a = t.a)
-> Append
-> Seq Scan on pt_p1 pt_1
-> Seq Scan on pt_p2 pt_2
-> Seq Scan on pt_p3 pt_3
-> Hash
-> Seq Scan on t
We can actually prune those subnodes of the Append that cannot possibly
contain any matching tuples from the other side of the join. To do
that, when building the Hash table, for each row from the inner side we
can compute the minimum set of subnodes that can possibly match the join
condition. When we have built the Hash table and start to execute the
Append node, we should have known which subnodes are survived and thus
can skip other subnodes.
This patch implements this idea.
---
src/backend/commands/explain.c | 61 ++++
src/backend/executor/execPartition.c | 127 +++++++-
src/backend/executor/nodeAppend.c | 32 +-
src/backend/executor/nodeHash.c | 75 +++++
src/backend/executor/nodeHashjoin.c | 10 +
src/backend/executor/nodeMergeAppend.c | 22 +-
src/backend/optimizer/path/costsize.c | 106 +++++++
src/backend/optimizer/plan/createplan.c | 49 ++-
src/backend/optimizer/plan/setrefs.c | 61 ++++
src/backend/partitioning/partprune.c | 288 ++++++++++++++++--
src/include/executor/execPartition.h | 17 +-
src/include/nodes/execnodes.h | 3 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 36 +++
src/include/optimizer/cost.h | 4 +
src/include/partitioning/partprune.h | 12 +-
src/test/regress/expected/partition_prune.out | 86 ++++++
src/test/regress/sql/partition_prune.sql | 39 +++
18 files changed, 982 insertions(+), 49 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f1d71bc54e..c51cf6beb6 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -18,6 +18,7 @@
#include "commands/createas.h"
#include "commands/defrem.h"
#include "commands/prepare.h"
+#include "executor/execPartition.h"
#include "executor/nodeHash.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -118,6 +119,9 @@ static void show_instrumentation_count(const char *qlabel, int which,
PlanState *planstate, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
+static void show_join_pruning_result_info(Bitmapset *join_prune_paramids,
+ ExplainState *es);
+static void show_joinpartprune_info(HashState *hashstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static void show_buffer_usage(ExplainState *es, const BufferUsage *usage,
bool planning);
@@ -2057,9 +2061,17 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_incremental_sort_info(castNode(IncrementalSortState, planstate),
es);
break;
+ case T_Append:
+ if (es->verbose)
+ show_join_pruning_result_info(((Append *) plan)->join_prune_paramids,
+ es);
+ break;
case T_MergeAppend:
show_merge_append_keys(castNode(MergeAppendState, planstate),
ancestors, es);
+ if (es->verbose)
+ show_join_pruning_result_info(((MergeAppend *) plan)->join_prune_paramids,
+ es);
break;
case T_Result:
show_upper_qual((List *) ((Result *) plan)->resconstantqual,
@@ -2075,6 +2087,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
break;
case T_Hash:
show_hash_info(castNode(HashState, planstate), es);
+ if (es->verbose)
+ show_joinpartprune_info(castNode(HashState, planstate), es);
break;
case T_Memoize:
show_memoize_info(castNode(MemoizeState, planstate), ancestors,
@@ -3515,6 +3529,53 @@ show_eval_params(Bitmapset *bms_params, ExplainState *es)
ExplainPropertyList("Params Evaluated", params, es);
}
+/*
+ * Show join partition pruning results at Append/MergeAppend nodes.
+ */
+static void
+show_join_pruning_result_info(Bitmapset *join_prune_paramids, ExplainState *es)
+{
+ int paramid = -1;
+ List *params = NIL;
+
+ if (bms_is_empty(join_prune_paramids))
+ return;
+
+ while ((paramid = bms_next_member(join_prune_paramids, paramid)) >= 0)
+ {
+ char param[32];
+
+ snprintf(param, sizeof(param), "$%d", paramid);
+ params = lappend(params, pstrdup(param));
+ }
+
+ ExplainPropertyList("Join Partition Pruning", params, es);
+}
+
+/*
+ * Show join partition pruning infos at Hash nodes.
+ */
+static void
+show_joinpartprune_info(HashState *hashstate, ExplainState *es)
+{
+ List *params = NIL;
+ ListCell *lc;
+
+ if (!hashstate->joinpartprune_state_list)
+ return;
+
+ foreach(lc, hashstate->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ char param[32];
+
+ snprintf(param, sizeof(param), "$%d", jpstate->paramid);
+ params = lappend(params, pstrdup(param));
+ }
+
+ ExplainPropertyList("Partition Prune", params, es);
+}
+
/*
* Fetch the name of an index in an EXPLAIN
*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index f6c34328b8..35a9149a39 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -199,6 +199,8 @@ static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
Bitmapset **validsubplans);
+static bool get_join_prune_matching_subplans(PlanState *planstate,
+ Bitmapset **partset);
/*
@@ -1806,7 +1808,7 @@ ExecInitPartitionPruning(PlanState *planstate,
* Perform an initial partition prune pass, if required.
*/
if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true, NULL);
else
{
/* No pruning, so we'll need to initialize all subplans */
@@ -1836,6 +1838,37 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecInitJoinpartpruneList
+ * Initialize data structures needed for join partition pruning
+ */
+List *
+ExecInitJoinpartpruneList(PlanState *planstate,
+ List *joinpartprune_info_list)
+{
+ ListCell *lc;
+ List *result = NIL;
+
+ foreach(lc, joinpartprune_info_list)
+ {
+ JoinPartitionPruneInfo *jpinfo = (JoinPartitionPruneInfo *) lfirst(lc);
+ JoinPartitionPruneState *jpstate = palloc(sizeof(JoinPartitionPruneState));
+
+ jpstate->part_prune_state =
+ CreatePartitionPruneState(planstate, jpinfo->part_prune_info);
+ Assert(jpstate->part_prune_state->do_exec_prune);
+
+ jpstate->paramid = jpinfo->paramid;
+ jpstate->nplans = jpinfo->nplans;
+ jpstate->finished = false;
+ jpstate->part_prune_result = NULL;
+
+ result = lappend(result, jpstate);
+ }
+
+ return result;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
@@ -2273,7 +2306,9 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
/*
* ExecFindMatchingSubPlans
* Determine which subplans match the pruning steps detailed in
- * 'prunestate' for the current comparison expression values.
+ * 'prunestate' if any for the current comparison expression values, and
+ * meanwhile match the join partition pruning results if any stored in
+ * Append/MergeAppend node's join_prune_paramids.
*
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
@@ -2281,11 +2316,30 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ PlanState *planstate)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
int i;
+ Bitmapset *join_prune_partset = NULL;
+ bool do_join_prune;
+
+ /* Retrieve the join partition pruning results if any */
+ do_join_prune =
+ get_join_prune_matching_subplans(planstate, &join_prune_partset);
+
+ /*
+ * Either we're here on partition prune done according to the pruning steps
+ * detailed in 'prunestate', or we have done join partition prune.
+ */
+ Assert(do_join_prune || prunestate != NULL);
+
+ /*
+ * If there is no 'prunestate', then rely entirely on join pruning.
+ */
+ if (prunestate == NULL)
+ return join_prune_partset;
/*
* Either we're here on the initial prune done during pruning
@@ -2326,6 +2380,10 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Add in any subplans that partition pruning didn't account for */
result = bms_add_members(result, prunestate->other_subplans);
+ /* Intersect join partition pruning results */
+ if (do_join_prune)
+ result = bms_intersect(result, join_prune_partset);
+
MemoryContextSwitchTo(oldcontext);
/* Copy result out of the temp context before we reset it */
@@ -2396,3 +2454,66 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
}
}
}
+
+/*
+ * get_join_prune_matching_subplans
+ * Retrieve the join partition pruning results if any stored in
+ * Append/MergeAppend node's join_prune_paramids. Return true if we can
+ * do join partition pruning, otherwise return false.
+ *
+ * Adds valid (non-prunable) subplan IDs to *partset
+ */
+static bool
+get_join_prune_matching_subplans(PlanState *planstate, Bitmapset **partset)
+{
+ Bitmapset *join_prune_paramids;
+ int nplans;
+ int paramid;
+
+ if (planstate == NULL)
+ return false;
+
+ if (IsA(planstate, AppendState))
+ {
+ join_prune_paramids =
+ ((Append *) planstate->plan)->join_prune_paramids;
+ nplans = ((AppendState *) planstate)->as_nplans;
+ }
+ else if (IsA(planstate, MergeAppendState))
+ {
+ join_prune_paramids =
+ ((MergeAppend *) planstate->plan)->join_prune_paramids;
+ nplans = ((MergeAppendState *) planstate)->ms_nplans;
+ }
+ else
+ {
+ elog(ERROR, "unrecognized node type: %d", (int) nodeTag(planstate));
+ return false;
+ }
+
+ if (bms_is_empty(join_prune_paramids))
+ return false;
+
+ Assert(nplans > 0);
+ *partset = bms_add_range(NULL, 0, nplans - 1);
+
+ paramid = -1;
+ while ((paramid = bms_next_member(join_prune_paramids, paramid)) >= 0)
+ {
+ ParamExecData *param;
+ JoinPartitionPruneState *jpstate;
+
+ param = &(planstate->state->es_param_exec_vals[paramid]);
+ Assert(param->execPlan == NULL);
+ Assert(!param->isnull);
+ jpstate = (JoinPartitionPruneState *) DatumGetPointer(param->value);
+
+ if (jpstate != NULL)
+ *partset = bms_intersect(*partset, jpstate->part_prune_result);
+ else /* the Hash node for this pruning has not been executed */
+ elog(WARNING, "Join partition pruning $%d has not been performed yet.",
+ paramid);
+ }
+
+ return true;
+}
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..c8dd8583d2 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -151,11 +151,13 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
nplans = bms_num_members(validsubplans);
/*
- * When no run-time pruning is required and there's at least one
- * subplan, we can fill as_valid_subplans immediately, preventing
- * later calls to ExecFindMatchingSubPlans.
+ * When no run-time pruning or join pruning is required and there's at
+ * least one subplan, we can fill as_valid_subplans immediately,
+ * preventing later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (!prunestate->do_exec_prune &&
+ bms_is_empty(node->join_prune_paramids) &&
+ nplans > 0)
{
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
appendstate->as_valid_subplans_identified = true;
@@ -170,10 +172,18 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplans as valid; they must also all be initialized.
*/
Assert(nplans > 0);
- appendstate->as_valid_subplans = validsubplans =
- bms_add_range(NULL, 0, nplans - 1);
- appendstate->as_valid_subplans_identified = true;
+ validsubplans = bms_add_range(NULL, 0, nplans - 1);
appendstate->as_prune_state = NULL;
+
+ /*
+ * When join pruning is not enabled we can fill as_valid_subplans
+ * immediately, preventing later calls to ExecFindMatchingSubPlans.
+ */
+ if (bms_is_empty(node->join_prune_paramids))
+ {
+ appendstate->as_valid_subplans = validsubplans;
+ appendstate->as_valid_subplans_identified = true;
+ }
}
/*
@@ -580,7 +590,7 @@ choose_next_subplan_locally(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
}
@@ -647,7 +657,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
/*
@@ -723,7 +733,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
mark_invalid_subplans_as_finished(node);
@@ -876,7 +886,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
classify_matching_subplans(node);
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index e72f0986c2..9ca8bf49d9 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -31,6 +31,7 @@
#include "catalog/pg_statistic.h"
#include "commands/tablespace.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/hashjoin.h"
#include "executor/nodeHash.h"
#include "executor/nodeHashjoin.h"
@@ -48,6 +49,8 @@ static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable);
static void ExecParallelHashIncreaseNumBatches(HashJoinTable hashtable);
static void ExecParallelHashIncreaseNumBuckets(HashJoinTable hashtable);
+static void ExecJoinPartitionPrune(HashState *node);
+static void ExecStoreJoinPartitionPruneResult(HashState *node);
static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node,
int mcvsToUse);
static void ExecHashSkewTableInsert(HashJoinTable hashtable,
@@ -189,8 +192,14 @@ MultiExecPrivateHash(HashState *node)
}
hashtable->totalTuples += 1;
}
+
+ /* Perform join partition pruning */
+ ExecJoinPartitionPrune(node);
}
+ /* Store the surviving partitions for Append/MergeAppend nodes */
+ ExecStoreJoinPartitionPruneResult(node);
+
/* resize the hash table if needed (NTUP_PER_BUCKET exceeded) */
if (hashtable->nbuckets != hashtable->nbuckets_optimal)
ExecHashIncreaseNumBuckets(hashtable);
@@ -401,6 +410,12 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
hashstate->hashkeys =
ExecInitExprList(node->hashkeys, (PlanState *) hashstate);
+ /*
+ * initialize join partition pruning infos
+ */
+ hashstate->joinpartprune_state_list =
+ ExecInitJoinpartpruneList(&hashstate->ps, node->joinpartprune_info_list);
+
return hashstate;
}
@@ -1601,6 +1616,56 @@ ExecParallelHashIncreaseNumBuckets(HashJoinTable hashtable)
}
}
+/*
+ * ExecJoinPartitionPrune
+ * Perform join partition pruning at this join for each
+ * JoinPartitionPruneState.
+ */
+static void
+ExecJoinPartitionPrune(HashState *node)
+{
+ ListCell *lc;
+
+ foreach(lc, node->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ Bitmapset *matching_subPlans;
+
+ if (jpstate->finished)
+ continue;
+
+ matching_subPlans =
+ ExecFindMatchingSubPlans(jpstate->part_prune_state, false, NULL);
+ jpstate->part_prune_result =
+ bms_add_members(jpstate->part_prune_result, matching_subPlans);
+
+ if (bms_num_members(jpstate->part_prune_result) == jpstate->nplans)
+ jpstate->finished = true;
+ }
+}
+
+/*
+ * ExecStoreJoinPartitionPruneResult
+ * For each JoinPartitionPruneState, store the set of surviving partitions
+ * to make it available for the Append/MergeAppend node.
+ */
+static void
+ExecStoreJoinPartitionPruneResult(HashState *node)
+{
+ ListCell *lc;
+
+ foreach(lc, node->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ ParamExecData *param;
+
+ param = &(node->ps.state->es_param_exec_vals[jpstate->paramid]);
+ Assert(param->execPlan == NULL);
+ Assert(!param->isnull);
+ param->value = PointerGetDatum(jpstate);
+ }
+}
+
/*
* ExecHashTableInsert
* insert a tuple into the hash table depending on the hash value
@@ -2345,6 +2410,16 @@ void
ExecReScanHash(HashState *node)
{
PlanState *outerPlan = outerPlanState(node);
+ ListCell *lc;
+
+ /* reset the state in JoinPartitionPruneStates */
+ foreach(lc, node->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+
+ jpstate->finished = false;
+ jpstate->part_prune_result = NULL;
+ }
/*
* if chgParam of subnode is not null then plan will be re-scanned by
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 25a2d78f15..ddca824206 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -311,6 +311,16 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
*/
node->hj_FirstOuterTupleSlot = NULL;
}
+ else if (hashNode->joinpartprune_state_list != NIL)
+ {
+ /*
+ * Give the hash node a chance to run join partition
+ * pruning if there is any JoinPartitionPruneState that can
+ * be evaluated at it. So do not apply the empty-outer
+ * optimization in this case.
+ */
+ node->hj_FirstOuterTupleSlot = NULL;
+ }
else if (HJ_FILL_OUTER(node) ||
(outerNode->plan->startup_cost < hashNode->ps.plan->total_cost &&
!node->hj_OuterNotEmpty))
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 21b5726e6e..9eb276abc8 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -99,11 +99,13 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
nplans = bms_num_members(validsubplans);
/*
- * When no run-time pruning is required and there's at least one
- * subplan, we can fill ms_valid_subplans immediately, preventing
- * later calls to ExecFindMatchingSubPlans.
+ * When no run-time pruning or join pruning is required and there's at
+ * least one subplan, we can fill ms_valid_subplans immediately,
+ * preventing later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (!prunestate->do_exec_prune &&
+ bms_is_empty(node->join_prune_paramids) &&
+ nplans > 0)
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -115,9 +117,15 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplans as valid; they must also all be initialized.
*/
Assert(nplans > 0);
- mergestate->ms_valid_subplans = validsubplans =
- bms_add_range(NULL, 0, nplans - 1);
+ validsubplans = bms_add_range(NULL, 0, nplans - 1);
mergestate->ms_prune_state = NULL;
+
+ /*
+ * When join pruning is not enabled we can fill ms_valid_subplans
+ * immediately, preventing later calls to ExecFindMatchingSubPlans.
+ */
+ if (bms_is_empty(node->join_prune_paramids))
+ mergestate->ms_valid_subplans = validsubplans;
}
mergeplanstates = (PlanState **) palloc(nplans * sizeof(PlanState *));
@@ -218,7 +226,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, &node->ps);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index d6ceafd51c..9bdc88a9db 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -173,6 +173,10 @@ static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
static bool has_indexed_join_quals(NestPath *path);
static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
List *quals);
+static double get_joinrel_matching_outer_size(PlannerInfo *root,
+ RelOptInfo *outer_rel,
+ Relids inner_relids,
+ List *restrictlist);
static double calc_joinrel_size_estimate(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outer_rel,
@@ -5380,6 +5384,61 @@ get_parameterized_joinrel_size(PlannerInfo *root, RelOptInfo *rel,
return nrows;
}
+/*
+ * get_joinrel_matching_outer_size
+ * Make a size estimate for the outer side that matches the inner side.
+ */
+static double
+get_joinrel_matching_outer_size(PlannerInfo *root,
+ RelOptInfo *outer_rel,
+ Relids inner_relids,
+ List *restrictlist)
+{
+ double nrows;
+ Selectivity fkselec;
+ Selectivity jselec;
+ SpecialJoinInfo *sjinfo;
+ SpecialJoinInfo sjinfo_data;
+
+ sjinfo = &sjinfo_data;
+ sjinfo->type = T_SpecialJoinInfo;
+ sjinfo->min_lefthand = outer_rel->relids;
+ sjinfo->min_righthand = inner_relids;
+ sjinfo->syn_lefthand = outer_rel->relids;
+ sjinfo->syn_righthand = inner_relids;
+ sjinfo->jointype = JOIN_SEMI;
+ sjinfo->ojrelid = 0;
+ sjinfo->commute_above_l = NULL;
+ sjinfo->commute_above_r = NULL;
+ sjinfo->commute_below_l = NULL;
+ sjinfo->commute_below_r = NULL;
+ /* we don't bother trying to make the remaining fields valid */
+ sjinfo->lhs_strict = false;
+ sjinfo->semi_can_btree = false;
+ sjinfo->semi_can_hash = false;
+ sjinfo->semi_operators = NIL;
+ sjinfo->semi_rhs_exprs = NIL;
+
+ fkselec = get_foreign_key_join_selectivity(root,
+ outer_rel->relids,
+ inner_relids,
+ sjinfo,
+ &restrictlist);
+ jselec = clauselist_selectivity(root,
+ restrictlist,
+ 0,
+ sjinfo->jointype,
+ sjinfo);
+
+ nrows = outer_rel->rows * fkselec * jselec;
+ nrows = clamp_row_est(nrows);
+
+ /* For safety, make sure result is not more than the base estimate */
+ if (nrows > outer_rel->rows)
+ nrows = outer_rel->rows;
+ return nrows;
+}
+
/*
* calc_joinrel_size_estimate
* Workhorse for set_joinrel_size_estimates and
@@ -6495,3 +6554,50 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
return pages_fetched;
}
+
+/*
+ * compute_partprune_cost
+ * Compute the overhead of join partition pruning.
+ */
+double
+compute_partprune_cost(PlannerInfo *root, RelOptInfo *appendrel,
+ Cost append_total_cost, int append_nplans,
+ Relids inner_relids, double inner_rows,
+ List *prunequal)
+{
+ Cost prune_cost;
+ Cost saved_cost;
+ double matching_outer_rows;
+ double unmatched_nplans;
+
+ switch (appendrel->part_scheme->strategy)
+ {
+
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ prune_cost = cpu_operator_cost * LOG2(append_nplans) * inner_rows;
+ break;
+ case PARTITION_STRATEGY_HASH:
+ prune_cost = cpu_operator_cost * append_nplans * inner_rows;
+ break;
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) appendrel->part_scheme->strategy);
+ break;
+ }
+
+ matching_outer_rows =
+ get_joinrel_matching_outer_size(root,
+ appendrel,
+ inner_relids,
+ prunequal);
+
+ /*
+ * We assume that each outer joined row occupies one new partition. This
+ * is really the worst case.
+ */
+ unmatched_nplans = append_nplans - Min(matching_outer_rows, append_nplans);
+ saved_cost = (unmatched_nplans / append_nplans) * append_total_cost;
+
+ return prune_cost - saved_cost;
+}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 34ca6d4ac2..308ff452d3 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -242,7 +242,8 @@ static Hash *make_hash(Plan *lefttree,
List *hashkeys,
Oid skewTable,
AttrNumber skewColumn,
- bool skewInherit);
+ bool skewInherit,
+ List *joinpartprune_info_list);
static MergeJoin *make_mergejoin(List *tlist,
List *joinclauses, List *otherclauses,
List *mergeclauses,
@@ -342,6 +343,7 @@ create_plan(PlannerInfo *root, Path *best_path)
/* Initialize this module's workspace in PlannerInfo */
root->curOuterRels = NULL;
root->curOuterParams = NIL;
+ root->join_partition_prune_candidates = NIL;
/* Recursively process the path tree, demanding the correct tlist result */
plan = create_plan_recurse(root, best_path, CP_EXACT_TLIST);
@@ -369,6 +371,8 @@ create_plan(PlannerInfo *root, Path *best_path)
if (root->curOuterParams != NIL)
elog(ERROR, "failed to assign all NestLoopParams to plan nodes");
+ Assert(root->join_partition_prune_candidates == NIL);
+
/*
* Reset plan_params to ensure param IDs used for nestloop params are not
* re-used later
@@ -1223,6 +1227,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ Bitmapset *join_prune_paramids = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1377,6 +1382,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
* do partition pruning.
+ *
+ * Also gather information needed by the executor to do join pruning.
*/
if (enable_partition_pruning)
{
@@ -1399,13 +1406,20 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
partpruneinfo =
make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ NULL);
+
+ join_prune_paramids =
+ make_join_partition_pruneinfos(root, rel,
+ (Path *) best_path,
+ best_path->subpaths);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
plan->part_prune_info = partpruneinfo;
+ plan->join_prune_paramids = join_prune_paramids;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1445,6 +1459,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ Bitmapset *join_prune_paramids = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1541,6 +1556,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
* do partition pruning.
+ *
+ * Also gather information needed by the executor to do join pruning.
*/
if (enable_partition_pruning)
{
@@ -1554,11 +1571,18 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
partpruneinfo = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ NULL);
+
+ join_prune_paramids =
+ make_join_partition_pruneinfos(root, rel,
+ (Path *) best_path,
+ best_path->subpaths);
}
node->mergeplans = subplans;
node->part_prune_info = partpruneinfo;
+ node->join_prune_paramids = join_prune_paramids;
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
@@ -4734,6 +4758,13 @@ create_hashjoin_plan(PlannerInfo *root,
AttrNumber skewColumn = InvalidAttrNumber;
bool skewInherit = false;
ListCell *lc;
+ List *joinpartprune_info_list;
+
+ /*
+ * Collect information required to build JoinPartitionPruneInfos at this
+ * join.
+ */
+ prepare_join_partition_prune_candidate(root, &best_path->jpath);
/*
* HashJoin can project, so we don't have to demand exact tlists from the
@@ -4745,6 +4776,11 @@ create_hashjoin_plan(PlannerInfo *root,
outer_plan = create_plan_recurse(root, best_path->jpath.outerjoinpath,
(best_path->num_batches > 1) ? CP_SMALL_TLIST : 0);
+ /*
+ * Retrieve all the JoinPartitionPruneInfos for this join.
+ */
+ joinpartprune_info_list = get_join_partition_prune_candidate(root);
+
inner_plan = create_plan_recurse(root, best_path->jpath.innerjoinpath,
CP_SMALL_TLIST);
@@ -4850,7 +4886,8 @@ create_hashjoin_plan(PlannerInfo *root,
inner_hashkeys,
skewTable,
skewColumn,
- skewInherit);
+ skewInherit,
+ joinpartprune_info_list);
/*
* Set Hash node's startup & total costs equal to total cost of input
@@ -5977,7 +6014,8 @@ make_hash(Plan *lefttree,
List *hashkeys,
Oid skewTable,
AttrNumber skewColumn,
- bool skewInherit)
+ bool skewInherit,
+ List *joinpartprune_info_list)
{
Hash *node = makeNode(Hash);
Plan *plan = &node->plan;
@@ -5991,6 +6029,7 @@ make_hash(Plan *lefttree,
node->skewTable = skewTable;
node->skewColumn = skewColumn;
node->skewInherit = skewInherit;
+ node->joinpartprune_info_list = joinpartprune_info_list;
return node;
}
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index fc3709510d..c416e7ccda 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -156,6 +156,11 @@ static Plan *set_mergeappend_references(PlannerInfo *root,
MergeAppend *mplan,
int rtoffset);
static void set_hash_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_joinpartitionprune_references(PlannerInfo *root,
+ List *joinpartprune_info_list,
+ indexed_tlist *outer_itlist,
+ int rtoffset,
+ double num_exec);
static Relids offset_relid_set(Relids relids, int rtoffset);
static Node *fix_scan_expr(PlannerInfo *root, Node *node,
int rtoffset, double num_exec);
@@ -1897,6 +1902,62 @@ set_hash_references(PlannerInfo *root, Plan *plan, int rtoffset)
/* Hash nodes don't have their own quals */
Assert(plan->qual == NIL);
+
+ set_joinpartitionprune_references(root,
+ hplan->joinpartprune_info_list,
+ outer_itlist,
+ rtoffset,
+ NUM_EXEC_TLIST(plan));
+}
+
+/*
+ * set_joinpartitionprune_references
+ * Do set_plan_references processing on JoinPartitionPruneInfos
+ */
+static void
+set_joinpartitionprune_references(PlannerInfo *root,
+ List *joinpartprune_info_list,
+ indexed_tlist *outer_itlist,
+ int rtoffset,
+ double num_exec)
+{
+ ListCell *l;
+
+ foreach(l, joinpartprune_info_list)
+ {
+ JoinPartitionPruneInfo *jpinfo = (JoinPartitionPruneInfo *) lfirst(l);
+ ListCell *l1;
+
+ foreach(l1, jpinfo->part_prune_info->prune_infos)
+ {
+ List *prune_infos = lfirst(l1);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ pinfo->rtindex += rtoffset;
+
+ pinfo->initial_pruning_steps = (List *)
+ fix_upper_expr(root,
+ (Node *) pinfo->initial_pruning_steps,
+ outer_itlist,
+ OUTER_VAR,
+ rtoffset,
+ NRM_EQUAL,
+ num_exec);
+ pinfo->exec_pruning_steps = (List *)
+ fix_upper_expr(root,
+ (Node *) pinfo->exec_pruning_steps,
+ outer_itlist,
+ OUTER_VAR,
+ rtoffset,
+ NRM_EQUAL,
+ num_exec);
+ }
+ }
+ }
}
/*
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 3f31ecc956..a6d6f0ad88 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -48,7 +48,9 @@
#include "optimizer/appendinfo.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
+#include "optimizer/paramassign.h"
#include "optimizer/pathnode.h"
+#include "optimizer/restrictinfo.h"
#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partprune.h"
@@ -103,15 +105,16 @@ typedef enum PartClauseTarget
*
* gen_partprune_steps() initializes and returns an instance of this struct.
*
- * Note that has_mutable_op, has_mutable_arg, and has_exec_param are set if
- * we found any potentially-useful-for-pruning clause having those properties,
- * whether or not we actually used the clause in the steps list. This
- * definition allows us to skip the PARTTARGET_EXEC pass in some cases.
+ * Note that has_mutable_op, has_mutable_arg, has_exec_param and has_vars are
+ * set if we found any potentially-useful-for-pruning clause having those
+ * properties, whether or not we actually used the clause in the steps list.
+ * This definition allows us to skip the PARTTARGET_EXEC pass in some cases.
*/
typedef struct GeneratePruningStepsContext
{
/* Copies of input arguments for gen_partprune_steps: */
RelOptInfo *rel; /* the partitioned relation */
+ Bitmapset *available_rels; /* rels whose Vars may be used for pruning */
PartClauseTarget target; /* use-case we're generating steps for */
/* Result data: */
List *steps; /* list of PartitionPruneSteps */
@@ -119,6 +122,7 @@ typedef struct GeneratePruningStepsContext
bool has_mutable_arg; /* clauses include any mutable comparison
* values, *other than* exec params */
bool has_exec_param; /* clauses include any PARAM_EXEC params */
+ bool has_vars; /* clauses include any Vars from 'available_rels' */
bool contradictory; /* clauses were proven self-contradictory */
/* Working state: */
int next_step_id;
@@ -144,8 +148,10 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
+ Bitmapset *available_rels,
Bitmapset **matchedsubplans);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
+ Bitmapset *available_rels,
PartClauseTarget target,
GeneratePruningStepsContext *context);
static List *gen_partprune_steps_internal(GeneratePruningStepsContext *context,
@@ -204,6 +210,10 @@ static PartClauseMatchStatus match_boolean_partition_clause(Oid partopfamily,
static void partkey_datum_from_expr(PartitionPruneContext *context,
Expr *expr, int stateidx,
Datum *value, bool *isnull);
+static bool contain_forbidden_var_clause(Node *node,
+ GeneratePruningStepsContext *context);
+static bool contain_forbidden_var_clause_walker(Node *node,
+ GeneratePruningStepsContext *context);
/*
@@ -216,11 +226,14 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'available_rels' is the relid set of rels whose Vars may be used for
+ * pruning.
*/
PartitionPruneInfo *
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ Bitmapset *available_rels)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
@@ -313,6 +326,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
prunequal,
partrelids,
relid_subplan_map,
+ available_rels,
&matchedsubplans);
/* When pruning is possible, record the matched subplans */
@@ -360,6 +374,174 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return pruneinfo;
}
+/*
+ * make_join_partition_pruneinfos
+ * Builds one JoinPartitionPruneInfo for each join at which join partition
+ * pruning is possible for this appendrel.
+ *
+ * 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
+ * of scan paths for its child rels.
+ */
+Bitmapset *
+make_join_partition_pruneinfos(PlannerInfo *root, RelOptInfo *parentrel,
+ Path *best_path, List *subpaths)
+{
+ Bitmapset *result = NULL;
+ ListCell *lc;
+
+ if (!IS_PARTITIONED_REL(parentrel))
+ return NULL;
+
+ foreach(lc, root->join_partition_prune_candidates)
+ {
+ JoinPartitionPruneCandidateInfo *candidate =
+ (JoinPartitionPruneCandidateInfo *) lfirst(lc);
+ PartitionPruneInfo *part_prune_info;
+ List *prunequal;
+ Relids joinrelids;
+ ListCell *l;
+ double prune_cost;
+
+ if (candidate == NULL)
+ continue;
+
+ /*
+ * Identify all joinclauses that are movable to this appendrel given
+ * this inner side relids. Only those clauses can be used for join
+ * partition pruning.
+ */
+ joinrelids = bms_union(parentrel->relids, candidate->inner_relids);
+ prunequal = NIL;
+ foreach(l, candidate->joinrestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ if (join_clause_is_movable_into(rinfo,
+ parentrel->relids,
+ joinrelids))
+ prunequal = lappend(prunequal, rinfo);
+ }
+
+ if (prunequal == NIL)
+ continue;
+
+ /*
+ * Check the overhead of this pruning
+ */
+ prune_cost = compute_partprune_cost(root,
+ parentrel,
+ best_path->total_cost,
+ list_length(subpaths),
+ candidate->inner_relids,
+ candidate->inner_rows,
+ prunequal);
+ if (prune_cost > 0)
+ continue;
+
+ part_prune_info = make_partition_pruneinfo(root, parentrel,
+ subpaths,
+ prunequal,
+ candidate->inner_relids);
+
+ if (part_prune_info)
+ {
+ JoinPartitionPruneInfo *jpinfo;
+
+ jpinfo = makeNode(JoinPartitionPruneInfo);
+
+ jpinfo->part_prune_info = part_prune_info;
+ jpinfo->paramid = assign_special_exec_param(root);
+ jpinfo->nplans = list_length(subpaths);
+
+ candidate->joinpartprune_info_list =
+ lappend(candidate->joinpartprune_info_list, jpinfo);
+
+ result = bms_add_member(result, jpinfo->paramid);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * prepare_join_partition_prune_candidate
+ * Check if join partition pruning is possible at this join and if so
+ * collect information required to build JoinPartitionPruneInfos.
+ *
+ * Note that we may build more than one JoinPartitionPruneInfo at one join, for
+ * different Append/MergeAppend paths.
+ */
+void
+prepare_join_partition_prune_candidate(PlannerInfo *root, JoinPath *jpath)
+{
+ JoinPartitionPruneCandidateInfo *candidate;
+
+ if (!enable_partition_pruning)
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ /*
+ * We cannot perform join partition pruning if the outer is the
+ * non-nullable side.
+ */
+ if (!(jpath->jointype == JOIN_INNER ||
+ jpath->jointype == JOIN_SEMI ||
+ jpath->jointype == JOIN_RIGHT ||
+ jpath->jointype == JOIN_RIGHT_ANTI))
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ /*
+ * For now we only support HashJoin.
+ */
+ if (jpath->path.pathtype != T_HashJoin)
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ candidate = makeNode(JoinPartitionPruneCandidateInfo);
+ candidate->joinrestrictinfo = jpath->joinrestrictinfo;
+ candidate->inner_relids = jpath->innerjoinpath->parent->relids;
+ candidate->inner_rows = jpath->innerjoinpath->parent->rows;
+ candidate->joinpartprune_info_list = NIL;
+
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, candidate);
+}
+
+/*
+ * get_join_partition_prune_candidate
+ * Pop out the JoinPartitionPruneCandidateInfo for this join and retrieve
+ * the JoinPartitionPruneInfos.
+ */
+List *
+get_join_partition_prune_candidate(PlannerInfo *root)
+{
+ JoinPartitionPruneCandidateInfo *candidate;
+ List *result;
+
+ candidate = llast(root->join_partition_prune_candidates);
+ root->join_partition_prune_candidates =
+ list_delete_last(root->join_partition_prune_candidates);
+
+ if (candidate == NULL)
+ return NIL;
+
+ result = candidate->joinpartprune_info_list;
+
+ pfree(candidate);
+
+ return result;
+}
+
/*
* add_part_relids
* Add new info to a list of Bitmapsets of partitioned relids.
@@ -428,6 +610,8 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* partrelids: Set of RT indexes identifying relevant partitioned tables
* within a single partitioning hierarchy
* relid_subplan_map[]: maps child relation relids to subplan indexes
+ * available_rels: the relid set of rels whose Vars may be used for
+ * pruning.
* matchedsubplans: on success, receives the set of subplan indexes which
* were matched to this partition hierarchy
*
@@ -440,6 +624,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
+ Bitmapset *available_rels,
Bitmapset **matchedsubplans)
{
RelOptInfo *targetpart = NULL;
@@ -539,8 +724,8 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
*/
- gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
- &context);
+ gen_partprune_steps(subpart, partprunequal, available_rels,
+ PARTTARGET_INITIAL, &context);
if (context.contradictory)
{
@@ -567,14 +752,15 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
initial_pruning_steps = NIL;
/*
- * If no exec Params appear in potentially-usable pruning clauses,
- * then there's no point in even thinking about per-scan pruning.
+ * If no exec Params or available Vars appear in potentially-usable
+ * pruning clauses, then there's no point in even thinking about
+ * per-scan pruning.
*/
- if (context.has_exec_param)
+ if (context.has_exec_param || context.has_vars)
{
/* ... OK, we'd better think about it */
- gen_partprune_steps(subpart, partprunequal, PARTTARGET_EXEC,
- &context);
+ gen_partprune_steps(subpart, partprunequal, available_rels,
+ PARTTARGET_EXEC, &context);
if (context.contradictory)
{
@@ -587,11 +773,14 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/*
* Detect which exec Params actually got used; the fact that some
* were in available clauses doesn't mean we actually used them.
- * Skip per-scan pruning if there are none.
*/
execparamids = get_partkey_exec_paramids(exec_pruning_steps);
- if (bms_is_empty(execparamids))
+ /*
+ * Skip per-scan pruning if there are none used exec Params and
+ * there are none available Vars.
+ */
+ if (bms_is_empty(execparamids) && !context.has_vars)
exec_pruning_steps = NIL;
}
else
@@ -703,6 +892,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* Process 'clauses' (typically a rel's baserestrictinfo list of clauses)
* and create a list of "partition pruning steps".
*
+ * 'available_rels' is the relid set of rels whose Vars may be used for
+ * pruning.
+ *
* 'target' tells whether to generate pruning steps for planning (use
* immutable clauses only), or for executor startup (use any allowable
* clause except ones containing PARAM_EXEC Params), or for executor
@@ -712,12 +904,13 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* some subsidiary flags; see the GeneratePruningStepsContext typedef.
*/
static void
-gen_partprune_steps(RelOptInfo *rel, List *clauses, PartClauseTarget target,
- GeneratePruningStepsContext *context)
+gen_partprune_steps(RelOptInfo *rel, List *clauses, Bitmapset *available_rels,
+ PartClauseTarget target, GeneratePruningStepsContext *context)
{
/* Initialize all output values to zero/false/NULL */
memset(context, 0, sizeof(GeneratePruningStepsContext));
context->rel = rel;
+ context->available_rels = available_rels;
context->target = target;
/*
@@ -773,7 +966,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
* If the clauses are found to be contradictory, we can return the empty
* set.
*/
- gen_partprune_steps(rel, clauses, PARTTARGET_PLANNER,
+ gen_partprune_steps(rel, clauses, NULL, PARTTARGET_PLANNER,
&gcontext);
if (gcontext.contradictory)
return NULL;
@@ -1957,9 +2150,10 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
return PARTCLAUSE_UNSUPPORTED;
/*
- * We can never prune using an expression that contains Vars.
+ * We can never prune using an expression that contains Vars except
+ * for Vars belonging to context->available_rels.
*/
- if (contain_var_clause((Node *) expr))
+ if (contain_forbidden_var_clause((Node *) expr, context))
return PARTCLAUSE_UNSUPPORTED;
/*
@@ -2155,9 +2349,10 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
return PARTCLAUSE_UNSUPPORTED;
/*
- * We can never prune using an expression that contains Vars.
+ * We can never prune using an expression that contains Vars except
+ * for Vars belonging to context->available_rels.
*/
- if (contain_var_clause((Node *) rightop))
+ if (contain_forbidden_var_clause((Node *) rightop, context))
return PARTCLAUSE_UNSUPPORTED;
/*
@@ -3727,3 +3922,54 @@ partkey_datum_from_expr(PartitionPruneContext *context,
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
+
+/*
+ * contain_forbidden_var_clause
+ * Recursively scan a clause to discover whether it contains any Var nodes
+ * (of the current query level) that do not belong to relations in
+ * context->available_rels.
+ *
+ * Returns true if any such varnode found.
+ *
+ * Does not examine subqueries, therefore must only be used after reduction
+ * of sublinks to subplans!
+ */
+static bool
+contain_forbidden_var_clause(Node *node, GeneratePruningStepsContext *context)
+{
+ return contain_forbidden_var_clause_walker(node, context);
+}
+
+static bool
+contain_forbidden_var_clause_walker(Node *node, GeneratePruningStepsContext *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varlevelsup != 0)
+ return false;
+
+ if (!bms_is_member(var->varno, context->available_rels))
+ return true; /* abort the tree traversal and return true */
+
+ context->has_vars = true;
+
+ if (context->target != PARTTARGET_EXEC)
+ return true; /* abort the tree traversal and return true */
+
+ return false;
+ }
+ if (IsA(node, CurrentOfExpr))
+ return true;
+ if (IsA(node, PlaceHolderVar))
+ {
+ if (((PlaceHolderVar *) node)->phlevelsup == 0)
+ return true; /* abort the tree traversal and return true */
+ /* else fall through to check the contained expr */
+ }
+ return expression_tree_walker(node, contain_forbidden_var_clause_walker,
+ (void *) context);
+}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 15ec869ac8..720bcc1149 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -121,11 +121,26 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+/*
+ * JoinPartitionPruneState - State object required for plan nodes to perform
+ * join partition pruning.
+ */
+typedef struct JoinPartitionPruneState
+{
+ PartitionPruneState *part_prune_state;
+ int paramid;
+ int nplans;
+ bool finished;
+ Bitmapset *part_prune_result;
+} JoinPartitionPruneState;
+
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
PartitionPruneInfo *pruneinfo,
Bitmapset **initially_valid_subplans);
+extern List *ExecInitJoinpartpruneList(PlanState *planstate, List *joinpartprune_info_list);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ PlanState *planstate);
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5d7f17dee0..0aeafcabff 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2678,6 +2678,9 @@ typedef struct HashState
/* Parallel hash state. */
struct ParallelHashJoinState *parallel_state;
+
+ /* Infos for join partition pruning. */
+ List *joinpartprune_state_list;
} HashState;
/* ----------------
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index ed85dc7414..d066b6105c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -530,6 +530,9 @@ struct PlannerInfo
/* not-yet-assigned NestLoopParams */
List *curOuterParams;
+ /* a stack of JoinPartitionPruneInfos */
+ List *join_partition_prune_candidates;
+
/*
* These fields are workspace for setrefs.c. Each is an array
* corresponding to glob->subplans. (We could probably teach
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 24d46c76dc..00058a735e 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -276,6 +276,9 @@ typedef struct Append
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /* Info for join partition pruning; NULL if we're not doing that */
+ Bitmapset *join_prune_paramids;
} Append;
/* ----------------
@@ -311,6 +314,9 @@ typedef struct MergeAppend
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /* Info for join partition pruning; NULL if we're not doing that */
+ Bitmapset *join_prune_paramids;
} MergeAppend;
/* ----------------
@@ -1207,6 +1213,7 @@ typedef struct Hash
bool skewInherit; /* is outer join rel an inheritance tree? */
/* all other info is in the parent HashJoin node */
Cardinality rows_total; /* estimate total rows if parallel_aware */
+ List *joinpartprune_info_list; /* infos for join partition pruning */
} Hash;
/* ----------------
@@ -1553,6 +1560,35 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*
+ * JoinPartitionPruneCandidateInfo - Information required to build
+ * JoinPartitionPruneInfos.
+ */
+typedef struct JoinPartitionPruneCandidateInfo
+{
+ pg_node_attr(no_equal, no_query_jumble)
+
+ NodeTag type;
+ List *joinrestrictinfo;
+ Bitmapset *inner_relids;
+ double inner_rows;
+ List *joinpartprune_info_list;
+} JoinPartitionPruneCandidateInfo;
+
+/*
+ * JoinPartitionPruneInfo - Details required to allow the executor to prune
+ * partitions during join.
+ */
+typedef struct JoinPartitionPruneInfo
+{
+ pg_node_attr(no_equal, no_query_jumble)
+
+ NodeTag type;
+ PartitionPruneInfo *part_prune_info;
+ int paramid;
+ int nplans;
+} JoinPartitionPruneInfo;
+
/*
* Plan invalidation info
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6d50afbf74..52de844f6d 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -211,5 +211,9 @@ extern void set_foreign_size_estimates(PlannerInfo *root, RelOptInfo *rel);
extern PathTarget *set_pathtarget_cost_width(PlannerInfo *root, PathTarget *target);
extern double compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel,
Path *bitmapqual, int loop_count, Cost *cost, double *tuple);
+extern double compute_partprune_cost(PlannerInfo *root, RelOptInfo *appendrel,
+ Cost append_total_cost, int append_nplans,
+ Relids inner_relids, double inner_rows,
+ List *prunequal);
#endif /* COST_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 8636e04e37..899aa61b34 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -19,6 +19,8 @@
struct PlannerInfo; /* avoid including pathnodes.h here */
struct RelOptInfo;
+struct Path;
+struct JoinPath;
/*
@@ -73,7 +75,15 @@ typedef struct PartitionPruneContext
extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ Bitmapset *available_rels);
+extern Bitmapset *make_join_partition_pruneinfos(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ struct Path *best_path,
+ List *subpaths);
+extern void prepare_join_partition_prune_candidate(struct PlannerInfo *root,
+ struct JoinPath *jpath);
+extern List *get_join_partition_prune_candidate(struct PlannerInfo *root);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 9a4c48c055..a08e7a1f0a 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -3003,6 +3003,92 @@ order by tbl1.col1, tprt.col1;
------+------
(0 rows)
+-- join partition pruning
+-- The 'Memory Usage' from the Hash node can vary between machines. Let's just
+-- replace the number with an 'N'.
+-- We need to run EXPLAIN ANALYZE because we need to see '(never executed)'
+-- notations because that's the only way to verify runtime pruning.
+create function explain_join_partition_pruning(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, verbose, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+ return next ln;
+ end loop;
+end;
+$$;
+delete from tbl1;
+insert into tbl1 values (501), (505);
+analyze tbl1, tprt;
+set enable_nestloop = off;
+set enable_mergejoin = off;
+set enable_hashjoin = on;
+select explain_join_partition_pruning('
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;');
+ explain_join_partition_pruning
+--------------------------------------------------------------------------------
+ Hash Right Join (actual rows=2 loops=1)
+ Output: p1.col1, p2.col1, t.col1
+ Hash Cond: ((p1.col1 = t.col1) AND (p2.col1 = t.col1))
+ -> Hash Join (actual rows=3 loops=1)
+ Output: p1.col1, p2.col1
+ Hash Cond: (p1.col1 = p2.col1)
+ -> Append (actual rows=3 loops=1)
+ Join Partition Pruning: $0
+ -> Seq Scan on public.tprt_1 p1_1 (never executed)
+ Output: p1_1.col1
+ -> Seq Scan on public.tprt_2 p1_2 (actual rows=3 loops=1)
+ Output: p1_2.col1
+ -> Seq Scan on public.tprt_3 p1_3 (never executed)
+ Output: p1_3.col1
+ -> Seq Scan on public.tprt_4 p1_4 (never executed)
+ Output: p1_4.col1
+ -> Seq Scan on public.tprt_5 p1_5 (never executed)
+ Output: p1_5.col1
+ -> Seq Scan on public.tprt_6 p1_6 (never executed)
+ Output: p1_6.col1
+ -> Hash (actual rows=3 loops=1)
+ Output: p2.col1
+ Buckets: 1024 Batches: 1 Memory Usage: NkB
+ -> Append (actual rows=3 loops=1)
+ Join Partition Pruning: $1
+ -> Seq Scan on public.tprt_1 p2_1 (never executed)
+ Output: p2_1.col1
+ -> Seq Scan on public.tprt_2 p2_2 (actual rows=3 loops=1)
+ Output: p2_2.col1
+ -> Seq Scan on public.tprt_3 p2_3 (never executed)
+ Output: p2_3.col1
+ -> Seq Scan on public.tprt_4 p2_4 (never executed)
+ Output: p2_4.col1
+ -> Seq Scan on public.tprt_5 p2_5 (never executed)
+ Output: p2_5.col1
+ -> Seq Scan on public.tprt_6 p2_6 (never executed)
+ Output: p2_6.col1
+ -> Hash (actual rows=2 loops=1)
+ Output: t.col1
+ Buckets: 1024 Batches: 1 Memory Usage: NkB
+ Partition Prune: $0, $1
+ -> Seq Scan on public.tbl1 t (actual rows=2 loops=1)
+ Output: t.col1
+(43 rows)
+
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;
+ col1 | col1 | col1
+------+------+------
+ 501 | 501 | 501
+ 505 | 505 | 505
+(2 rows)
+
drop table tbl1, tprt;
-- Test with columns defined in varying orders between each level
create table part_abc (a int not null, b int not null, c int not null) partition by list (a);
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 7bf3920827..fc5982edcf 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -727,6 +727,45 @@ select tbl1.col1, tprt.col1 from tbl1
inner join tprt on tbl1.col1 = tprt.col1
order by tbl1.col1, tprt.col1;
+-- join partition pruning
+
+-- The 'Memory Usage' from the Hash node can vary between machines. Let's just
+-- replace the number with an 'N'.
+-- We need to run EXPLAIN ANALYZE because we need to see '(never executed)'
+-- notations because that's the only way to verify runtime pruning.
+create function explain_join_partition_pruning(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, verbose, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+ return next ln;
+ end loop;
+end;
+$$;
+
+delete from tbl1;
+insert into tbl1 values (501), (505);
+analyze tbl1, tprt;
+
+set enable_nestloop = off;
+set enable_mergejoin = off;
+set enable_hashjoin = on;
+
+select explain_join_partition_pruning('
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;');
+
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;
+
drop table tbl1, tprt;
-- Test with columns defined in varying orders between each level
--
2.31.0
Hello Richard,
06.11.2023 06:05, Richard Guo wrote:
Fixed this issue in v4.
Please look at a warning and an assertion failure triggered by the
following script:
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
set min_parallel_table_scan_size = '1kB';
create table t1 (i int) partition by range (i);
create table t1_1 partition of t1 for values from (1) to (2);
create table t1_2 partition of t1 for values from (2) to (3);
insert into t1 values (1), (2);
create table t2(i int);
insert into t2 values (1), (2);
analyze t1, t2;
select * from t1 right join t2 on t1.i = t2.i;
2023-11-06 14:11:37.398 UTC|law|regression|6548f419.392cf5|WARNING: Join partition pruning $0 has not been performed yet.
TRAP: failed Assert("node->as_prune_state"), File: "nodeAppend.c", Line: 846, PID: 3747061
Best regards,
Alexander
On Mon, Nov 6, 2023 at 11:00 PM Alexander Lakhin <exclusion@gmail.com>
wrote:
Please look at a warning and an assertion failure triggered by the
following script:
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
set min_parallel_table_scan_size = '1kB';create table t1 (i int) partition by range (i);
create table t1_1 partition of t1 for values from (1) to (2);
create table t1_2 partition of t1 for values from (2) to (3);
insert into t1 values (1), (2);create table t2(i int);
insert into t2 values (1), (2);
analyze t1, t2;select * from t1 right join t2 on t1.i = t2.i;
2023-11-06 14:11:37.398 UTC|law|regression|6548f419.392cf5|WARNING: Join
partition pruning $0 has not been performed yet.
TRAP: failed Assert("node->as_prune_state"), File: "nodeAppend.c", Line:
846, PID: 3747061
Thanks for the report! I failed to take care of the parallel-hashjoin
case, and I have to admit that it's not clear to me yet how we should do
join partition pruning in that case.
For now I think it's better to just avoid performing join partition
pruning for parallel hashjoin, so that the patch doesn't become too
complex for review. We can always extend it in the future.
I have done that in v5. Thanks for testing!
Thanks
Richard
Attachments:
v5-0001-Support-run-time-partition-pruning-for-hash-join.patchapplication/octet-stream; name=v5-0001-Support-run-time-partition-pruning-for-hash-join.patchDownload
From 470ebcee95f4c0b162b6826ce840bf17d8df5266 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Mon, 14 Aug 2023 14:55:26 +0800
Subject: [PATCH v5] Support run-time partition pruning for hash join
If we have a hash join with an Append node on the outer side, something
like
Hash Join
Hash Cond: (pt.a = t.a)
-> Append
-> Seq Scan on pt_p1 pt_1
-> Seq Scan on pt_p2 pt_2
-> Seq Scan on pt_p3 pt_3
-> Hash
-> Seq Scan on t
We can actually prune those subnodes of the Append that cannot possibly
contain any matching tuples from the other side of the join. To do
that, when building the Hash table, for each row from the inner side we
can compute the minimum set of subnodes that can possibly match the join
condition. When we have built the Hash table and start to execute the
Append node, we should have known which subnodes are survived and thus
can skip other subnodes.
This patch implements this idea.
---
src/backend/commands/explain.c | 61 ++++
src/backend/executor/execPartition.c | 127 +++++++-
src/backend/executor/nodeAppend.c | 32 +-
src/backend/executor/nodeHash.c | 75 +++++
src/backend/executor/nodeHashjoin.c | 10 +
src/backend/executor/nodeMergeAppend.c | 22 +-
src/backend/optimizer/path/costsize.c | 106 +++++++
src/backend/optimizer/plan/createplan.c | 49 ++-
src/backend/optimizer/plan/setrefs.c | 61 ++++
src/backend/partitioning/partprune.c | 298 ++++++++++++++++--
src/include/executor/execPartition.h | 17 +-
src/include/nodes/execnodes.h | 3 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 36 +++
src/include/optimizer/cost.h | 4 +
src/include/partitioning/partprune.h | 12 +-
src/test/regress/expected/partition_prune.out | 86 +++++
src/test/regress/sql/partition_prune.sql | 39 +++
18 files changed, 992 insertions(+), 49 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f1d71bc54e..c51cf6beb6 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -18,6 +18,7 @@
#include "commands/createas.h"
#include "commands/defrem.h"
#include "commands/prepare.h"
+#include "executor/execPartition.h"
#include "executor/nodeHash.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -118,6 +119,9 @@ static void show_instrumentation_count(const char *qlabel, int which,
PlanState *planstate, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
+static void show_join_pruning_result_info(Bitmapset *join_prune_paramids,
+ ExplainState *es);
+static void show_joinpartprune_info(HashState *hashstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static void show_buffer_usage(ExplainState *es, const BufferUsage *usage,
bool planning);
@@ -2057,9 +2061,17 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_incremental_sort_info(castNode(IncrementalSortState, planstate),
es);
break;
+ case T_Append:
+ if (es->verbose)
+ show_join_pruning_result_info(((Append *) plan)->join_prune_paramids,
+ es);
+ break;
case T_MergeAppend:
show_merge_append_keys(castNode(MergeAppendState, planstate),
ancestors, es);
+ if (es->verbose)
+ show_join_pruning_result_info(((MergeAppend *) plan)->join_prune_paramids,
+ es);
break;
case T_Result:
show_upper_qual((List *) ((Result *) plan)->resconstantqual,
@@ -2075,6 +2087,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
break;
case T_Hash:
show_hash_info(castNode(HashState, planstate), es);
+ if (es->verbose)
+ show_joinpartprune_info(castNode(HashState, planstate), es);
break;
case T_Memoize:
show_memoize_info(castNode(MemoizeState, planstate), ancestors,
@@ -3515,6 +3529,53 @@ show_eval_params(Bitmapset *bms_params, ExplainState *es)
ExplainPropertyList("Params Evaluated", params, es);
}
+/*
+ * Show join partition pruning results at Append/MergeAppend nodes.
+ */
+static void
+show_join_pruning_result_info(Bitmapset *join_prune_paramids, ExplainState *es)
+{
+ int paramid = -1;
+ List *params = NIL;
+
+ if (bms_is_empty(join_prune_paramids))
+ return;
+
+ while ((paramid = bms_next_member(join_prune_paramids, paramid)) >= 0)
+ {
+ char param[32];
+
+ snprintf(param, sizeof(param), "$%d", paramid);
+ params = lappend(params, pstrdup(param));
+ }
+
+ ExplainPropertyList("Join Partition Pruning", params, es);
+}
+
+/*
+ * Show join partition pruning infos at Hash nodes.
+ */
+static void
+show_joinpartprune_info(HashState *hashstate, ExplainState *es)
+{
+ List *params = NIL;
+ ListCell *lc;
+
+ if (!hashstate->joinpartprune_state_list)
+ return;
+
+ foreach(lc, hashstate->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ char param[32];
+
+ snprintf(param, sizeof(param), "$%d", jpstate->paramid);
+ params = lappend(params, pstrdup(param));
+ }
+
+ ExplainPropertyList("Partition Prune", params, es);
+}
+
/*
* Fetch the name of an index in an EXPLAIN
*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index f6c34328b8..35a9149a39 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -199,6 +199,8 @@ static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
Bitmapset **validsubplans);
+static bool get_join_prune_matching_subplans(PlanState *planstate,
+ Bitmapset **partset);
/*
@@ -1806,7 +1808,7 @@ ExecInitPartitionPruning(PlanState *planstate,
* Perform an initial partition prune pass, if required.
*/
if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true, NULL);
else
{
/* No pruning, so we'll need to initialize all subplans */
@@ -1836,6 +1838,37 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecInitJoinpartpruneList
+ * Initialize data structures needed for join partition pruning
+ */
+List *
+ExecInitJoinpartpruneList(PlanState *planstate,
+ List *joinpartprune_info_list)
+{
+ ListCell *lc;
+ List *result = NIL;
+
+ foreach(lc, joinpartprune_info_list)
+ {
+ JoinPartitionPruneInfo *jpinfo = (JoinPartitionPruneInfo *) lfirst(lc);
+ JoinPartitionPruneState *jpstate = palloc(sizeof(JoinPartitionPruneState));
+
+ jpstate->part_prune_state =
+ CreatePartitionPruneState(planstate, jpinfo->part_prune_info);
+ Assert(jpstate->part_prune_state->do_exec_prune);
+
+ jpstate->paramid = jpinfo->paramid;
+ jpstate->nplans = jpinfo->nplans;
+ jpstate->finished = false;
+ jpstate->part_prune_result = NULL;
+
+ result = lappend(result, jpstate);
+ }
+
+ return result;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
@@ -2273,7 +2306,9 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
/*
* ExecFindMatchingSubPlans
* Determine which subplans match the pruning steps detailed in
- * 'prunestate' for the current comparison expression values.
+ * 'prunestate' if any for the current comparison expression values, and
+ * meanwhile match the join partition pruning results if any stored in
+ * Append/MergeAppend node's join_prune_paramids.
*
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
@@ -2281,11 +2316,30 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ PlanState *planstate)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
int i;
+ Bitmapset *join_prune_partset = NULL;
+ bool do_join_prune;
+
+ /* Retrieve the join partition pruning results if any */
+ do_join_prune =
+ get_join_prune_matching_subplans(planstate, &join_prune_partset);
+
+ /*
+ * Either we're here on partition prune done according to the pruning steps
+ * detailed in 'prunestate', or we have done join partition prune.
+ */
+ Assert(do_join_prune || prunestate != NULL);
+
+ /*
+ * If there is no 'prunestate', then rely entirely on join pruning.
+ */
+ if (prunestate == NULL)
+ return join_prune_partset;
/*
* Either we're here on the initial prune done during pruning
@@ -2326,6 +2380,10 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Add in any subplans that partition pruning didn't account for */
result = bms_add_members(result, prunestate->other_subplans);
+ /* Intersect join partition pruning results */
+ if (do_join_prune)
+ result = bms_intersect(result, join_prune_partset);
+
MemoryContextSwitchTo(oldcontext);
/* Copy result out of the temp context before we reset it */
@@ -2396,3 +2454,66 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
}
}
}
+
+/*
+ * get_join_prune_matching_subplans
+ * Retrieve the join partition pruning results if any stored in
+ * Append/MergeAppend node's join_prune_paramids. Return true if we can
+ * do join partition pruning, otherwise return false.
+ *
+ * Adds valid (non-prunable) subplan IDs to *partset
+ */
+static bool
+get_join_prune_matching_subplans(PlanState *planstate, Bitmapset **partset)
+{
+ Bitmapset *join_prune_paramids;
+ int nplans;
+ int paramid;
+
+ if (planstate == NULL)
+ return false;
+
+ if (IsA(planstate, AppendState))
+ {
+ join_prune_paramids =
+ ((Append *) planstate->plan)->join_prune_paramids;
+ nplans = ((AppendState *) planstate)->as_nplans;
+ }
+ else if (IsA(planstate, MergeAppendState))
+ {
+ join_prune_paramids =
+ ((MergeAppend *) planstate->plan)->join_prune_paramids;
+ nplans = ((MergeAppendState *) planstate)->ms_nplans;
+ }
+ else
+ {
+ elog(ERROR, "unrecognized node type: %d", (int) nodeTag(planstate));
+ return false;
+ }
+
+ if (bms_is_empty(join_prune_paramids))
+ return false;
+
+ Assert(nplans > 0);
+ *partset = bms_add_range(NULL, 0, nplans - 1);
+
+ paramid = -1;
+ while ((paramid = bms_next_member(join_prune_paramids, paramid)) >= 0)
+ {
+ ParamExecData *param;
+ JoinPartitionPruneState *jpstate;
+
+ param = &(planstate->state->es_param_exec_vals[paramid]);
+ Assert(param->execPlan == NULL);
+ Assert(!param->isnull);
+ jpstate = (JoinPartitionPruneState *) DatumGetPointer(param->value);
+
+ if (jpstate != NULL)
+ *partset = bms_intersect(*partset, jpstate->part_prune_result);
+ else /* the Hash node for this pruning has not been executed */
+ elog(WARNING, "Join partition pruning $%d has not been performed yet.",
+ paramid);
+ }
+
+ return true;
+}
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..c8dd8583d2 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -151,11 +151,13 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
nplans = bms_num_members(validsubplans);
/*
- * When no run-time pruning is required and there's at least one
- * subplan, we can fill as_valid_subplans immediately, preventing
- * later calls to ExecFindMatchingSubPlans.
+ * When no run-time pruning or join pruning is required and there's at
+ * least one subplan, we can fill as_valid_subplans immediately,
+ * preventing later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (!prunestate->do_exec_prune &&
+ bms_is_empty(node->join_prune_paramids) &&
+ nplans > 0)
{
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
appendstate->as_valid_subplans_identified = true;
@@ -170,10 +172,18 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplans as valid; they must also all be initialized.
*/
Assert(nplans > 0);
- appendstate->as_valid_subplans = validsubplans =
- bms_add_range(NULL, 0, nplans - 1);
- appendstate->as_valid_subplans_identified = true;
+ validsubplans = bms_add_range(NULL, 0, nplans - 1);
appendstate->as_prune_state = NULL;
+
+ /*
+ * When join pruning is not enabled we can fill as_valid_subplans
+ * immediately, preventing later calls to ExecFindMatchingSubPlans.
+ */
+ if (bms_is_empty(node->join_prune_paramids))
+ {
+ appendstate->as_valid_subplans = validsubplans;
+ appendstate->as_valid_subplans_identified = true;
+ }
}
/*
@@ -580,7 +590,7 @@ choose_next_subplan_locally(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
}
@@ -647,7 +657,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
/*
@@ -723,7 +733,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
mark_invalid_subplans_as_finished(node);
@@ -876,7 +886,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
classify_matching_subplans(node);
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index e72f0986c2..9ca8bf49d9 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -31,6 +31,7 @@
#include "catalog/pg_statistic.h"
#include "commands/tablespace.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/hashjoin.h"
#include "executor/nodeHash.h"
#include "executor/nodeHashjoin.h"
@@ -48,6 +49,8 @@ static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable);
static void ExecParallelHashIncreaseNumBatches(HashJoinTable hashtable);
static void ExecParallelHashIncreaseNumBuckets(HashJoinTable hashtable);
+static void ExecJoinPartitionPrune(HashState *node);
+static void ExecStoreJoinPartitionPruneResult(HashState *node);
static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node,
int mcvsToUse);
static void ExecHashSkewTableInsert(HashJoinTable hashtable,
@@ -189,8 +192,14 @@ MultiExecPrivateHash(HashState *node)
}
hashtable->totalTuples += 1;
}
+
+ /* Perform join partition pruning */
+ ExecJoinPartitionPrune(node);
}
+ /* Store the surviving partitions for Append/MergeAppend nodes */
+ ExecStoreJoinPartitionPruneResult(node);
+
/* resize the hash table if needed (NTUP_PER_BUCKET exceeded) */
if (hashtable->nbuckets != hashtable->nbuckets_optimal)
ExecHashIncreaseNumBuckets(hashtable);
@@ -401,6 +410,12 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
hashstate->hashkeys =
ExecInitExprList(node->hashkeys, (PlanState *) hashstate);
+ /*
+ * initialize join partition pruning infos
+ */
+ hashstate->joinpartprune_state_list =
+ ExecInitJoinpartpruneList(&hashstate->ps, node->joinpartprune_info_list);
+
return hashstate;
}
@@ -1601,6 +1616,56 @@ ExecParallelHashIncreaseNumBuckets(HashJoinTable hashtable)
}
}
+/*
+ * ExecJoinPartitionPrune
+ * Perform join partition pruning at this join for each
+ * JoinPartitionPruneState.
+ */
+static void
+ExecJoinPartitionPrune(HashState *node)
+{
+ ListCell *lc;
+
+ foreach(lc, node->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ Bitmapset *matching_subPlans;
+
+ if (jpstate->finished)
+ continue;
+
+ matching_subPlans =
+ ExecFindMatchingSubPlans(jpstate->part_prune_state, false, NULL);
+ jpstate->part_prune_result =
+ bms_add_members(jpstate->part_prune_result, matching_subPlans);
+
+ if (bms_num_members(jpstate->part_prune_result) == jpstate->nplans)
+ jpstate->finished = true;
+ }
+}
+
+/*
+ * ExecStoreJoinPartitionPruneResult
+ * For each JoinPartitionPruneState, store the set of surviving partitions
+ * to make it available for the Append/MergeAppend node.
+ */
+static void
+ExecStoreJoinPartitionPruneResult(HashState *node)
+{
+ ListCell *lc;
+
+ foreach(lc, node->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ ParamExecData *param;
+
+ param = &(node->ps.state->es_param_exec_vals[jpstate->paramid]);
+ Assert(param->execPlan == NULL);
+ Assert(!param->isnull);
+ param->value = PointerGetDatum(jpstate);
+ }
+}
+
/*
* ExecHashTableInsert
* insert a tuple into the hash table depending on the hash value
@@ -2345,6 +2410,16 @@ void
ExecReScanHash(HashState *node)
{
PlanState *outerPlan = outerPlanState(node);
+ ListCell *lc;
+
+ /* reset the state in JoinPartitionPruneStates */
+ foreach(lc, node->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+
+ jpstate->finished = false;
+ jpstate->part_prune_result = NULL;
+ }
/*
* if chgParam of subnode is not null then plan will be re-scanned by
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 25a2d78f15..ddca824206 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -311,6 +311,16 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
*/
node->hj_FirstOuterTupleSlot = NULL;
}
+ else if (hashNode->joinpartprune_state_list != NIL)
+ {
+ /*
+ * Give the hash node a chance to run join partition
+ * pruning if there is any JoinPartitionPruneState that can
+ * be evaluated at it. So do not apply the empty-outer
+ * optimization in this case.
+ */
+ node->hj_FirstOuterTupleSlot = NULL;
+ }
else if (HJ_FILL_OUTER(node) ||
(outerNode->plan->startup_cost < hashNode->ps.plan->total_cost &&
!node->hj_OuterNotEmpty))
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 21b5726e6e..9eb276abc8 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -99,11 +99,13 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
nplans = bms_num_members(validsubplans);
/*
- * When no run-time pruning is required and there's at least one
- * subplan, we can fill ms_valid_subplans immediately, preventing
- * later calls to ExecFindMatchingSubPlans.
+ * When no run-time pruning or join pruning is required and there's at
+ * least one subplan, we can fill ms_valid_subplans immediately,
+ * preventing later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (!prunestate->do_exec_prune &&
+ bms_is_empty(node->join_prune_paramids) &&
+ nplans > 0)
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -115,9 +117,15 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplans as valid; they must also all be initialized.
*/
Assert(nplans > 0);
- mergestate->ms_valid_subplans = validsubplans =
- bms_add_range(NULL, 0, nplans - 1);
+ validsubplans = bms_add_range(NULL, 0, nplans - 1);
mergestate->ms_prune_state = NULL;
+
+ /*
+ * When join pruning is not enabled we can fill ms_valid_subplans
+ * immediately, preventing later calls to ExecFindMatchingSubPlans.
+ */
+ if (bms_is_empty(node->join_prune_paramids))
+ mergestate->ms_valid_subplans = validsubplans;
}
mergeplanstates = (PlanState **) palloc(nplans * sizeof(PlanState *));
@@ -218,7 +226,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, &node->ps);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index d6ceafd51c..9bdc88a9db 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -173,6 +173,10 @@ static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
static bool has_indexed_join_quals(NestPath *path);
static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
List *quals);
+static double get_joinrel_matching_outer_size(PlannerInfo *root,
+ RelOptInfo *outer_rel,
+ Relids inner_relids,
+ List *restrictlist);
static double calc_joinrel_size_estimate(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outer_rel,
@@ -5380,6 +5384,61 @@ get_parameterized_joinrel_size(PlannerInfo *root, RelOptInfo *rel,
return nrows;
}
+/*
+ * get_joinrel_matching_outer_size
+ * Make a size estimate for the outer side that matches the inner side.
+ */
+static double
+get_joinrel_matching_outer_size(PlannerInfo *root,
+ RelOptInfo *outer_rel,
+ Relids inner_relids,
+ List *restrictlist)
+{
+ double nrows;
+ Selectivity fkselec;
+ Selectivity jselec;
+ SpecialJoinInfo *sjinfo;
+ SpecialJoinInfo sjinfo_data;
+
+ sjinfo = &sjinfo_data;
+ sjinfo->type = T_SpecialJoinInfo;
+ sjinfo->min_lefthand = outer_rel->relids;
+ sjinfo->min_righthand = inner_relids;
+ sjinfo->syn_lefthand = outer_rel->relids;
+ sjinfo->syn_righthand = inner_relids;
+ sjinfo->jointype = JOIN_SEMI;
+ sjinfo->ojrelid = 0;
+ sjinfo->commute_above_l = NULL;
+ sjinfo->commute_above_r = NULL;
+ sjinfo->commute_below_l = NULL;
+ sjinfo->commute_below_r = NULL;
+ /* we don't bother trying to make the remaining fields valid */
+ sjinfo->lhs_strict = false;
+ sjinfo->semi_can_btree = false;
+ sjinfo->semi_can_hash = false;
+ sjinfo->semi_operators = NIL;
+ sjinfo->semi_rhs_exprs = NIL;
+
+ fkselec = get_foreign_key_join_selectivity(root,
+ outer_rel->relids,
+ inner_relids,
+ sjinfo,
+ &restrictlist);
+ jselec = clauselist_selectivity(root,
+ restrictlist,
+ 0,
+ sjinfo->jointype,
+ sjinfo);
+
+ nrows = outer_rel->rows * fkselec * jselec;
+ nrows = clamp_row_est(nrows);
+
+ /* For safety, make sure result is not more than the base estimate */
+ if (nrows > outer_rel->rows)
+ nrows = outer_rel->rows;
+ return nrows;
+}
+
/*
* calc_joinrel_size_estimate
* Workhorse for set_joinrel_size_estimates and
@@ -6495,3 +6554,50 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
return pages_fetched;
}
+
+/*
+ * compute_partprune_cost
+ * Compute the overhead of join partition pruning.
+ */
+double
+compute_partprune_cost(PlannerInfo *root, RelOptInfo *appendrel,
+ Cost append_total_cost, int append_nplans,
+ Relids inner_relids, double inner_rows,
+ List *prunequal)
+{
+ Cost prune_cost;
+ Cost saved_cost;
+ double matching_outer_rows;
+ double unmatched_nplans;
+
+ switch (appendrel->part_scheme->strategy)
+ {
+
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ prune_cost = cpu_operator_cost * LOG2(append_nplans) * inner_rows;
+ break;
+ case PARTITION_STRATEGY_HASH:
+ prune_cost = cpu_operator_cost * append_nplans * inner_rows;
+ break;
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) appendrel->part_scheme->strategy);
+ break;
+ }
+
+ matching_outer_rows =
+ get_joinrel_matching_outer_size(root,
+ appendrel,
+ inner_relids,
+ prunequal);
+
+ /*
+ * We assume that each outer joined row occupies one new partition. This
+ * is really the worst case.
+ */
+ unmatched_nplans = append_nplans - Min(matching_outer_rows, append_nplans);
+ saved_cost = (unmatched_nplans / append_nplans) * append_total_cost;
+
+ return prune_cost - saved_cost;
+}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 34ca6d4ac2..308ff452d3 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -242,7 +242,8 @@ static Hash *make_hash(Plan *lefttree,
List *hashkeys,
Oid skewTable,
AttrNumber skewColumn,
- bool skewInherit);
+ bool skewInherit,
+ List *joinpartprune_info_list);
static MergeJoin *make_mergejoin(List *tlist,
List *joinclauses, List *otherclauses,
List *mergeclauses,
@@ -342,6 +343,7 @@ create_plan(PlannerInfo *root, Path *best_path)
/* Initialize this module's workspace in PlannerInfo */
root->curOuterRels = NULL;
root->curOuterParams = NIL;
+ root->join_partition_prune_candidates = NIL;
/* Recursively process the path tree, demanding the correct tlist result */
plan = create_plan_recurse(root, best_path, CP_EXACT_TLIST);
@@ -369,6 +371,8 @@ create_plan(PlannerInfo *root, Path *best_path)
if (root->curOuterParams != NIL)
elog(ERROR, "failed to assign all NestLoopParams to plan nodes");
+ Assert(root->join_partition_prune_candidates == NIL);
+
/*
* Reset plan_params to ensure param IDs used for nestloop params are not
* re-used later
@@ -1223,6 +1227,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ Bitmapset *join_prune_paramids = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1377,6 +1382,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
* do partition pruning.
+ *
+ * Also gather information needed by the executor to do join pruning.
*/
if (enable_partition_pruning)
{
@@ -1399,13 +1406,20 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
partpruneinfo =
make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ NULL);
+
+ join_prune_paramids =
+ make_join_partition_pruneinfos(root, rel,
+ (Path *) best_path,
+ best_path->subpaths);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
plan->part_prune_info = partpruneinfo;
+ plan->join_prune_paramids = join_prune_paramids;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1445,6 +1459,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ Bitmapset *join_prune_paramids = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1541,6 +1556,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
* do partition pruning.
+ *
+ * Also gather information needed by the executor to do join pruning.
*/
if (enable_partition_pruning)
{
@@ -1554,11 +1571,18 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
partpruneinfo = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ NULL);
+
+ join_prune_paramids =
+ make_join_partition_pruneinfos(root, rel,
+ (Path *) best_path,
+ best_path->subpaths);
}
node->mergeplans = subplans;
node->part_prune_info = partpruneinfo;
+ node->join_prune_paramids = join_prune_paramids;
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
@@ -4734,6 +4758,13 @@ create_hashjoin_plan(PlannerInfo *root,
AttrNumber skewColumn = InvalidAttrNumber;
bool skewInherit = false;
ListCell *lc;
+ List *joinpartprune_info_list;
+
+ /*
+ * Collect information required to build JoinPartitionPruneInfos at this
+ * join.
+ */
+ prepare_join_partition_prune_candidate(root, &best_path->jpath);
/*
* HashJoin can project, so we don't have to demand exact tlists from the
@@ -4745,6 +4776,11 @@ create_hashjoin_plan(PlannerInfo *root,
outer_plan = create_plan_recurse(root, best_path->jpath.outerjoinpath,
(best_path->num_batches > 1) ? CP_SMALL_TLIST : 0);
+ /*
+ * Retrieve all the JoinPartitionPruneInfos for this join.
+ */
+ joinpartprune_info_list = get_join_partition_prune_candidate(root);
+
inner_plan = create_plan_recurse(root, best_path->jpath.innerjoinpath,
CP_SMALL_TLIST);
@@ -4850,7 +4886,8 @@ create_hashjoin_plan(PlannerInfo *root,
inner_hashkeys,
skewTable,
skewColumn,
- skewInherit);
+ skewInherit,
+ joinpartprune_info_list);
/*
* Set Hash node's startup & total costs equal to total cost of input
@@ -5977,7 +6014,8 @@ make_hash(Plan *lefttree,
List *hashkeys,
Oid skewTable,
AttrNumber skewColumn,
- bool skewInherit)
+ bool skewInherit,
+ List *joinpartprune_info_list)
{
Hash *node = makeNode(Hash);
Plan *plan = &node->plan;
@@ -5991,6 +6029,7 @@ make_hash(Plan *lefttree,
node->skewTable = skewTable;
node->skewColumn = skewColumn;
node->skewInherit = skewInherit;
+ node->joinpartprune_info_list = joinpartprune_info_list;
return node;
}
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index fc3709510d..c416e7ccda 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -156,6 +156,11 @@ static Plan *set_mergeappend_references(PlannerInfo *root,
MergeAppend *mplan,
int rtoffset);
static void set_hash_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_joinpartitionprune_references(PlannerInfo *root,
+ List *joinpartprune_info_list,
+ indexed_tlist *outer_itlist,
+ int rtoffset,
+ double num_exec);
static Relids offset_relid_set(Relids relids, int rtoffset);
static Node *fix_scan_expr(PlannerInfo *root, Node *node,
int rtoffset, double num_exec);
@@ -1897,6 +1902,62 @@ set_hash_references(PlannerInfo *root, Plan *plan, int rtoffset)
/* Hash nodes don't have their own quals */
Assert(plan->qual == NIL);
+
+ set_joinpartitionprune_references(root,
+ hplan->joinpartprune_info_list,
+ outer_itlist,
+ rtoffset,
+ NUM_EXEC_TLIST(plan));
+}
+
+/*
+ * set_joinpartitionprune_references
+ * Do set_plan_references processing on JoinPartitionPruneInfos
+ */
+static void
+set_joinpartitionprune_references(PlannerInfo *root,
+ List *joinpartprune_info_list,
+ indexed_tlist *outer_itlist,
+ int rtoffset,
+ double num_exec)
+{
+ ListCell *l;
+
+ foreach(l, joinpartprune_info_list)
+ {
+ JoinPartitionPruneInfo *jpinfo = (JoinPartitionPruneInfo *) lfirst(l);
+ ListCell *l1;
+
+ foreach(l1, jpinfo->part_prune_info->prune_infos)
+ {
+ List *prune_infos = lfirst(l1);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ pinfo->rtindex += rtoffset;
+
+ pinfo->initial_pruning_steps = (List *)
+ fix_upper_expr(root,
+ (Node *) pinfo->initial_pruning_steps,
+ outer_itlist,
+ OUTER_VAR,
+ rtoffset,
+ NRM_EQUAL,
+ num_exec);
+ pinfo->exec_pruning_steps = (List *)
+ fix_upper_expr(root,
+ (Node *) pinfo->exec_pruning_steps,
+ outer_itlist,
+ OUTER_VAR,
+ rtoffset,
+ NRM_EQUAL,
+ num_exec);
+ }
+ }
+ }
}
/*
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 3f31ecc956..d978093831 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -48,7 +48,9 @@
#include "optimizer/appendinfo.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
+#include "optimizer/paramassign.h"
#include "optimizer/pathnode.h"
+#include "optimizer/restrictinfo.h"
#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partprune.h"
@@ -103,15 +105,16 @@ typedef enum PartClauseTarget
*
* gen_partprune_steps() initializes and returns an instance of this struct.
*
- * Note that has_mutable_op, has_mutable_arg, and has_exec_param are set if
- * we found any potentially-useful-for-pruning clause having those properties,
- * whether or not we actually used the clause in the steps list. This
- * definition allows us to skip the PARTTARGET_EXEC pass in some cases.
+ * Note that has_mutable_op, has_mutable_arg, has_exec_param and has_vars are
+ * set if we found any potentially-useful-for-pruning clause having those
+ * properties, whether or not we actually used the clause in the steps list.
+ * This definition allows us to skip the PARTTARGET_EXEC pass in some cases.
*/
typedef struct GeneratePruningStepsContext
{
/* Copies of input arguments for gen_partprune_steps: */
RelOptInfo *rel; /* the partitioned relation */
+ Bitmapset *available_rels; /* rels whose Vars may be used for pruning */
PartClauseTarget target; /* use-case we're generating steps for */
/* Result data: */
List *steps; /* list of PartitionPruneSteps */
@@ -119,6 +122,7 @@ typedef struct GeneratePruningStepsContext
bool has_mutable_arg; /* clauses include any mutable comparison
* values, *other than* exec params */
bool has_exec_param; /* clauses include any PARAM_EXEC params */
+ bool has_vars; /* clauses include any Vars from 'available_rels' */
bool contradictory; /* clauses were proven self-contradictory */
/* Working state: */
int next_step_id;
@@ -144,8 +148,10 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
+ Bitmapset *available_rels,
Bitmapset **matchedsubplans);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
+ Bitmapset *available_rels,
PartClauseTarget target,
GeneratePruningStepsContext *context);
static List *gen_partprune_steps_internal(GeneratePruningStepsContext *context,
@@ -204,6 +210,10 @@ static PartClauseMatchStatus match_boolean_partition_clause(Oid partopfamily,
static void partkey_datum_from_expr(PartitionPruneContext *context,
Expr *expr, int stateidx,
Datum *value, bool *isnull);
+static bool contain_forbidden_var_clause(Node *node,
+ GeneratePruningStepsContext *context);
+static bool contain_forbidden_var_clause_walker(Node *node,
+ GeneratePruningStepsContext *context);
/*
@@ -216,11 +226,14 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'available_rels' is the relid set of rels whose Vars may be used for
+ * pruning.
*/
PartitionPruneInfo *
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ Bitmapset *available_rels)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
@@ -313,6 +326,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
prunequal,
partrelids,
relid_subplan_map,
+ available_rels,
&matchedsubplans);
/* When pruning is possible, record the matched subplans */
@@ -360,6 +374,184 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return pruneinfo;
}
+/*
+ * make_join_partition_pruneinfos
+ * Builds one JoinPartitionPruneInfo for each join at which join partition
+ * pruning is possible for this appendrel.
+ *
+ * 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
+ * of scan paths for its child rels.
+ */
+Bitmapset *
+make_join_partition_pruneinfos(PlannerInfo *root, RelOptInfo *parentrel,
+ Path *best_path, List *subpaths)
+{
+ Bitmapset *result = NULL;
+ ListCell *lc;
+
+ if (!IS_PARTITIONED_REL(parentrel))
+ return NULL;
+
+ foreach(lc, root->join_partition_prune_candidates)
+ {
+ JoinPartitionPruneCandidateInfo *candidate =
+ (JoinPartitionPruneCandidateInfo *) lfirst(lc);
+ PartitionPruneInfo *part_prune_info;
+ List *prunequal;
+ Relids joinrelids;
+ ListCell *l;
+ double prune_cost;
+
+ if (candidate == NULL)
+ continue;
+
+ /*
+ * Identify all joinclauses that are movable to this appendrel given
+ * this inner side relids. Only those clauses can be used for join
+ * partition pruning.
+ */
+ joinrelids = bms_union(parentrel->relids, candidate->inner_relids);
+ prunequal = NIL;
+ foreach(l, candidate->joinrestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ if (join_clause_is_movable_into(rinfo,
+ parentrel->relids,
+ joinrelids))
+ prunequal = lappend(prunequal, rinfo);
+ }
+
+ if (prunequal == NIL)
+ continue;
+
+ /*
+ * Check the overhead of this pruning
+ */
+ prune_cost = compute_partprune_cost(root,
+ parentrel,
+ best_path->total_cost,
+ list_length(subpaths),
+ candidate->inner_relids,
+ candidate->inner_rows,
+ prunequal);
+ if (prune_cost > 0)
+ continue;
+
+ part_prune_info = make_partition_pruneinfo(root, parentrel,
+ subpaths,
+ prunequal,
+ candidate->inner_relids);
+
+ if (part_prune_info)
+ {
+ JoinPartitionPruneInfo *jpinfo;
+
+ jpinfo = makeNode(JoinPartitionPruneInfo);
+
+ jpinfo->part_prune_info = part_prune_info;
+ jpinfo->paramid = assign_special_exec_param(root);
+ jpinfo->nplans = list_length(subpaths);
+
+ candidate->joinpartprune_info_list =
+ lappend(candidate->joinpartprune_info_list, jpinfo);
+
+ result = bms_add_member(result, jpinfo->paramid);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * prepare_join_partition_prune_candidate
+ * Check if join partition pruning is possible at this join and if so
+ * collect information required to build JoinPartitionPruneInfos.
+ *
+ * Note that we may build more than one JoinPartitionPruneInfo at one join, for
+ * different Append/MergeAppend paths.
+ */
+void
+prepare_join_partition_prune_candidate(PlannerInfo *root, JoinPath *jpath)
+{
+ JoinPartitionPruneCandidateInfo *candidate;
+
+ if (!enable_partition_pruning)
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ /*
+ * For now do not perform join partition pruning for parallel hashjoin.
+ */
+ if (jpath->path.parallel_workers > 0)
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ /*
+ * We cannot perform join partition pruning if the outer is the
+ * non-nullable side.
+ */
+ if (!(jpath->jointype == JOIN_INNER ||
+ jpath->jointype == JOIN_SEMI ||
+ jpath->jointype == JOIN_RIGHT ||
+ jpath->jointype == JOIN_RIGHT_ANTI))
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ /*
+ * For now we only support HashJoin.
+ */
+ if (jpath->path.pathtype != T_HashJoin)
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ candidate = makeNode(JoinPartitionPruneCandidateInfo);
+ candidate->joinrestrictinfo = jpath->joinrestrictinfo;
+ candidate->inner_relids = jpath->innerjoinpath->parent->relids;
+ candidate->inner_rows = jpath->innerjoinpath->parent->rows;
+ candidate->joinpartprune_info_list = NIL;
+
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, candidate);
+}
+
+/*
+ * get_join_partition_prune_candidate
+ * Pop out the JoinPartitionPruneCandidateInfo for this join and retrieve
+ * the JoinPartitionPruneInfos.
+ */
+List *
+get_join_partition_prune_candidate(PlannerInfo *root)
+{
+ JoinPartitionPruneCandidateInfo *candidate;
+ List *result;
+
+ candidate = llast(root->join_partition_prune_candidates);
+ root->join_partition_prune_candidates =
+ list_delete_last(root->join_partition_prune_candidates);
+
+ if (candidate == NULL)
+ return NIL;
+
+ result = candidate->joinpartprune_info_list;
+
+ pfree(candidate);
+
+ return result;
+}
+
/*
* add_part_relids
* Add new info to a list of Bitmapsets of partitioned relids.
@@ -428,6 +620,8 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* partrelids: Set of RT indexes identifying relevant partitioned tables
* within a single partitioning hierarchy
* relid_subplan_map[]: maps child relation relids to subplan indexes
+ * available_rels: the relid set of rels whose Vars may be used for
+ * pruning.
* matchedsubplans: on success, receives the set of subplan indexes which
* were matched to this partition hierarchy
*
@@ -440,6 +634,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
+ Bitmapset *available_rels,
Bitmapset **matchedsubplans)
{
RelOptInfo *targetpart = NULL;
@@ -539,8 +734,8 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
*/
- gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
- &context);
+ gen_partprune_steps(subpart, partprunequal, available_rels,
+ PARTTARGET_INITIAL, &context);
if (context.contradictory)
{
@@ -567,14 +762,15 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
initial_pruning_steps = NIL;
/*
- * If no exec Params appear in potentially-usable pruning clauses,
- * then there's no point in even thinking about per-scan pruning.
+ * If no exec Params or available Vars appear in potentially-usable
+ * pruning clauses, then there's no point in even thinking about
+ * per-scan pruning.
*/
- if (context.has_exec_param)
+ if (context.has_exec_param || context.has_vars)
{
/* ... OK, we'd better think about it */
- gen_partprune_steps(subpart, partprunequal, PARTTARGET_EXEC,
- &context);
+ gen_partprune_steps(subpart, partprunequal, available_rels,
+ PARTTARGET_EXEC, &context);
if (context.contradictory)
{
@@ -587,11 +783,14 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/*
* Detect which exec Params actually got used; the fact that some
* were in available clauses doesn't mean we actually used them.
- * Skip per-scan pruning if there are none.
*/
execparamids = get_partkey_exec_paramids(exec_pruning_steps);
- if (bms_is_empty(execparamids))
+ /*
+ * Skip per-scan pruning if there are none used exec Params and
+ * there are none available Vars.
+ */
+ if (bms_is_empty(execparamids) && !context.has_vars)
exec_pruning_steps = NIL;
}
else
@@ -703,6 +902,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* Process 'clauses' (typically a rel's baserestrictinfo list of clauses)
* and create a list of "partition pruning steps".
*
+ * 'available_rels' is the relid set of rels whose Vars may be used for
+ * pruning.
+ *
* 'target' tells whether to generate pruning steps for planning (use
* immutable clauses only), or for executor startup (use any allowable
* clause except ones containing PARAM_EXEC Params), or for executor
@@ -712,12 +914,13 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* some subsidiary flags; see the GeneratePruningStepsContext typedef.
*/
static void
-gen_partprune_steps(RelOptInfo *rel, List *clauses, PartClauseTarget target,
- GeneratePruningStepsContext *context)
+gen_partprune_steps(RelOptInfo *rel, List *clauses, Bitmapset *available_rels,
+ PartClauseTarget target, GeneratePruningStepsContext *context)
{
/* Initialize all output values to zero/false/NULL */
memset(context, 0, sizeof(GeneratePruningStepsContext));
context->rel = rel;
+ context->available_rels = available_rels;
context->target = target;
/*
@@ -773,7 +976,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
* If the clauses are found to be contradictory, we can return the empty
* set.
*/
- gen_partprune_steps(rel, clauses, PARTTARGET_PLANNER,
+ gen_partprune_steps(rel, clauses, NULL, PARTTARGET_PLANNER,
&gcontext);
if (gcontext.contradictory)
return NULL;
@@ -1957,9 +2160,10 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
return PARTCLAUSE_UNSUPPORTED;
/*
- * We can never prune using an expression that contains Vars.
+ * We can never prune using an expression that contains Vars except
+ * for Vars belonging to context->available_rels.
*/
- if (contain_var_clause((Node *) expr))
+ if (contain_forbidden_var_clause((Node *) expr, context))
return PARTCLAUSE_UNSUPPORTED;
/*
@@ -2155,9 +2359,10 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
return PARTCLAUSE_UNSUPPORTED;
/*
- * We can never prune using an expression that contains Vars.
+ * We can never prune using an expression that contains Vars except
+ * for Vars belonging to context->available_rels.
*/
- if (contain_var_clause((Node *) rightop))
+ if (contain_forbidden_var_clause((Node *) rightop, context))
return PARTCLAUSE_UNSUPPORTED;
/*
@@ -3727,3 +3932,54 @@ partkey_datum_from_expr(PartitionPruneContext *context,
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
+
+/*
+ * contain_forbidden_var_clause
+ * Recursively scan a clause to discover whether it contains any Var nodes
+ * (of the current query level) that do not belong to relations in
+ * context->available_rels.
+ *
+ * Returns true if any such varnode found.
+ *
+ * Does not examine subqueries, therefore must only be used after reduction
+ * of sublinks to subplans!
+ */
+static bool
+contain_forbidden_var_clause(Node *node, GeneratePruningStepsContext *context)
+{
+ return contain_forbidden_var_clause_walker(node, context);
+}
+
+static bool
+contain_forbidden_var_clause_walker(Node *node, GeneratePruningStepsContext *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varlevelsup != 0)
+ return false;
+
+ if (!bms_is_member(var->varno, context->available_rels))
+ return true; /* abort the tree traversal and return true */
+
+ context->has_vars = true;
+
+ if (context->target != PARTTARGET_EXEC)
+ return true; /* abort the tree traversal and return true */
+
+ return false;
+ }
+ if (IsA(node, CurrentOfExpr))
+ return true;
+ if (IsA(node, PlaceHolderVar))
+ {
+ if (((PlaceHolderVar *) node)->phlevelsup == 0)
+ return true; /* abort the tree traversal and return true */
+ /* else fall through to check the contained expr */
+ }
+ return expression_tree_walker(node, contain_forbidden_var_clause_walker,
+ (void *) context);
+}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 15ec869ac8..720bcc1149 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -121,11 +121,26 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+/*
+ * JoinPartitionPruneState - State object required for plan nodes to perform
+ * join partition pruning.
+ */
+typedef struct JoinPartitionPruneState
+{
+ PartitionPruneState *part_prune_state;
+ int paramid;
+ int nplans;
+ bool finished;
+ Bitmapset *part_prune_result;
+} JoinPartitionPruneState;
+
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
PartitionPruneInfo *pruneinfo,
Bitmapset **initially_valid_subplans);
+extern List *ExecInitJoinpartpruneList(PlanState *planstate, List *joinpartprune_info_list);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ PlanState *planstate);
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5d7f17dee0..0aeafcabff 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2678,6 +2678,9 @@ typedef struct HashState
/* Parallel hash state. */
struct ParallelHashJoinState *parallel_state;
+
+ /* Infos for join partition pruning. */
+ List *joinpartprune_state_list;
} HashState;
/* ----------------
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index ed85dc7414..d066b6105c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -530,6 +530,9 @@ struct PlannerInfo
/* not-yet-assigned NestLoopParams */
List *curOuterParams;
+ /* a stack of JoinPartitionPruneInfos */
+ List *join_partition_prune_candidates;
+
/*
* These fields are workspace for setrefs.c. Each is an array
* corresponding to glob->subplans. (We could probably teach
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 24d46c76dc..00058a735e 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -276,6 +276,9 @@ typedef struct Append
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /* Info for join partition pruning; NULL if we're not doing that */
+ Bitmapset *join_prune_paramids;
} Append;
/* ----------------
@@ -311,6 +314,9 @@ typedef struct MergeAppend
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /* Info for join partition pruning; NULL if we're not doing that */
+ Bitmapset *join_prune_paramids;
} MergeAppend;
/* ----------------
@@ -1207,6 +1213,7 @@ typedef struct Hash
bool skewInherit; /* is outer join rel an inheritance tree? */
/* all other info is in the parent HashJoin node */
Cardinality rows_total; /* estimate total rows if parallel_aware */
+ List *joinpartprune_info_list; /* infos for join partition pruning */
} Hash;
/* ----------------
@@ -1553,6 +1560,35 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*
+ * JoinPartitionPruneCandidateInfo - Information required to build
+ * JoinPartitionPruneInfos.
+ */
+typedef struct JoinPartitionPruneCandidateInfo
+{
+ pg_node_attr(no_equal, no_query_jumble)
+
+ NodeTag type;
+ List *joinrestrictinfo;
+ Bitmapset *inner_relids;
+ double inner_rows;
+ List *joinpartprune_info_list;
+} JoinPartitionPruneCandidateInfo;
+
+/*
+ * JoinPartitionPruneInfo - Details required to allow the executor to prune
+ * partitions during join.
+ */
+typedef struct JoinPartitionPruneInfo
+{
+ pg_node_attr(no_equal, no_query_jumble)
+
+ NodeTag type;
+ PartitionPruneInfo *part_prune_info;
+ int paramid;
+ int nplans;
+} JoinPartitionPruneInfo;
+
/*
* Plan invalidation info
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6d50afbf74..52de844f6d 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -211,5 +211,9 @@ extern void set_foreign_size_estimates(PlannerInfo *root, RelOptInfo *rel);
extern PathTarget *set_pathtarget_cost_width(PlannerInfo *root, PathTarget *target);
extern double compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel,
Path *bitmapqual, int loop_count, Cost *cost, double *tuple);
+extern double compute_partprune_cost(PlannerInfo *root, RelOptInfo *appendrel,
+ Cost append_total_cost, int append_nplans,
+ Relids inner_relids, double inner_rows,
+ List *prunequal);
#endif /* COST_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 8636e04e37..899aa61b34 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -19,6 +19,8 @@
struct PlannerInfo; /* avoid including pathnodes.h here */
struct RelOptInfo;
+struct Path;
+struct JoinPath;
/*
@@ -73,7 +75,15 @@ typedef struct PartitionPruneContext
extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ Bitmapset *available_rels);
+extern Bitmapset *make_join_partition_pruneinfos(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ struct Path *best_path,
+ List *subpaths);
+extern void prepare_join_partition_prune_candidate(struct PlannerInfo *root,
+ struct JoinPath *jpath);
+extern List *get_join_partition_prune_candidate(struct PlannerInfo *root);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 9a4c48c055..a08e7a1f0a 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -3003,6 +3003,92 @@ order by tbl1.col1, tprt.col1;
------+------
(0 rows)
+-- join partition pruning
+-- The 'Memory Usage' from the Hash node can vary between machines. Let's just
+-- replace the number with an 'N'.
+-- We need to run EXPLAIN ANALYZE because we need to see '(never executed)'
+-- notations because that's the only way to verify runtime pruning.
+create function explain_join_partition_pruning(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, verbose, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+ return next ln;
+ end loop;
+end;
+$$;
+delete from tbl1;
+insert into tbl1 values (501), (505);
+analyze tbl1, tprt;
+set enable_nestloop = off;
+set enable_mergejoin = off;
+set enable_hashjoin = on;
+select explain_join_partition_pruning('
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;');
+ explain_join_partition_pruning
+--------------------------------------------------------------------------------
+ Hash Right Join (actual rows=2 loops=1)
+ Output: p1.col1, p2.col1, t.col1
+ Hash Cond: ((p1.col1 = t.col1) AND (p2.col1 = t.col1))
+ -> Hash Join (actual rows=3 loops=1)
+ Output: p1.col1, p2.col1
+ Hash Cond: (p1.col1 = p2.col1)
+ -> Append (actual rows=3 loops=1)
+ Join Partition Pruning: $0
+ -> Seq Scan on public.tprt_1 p1_1 (never executed)
+ Output: p1_1.col1
+ -> Seq Scan on public.tprt_2 p1_2 (actual rows=3 loops=1)
+ Output: p1_2.col1
+ -> Seq Scan on public.tprt_3 p1_3 (never executed)
+ Output: p1_3.col1
+ -> Seq Scan on public.tprt_4 p1_4 (never executed)
+ Output: p1_4.col1
+ -> Seq Scan on public.tprt_5 p1_5 (never executed)
+ Output: p1_5.col1
+ -> Seq Scan on public.tprt_6 p1_6 (never executed)
+ Output: p1_6.col1
+ -> Hash (actual rows=3 loops=1)
+ Output: p2.col1
+ Buckets: 1024 Batches: 1 Memory Usage: NkB
+ -> Append (actual rows=3 loops=1)
+ Join Partition Pruning: $1
+ -> Seq Scan on public.tprt_1 p2_1 (never executed)
+ Output: p2_1.col1
+ -> Seq Scan on public.tprt_2 p2_2 (actual rows=3 loops=1)
+ Output: p2_2.col1
+ -> Seq Scan on public.tprt_3 p2_3 (never executed)
+ Output: p2_3.col1
+ -> Seq Scan on public.tprt_4 p2_4 (never executed)
+ Output: p2_4.col1
+ -> Seq Scan on public.tprt_5 p2_5 (never executed)
+ Output: p2_5.col1
+ -> Seq Scan on public.tprt_6 p2_6 (never executed)
+ Output: p2_6.col1
+ -> Hash (actual rows=2 loops=1)
+ Output: t.col1
+ Buckets: 1024 Batches: 1 Memory Usage: NkB
+ Partition Prune: $0, $1
+ -> Seq Scan on public.tbl1 t (actual rows=2 loops=1)
+ Output: t.col1
+(43 rows)
+
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;
+ col1 | col1 | col1
+------+------+------
+ 501 | 501 | 501
+ 505 | 505 | 505
+(2 rows)
+
drop table tbl1, tprt;
-- Test with columns defined in varying orders between each level
create table part_abc (a int not null, b int not null, c int not null) partition by list (a);
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 7bf3920827..fc5982edcf 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -727,6 +727,45 @@ select tbl1.col1, tprt.col1 from tbl1
inner join tprt on tbl1.col1 = tprt.col1
order by tbl1.col1, tprt.col1;
+-- join partition pruning
+
+-- The 'Memory Usage' from the Hash node can vary between machines. Let's just
+-- replace the number with an 'N'.
+-- We need to run EXPLAIN ANALYZE because we need to see '(never executed)'
+-- notations because that's the only way to verify runtime pruning.
+create function explain_join_partition_pruning(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, verbose, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+ return next ln;
+ end loop;
+end;
+$$;
+
+delete from tbl1;
+insert into tbl1 values (501), (505);
+analyze tbl1, tprt;
+
+set enable_nestloop = off;
+set enable_mergejoin = off;
+set enable_hashjoin = on;
+
+select explain_join_partition_pruning('
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;');
+
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;
+
drop table tbl1, tprt;
-- Test with columns defined in varying orders between each level
--
2.31.0
On Tue, 7 Nov 2023 at 13:25, Richard Guo <guofenglinux@gmail.com> wrote:
On Mon, Nov 6, 2023 at 11:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
Please look at a warning and an assertion failure triggered by the
following script:
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
set min_parallel_table_scan_size = '1kB';create table t1 (i int) partition by range (i);
create table t1_1 partition of t1 for values from (1) to (2);
create table t1_2 partition of t1 for values from (2) to (3);
insert into t1 values (1), (2);create table t2(i int);
insert into t2 values (1), (2);
analyze t1, t2;select * from t1 right join t2 on t1.i = t2.i;
2023-11-06 14:11:37.398 UTC|law|regression|6548f419.392cf5|WARNING: Join partition pruning $0 has not been performed yet.
TRAP: failed Assert("node->as_prune_state"), File: "nodeAppend.c", Line: 846, PID: 3747061Thanks for the report! I failed to take care of the parallel-hashjoin
case, and I have to admit that it's not clear to me yet how we should do
join partition pruning in that case.For now I think it's better to just avoid performing join partition
pruning for parallel hashjoin, so that the patch doesn't become too
complex for review. We can always extend it in the future.I have done that in v5. Thanks for testing!
CFBot shows that the patch does not apply anymore as in [1]http://cfbot.cputube.org/patch_46_4512.log:
=== Applying patches on top of PostgreSQL commit ID
924d046dcf55887c98a1628675a30f4b0eebe556 ===
=== applying patch
./v5-0001-Support-run-time-partition-pruning-for-hash-join.patch
...
patching file src/include/nodes/plannodes.h
...
patching file src/include/optimizer/cost.h
Hunk #1 FAILED at 211.
1 out of 1 hunk FAILED -- saving rejects to file
src/include/optimizer/cost.h.rej
Please post an updated version for the same.
[1]: http://cfbot.cputube.org/patch_46_4512.log
Regards,
Vignesh
On Sat, Jan 27, 2024 at 11:29 AM vignesh C <vignesh21@gmail.com> wrote:
CFBot shows that the patch does not apply anymore as in [1]:
Please post an updated version for the same.
Attached is an updated patch. Nothing else has changed.
Thanks
Richard
Attachments:
v6-0001-Support-run-time-partition-pruning-for-hash-join.patchapplication/octet-stream; name=v6-0001-Support-run-time-partition-pruning-for-hash-join.patchDownload
From 05d18d47135ba086fe416c02abdaa44b69612c03 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Mon, 14 Aug 2023 14:55:26 +0800
Subject: [PATCH v6] Support run-time partition pruning for hash join
If we have a hash join with an Append node on the outer side, something
like
Hash Join
Hash Cond: (pt.a = t.a)
-> Append
-> Seq Scan on pt_p1 pt_1
-> Seq Scan on pt_p2 pt_2
-> Seq Scan on pt_p3 pt_3
-> Hash
-> Seq Scan on t
We can actually prune those subnodes of the Append that cannot possibly
contain any matching tuples from the other side of the join. To do
that, when building the Hash table, for each row from the inner side we
can compute the minimum set of subnodes that can possibly match the join
condition. When we have built the Hash table and start to execute the
Append node, we should have known which subnodes are survived and thus
can skip other subnodes.
This patch implements this idea.
---
src/backend/commands/explain.c | 61 ++++
src/backend/executor/execPartition.c | 127 +++++++-
src/backend/executor/nodeAppend.c | 32 +-
src/backend/executor/nodeHash.c | 75 +++++
src/backend/executor/nodeHashjoin.c | 10 +
src/backend/executor/nodeMergeAppend.c | 22 +-
src/backend/optimizer/path/costsize.c | 106 +++++++
src/backend/optimizer/plan/createplan.c | 49 ++-
src/backend/optimizer/plan/setrefs.c | 61 ++++
src/backend/partitioning/partprune.c | 298 ++++++++++++++++--
src/include/executor/execPartition.h | 17 +-
src/include/nodes/execnodes.h | 3 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 36 +++
src/include/optimizer/cost.h | 4 +
src/include/partitioning/partprune.h | 12 +-
src/test/regress/expected/partition_prune.out | 86 +++++
src/test/regress/sql/partition_prune.sql | 39 +++
18 files changed, 992 insertions(+), 49 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 83d00a4663..04450b9c37 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -18,6 +18,7 @@
#include "commands/createas.h"
#include "commands/defrem.h"
#include "commands/prepare.h"
+#include "executor/execPartition.h"
#include "executor/nodeHash.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -118,6 +119,9 @@ static void show_instrumentation_count(const char *qlabel, int which,
PlanState *planstate, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
+static void show_join_pruning_result_info(Bitmapset *join_prune_paramids,
+ ExplainState *es);
+static void show_joinpartprune_info(HashState *hashstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
@@ -2104,9 +2108,17 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_incremental_sort_info(castNode(IncrementalSortState, planstate),
es);
break;
+ case T_Append:
+ if (es->verbose)
+ show_join_pruning_result_info(((Append *) plan)->join_prune_paramids,
+ es);
+ break;
case T_MergeAppend:
show_merge_append_keys(castNode(MergeAppendState, planstate),
ancestors, es);
+ if (es->verbose)
+ show_join_pruning_result_info(((MergeAppend *) plan)->join_prune_paramids,
+ es);
break;
case T_Result:
show_upper_qual((List *) ((Result *) plan)->resconstantqual,
@@ -2122,6 +2134,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
break;
case T_Hash:
show_hash_info(castNode(HashState, planstate), es);
+ if (es->verbose)
+ show_joinpartprune_info(castNode(HashState, planstate), es);
break;
case T_Memoize:
show_memoize_info(castNode(MemoizeState, planstate), ancestors,
@@ -3562,6 +3576,53 @@ show_eval_params(Bitmapset *bms_params, ExplainState *es)
ExplainPropertyList("Params Evaluated", params, es);
}
+/*
+ * Show join partition pruning results at Append/MergeAppend nodes.
+ */
+static void
+show_join_pruning_result_info(Bitmapset *join_prune_paramids, ExplainState *es)
+{
+ int paramid = -1;
+ List *params = NIL;
+
+ if (bms_is_empty(join_prune_paramids))
+ return;
+
+ while ((paramid = bms_next_member(join_prune_paramids, paramid)) >= 0)
+ {
+ char param[32];
+
+ snprintf(param, sizeof(param), "$%d", paramid);
+ params = lappend(params, pstrdup(param));
+ }
+
+ ExplainPropertyList("Join Partition Pruning", params, es);
+}
+
+/*
+ * Show join partition pruning infos at Hash nodes.
+ */
+static void
+show_joinpartprune_info(HashState *hashstate, ExplainState *es)
+{
+ List *params = NIL;
+ ListCell *lc;
+
+ if (!hashstate->joinpartprune_state_list)
+ return;
+
+ foreach(lc, hashstate->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ char param[32];
+
+ snprintf(param, sizeof(param), "$%d", jpstate->paramid);
+ params = lappend(params, pstrdup(param));
+ }
+
+ ExplainPropertyList("Partition Prune", params, es);
+}
+
/*
* Fetch the name of an index in an EXPLAIN
*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b22040ae8e..95c791633c 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -199,6 +199,8 @@ static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
Bitmapset **validsubplans);
+static bool get_join_prune_matching_subplans(PlanState *planstate,
+ Bitmapset **partset);
/*
@@ -1806,7 +1808,7 @@ ExecInitPartitionPruning(PlanState *planstate,
* Perform an initial partition prune pass, if required.
*/
if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true, NULL);
else
{
/* No pruning, so we'll need to initialize all subplans */
@@ -1836,6 +1838,37 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecInitJoinpartpruneList
+ * Initialize data structures needed for join partition pruning
+ */
+List *
+ExecInitJoinpartpruneList(PlanState *planstate,
+ List *joinpartprune_info_list)
+{
+ ListCell *lc;
+ List *result = NIL;
+
+ foreach(lc, joinpartprune_info_list)
+ {
+ JoinPartitionPruneInfo *jpinfo = (JoinPartitionPruneInfo *) lfirst(lc);
+ JoinPartitionPruneState *jpstate = palloc(sizeof(JoinPartitionPruneState));
+
+ jpstate->part_prune_state =
+ CreatePartitionPruneState(planstate, jpinfo->part_prune_info);
+ Assert(jpstate->part_prune_state->do_exec_prune);
+
+ jpstate->paramid = jpinfo->paramid;
+ jpstate->nplans = jpinfo->nplans;
+ jpstate->finished = false;
+ jpstate->part_prune_result = NULL;
+
+ result = lappend(result, jpstate);
+ }
+
+ return result;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
@@ -2273,7 +2306,9 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
/*
* ExecFindMatchingSubPlans
* Determine which subplans match the pruning steps detailed in
- * 'prunestate' for the current comparison expression values.
+ * 'prunestate' if any for the current comparison expression values, and
+ * meanwhile match the join partition pruning results if any stored in
+ * Append/MergeAppend node's join_prune_paramids.
*
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
@@ -2281,11 +2316,30 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ PlanState *planstate)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
int i;
+ Bitmapset *join_prune_partset = NULL;
+ bool do_join_prune;
+
+ /* Retrieve the join partition pruning results if any */
+ do_join_prune =
+ get_join_prune_matching_subplans(planstate, &join_prune_partset);
+
+ /*
+ * Either we're here on partition prune done according to the pruning steps
+ * detailed in 'prunestate', or we have done join partition prune.
+ */
+ Assert(do_join_prune || prunestate != NULL);
+
+ /*
+ * If there is no 'prunestate', then rely entirely on join pruning.
+ */
+ if (prunestate == NULL)
+ return join_prune_partset;
/*
* Either we're here on the initial prune done during pruning
@@ -2326,6 +2380,10 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Add in any subplans that partition pruning didn't account for */
result = bms_add_members(result, prunestate->other_subplans);
+ /* Intersect join partition pruning results */
+ if (do_join_prune)
+ result = bms_intersect(result, join_prune_partset);
+
MemoryContextSwitchTo(oldcontext);
/* Copy result out of the temp context before we reset it */
@@ -2396,3 +2454,66 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
}
}
}
+
+/*
+ * get_join_prune_matching_subplans
+ * Retrieve the join partition pruning results if any stored in
+ * Append/MergeAppend node's join_prune_paramids. Return true if we can
+ * do join partition pruning, otherwise return false.
+ *
+ * Adds valid (non-prunable) subplan IDs to *partset
+ */
+static bool
+get_join_prune_matching_subplans(PlanState *planstate, Bitmapset **partset)
+{
+ Bitmapset *join_prune_paramids;
+ int nplans;
+ int paramid;
+
+ if (planstate == NULL)
+ return false;
+
+ if (IsA(planstate, AppendState))
+ {
+ join_prune_paramids =
+ ((Append *) planstate->plan)->join_prune_paramids;
+ nplans = ((AppendState *) planstate)->as_nplans;
+ }
+ else if (IsA(planstate, MergeAppendState))
+ {
+ join_prune_paramids =
+ ((MergeAppend *) planstate->plan)->join_prune_paramids;
+ nplans = ((MergeAppendState *) planstate)->ms_nplans;
+ }
+ else
+ {
+ elog(ERROR, "unrecognized node type: %d", (int) nodeTag(planstate));
+ return false;
+ }
+
+ if (bms_is_empty(join_prune_paramids))
+ return false;
+
+ Assert(nplans > 0);
+ *partset = bms_add_range(NULL, 0, nplans - 1);
+
+ paramid = -1;
+ while ((paramid = bms_next_member(join_prune_paramids, paramid)) >= 0)
+ {
+ ParamExecData *param;
+ JoinPartitionPruneState *jpstate;
+
+ param = &(planstate->state->es_param_exec_vals[paramid]);
+ Assert(param->execPlan == NULL);
+ Assert(!param->isnull);
+ jpstate = (JoinPartitionPruneState *) DatumGetPointer(param->value);
+
+ if (jpstate != NULL)
+ *partset = bms_intersect(*partset, jpstate->part_prune_result);
+ else /* the Hash node for this pruning has not been executed */
+ elog(WARNING, "Join partition pruning $%d has not been performed yet.",
+ paramid);
+ }
+
+ return true;
+}
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c7059e7528..b68a1e2eb2 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -151,11 +151,13 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
nplans = bms_num_members(validsubplans);
/*
- * When no run-time pruning is required and there's at least one
- * subplan, we can fill as_valid_subplans immediately, preventing
- * later calls to ExecFindMatchingSubPlans.
+ * When no run-time pruning or join pruning is required and there's at
+ * least one subplan, we can fill as_valid_subplans immediately,
+ * preventing later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (!prunestate->do_exec_prune &&
+ bms_is_empty(node->join_prune_paramids) &&
+ nplans > 0)
{
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
appendstate->as_valid_subplans_identified = true;
@@ -170,10 +172,18 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplans as valid; they must also all be initialized.
*/
Assert(nplans > 0);
- appendstate->as_valid_subplans = validsubplans =
- bms_add_range(NULL, 0, nplans - 1);
- appendstate->as_valid_subplans_identified = true;
+ validsubplans = bms_add_range(NULL, 0, nplans - 1);
appendstate->as_prune_state = NULL;
+
+ /*
+ * When join pruning is not enabled we can fill as_valid_subplans
+ * immediately, preventing later calls to ExecFindMatchingSubPlans.
+ */
+ if (bms_is_empty(node->join_prune_paramids))
+ {
+ appendstate->as_valid_subplans = validsubplans;
+ appendstate->as_valid_subplans_identified = true;
+ }
}
/*
@@ -580,7 +590,7 @@ choose_next_subplan_locally(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
}
@@ -647,7 +657,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
/*
@@ -723,7 +733,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
mark_invalid_subplans_as_finished(node);
@@ -876,7 +886,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
classify_matching_subplans(node);
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index f90e16ede8..00138a61a2 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -31,6 +31,7 @@
#include "catalog/pg_statistic.h"
#include "commands/tablespace.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/hashjoin.h"
#include "executor/nodeHash.h"
#include "executor/nodeHashjoin.h"
@@ -48,6 +49,8 @@ static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable);
static void ExecParallelHashIncreaseNumBatches(HashJoinTable hashtable);
static void ExecParallelHashIncreaseNumBuckets(HashJoinTable hashtable);
+static void ExecJoinPartitionPrune(HashState *node);
+static void ExecStoreJoinPartitionPruneResult(HashState *node);
static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node,
int mcvsToUse);
static void ExecHashSkewTableInsert(HashJoinTable hashtable,
@@ -189,8 +192,14 @@ MultiExecPrivateHash(HashState *node)
}
hashtable->totalTuples += 1;
}
+
+ /* Perform join partition pruning */
+ ExecJoinPartitionPrune(node);
}
+ /* Store the surviving partitions for Append/MergeAppend nodes */
+ ExecStoreJoinPartitionPruneResult(node);
+
/* resize the hash table if needed (NTUP_PER_BUCKET exceeded) */
if (hashtable->nbuckets != hashtable->nbuckets_optimal)
ExecHashIncreaseNumBuckets(hashtable);
@@ -401,6 +410,12 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
hashstate->hashkeys =
ExecInitExprList(node->hashkeys, (PlanState *) hashstate);
+ /*
+ * initialize join partition pruning infos
+ */
+ hashstate->joinpartprune_state_list =
+ ExecInitJoinpartpruneList(&hashstate->ps, node->joinpartprune_info_list);
+
return hashstate;
}
@@ -1610,6 +1625,56 @@ ExecParallelHashIncreaseNumBuckets(HashJoinTable hashtable)
}
}
+/*
+ * ExecJoinPartitionPrune
+ * Perform join partition pruning at this join for each
+ * JoinPartitionPruneState.
+ */
+static void
+ExecJoinPartitionPrune(HashState *node)
+{
+ ListCell *lc;
+
+ foreach(lc, node->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ Bitmapset *matching_subPlans;
+
+ if (jpstate->finished)
+ continue;
+
+ matching_subPlans =
+ ExecFindMatchingSubPlans(jpstate->part_prune_state, false, NULL);
+ jpstate->part_prune_result =
+ bms_add_members(jpstate->part_prune_result, matching_subPlans);
+
+ if (bms_num_members(jpstate->part_prune_result) == jpstate->nplans)
+ jpstate->finished = true;
+ }
+}
+
+/*
+ * ExecStoreJoinPartitionPruneResult
+ * For each JoinPartitionPruneState, store the set of surviving partitions
+ * to make it available for the Append/MergeAppend node.
+ */
+static void
+ExecStoreJoinPartitionPruneResult(HashState *node)
+{
+ ListCell *lc;
+
+ foreach(lc, node->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ ParamExecData *param;
+
+ param = &(node->ps.state->es_param_exec_vals[jpstate->paramid]);
+ Assert(param->execPlan == NULL);
+ Assert(!param->isnull);
+ param->value = PointerGetDatum(jpstate);
+ }
+}
+
/*
* ExecHashTableInsert
* insert a tuple into the hash table depending on the hash value
@@ -2354,6 +2419,16 @@ void
ExecReScanHash(HashState *node)
{
PlanState *outerPlan = outerPlanState(node);
+ ListCell *lc;
+
+ /* reset the state in JoinPartitionPruneStates */
+ foreach(lc, node->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+
+ jpstate->finished = false;
+ jpstate->part_prune_result = NULL;
+ }
/*
* if chgParam of subnode is not null then plan will be re-scanned by
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 1cbec4647c..4f84799929 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -311,6 +311,16 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
*/
node->hj_FirstOuterTupleSlot = NULL;
}
+ else if (hashNode->joinpartprune_state_list != NIL)
+ {
+ /*
+ * Give the hash node a chance to run join partition
+ * pruning if there is any JoinPartitionPruneState that can
+ * be evaluated at it. So do not apply the empty-outer
+ * optimization in this case.
+ */
+ node->hj_FirstOuterTupleSlot = NULL;
+ }
else if (HJ_FILL_OUTER(node) ||
(outerNode->plan->startup_cost < hashNode->ps.plan->total_cost &&
!node->hj_OuterNotEmpty))
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 0817868452..ec0ac29654 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -99,11 +99,13 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
nplans = bms_num_members(validsubplans);
/*
- * When no run-time pruning is required and there's at least one
- * subplan, we can fill ms_valid_subplans immediately, preventing
- * later calls to ExecFindMatchingSubPlans.
+ * When no run-time pruning or join pruning is required and there's at
+ * least one subplan, we can fill ms_valid_subplans immediately,
+ * preventing later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (!prunestate->do_exec_prune &&
+ bms_is_empty(node->join_prune_paramids) &&
+ nplans > 0)
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -115,9 +117,15 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplans as valid; they must also all be initialized.
*/
Assert(nplans > 0);
- mergestate->ms_valid_subplans = validsubplans =
- bms_add_range(NULL, 0, nplans - 1);
+ validsubplans = bms_add_range(NULL, 0, nplans - 1);
mergestate->ms_prune_state = NULL;
+
+ /*
+ * When join pruning is not enabled we can fill ms_valid_subplans
+ * immediately, preventing later calls to ExecFindMatchingSubPlans.
+ */
+ if (bms_is_empty(node->join_prune_paramids))
+ mergestate->ms_valid_subplans = validsubplans;
}
mergeplanstates = (PlanState **) palloc(nplans * sizeof(PlanState *));
@@ -218,7 +226,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, &node->ps);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 8b76e98529..ea3d5e57c1 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -173,6 +173,10 @@ static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
static bool has_indexed_join_quals(NestPath *path);
static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
List *quals);
+static double get_joinrel_matching_outer_size(PlannerInfo *root,
+ RelOptInfo *outer_rel,
+ Relids inner_relids,
+ List *restrictlist);
static double calc_joinrel_size_estimate(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outer_rel,
@@ -5409,6 +5413,61 @@ get_parameterized_joinrel_size(PlannerInfo *root, RelOptInfo *rel,
return nrows;
}
+/*
+ * get_joinrel_matching_outer_size
+ * Make a size estimate for the outer side that matches the inner side.
+ */
+static double
+get_joinrel_matching_outer_size(PlannerInfo *root,
+ RelOptInfo *outer_rel,
+ Relids inner_relids,
+ List *restrictlist)
+{
+ double nrows;
+ Selectivity fkselec;
+ Selectivity jselec;
+ SpecialJoinInfo *sjinfo;
+ SpecialJoinInfo sjinfo_data;
+
+ sjinfo = &sjinfo_data;
+ sjinfo->type = T_SpecialJoinInfo;
+ sjinfo->min_lefthand = outer_rel->relids;
+ sjinfo->min_righthand = inner_relids;
+ sjinfo->syn_lefthand = outer_rel->relids;
+ sjinfo->syn_righthand = inner_relids;
+ sjinfo->jointype = JOIN_SEMI;
+ sjinfo->ojrelid = 0;
+ sjinfo->commute_above_l = NULL;
+ sjinfo->commute_above_r = NULL;
+ sjinfo->commute_below_l = NULL;
+ sjinfo->commute_below_r = NULL;
+ /* we don't bother trying to make the remaining fields valid */
+ sjinfo->lhs_strict = false;
+ sjinfo->semi_can_btree = false;
+ sjinfo->semi_can_hash = false;
+ sjinfo->semi_operators = NIL;
+ sjinfo->semi_rhs_exprs = NIL;
+
+ fkselec = get_foreign_key_join_selectivity(root,
+ outer_rel->relids,
+ inner_relids,
+ sjinfo,
+ &restrictlist);
+ jselec = clauselist_selectivity(root,
+ restrictlist,
+ 0,
+ sjinfo->jointype,
+ sjinfo);
+
+ nrows = outer_rel->rows * fkselec * jselec;
+ nrows = clamp_row_est(nrows);
+
+ /* For safety, make sure result is not more than the base estimate */
+ if (nrows > outer_rel->rows)
+ nrows = outer_rel->rows;
+ return nrows;
+}
+
/*
* calc_joinrel_size_estimate
* Workhorse for set_joinrel_size_estimates and
@@ -6530,3 +6589,50 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel,
return pages_fetched;
}
+
+/*
+ * compute_partprune_cost
+ * Compute the overhead of join partition pruning.
+ */
+double
+compute_partprune_cost(PlannerInfo *root, RelOptInfo *appendrel,
+ Cost append_total_cost, int append_nplans,
+ Relids inner_relids, double inner_rows,
+ List *prunequal)
+{
+ Cost prune_cost;
+ Cost saved_cost;
+ double matching_outer_rows;
+ double unmatched_nplans;
+
+ switch (appendrel->part_scheme->strategy)
+ {
+
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ prune_cost = cpu_operator_cost * LOG2(append_nplans) * inner_rows;
+ break;
+ case PARTITION_STRATEGY_HASH:
+ prune_cost = cpu_operator_cost * append_nplans * inner_rows;
+ break;
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) appendrel->part_scheme->strategy);
+ break;
+ }
+
+ matching_outer_rows =
+ get_joinrel_matching_outer_size(root,
+ appendrel,
+ inner_relids,
+ prunequal);
+
+ /*
+ * We assume that each outer joined row occupies one new partition. This
+ * is really the worst case.
+ */
+ unmatched_nplans = append_nplans - Min(matching_outer_rows, append_nplans);
+ saved_cost = (unmatched_nplans / append_nplans) * append_total_cost;
+
+ return prune_cost - saved_cost;
+}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 610f4a56d6..55cd3bb616 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -242,7 +242,8 @@ static Hash *make_hash(Plan *lefttree,
List *hashkeys,
Oid skewTable,
AttrNumber skewColumn,
- bool skewInherit);
+ bool skewInherit,
+ List *joinpartprune_info_list);
static MergeJoin *make_mergejoin(List *tlist,
List *joinclauses, List *otherclauses,
List *mergeclauses,
@@ -342,6 +343,7 @@ create_plan(PlannerInfo *root, Path *best_path)
/* Initialize this module's workspace in PlannerInfo */
root->curOuterRels = NULL;
root->curOuterParams = NIL;
+ root->join_partition_prune_candidates = NIL;
/* Recursively process the path tree, demanding the correct tlist result */
plan = create_plan_recurse(root, best_path, CP_EXACT_TLIST);
@@ -369,6 +371,8 @@ create_plan(PlannerInfo *root, Path *best_path)
if (root->curOuterParams != NIL)
elog(ERROR, "failed to assign all NestLoopParams to plan nodes");
+ Assert(root->join_partition_prune_candidates == NIL);
+
/*
* Reset plan_params to ensure param IDs used for nestloop params are not
* re-used later
@@ -1223,6 +1227,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ Bitmapset *join_prune_paramids = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1377,6 +1382,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
* do partition pruning.
+ *
+ * Also gather information needed by the executor to do join pruning.
*/
if (enable_partition_pruning)
{
@@ -1399,13 +1406,20 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
partpruneinfo =
make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ NULL);
+
+ join_prune_paramids =
+ make_join_partition_pruneinfos(root, rel,
+ (Path *) best_path,
+ best_path->subpaths);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
plan->part_prune_info = partpruneinfo;
+ plan->join_prune_paramids = join_prune_paramids;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1445,6 +1459,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ Bitmapset *join_prune_paramids = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1541,6 +1556,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
* do partition pruning.
+ *
+ * Also gather information needed by the executor to do join pruning.
*/
if (enable_partition_pruning)
{
@@ -1554,11 +1571,18 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
partpruneinfo = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ NULL);
+
+ join_prune_paramids =
+ make_join_partition_pruneinfos(root, rel,
+ (Path *) best_path,
+ best_path->subpaths);
}
node->mergeplans = subplans;
node->part_prune_info = partpruneinfo;
+ node->join_prune_paramids = join_prune_paramids;
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
@@ -4743,6 +4767,13 @@ create_hashjoin_plan(PlannerInfo *root,
AttrNumber skewColumn = InvalidAttrNumber;
bool skewInherit = false;
ListCell *lc;
+ List *joinpartprune_info_list;
+
+ /*
+ * Collect information required to build JoinPartitionPruneInfos at this
+ * join.
+ */
+ prepare_join_partition_prune_candidate(root, &best_path->jpath);
/*
* HashJoin can project, so we don't have to demand exact tlists from the
@@ -4754,6 +4785,11 @@ create_hashjoin_plan(PlannerInfo *root,
outer_plan = create_plan_recurse(root, best_path->jpath.outerjoinpath,
(best_path->num_batches > 1) ? CP_SMALL_TLIST : 0);
+ /*
+ * Retrieve all the JoinPartitionPruneInfos for this join.
+ */
+ joinpartprune_info_list = get_join_partition_prune_candidate(root);
+
inner_plan = create_plan_recurse(root, best_path->jpath.innerjoinpath,
CP_SMALL_TLIST);
@@ -4859,7 +4895,8 @@ create_hashjoin_plan(PlannerInfo *root,
inner_hashkeys,
skewTable,
skewColumn,
- skewInherit);
+ skewInherit,
+ joinpartprune_info_list);
/*
* Set Hash node's startup & total costs equal to total cost of input
@@ -5986,7 +6023,8 @@ make_hash(Plan *lefttree,
List *hashkeys,
Oid skewTable,
AttrNumber skewColumn,
- bool skewInherit)
+ bool skewInherit,
+ List *joinpartprune_info_list)
{
Hash *node = makeNode(Hash);
Plan *plan = &node->plan;
@@ -6000,6 +6038,7 @@ make_hash(Plan *lefttree,
node->skewTable = skewTable;
node->skewColumn = skewColumn;
node->skewInherit = skewInherit;
+ node->joinpartprune_info_list = joinpartprune_info_list;
return node;
}
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 22a1fa29f3..c83d3bc0ad 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -156,6 +156,11 @@ static Plan *set_mergeappend_references(PlannerInfo *root,
MergeAppend *mplan,
int rtoffset);
static void set_hash_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_joinpartitionprune_references(PlannerInfo *root,
+ List *joinpartprune_info_list,
+ indexed_tlist *outer_itlist,
+ int rtoffset,
+ double num_exec);
static Relids offset_relid_set(Relids relids, int rtoffset);
static Node *fix_scan_expr(PlannerInfo *root, Node *node,
int rtoffset, double num_exec);
@@ -1897,6 +1902,62 @@ set_hash_references(PlannerInfo *root, Plan *plan, int rtoffset)
/* Hash nodes don't have their own quals */
Assert(plan->qual == NIL);
+
+ set_joinpartitionprune_references(root,
+ hplan->joinpartprune_info_list,
+ outer_itlist,
+ rtoffset,
+ NUM_EXEC_TLIST(plan));
+}
+
+/*
+ * set_joinpartitionprune_references
+ * Do set_plan_references processing on JoinPartitionPruneInfos
+ */
+static void
+set_joinpartitionprune_references(PlannerInfo *root,
+ List *joinpartprune_info_list,
+ indexed_tlist *outer_itlist,
+ int rtoffset,
+ double num_exec)
+{
+ ListCell *l;
+
+ foreach(l, joinpartprune_info_list)
+ {
+ JoinPartitionPruneInfo *jpinfo = (JoinPartitionPruneInfo *) lfirst(l);
+ ListCell *l1;
+
+ foreach(l1, jpinfo->part_prune_info->prune_infos)
+ {
+ List *prune_infos = lfirst(l1);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ pinfo->rtindex += rtoffset;
+
+ pinfo->initial_pruning_steps = (List *)
+ fix_upper_expr(root,
+ (Node *) pinfo->initial_pruning_steps,
+ outer_itlist,
+ OUTER_VAR,
+ rtoffset,
+ NRM_EQUAL,
+ num_exec);
+ pinfo->exec_pruning_steps = (List *)
+ fix_upper_expr(root,
+ (Node *) pinfo->exec_pruning_steps,
+ outer_itlist,
+ OUTER_VAR,
+ rtoffset,
+ NRM_EQUAL,
+ num_exec);
+ }
+ }
+ }
}
/*
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6b635e8ad1..17725aa972 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -48,7 +48,9 @@
#include "optimizer/appendinfo.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
+#include "optimizer/paramassign.h"
#include "optimizer/pathnode.h"
+#include "optimizer/restrictinfo.h"
#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partprune.h"
@@ -103,15 +105,16 @@ typedef enum PartClauseTarget
*
* gen_partprune_steps() initializes and returns an instance of this struct.
*
- * Note that has_mutable_op, has_mutable_arg, and has_exec_param are set if
- * we found any potentially-useful-for-pruning clause having those properties,
- * whether or not we actually used the clause in the steps list. This
- * definition allows us to skip the PARTTARGET_EXEC pass in some cases.
+ * Note that has_mutable_op, has_mutable_arg, has_exec_param and has_vars are
+ * set if we found any potentially-useful-for-pruning clause having those
+ * properties, whether or not we actually used the clause in the steps list.
+ * This definition allows us to skip the PARTTARGET_EXEC pass in some cases.
*/
typedef struct GeneratePruningStepsContext
{
/* Copies of input arguments for gen_partprune_steps: */
RelOptInfo *rel; /* the partitioned relation */
+ Bitmapset *available_rels; /* rels whose Vars may be used for pruning */
PartClauseTarget target; /* use-case we're generating steps for */
/* Result data: */
List *steps; /* list of PartitionPruneSteps */
@@ -119,6 +122,7 @@ typedef struct GeneratePruningStepsContext
bool has_mutable_arg; /* clauses include any mutable comparison
* values, *other than* exec params */
bool has_exec_param; /* clauses include any PARAM_EXEC params */
+ bool has_vars; /* clauses include any Vars from 'available_rels' */
bool contradictory; /* clauses were proven self-contradictory */
/* Working state: */
int next_step_id;
@@ -144,8 +148,10 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
+ Bitmapset *available_rels,
Bitmapset **matchedsubplans);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
+ Bitmapset *available_rels,
PartClauseTarget target,
GeneratePruningStepsContext *context);
static List *gen_partprune_steps_internal(GeneratePruningStepsContext *context,
@@ -204,6 +210,10 @@ static PartClauseMatchStatus match_boolean_partition_clause(Oid partopfamily,
static void partkey_datum_from_expr(PartitionPruneContext *context,
Expr *expr, int stateidx,
Datum *value, bool *isnull);
+static bool contain_forbidden_var_clause(Node *node,
+ GeneratePruningStepsContext *context);
+static bool contain_forbidden_var_clause_walker(Node *node,
+ GeneratePruningStepsContext *context);
/*
@@ -216,11 +226,14 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'available_rels' is the relid set of rels whose Vars may be used for
+ * pruning.
*/
PartitionPruneInfo *
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ Bitmapset *available_rels)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
@@ -313,6 +326,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
prunequal,
partrelids,
relid_subplan_map,
+ available_rels,
&matchedsubplans);
/* When pruning is possible, record the matched subplans */
@@ -360,6 +374,184 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return pruneinfo;
}
+/*
+ * make_join_partition_pruneinfos
+ * Builds one JoinPartitionPruneInfo for each join at which join partition
+ * pruning is possible for this appendrel.
+ *
+ * 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
+ * of scan paths for its child rels.
+ */
+Bitmapset *
+make_join_partition_pruneinfos(PlannerInfo *root, RelOptInfo *parentrel,
+ Path *best_path, List *subpaths)
+{
+ Bitmapset *result = NULL;
+ ListCell *lc;
+
+ if (!IS_PARTITIONED_REL(parentrel))
+ return NULL;
+
+ foreach(lc, root->join_partition_prune_candidates)
+ {
+ JoinPartitionPruneCandidateInfo *candidate =
+ (JoinPartitionPruneCandidateInfo *) lfirst(lc);
+ PartitionPruneInfo *part_prune_info;
+ List *prunequal;
+ Relids joinrelids;
+ ListCell *l;
+ double prune_cost;
+
+ if (candidate == NULL)
+ continue;
+
+ /*
+ * Identify all joinclauses that are movable to this appendrel given
+ * this inner side relids. Only those clauses can be used for join
+ * partition pruning.
+ */
+ joinrelids = bms_union(parentrel->relids, candidate->inner_relids);
+ prunequal = NIL;
+ foreach(l, candidate->joinrestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ if (join_clause_is_movable_into(rinfo,
+ parentrel->relids,
+ joinrelids))
+ prunequal = lappend(prunequal, rinfo);
+ }
+
+ if (prunequal == NIL)
+ continue;
+
+ /*
+ * Check the overhead of this pruning
+ */
+ prune_cost = compute_partprune_cost(root,
+ parentrel,
+ best_path->total_cost,
+ list_length(subpaths),
+ candidate->inner_relids,
+ candidate->inner_rows,
+ prunequal);
+ if (prune_cost > 0)
+ continue;
+
+ part_prune_info = make_partition_pruneinfo(root, parentrel,
+ subpaths,
+ prunequal,
+ candidate->inner_relids);
+
+ if (part_prune_info)
+ {
+ JoinPartitionPruneInfo *jpinfo;
+
+ jpinfo = makeNode(JoinPartitionPruneInfo);
+
+ jpinfo->part_prune_info = part_prune_info;
+ jpinfo->paramid = assign_special_exec_param(root);
+ jpinfo->nplans = list_length(subpaths);
+
+ candidate->joinpartprune_info_list =
+ lappend(candidate->joinpartprune_info_list, jpinfo);
+
+ result = bms_add_member(result, jpinfo->paramid);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * prepare_join_partition_prune_candidate
+ * Check if join partition pruning is possible at this join and if so
+ * collect information required to build JoinPartitionPruneInfos.
+ *
+ * Note that we may build more than one JoinPartitionPruneInfo at one join, for
+ * different Append/MergeAppend paths.
+ */
+void
+prepare_join_partition_prune_candidate(PlannerInfo *root, JoinPath *jpath)
+{
+ JoinPartitionPruneCandidateInfo *candidate;
+
+ if (!enable_partition_pruning)
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ /*
+ * For now do not perform join partition pruning for parallel hashjoin.
+ */
+ if (jpath->path.parallel_workers > 0)
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ /*
+ * We cannot perform join partition pruning if the outer is the
+ * non-nullable side.
+ */
+ if (!(jpath->jointype == JOIN_INNER ||
+ jpath->jointype == JOIN_SEMI ||
+ jpath->jointype == JOIN_RIGHT ||
+ jpath->jointype == JOIN_RIGHT_ANTI))
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ /*
+ * For now we only support HashJoin.
+ */
+ if (jpath->path.pathtype != T_HashJoin)
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ candidate = makeNode(JoinPartitionPruneCandidateInfo);
+ candidate->joinrestrictinfo = jpath->joinrestrictinfo;
+ candidate->inner_relids = jpath->innerjoinpath->parent->relids;
+ candidate->inner_rows = jpath->innerjoinpath->parent->rows;
+ candidate->joinpartprune_info_list = NIL;
+
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, candidate);
+}
+
+/*
+ * get_join_partition_prune_candidate
+ * Pop out the JoinPartitionPruneCandidateInfo for this join and retrieve
+ * the JoinPartitionPruneInfos.
+ */
+List *
+get_join_partition_prune_candidate(PlannerInfo *root)
+{
+ JoinPartitionPruneCandidateInfo *candidate;
+ List *result;
+
+ candidate = llast(root->join_partition_prune_candidates);
+ root->join_partition_prune_candidates =
+ list_delete_last(root->join_partition_prune_candidates);
+
+ if (candidate == NULL)
+ return NIL;
+
+ result = candidate->joinpartprune_info_list;
+
+ pfree(candidate);
+
+ return result;
+}
+
/*
* add_part_relids
* Add new info to a list of Bitmapsets of partitioned relids.
@@ -428,6 +620,8 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* partrelids: Set of RT indexes identifying relevant partitioned tables
* within a single partitioning hierarchy
* relid_subplan_map[]: maps child relation relids to subplan indexes
+ * available_rels: the relid set of rels whose Vars may be used for
+ * pruning.
* matchedsubplans: on success, receives the set of subplan indexes which
* were matched to this partition hierarchy
*
@@ -440,6 +634,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
+ Bitmapset *available_rels,
Bitmapset **matchedsubplans)
{
RelOptInfo *targetpart = NULL;
@@ -539,8 +734,8 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
*/
- gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
- &context);
+ gen_partprune_steps(subpart, partprunequal, available_rels,
+ PARTTARGET_INITIAL, &context);
if (context.contradictory)
{
@@ -567,14 +762,15 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
initial_pruning_steps = NIL;
/*
- * If no exec Params appear in potentially-usable pruning clauses,
- * then there's no point in even thinking about per-scan pruning.
+ * If no exec Params or available Vars appear in potentially-usable
+ * pruning clauses, then there's no point in even thinking about
+ * per-scan pruning.
*/
- if (context.has_exec_param)
+ if (context.has_exec_param || context.has_vars)
{
/* ... OK, we'd better think about it */
- gen_partprune_steps(subpart, partprunequal, PARTTARGET_EXEC,
- &context);
+ gen_partprune_steps(subpart, partprunequal, available_rels,
+ PARTTARGET_EXEC, &context);
if (context.contradictory)
{
@@ -587,11 +783,14 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/*
* Detect which exec Params actually got used; the fact that some
* were in available clauses doesn't mean we actually used them.
- * Skip per-scan pruning if there are none.
*/
execparamids = get_partkey_exec_paramids(exec_pruning_steps);
- if (bms_is_empty(execparamids))
+ /*
+ * Skip per-scan pruning if there are none used exec Params and
+ * there are none available Vars.
+ */
+ if (bms_is_empty(execparamids) && !context.has_vars)
exec_pruning_steps = NIL;
}
else
@@ -703,6 +902,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* Process 'clauses' (typically a rel's baserestrictinfo list of clauses)
* and create a list of "partition pruning steps".
*
+ * 'available_rels' is the relid set of rels whose Vars may be used for
+ * pruning.
+ *
* 'target' tells whether to generate pruning steps for planning (use
* immutable clauses only), or for executor startup (use any allowable
* clause except ones containing PARAM_EXEC Params), or for executor
@@ -712,12 +914,13 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* some subsidiary flags; see the GeneratePruningStepsContext typedef.
*/
static void
-gen_partprune_steps(RelOptInfo *rel, List *clauses, PartClauseTarget target,
- GeneratePruningStepsContext *context)
+gen_partprune_steps(RelOptInfo *rel, List *clauses, Bitmapset *available_rels,
+ PartClauseTarget target, GeneratePruningStepsContext *context)
{
/* Initialize all output values to zero/false/NULL */
memset(context, 0, sizeof(GeneratePruningStepsContext));
context->rel = rel;
+ context->available_rels = available_rels;
context->target = target;
/*
@@ -773,7 +976,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
* If the clauses are found to be contradictory, we can return the empty
* set.
*/
- gen_partprune_steps(rel, clauses, PARTTARGET_PLANNER,
+ gen_partprune_steps(rel, clauses, NULL, PARTTARGET_PLANNER,
&gcontext);
if (gcontext.contradictory)
return NULL;
@@ -1957,9 +2160,10 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
return PARTCLAUSE_UNSUPPORTED;
/*
- * We can never prune using an expression that contains Vars.
+ * We can never prune using an expression that contains Vars except
+ * for Vars belonging to context->available_rels.
*/
- if (contain_var_clause((Node *) expr))
+ if (contain_forbidden_var_clause((Node *) expr, context))
return PARTCLAUSE_UNSUPPORTED;
/*
@@ -2155,9 +2359,10 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
return PARTCLAUSE_UNSUPPORTED;
/*
- * We can never prune using an expression that contains Vars.
+ * We can never prune using an expression that contains Vars except
+ * for Vars belonging to context->available_rels.
*/
- if (contain_var_clause((Node *) rightop))
+ if (contain_forbidden_var_clause((Node *) rightop, context))
return PARTCLAUSE_UNSUPPORTED;
/*
@@ -3727,3 +3932,54 @@ partkey_datum_from_expr(PartitionPruneContext *context,
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
+
+/*
+ * contain_forbidden_var_clause
+ * Recursively scan a clause to discover whether it contains any Var nodes
+ * (of the current query level) that do not belong to relations in
+ * context->available_rels.
+ *
+ * Returns true if any such varnode found.
+ *
+ * Does not examine subqueries, therefore must only be used after reduction
+ * of sublinks to subplans!
+ */
+static bool
+contain_forbidden_var_clause(Node *node, GeneratePruningStepsContext *context)
+{
+ return contain_forbidden_var_clause_walker(node, context);
+}
+
+static bool
+contain_forbidden_var_clause_walker(Node *node, GeneratePruningStepsContext *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varlevelsup != 0)
+ return false;
+
+ if (!bms_is_member(var->varno, context->available_rels))
+ return true; /* abort the tree traversal and return true */
+
+ context->has_vars = true;
+
+ if (context->target != PARTTARGET_EXEC)
+ return true; /* abort the tree traversal and return true */
+
+ return false;
+ }
+ if (IsA(node, CurrentOfExpr))
+ return true;
+ if (IsA(node, PlaceHolderVar))
+ {
+ if (((PlaceHolderVar *) node)->phlevelsup == 0)
+ return true; /* abort the tree traversal and return true */
+ /* else fall through to check the contained expr */
+ }
+ return expression_tree_walker(node, contain_forbidden_var_clause_walker,
+ (void *) context);
+}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index c09bc83b2a..7e46f0baf6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -121,11 +121,26 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+/*
+ * JoinPartitionPruneState - State object required for plan nodes to perform
+ * join partition pruning.
+ */
+typedef struct JoinPartitionPruneState
+{
+ PartitionPruneState *part_prune_state;
+ int paramid;
+ int nplans;
+ bool finished;
+ Bitmapset *part_prune_result;
+} JoinPartitionPruneState;
+
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
PartitionPruneInfo *pruneinfo,
Bitmapset **initially_valid_subplans);
+extern List *ExecInitJoinpartpruneList(PlanState *planstate, List *joinpartprune_info_list);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ PlanState *planstate);
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 444a5f0fd5..8f02af1f10 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2685,6 +2685,9 @@ typedef struct HashState
/* Parallel hash state. */
struct ParallelHashJoinState *parallel_state;
+
+ /* Infos for join partition pruning. */
+ List *joinpartprune_state_list;
} HashState;
/* ----------------
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 534692bee1..da7803448a 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -530,6 +530,9 @@ struct PlannerInfo
/* not-yet-assigned NestLoopParams */
List *curOuterParams;
+ /* a stack of JoinPartitionPruneInfos */
+ List *join_partition_prune_candidates;
+
/*
* These fields are workspace for setrefs.c. Each is an array
* corresponding to glob->subplans. (We could probably teach
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b4ef6bc44c..964bc85123 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -275,6 +275,9 @@ typedef struct Append
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /* Info for join partition pruning; NULL if we're not doing that */
+ Bitmapset *join_prune_paramids;
} Append;
/* ----------------
@@ -310,6 +313,9 @@ typedef struct MergeAppend
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /* Info for join partition pruning; NULL if we're not doing that */
+ Bitmapset *join_prune_paramids;
} MergeAppend;
/* ----------------
@@ -1206,6 +1212,7 @@ typedef struct Hash
bool skewInherit; /* is outer join rel an inheritance tree? */
/* all other info is in the parent HashJoin node */
Cardinality rows_total; /* estimate total rows if parallel_aware */
+ List *joinpartprune_info_list; /* infos for join partition pruning */
} Hash;
/* ----------------
@@ -1552,6 +1559,35 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*
+ * JoinPartitionPruneCandidateInfo - Information required to build
+ * JoinPartitionPruneInfos.
+ */
+typedef struct JoinPartitionPruneCandidateInfo
+{
+ pg_node_attr(no_equal, no_query_jumble)
+
+ NodeTag type;
+ List *joinrestrictinfo;
+ Bitmapset *inner_relids;
+ double inner_rows;
+ List *joinpartprune_info_list;
+} JoinPartitionPruneCandidateInfo;
+
+/*
+ * JoinPartitionPruneInfo - Details required to allow the executor to prune
+ * partitions during join.
+ */
+typedef struct JoinPartitionPruneInfo
+{
+ pg_node_attr(no_equal, no_query_jumble)
+
+ NodeTag type;
+ PartitionPruneInfo *part_prune_info;
+ int paramid;
+ int nplans;
+} JoinPartitionPruneInfo;
+
/*
* Plan invalidation info
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index b1c51a4e70..5ccd3c62d4 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -212,5 +212,9 @@ extern PathTarget *set_pathtarget_cost_width(PlannerInfo *root, PathTarget *targ
extern double compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel,
Path *bitmapqual, double loop_count,
Cost *cost_p, double *tuples_p);
+extern double compute_partprune_cost(PlannerInfo *root, RelOptInfo *appendrel,
+ Cost append_total_cost, int append_nplans,
+ Relids inner_relids, double inner_rows,
+ List *prunequal);
#endif /* COST_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index bd490d154f..5187485c88 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -19,6 +19,8 @@
struct PlannerInfo; /* avoid including pathnodes.h here */
struct RelOptInfo;
+struct Path;
+struct JoinPath;
/*
@@ -73,7 +75,15 @@ typedef struct PartitionPruneContext
extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ Bitmapset *available_rels);
+extern Bitmapset *make_join_partition_pruneinfos(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ struct Path *best_path,
+ List *subpaths);
+extern void prepare_join_partition_prune_candidate(struct PlannerInfo *root,
+ struct JoinPath *jpath);
+extern List *get_join_partition_prune_candidate(struct PlannerInfo *root);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 9a4c48c055..a08e7a1f0a 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -3003,6 +3003,92 @@ order by tbl1.col1, tprt.col1;
------+------
(0 rows)
+-- join partition pruning
+-- The 'Memory Usage' from the Hash node can vary between machines. Let's just
+-- replace the number with an 'N'.
+-- We need to run EXPLAIN ANALYZE because we need to see '(never executed)'
+-- notations because that's the only way to verify runtime pruning.
+create function explain_join_partition_pruning(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, verbose, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+ return next ln;
+ end loop;
+end;
+$$;
+delete from tbl1;
+insert into tbl1 values (501), (505);
+analyze tbl1, tprt;
+set enable_nestloop = off;
+set enable_mergejoin = off;
+set enable_hashjoin = on;
+select explain_join_partition_pruning('
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;');
+ explain_join_partition_pruning
+--------------------------------------------------------------------------------
+ Hash Right Join (actual rows=2 loops=1)
+ Output: p1.col1, p2.col1, t.col1
+ Hash Cond: ((p1.col1 = t.col1) AND (p2.col1 = t.col1))
+ -> Hash Join (actual rows=3 loops=1)
+ Output: p1.col1, p2.col1
+ Hash Cond: (p1.col1 = p2.col1)
+ -> Append (actual rows=3 loops=1)
+ Join Partition Pruning: $0
+ -> Seq Scan on public.tprt_1 p1_1 (never executed)
+ Output: p1_1.col1
+ -> Seq Scan on public.tprt_2 p1_2 (actual rows=3 loops=1)
+ Output: p1_2.col1
+ -> Seq Scan on public.tprt_3 p1_3 (never executed)
+ Output: p1_3.col1
+ -> Seq Scan on public.tprt_4 p1_4 (never executed)
+ Output: p1_4.col1
+ -> Seq Scan on public.tprt_5 p1_5 (never executed)
+ Output: p1_5.col1
+ -> Seq Scan on public.tprt_6 p1_6 (never executed)
+ Output: p1_6.col1
+ -> Hash (actual rows=3 loops=1)
+ Output: p2.col1
+ Buckets: 1024 Batches: 1 Memory Usage: NkB
+ -> Append (actual rows=3 loops=1)
+ Join Partition Pruning: $1
+ -> Seq Scan on public.tprt_1 p2_1 (never executed)
+ Output: p2_1.col1
+ -> Seq Scan on public.tprt_2 p2_2 (actual rows=3 loops=1)
+ Output: p2_2.col1
+ -> Seq Scan on public.tprt_3 p2_3 (never executed)
+ Output: p2_3.col1
+ -> Seq Scan on public.tprt_4 p2_4 (never executed)
+ Output: p2_4.col1
+ -> Seq Scan on public.tprt_5 p2_5 (never executed)
+ Output: p2_5.col1
+ -> Seq Scan on public.tprt_6 p2_6 (never executed)
+ Output: p2_6.col1
+ -> Hash (actual rows=2 loops=1)
+ Output: t.col1
+ Buckets: 1024 Batches: 1 Memory Usage: NkB
+ Partition Prune: $0, $1
+ -> Seq Scan on public.tbl1 t (actual rows=2 loops=1)
+ Output: t.col1
+(43 rows)
+
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;
+ col1 | col1 | col1
+------+------+------
+ 501 | 501 | 501
+ 505 | 505 | 505
+(2 rows)
+
drop table tbl1, tprt;
-- Test with columns defined in varying orders between each level
create table part_abc (a int not null, b int not null, c int not null) partition by list (a);
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 7bf3920827..fc5982edcf 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -727,6 +727,45 @@ select tbl1.col1, tprt.col1 from tbl1
inner join tprt on tbl1.col1 = tprt.col1
order by tbl1.col1, tprt.col1;
+-- join partition pruning
+
+-- The 'Memory Usage' from the Hash node can vary between machines. Let's just
+-- replace the number with an 'N'.
+-- We need to run EXPLAIN ANALYZE because we need to see '(never executed)'
+-- notations because that's the only way to verify runtime pruning.
+create function explain_join_partition_pruning(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, verbose, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+ return next ln;
+ end loop;
+end;
+$$;
+
+delete from tbl1;
+insert into tbl1 values (501), (505);
+analyze tbl1, tprt;
+
+set enable_nestloop = off;
+set enable_mergejoin = off;
+set enable_hashjoin = on;
+
+select explain_join_partition_pruning('
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;');
+
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;
+
drop table tbl1, tprt;
-- Test with columns defined in varying orders between each level
--
2.31.0
On Tue, Jan 30, 2024 at 10:33 AM Richard Guo <guofenglinux@gmail.com> wrote:
Attached is an updated patch. Nothing else has changed.
Here is another rebase over master so it applies again. Nothing else
has changed.
Thanks
Richard
Attachments:
v7-0001-Support-run-time-partition-pruning-for-hash-join.patchapplication/octet-stream; name=v7-0001-Support-run-time-partition-pruning-for-hash-join.patchDownload
From 465c96b2c84df1c6dd2ffa3001759c2f18ab3867 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Mon, 14 Aug 2023 14:55:26 +0800
Subject: [PATCH v7] Support run-time partition pruning for hash join
If we have a hash join with an Append node on the outer side, something
like
Hash Join
Hash Cond: (pt.a = t.a)
-> Append
-> Seq Scan on pt_p1 pt_1
-> Seq Scan on pt_p2 pt_2
-> Seq Scan on pt_p3 pt_3
-> Hash
-> Seq Scan on t
We can actually prune those subnodes of the Append that cannot possibly
contain any matching tuples from the other side of the join. To do
that, when building the Hash table, for each row from the inner side we
can compute the minimum set of subnodes that can possibly match the join
condition. When we have built the Hash table and start to execute the
Append node, we should have known which subnodes are survived and thus
can skip other subnodes.
This patch implements this idea.
---
src/backend/commands/explain.c | 61 ++++
src/backend/executor/execPartition.c | 127 +++++++-
src/backend/executor/nodeAppend.c | 32 +-
src/backend/executor/nodeHash.c | 75 +++++
src/backend/executor/nodeHashjoin.c | 10 +
src/backend/executor/nodeMergeAppend.c | 22 +-
src/backend/optimizer/path/costsize.c | 106 +++++++
src/backend/optimizer/plan/createplan.c | 49 ++-
src/backend/optimizer/plan/setrefs.c | 61 ++++
src/backend/partitioning/partprune.c | 298 ++++++++++++++++--
src/include/executor/execPartition.h | 17 +-
src/include/nodes/execnodes.h | 3 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 36 +++
src/include/optimizer/cost.h | 4 +
src/include/partitioning/partprune.h | 12 +-
src/test/regress/expected/partition_prune.out | 86 +++++
src/test/regress/sql/partition_prune.sql | 39 +++
18 files changed, 992 insertions(+), 49 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a9d5056af4..0a5591b95f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -18,6 +18,7 @@
#include "commands/createas.h"
#include "commands/defrem.h"
#include "commands/prepare.h"
+#include "executor/execPartition.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
#include "nodes/extensible.h"
@@ -117,6 +118,9 @@ static void show_instrumentation_count(const char *qlabel, int which,
PlanState *planstate, ExplainState *es);
static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
+static void show_join_pruning_result_info(Bitmapset *join_prune_paramids,
+ ExplainState *es);
+static void show_joinpartprune_info(HashState *hashstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
@@ -2115,9 +2119,17 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_incremental_sort_info(castNode(IncrementalSortState, planstate),
es);
break;
+ case T_Append:
+ if (es->verbose)
+ show_join_pruning_result_info(((Append *) plan)->join_prune_paramids,
+ es);
+ break;
case T_MergeAppend:
show_merge_append_keys(castNode(MergeAppendState, planstate),
ancestors, es);
+ if (es->verbose)
+ show_join_pruning_result_info(((MergeAppend *) plan)->join_prune_paramids,
+ es);
break;
case T_Result:
show_upper_qual((List *) ((Result *) plan)->resconstantqual,
@@ -2133,6 +2145,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
break;
case T_Hash:
show_hash_info(castNode(HashState, planstate), es);
+ if (es->verbose)
+ show_joinpartprune_info(castNode(HashState, planstate), es);
break;
case T_Memoize:
show_memoize_info(castNode(MemoizeState, planstate), ancestors,
@@ -3573,6 +3587,53 @@ show_eval_params(Bitmapset *bms_params, ExplainState *es)
ExplainPropertyList("Params Evaluated", params, es);
}
+/*
+ * Show join partition pruning results at Append/MergeAppend nodes.
+ */
+static void
+show_join_pruning_result_info(Bitmapset *join_prune_paramids, ExplainState *es)
+{
+ int paramid = -1;
+ List *params = NIL;
+
+ if (bms_is_empty(join_prune_paramids))
+ return;
+
+ while ((paramid = bms_next_member(join_prune_paramids, paramid)) >= 0)
+ {
+ char param[32];
+
+ snprintf(param, sizeof(param), "$%d", paramid);
+ params = lappend(params, pstrdup(param));
+ }
+
+ ExplainPropertyList("Join Partition Pruning", params, es);
+}
+
+/*
+ * Show join partition pruning infos at Hash nodes.
+ */
+static void
+show_joinpartprune_info(HashState *hashstate, ExplainState *es)
+{
+ List *params = NIL;
+ ListCell *lc;
+
+ if (!hashstate->joinpartprune_state_list)
+ return;
+
+ foreach(lc, hashstate->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ char param[32];
+
+ snprintf(param, sizeof(param), "$%d", jpstate->paramid);
+ params = lappend(params, pstrdup(param));
+ }
+
+ ExplainPropertyList("Partition Prune", params, es);
+}
+
/*
* Fetch the name of an index in an EXPLAIN
*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 64fcb012db..d715827972 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -196,6 +196,8 @@ static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
Bitmapset **validsubplans);
+static bool get_join_prune_matching_subplans(PlanState *planstate,
+ Bitmapset **partset);
/*
@@ -1805,7 +1807,7 @@ ExecInitPartitionPruning(PlanState *planstate,
* Perform an initial partition prune pass, if required.
*/
if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true, NULL);
else
{
/* No pruning, so we'll need to initialize all subplans */
@@ -1835,6 +1837,37 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecInitJoinpartpruneList
+ * Initialize data structures needed for join partition pruning
+ */
+List *
+ExecInitJoinpartpruneList(PlanState *planstate,
+ List *joinpartprune_info_list)
+{
+ ListCell *lc;
+ List *result = NIL;
+
+ foreach(lc, joinpartprune_info_list)
+ {
+ JoinPartitionPruneInfo *jpinfo = (JoinPartitionPruneInfo *) lfirst(lc);
+ JoinPartitionPruneState *jpstate = palloc(sizeof(JoinPartitionPruneState));
+
+ jpstate->part_prune_state =
+ CreatePartitionPruneState(planstate, jpinfo->part_prune_info);
+ Assert(jpstate->part_prune_state->do_exec_prune);
+
+ jpstate->paramid = jpinfo->paramid;
+ jpstate->nplans = jpinfo->nplans;
+ jpstate->finished = false;
+ jpstate->part_prune_result = NULL;
+
+ result = lappend(result, jpstate);
+ }
+
+ return result;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
@@ -2272,7 +2305,9 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
/*
* ExecFindMatchingSubPlans
* Determine which subplans match the pruning steps detailed in
- * 'prunestate' for the current comparison expression values.
+ * 'prunestate' if any for the current comparison expression values, and
+ * meanwhile match the join partition pruning results if any stored in
+ * Append/MergeAppend node's join_prune_paramids.
*
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
@@ -2280,11 +2315,30 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ PlanState *planstate)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
int i;
+ Bitmapset *join_prune_partset = NULL;
+ bool do_join_prune;
+
+ /* Retrieve the join partition pruning results if any */
+ do_join_prune =
+ get_join_prune_matching_subplans(planstate, &join_prune_partset);
+
+ /*
+ * Either we're here on partition prune done according to the pruning steps
+ * detailed in 'prunestate', or we have done join partition prune.
+ */
+ Assert(do_join_prune || prunestate != NULL);
+
+ /*
+ * If there is no 'prunestate', then rely entirely on join pruning.
+ */
+ if (prunestate == NULL)
+ return join_prune_partset;
/*
* Either we're here on the initial prune done during pruning
@@ -2325,6 +2379,10 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Add in any subplans that partition pruning didn't account for */
result = bms_add_members(result, prunestate->other_subplans);
+ /* Intersect join partition pruning results */
+ if (do_join_prune)
+ result = bms_intersect(result, join_prune_partset);
+
MemoryContextSwitchTo(oldcontext);
/* Copy result out of the temp context before we reset it */
@@ -2395,3 +2453,66 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
}
}
}
+
+/*
+ * get_join_prune_matching_subplans
+ * Retrieve the join partition pruning results if any stored in
+ * Append/MergeAppend node's join_prune_paramids. Return true if we can
+ * do join partition pruning, otherwise return false.
+ *
+ * Adds valid (non-prunable) subplan IDs to *partset
+ */
+static bool
+get_join_prune_matching_subplans(PlanState *planstate, Bitmapset **partset)
+{
+ Bitmapset *join_prune_paramids;
+ int nplans;
+ int paramid;
+
+ if (planstate == NULL)
+ return false;
+
+ if (IsA(planstate, AppendState))
+ {
+ join_prune_paramids =
+ ((Append *) planstate->plan)->join_prune_paramids;
+ nplans = ((AppendState *) planstate)->as_nplans;
+ }
+ else if (IsA(planstate, MergeAppendState))
+ {
+ join_prune_paramids =
+ ((MergeAppend *) planstate->plan)->join_prune_paramids;
+ nplans = ((MergeAppendState *) planstate)->ms_nplans;
+ }
+ else
+ {
+ elog(ERROR, "unrecognized node type: %d", (int) nodeTag(planstate));
+ return false;
+ }
+
+ if (bms_is_empty(join_prune_paramids))
+ return false;
+
+ Assert(nplans > 0);
+ *partset = bms_add_range(NULL, 0, nplans - 1);
+
+ paramid = -1;
+ while ((paramid = bms_next_member(join_prune_paramids, paramid)) >= 0)
+ {
+ ParamExecData *param;
+ JoinPartitionPruneState *jpstate;
+
+ param = &(planstate->state->es_param_exec_vals[paramid]);
+ Assert(param->execPlan == NULL);
+ Assert(!param->isnull);
+ jpstate = (JoinPartitionPruneState *) DatumGetPointer(param->value);
+
+ if (jpstate != NULL)
+ *partset = bms_intersect(*partset, jpstate->part_prune_result);
+ else /* the Hash node for this pruning has not been executed */
+ elog(WARNING, "Join partition pruning $%d has not been performed yet.",
+ paramid);
+ }
+
+ return true;
+}
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index ca0f54d676..cd5ec550fc 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -151,11 +151,13 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
nplans = bms_num_members(validsubplans);
/*
- * When no run-time pruning is required and there's at least one
- * subplan, we can fill as_valid_subplans immediately, preventing
- * later calls to ExecFindMatchingSubPlans.
+ * When no run-time pruning or join pruning is required and there's at
+ * least one subplan, we can fill as_valid_subplans immediately,
+ * preventing later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (!prunestate->do_exec_prune &&
+ bms_is_empty(node->join_prune_paramids) &&
+ nplans > 0)
{
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
appendstate->as_valid_subplans_identified = true;
@@ -170,10 +172,18 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplans as valid; they must also all be initialized.
*/
Assert(nplans > 0);
- appendstate->as_valid_subplans = validsubplans =
- bms_add_range(NULL, 0, nplans - 1);
- appendstate->as_valid_subplans_identified = true;
+ validsubplans = bms_add_range(NULL, 0, nplans - 1);
appendstate->as_prune_state = NULL;
+
+ /*
+ * When join pruning is not enabled we can fill as_valid_subplans
+ * immediately, preventing later calls to ExecFindMatchingSubPlans.
+ */
+ if (bms_is_empty(node->join_prune_paramids))
+ {
+ appendstate->as_valid_subplans = validsubplans;
+ appendstate->as_valid_subplans_identified = true;
+ }
}
/*
@@ -580,7 +590,7 @@ choose_next_subplan_locally(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
}
@@ -647,7 +657,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
/*
@@ -723,7 +733,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
mark_invalid_subplans_as_finished(node);
@@ -876,7 +886,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (!node->as_valid_subplans_identified)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, &node->ps);
node->as_valid_subplans_identified = true;
classify_matching_subplans(node);
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 61480733a1..39658b9fc5 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -30,6 +30,7 @@
#include "access/parallel.h"
#include "catalog/pg_statistic.h"
#include "commands/tablespace.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/hashjoin.h"
#include "executor/nodeHash.h"
@@ -47,6 +48,8 @@ static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
static void ExecHashIncreaseNumBuckets(HashJoinTable hashtable);
static void ExecParallelHashIncreaseNumBatches(HashJoinTable hashtable);
static void ExecParallelHashIncreaseNumBuckets(HashJoinTable hashtable);
+static void ExecJoinPartitionPrune(HashState *node);
+static void ExecStoreJoinPartitionPruneResult(HashState *node);
static void ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node,
int mcvsToUse);
static void ExecHashSkewTableInsert(HashJoinTable hashtable,
@@ -188,8 +191,14 @@ MultiExecPrivateHash(HashState *node)
}
hashtable->totalTuples += 1;
}
+
+ /* Perform join partition pruning */
+ ExecJoinPartitionPrune(node);
}
+ /* Store the surviving partitions for Append/MergeAppend nodes */
+ ExecStoreJoinPartitionPruneResult(node);
+
/* resize the hash table if needed (NTUP_PER_BUCKET exceeded) */
if (hashtable->nbuckets != hashtable->nbuckets_optimal)
ExecHashIncreaseNumBuckets(hashtable);
@@ -400,6 +409,12 @@ ExecInitHash(Hash *node, EState *estate, int eflags)
hashstate->hashkeys =
ExecInitExprList(node->hashkeys, (PlanState *) hashstate);
+ /*
+ * initialize join partition pruning infos
+ */
+ hashstate->joinpartprune_state_list =
+ ExecInitJoinpartpruneList(&hashstate->ps, node->joinpartprune_info_list);
+
return hashstate;
}
@@ -1609,6 +1624,56 @@ ExecParallelHashIncreaseNumBuckets(HashJoinTable hashtable)
}
}
+/*
+ * ExecJoinPartitionPrune
+ * Perform join partition pruning at this join for each
+ * JoinPartitionPruneState.
+ */
+static void
+ExecJoinPartitionPrune(HashState *node)
+{
+ ListCell *lc;
+
+ foreach(lc, node->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ Bitmapset *matching_subPlans;
+
+ if (jpstate->finished)
+ continue;
+
+ matching_subPlans =
+ ExecFindMatchingSubPlans(jpstate->part_prune_state, false, NULL);
+ jpstate->part_prune_result =
+ bms_add_members(jpstate->part_prune_result, matching_subPlans);
+
+ if (bms_num_members(jpstate->part_prune_result) == jpstate->nplans)
+ jpstate->finished = true;
+ }
+}
+
+/*
+ * ExecStoreJoinPartitionPruneResult
+ * For each JoinPartitionPruneState, store the set of surviving partitions
+ * to make it available for the Append/MergeAppend node.
+ */
+static void
+ExecStoreJoinPartitionPruneResult(HashState *node)
+{
+ ListCell *lc;
+
+ foreach(lc, node->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+ ParamExecData *param;
+
+ param = &(node->ps.state->es_param_exec_vals[jpstate->paramid]);
+ Assert(param->execPlan == NULL);
+ Assert(!param->isnull);
+ param->value = PointerGetDatum(jpstate);
+ }
+}
+
/*
* ExecHashTableInsert
* insert a tuple into the hash table depending on the hash value
@@ -2353,6 +2418,16 @@ void
ExecReScanHash(HashState *node)
{
PlanState *outerPlan = outerPlanState(node);
+ ListCell *lc;
+
+ /* reset the state in JoinPartitionPruneStates */
+ foreach(lc, node->joinpartprune_state_list)
+ {
+ JoinPartitionPruneState *jpstate = (JoinPartitionPruneState *) lfirst(lc);
+
+ jpstate->finished = false;
+ jpstate->part_prune_result = NULL;
+ }
/*
* if chgParam of subnode is not null then plan will be re-scanned by
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index dbf114cd5e..2a9d2c6482 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -310,6 +310,16 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
*/
node->hj_FirstOuterTupleSlot = NULL;
}
+ else if (hashNode->joinpartprune_state_list != NIL)
+ {
+ /*
+ * Give the hash node a chance to run join partition
+ * pruning if there is any JoinPartitionPruneState that can
+ * be evaluated at it. So do not apply the empty-outer
+ * optimization in this case.
+ */
+ node->hj_FirstOuterTupleSlot = NULL;
+ }
else if (HJ_FILL_OUTER(node) ||
(outerNode->plan->startup_cost < hashNode->ps.plan->total_cost &&
!node->hj_OuterNotEmpty))
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index e1b9b984a7..df921ad5ad 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -99,11 +99,13 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
nplans = bms_num_members(validsubplans);
/*
- * When no run-time pruning is required and there's at least one
- * subplan, we can fill ms_valid_subplans immediately, preventing
- * later calls to ExecFindMatchingSubPlans.
+ * When no run-time pruning or join pruning is required and there's at
+ * least one subplan, we can fill ms_valid_subplans immediately,
+ * preventing later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (!prunestate->do_exec_prune &&
+ bms_is_empty(node->join_prune_paramids) &&
+ nplans > 0)
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -115,9 +117,15 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplans as valid; they must also all be initialized.
*/
Assert(nplans > 0);
- mergestate->ms_valid_subplans = validsubplans =
- bms_add_range(NULL, 0, nplans - 1);
+ validsubplans = bms_add_range(NULL, 0, nplans - 1);
mergestate->ms_prune_state = NULL;
+
+ /*
+ * When join pruning is not enabled we can fill ms_valid_subplans
+ * immediately, preventing later calls to ExecFindMatchingSubPlans.
+ */
+ if (bms_is_empty(node->join_prune_paramids))
+ mergestate->ms_valid_subplans = validsubplans;
}
mergeplanstates = (PlanState **) palloc(nplans * sizeof(PlanState *));
@@ -218,7 +226,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, &node->ps);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 83a0aed051..382413e666 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -172,6 +172,10 @@ static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
static bool has_indexed_join_quals(NestPath *path);
static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
List *quals);
+static double get_joinrel_matching_outer_size(PlannerInfo *root,
+ RelOptInfo *outer_rel,
+ Relids inner_relids,
+ List *restrictlist);
static double calc_joinrel_size_estimate(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outer_rel,
@@ -5408,6 +5412,61 @@ get_parameterized_joinrel_size(PlannerInfo *root, RelOptInfo *rel,
return nrows;
}
+/*
+ * get_joinrel_matching_outer_size
+ * Make a size estimate for the outer side that matches the inner side.
+ */
+static double
+get_joinrel_matching_outer_size(PlannerInfo *root,
+ RelOptInfo *outer_rel,
+ Relids inner_relids,
+ List *restrictlist)
+{
+ double nrows;
+ Selectivity fkselec;
+ Selectivity jselec;
+ SpecialJoinInfo *sjinfo;
+ SpecialJoinInfo sjinfo_data;
+
+ sjinfo = &sjinfo_data;
+ sjinfo->type = T_SpecialJoinInfo;
+ sjinfo->min_lefthand = outer_rel->relids;
+ sjinfo->min_righthand = inner_relids;
+ sjinfo->syn_lefthand = outer_rel->relids;
+ sjinfo->syn_righthand = inner_relids;
+ sjinfo->jointype = JOIN_SEMI;
+ sjinfo->ojrelid = 0;
+ sjinfo->commute_above_l = NULL;
+ sjinfo->commute_above_r = NULL;
+ sjinfo->commute_below_l = NULL;
+ sjinfo->commute_below_r = NULL;
+ /* we don't bother trying to make the remaining fields valid */
+ sjinfo->lhs_strict = false;
+ sjinfo->semi_can_btree = false;
+ sjinfo->semi_can_hash = false;
+ sjinfo->semi_operators = NIL;
+ sjinfo->semi_rhs_exprs = NIL;
+
+ fkselec = get_foreign_key_join_selectivity(root,
+ outer_rel->relids,
+ inner_relids,
+ sjinfo,
+ &restrictlist);
+ jselec = clauselist_selectivity(root,
+ restrictlist,
+ 0,
+ sjinfo->jointype,
+ sjinfo);
+
+ nrows = outer_rel->rows * fkselec * jselec;
+ nrows = clamp_row_est(nrows);
+
+ /* For safety, make sure result is not more than the base estimate */
+ if (nrows > outer_rel->rows)
+ nrows = outer_rel->rows;
+ return nrows;
+}
+
/*
* calc_joinrel_size_estimate
* Workhorse for set_joinrel_size_estimates and
@@ -6529,3 +6588,50 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel,
return pages_fetched;
}
+
+/*
+ * compute_partprune_cost
+ * Compute the overhead of join partition pruning.
+ */
+double
+compute_partprune_cost(PlannerInfo *root, RelOptInfo *appendrel,
+ Cost append_total_cost, int append_nplans,
+ Relids inner_relids, double inner_rows,
+ List *prunequal)
+{
+ Cost prune_cost;
+ Cost saved_cost;
+ double matching_outer_rows;
+ double unmatched_nplans;
+
+ switch (appendrel->part_scheme->strategy)
+ {
+
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ prune_cost = cpu_operator_cost * LOG2(append_nplans) * inner_rows;
+ break;
+ case PARTITION_STRATEGY_HASH:
+ prune_cost = cpu_operator_cost * append_nplans * inner_rows;
+ break;
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) appendrel->part_scheme->strategy);
+ break;
+ }
+
+ matching_outer_rows =
+ get_joinrel_matching_outer_size(root,
+ appendrel,
+ inner_relids,
+ prunequal);
+
+ /*
+ * We assume that each outer joined row occupies one new partition. This
+ * is really the worst case.
+ */
+ unmatched_nplans = append_nplans - Min(matching_outer_rows, append_nplans);
+ saved_cost = (unmatched_nplans / append_nplans) * append_total_cost;
+
+ return prune_cost - saved_cost;
+}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 610f4a56d6..55cd3bb616 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -242,7 +242,8 @@ static Hash *make_hash(Plan *lefttree,
List *hashkeys,
Oid skewTable,
AttrNumber skewColumn,
- bool skewInherit);
+ bool skewInherit,
+ List *joinpartprune_info_list);
static MergeJoin *make_mergejoin(List *tlist,
List *joinclauses, List *otherclauses,
List *mergeclauses,
@@ -342,6 +343,7 @@ create_plan(PlannerInfo *root, Path *best_path)
/* Initialize this module's workspace in PlannerInfo */
root->curOuterRels = NULL;
root->curOuterParams = NIL;
+ root->join_partition_prune_candidates = NIL;
/* Recursively process the path tree, demanding the correct tlist result */
plan = create_plan_recurse(root, best_path, CP_EXACT_TLIST);
@@ -369,6 +371,8 @@ create_plan(PlannerInfo *root, Path *best_path)
if (root->curOuterParams != NIL)
elog(ERROR, "failed to assign all NestLoopParams to plan nodes");
+ Assert(root->join_partition_prune_candidates == NIL);
+
/*
* Reset plan_params to ensure param IDs used for nestloop params are not
* re-used later
@@ -1223,6 +1227,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ Bitmapset *join_prune_paramids = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1377,6 +1382,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
* do partition pruning.
+ *
+ * Also gather information needed by the executor to do join pruning.
*/
if (enable_partition_pruning)
{
@@ -1399,13 +1406,20 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
partpruneinfo =
make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ NULL);
+
+ join_prune_paramids =
+ make_join_partition_pruneinfos(root, rel,
+ (Path *) best_path,
+ best_path->subpaths);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
plan->part_prune_info = partpruneinfo;
+ plan->join_prune_paramids = join_prune_paramids;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1445,6 +1459,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
PartitionPruneInfo *partpruneinfo = NULL;
+ Bitmapset *join_prune_paramids = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1541,6 +1556,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
* do partition pruning.
+ *
+ * Also gather information needed by the executor to do join pruning.
*/
if (enable_partition_pruning)
{
@@ -1554,11 +1571,18 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
if (prunequal != NIL)
partpruneinfo = make_partition_pruneinfo(root, rel,
best_path->subpaths,
- prunequal);
+ prunequal,
+ NULL);
+
+ join_prune_paramids =
+ make_join_partition_pruneinfos(root, rel,
+ (Path *) best_path,
+ best_path->subpaths);
}
node->mergeplans = subplans;
node->part_prune_info = partpruneinfo;
+ node->join_prune_paramids = join_prune_paramids;
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
@@ -4743,6 +4767,13 @@ create_hashjoin_plan(PlannerInfo *root,
AttrNumber skewColumn = InvalidAttrNumber;
bool skewInherit = false;
ListCell *lc;
+ List *joinpartprune_info_list;
+
+ /*
+ * Collect information required to build JoinPartitionPruneInfos at this
+ * join.
+ */
+ prepare_join_partition_prune_candidate(root, &best_path->jpath);
/*
* HashJoin can project, so we don't have to demand exact tlists from the
@@ -4754,6 +4785,11 @@ create_hashjoin_plan(PlannerInfo *root,
outer_plan = create_plan_recurse(root, best_path->jpath.outerjoinpath,
(best_path->num_batches > 1) ? CP_SMALL_TLIST : 0);
+ /*
+ * Retrieve all the JoinPartitionPruneInfos for this join.
+ */
+ joinpartprune_info_list = get_join_partition_prune_candidate(root);
+
inner_plan = create_plan_recurse(root, best_path->jpath.innerjoinpath,
CP_SMALL_TLIST);
@@ -4859,7 +4895,8 @@ create_hashjoin_plan(PlannerInfo *root,
inner_hashkeys,
skewTable,
skewColumn,
- skewInherit);
+ skewInherit,
+ joinpartprune_info_list);
/*
* Set Hash node's startup & total costs equal to total cost of input
@@ -5986,7 +6023,8 @@ make_hash(Plan *lefttree,
List *hashkeys,
Oid skewTable,
AttrNumber skewColumn,
- bool skewInherit)
+ bool skewInherit,
+ List *joinpartprune_info_list)
{
Hash *node = makeNode(Hash);
Plan *plan = &node->plan;
@@ -6000,6 +6038,7 @@ make_hash(Plan *lefttree,
node->skewTable = skewTable;
node->skewColumn = skewColumn;
node->skewInherit = skewInherit;
+ node->joinpartprune_info_list = joinpartprune_info_list;
return node;
}
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 42603dbc7c..573446b989 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -155,6 +155,11 @@ static Plan *set_mergeappend_references(PlannerInfo *root,
MergeAppend *mplan,
int rtoffset);
static void set_hash_references(PlannerInfo *root, Plan *plan, int rtoffset);
+static void set_joinpartitionprune_references(PlannerInfo *root,
+ List *joinpartprune_info_list,
+ indexed_tlist *outer_itlist,
+ int rtoffset,
+ double num_exec);
static Relids offset_relid_set(Relids relids, int rtoffset);
static Node *fix_scan_expr(PlannerInfo *root, Node *node,
int rtoffset, double num_exec);
@@ -1896,6 +1901,62 @@ set_hash_references(PlannerInfo *root, Plan *plan, int rtoffset)
/* Hash nodes don't have their own quals */
Assert(plan->qual == NIL);
+
+ set_joinpartitionprune_references(root,
+ hplan->joinpartprune_info_list,
+ outer_itlist,
+ rtoffset,
+ NUM_EXEC_TLIST(plan));
+}
+
+/*
+ * set_joinpartitionprune_references
+ * Do set_plan_references processing on JoinPartitionPruneInfos
+ */
+static void
+set_joinpartitionprune_references(PlannerInfo *root,
+ List *joinpartprune_info_list,
+ indexed_tlist *outer_itlist,
+ int rtoffset,
+ double num_exec)
+{
+ ListCell *l;
+
+ foreach(l, joinpartprune_info_list)
+ {
+ JoinPartitionPruneInfo *jpinfo = (JoinPartitionPruneInfo *) lfirst(l);
+ ListCell *l1;
+
+ foreach(l1, jpinfo->part_prune_info->prune_infos)
+ {
+ List *prune_infos = lfirst(l1);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ pinfo->rtindex += rtoffset;
+
+ pinfo->initial_pruning_steps = (List *)
+ fix_upper_expr(root,
+ (Node *) pinfo->initial_pruning_steps,
+ outer_itlist,
+ OUTER_VAR,
+ rtoffset,
+ NRM_EQUAL,
+ num_exec);
+ pinfo->exec_pruning_steps = (List *)
+ fix_upper_expr(root,
+ (Node *) pinfo->exec_pruning_steps,
+ outer_itlist,
+ OUTER_VAR,
+ rtoffset,
+ NRM_EQUAL,
+ num_exec);
+ }
+ }
+ }
}
/*
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9006afd9d2..c0cf7c3c86 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -48,7 +48,9 @@
#include "optimizer/appendinfo.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
+#include "optimizer/paramassign.h"
#include "optimizer/pathnode.h"
+#include "optimizer/restrictinfo.h"
#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partprune.h"
@@ -102,15 +104,16 @@ typedef enum PartClauseTarget
*
* gen_partprune_steps() initializes and returns an instance of this struct.
*
- * Note that has_mutable_op, has_mutable_arg, and has_exec_param are set if
- * we found any potentially-useful-for-pruning clause having those properties,
- * whether or not we actually used the clause in the steps list. This
- * definition allows us to skip the PARTTARGET_EXEC pass in some cases.
+ * Note that has_mutable_op, has_mutable_arg, has_exec_param and has_vars are
+ * set if we found any potentially-useful-for-pruning clause having those
+ * properties, whether or not we actually used the clause in the steps list.
+ * This definition allows us to skip the PARTTARGET_EXEC pass in some cases.
*/
typedef struct GeneratePruningStepsContext
{
/* Copies of input arguments for gen_partprune_steps: */
RelOptInfo *rel; /* the partitioned relation */
+ Bitmapset *available_rels; /* rels whose Vars may be used for pruning */
PartClauseTarget target; /* use-case we're generating steps for */
/* Result data: */
List *steps; /* list of PartitionPruneSteps */
@@ -118,6 +121,7 @@ typedef struct GeneratePruningStepsContext
bool has_mutable_arg; /* clauses include any mutable comparison
* values, *other than* exec params */
bool has_exec_param; /* clauses include any PARAM_EXEC params */
+ bool has_vars; /* clauses include any Vars from 'available_rels' */
bool contradictory; /* clauses were proven self-contradictory */
/* Working state: */
int next_step_id;
@@ -143,8 +147,10 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
+ Bitmapset *available_rels,
Bitmapset **matchedsubplans);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
+ Bitmapset *available_rels,
PartClauseTarget target,
GeneratePruningStepsContext *context);
static List *gen_partprune_steps_internal(GeneratePruningStepsContext *context,
@@ -203,6 +209,10 @@ static PartClauseMatchStatus match_boolean_partition_clause(Oid partopfamily,
static void partkey_datum_from_expr(PartitionPruneContext *context,
Expr *expr, int stateidx,
Datum *value, bool *isnull);
+static bool contain_forbidden_var_clause(Node *node,
+ GeneratePruningStepsContext *context);
+static bool contain_forbidden_var_clause_walker(Node *node,
+ GeneratePruningStepsContext *context);
/*
@@ -215,11 +225,14 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
+ * 'available_rels' is the relid set of rels whose Vars may be used for
+ * pruning.
*/
PartitionPruneInfo *
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
- List *prunequal)
+ List *prunequal,
+ Bitmapset *available_rels)
{
PartitionPruneInfo *pruneinfo;
Bitmapset *allmatchedsubplans = NULL;
@@ -312,6 +325,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
prunequal,
partrelids,
relid_subplan_map,
+ available_rels,
&matchedsubplans);
/* When pruning is possible, record the matched subplans */
@@ -359,6 +373,184 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
return pruneinfo;
}
+/*
+ * make_join_partition_pruneinfos
+ * Builds one JoinPartitionPruneInfo for each join at which join partition
+ * pruning is possible for this appendrel.
+ *
+ * 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
+ * of scan paths for its child rels.
+ */
+Bitmapset *
+make_join_partition_pruneinfos(PlannerInfo *root, RelOptInfo *parentrel,
+ Path *best_path, List *subpaths)
+{
+ Bitmapset *result = NULL;
+ ListCell *lc;
+
+ if (!IS_PARTITIONED_REL(parentrel))
+ return NULL;
+
+ foreach(lc, root->join_partition_prune_candidates)
+ {
+ JoinPartitionPruneCandidateInfo *candidate =
+ (JoinPartitionPruneCandidateInfo *) lfirst(lc);
+ PartitionPruneInfo *part_prune_info;
+ List *prunequal;
+ Relids joinrelids;
+ ListCell *l;
+ double prune_cost;
+
+ if (candidate == NULL)
+ continue;
+
+ /*
+ * Identify all joinclauses that are movable to this appendrel given
+ * this inner side relids. Only those clauses can be used for join
+ * partition pruning.
+ */
+ joinrelids = bms_union(parentrel->relids, candidate->inner_relids);
+ prunequal = NIL;
+ foreach(l, candidate->joinrestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ if (join_clause_is_movable_into(rinfo,
+ parentrel->relids,
+ joinrelids))
+ prunequal = lappend(prunequal, rinfo);
+ }
+
+ if (prunequal == NIL)
+ continue;
+
+ /*
+ * Check the overhead of this pruning
+ */
+ prune_cost = compute_partprune_cost(root,
+ parentrel,
+ best_path->total_cost,
+ list_length(subpaths),
+ candidate->inner_relids,
+ candidate->inner_rows,
+ prunequal);
+ if (prune_cost > 0)
+ continue;
+
+ part_prune_info = make_partition_pruneinfo(root, parentrel,
+ subpaths,
+ prunequal,
+ candidate->inner_relids);
+
+ if (part_prune_info)
+ {
+ JoinPartitionPruneInfo *jpinfo;
+
+ jpinfo = makeNode(JoinPartitionPruneInfo);
+
+ jpinfo->part_prune_info = part_prune_info;
+ jpinfo->paramid = assign_special_exec_param(root);
+ jpinfo->nplans = list_length(subpaths);
+
+ candidate->joinpartprune_info_list =
+ lappend(candidate->joinpartprune_info_list, jpinfo);
+
+ result = bms_add_member(result, jpinfo->paramid);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * prepare_join_partition_prune_candidate
+ * Check if join partition pruning is possible at this join and if so
+ * collect information required to build JoinPartitionPruneInfos.
+ *
+ * Note that we may build more than one JoinPartitionPruneInfo at one join, for
+ * different Append/MergeAppend paths.
+ */
+void
+prepare_join_partition_prune_candidate(PlannerInfo *root, JoinPath *jpath)
+{
+ JoinPartitionPruneCandidateInfo *candidate;
+
+ if (!enable_partition_pruning)
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ /*
+ * For now do not perform join partition pruning for parallel hashjoin.
+ */
+ if (jpath->path.parallel_workers > 0)
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ /*
+ * We cannot perform join partition pruning if the outer is the
+ * non-nullable side.
+ */
+ if (!(jpath->jointype == JOIN_INNER ||
+ jpath->jointype == JOIN_SEMI ||
+ jpath->jointype == JOIN_RIGHT ||
+ jpath->jointype == JOIN_RIGHT_ANTI))
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ /*
+ * For now we only support HashJoin.
+ */
+ if (jpath->path.pathtype != T_HashJoin)
+ {
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, NULL);
+ return;
+ }
+
+ candidate = makeNode(JoinPartitionPruneCandidateInfo);
+ candidate->joinrestrictinfo = jpath->joinrestrictinfo;
+ candidate->inner_relids = jpath->innerjoinpath->parent->relids;
+ candidate->inner_rows = jpath->innerjoinpath->parent->rows;
+ candidate->joinpartprune_info_list = NIL;
+
+ root->join_partition_prune_candidates =
+ lappend(root->join_partition_prune_candidates, candidate);
+}
+
+/*
+ * get_join_partition_prune_candidate
+ * Pop out the JoinPartitionPruneCandidateInfo for this join and retrieve
+ * the JoinPartitionPruneInfos.
+ */
+List *
+get_join_partition_prune_candidate(PlannerInfo *root)
+{
+ JoinPartitionPruneCandidateInfo *candidate;
+ List *result;
+
+ candidate = llast(root->join_partition_prune_candidates);
+ root->join_partition_prune_candidates =
+ list_delete_last(root->join_partition_prune_candidates);
+
+ if (candidate == NULL)
+ return NIL;
+
+ result = candidate->joinpartprune_info_list;
+
+ pfree(candidate);
+
+ return result;
+}
+
/*
* add_part_relids
* Add new info to a list of Bitmapsets of partitioned relids.
@@ -427,6 +619,8 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* partrelids: Set of RT indexes identifying relevant partitioned tables
* within a single partitioning hierarchy
* relid_subplan_map[]: maps child relation relids to subplan indexes
+ * available_rels: the relid set of rels whose Vars may be used for
+ * pruning.
* matchedsubplans: on success, receives the set of subplan indexes which
* were matched to this partition hierarchy
*
@@ -439,6 +633,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
+ Bitmapset *available_rels,
Bitmapset **matchedsubplans)
{
RelOptInfo *targetpart = NULL;
@@ -538,8 +733,8 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
*/
- gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
- &context);
+ gen_partprune_steps(subpart, partprunequal, available_rels,
+ PARTTARGET_INITIAL, &context);
if (context.contradictory)
{
@@ -566,14 +761,15 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
initial_pruning_steps = NIL;
/*
- * If no exec Params appear in potentially-usable pruning clauses,
- * then there's no point in even thinking about per-scan pruning.
+ * If no exec Params or available Vars appear in potentially-usable
+ * pruning clauses, then there's no point in even thinking about
+ * per-scan pruning.
*/
- if (context.has_exec_param)
+ if (context.has_exec_param || context.has_vars)
{
/* ... OK, we'd better think about it */
- gen_partprune_steps(subpart, partprunequal, PARTTARGET_EXEC,
- &context);
+ gen_partprune_steps(subpart, partprunequal, available_rels,
+ PARTTARGET_EXEC, &context);
if (context.contradictory)
{
@@ -586,11 +782,14 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/*
* Detect which exec Params actually got used; the fact that some
* were in available clauses doesn't mean we actually used them.
- * Skip per-scan pruning if there are none.
*/
execparamids = get_partkey_exec_paramids(exec_pruning_steps);
- if (bms_is_empty(execparamids))
+ /*
+ * Skip per-scan pruning if there are none used exec Params and
+ * there are none available Vars.
+ */
+ if (bms_is_empty(execparamids) && !context.has_vars)
exec_pruning_steps = NIL;
}
else
@@ -702,6 +901,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* Process 'clauses' (typically a rel's baserestrictinfo list of clauses)
* and create a list of "partition pruning steps".
*
+ * 'available_rels' is the relid set of rels whose Vars may be used for
+ * pruning.
+ *
* 'target' tells whether to generate pruning steps for planning (use
* immutable clauses only), or for executor startup (use any allowable
* clause except ones containing PARAM_EXEC Params), or for executor
@@ -711,12 +913,13 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* some subsidiary flags; see the GeneratePruningStepsContext typedef.
*/
static void
-gen_partprune_steps(RelOptInfo *rel, List *clauses, PartClauseTarget target,
- GeneratePruningStepsContext *context)
+gen_partprune_steps(RelOptInfo *rel, List *clauses, Bitmapset *available_rels,
+ PartClauseTarget target, GeneratePruningStepsContext *context)
{
/* Initialize all output values to zero/false/NULL */
memset(context, 0, sizeof(GeneratePruningStepsContext));
context->rel = rel;
+ context->available_rels = available_rels;
context->target = target;
/*
@@ -772,7 +975,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
* If the clauses are found to be contradictory, we can return the empty
* set.
*/
- gen_partprune_steps(rel, clauses, PARTTARGET_PLANNER,
+ gen_partprune_steps(rel, clauses, NULL, PARTTARGET_PLANNER,
&gcontext);
if (gcontext.contradictory)
return NULL;
@@ -2020,9 +2223,10 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
return PARTCLAUSE_UNSUPPORTED;
/*
- * We can never prune using an expression that contains Vars.
+ * We can never prune using an expression that contains Vars except
+ * for Vars belonging to context->available_rels.
*/
- if (contain_var_clause((Node *) expr))
+ if (contain_forbidden_var_clause((Node *) expr, context))
return PARTCLAUSE_UNSUPPORTED;
/*
@@ -2218,9 +2422,10 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
return PARTCLAUSE_UNSUPPORTED;
/*
- * We can never prune using an expression that contains Vars.
+ * We can never prune using an expression that contains Vars except
+ * for Vars belonging to context->available_rels.
*/
- if (contain_var_clause((Node *) rightop))
+ if (contain_forbidden_var_clause((Node *) rightop, context))
return PARTCLAUSE_UNSUPPORTED;
/*
@@ -3790,3 +3995,54 @@ partkey_datum_from_expr(PartitionPruneContext *context,
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
+
+/*
+ * contain_forbidden_var_clause
+ * Recursively scan a clause to discover whether it contains any Var nodes
+ * (of the current query level) that do not belong to relations in
+ * context->available_rels.
+ *
+ * Returns true if any such varnode found.
+ *
+ * Does not examine subqueries, therefore must only be used after reduction
+ * of sublinks to subplans!
+ */
+static bool
+contain_forbidden_var_clause(Node *node, GeneratePruningStepsContext *context)
+{
+ return contain_forbidden_var_clause_walker(node, context);
+}
+
+static bool
+contain_forbidden_var_clause_walker(Node *node, GeneratePruningStepsContext *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varlevelsup != 0)
+ return false;
+
+ if (!bms_is_member(var->varno, context->available_rels))
+ return true; /* abort the tree traversal and return true */
+
+ context->has_vars = true;
+
+ if (context->target != PARTTARGET_EXEC)
+ return true; /* abort the tree traversal and return true */
+
+ return false;
+ }
+ if (IsA(node, CurrentOfExpr))
+ return true;
+ if (IsA(node, PlaceHolderVar))
+ {
+ if (((PlaceHolderVar *) node)->phlevelsup == 0)
+ return true; /* abort the tree traversal and return true */
+ /* else fall through to check the contained expr */
+ }
+ return expression_tree_walker(node, contain_forbidden_var_clause_walker,
+ (void *) context);
+}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index c09bc83b2a..7e46f0baf6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -121,11 +121,26 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+/*
+ * JoinPartitionPruneState - State object required for plan nodes to perform
+ * join partition pruning.
+ */
+typedef struct JoinPartitionPruneState
+{
+ PartitionPruneState *part_prune_state;
+ int paramid;
+ int nplans;
+ bool finished;
+ Bitmapset *part_prune_result;
+} JoinPartitionPruneState;
+
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
PartitionPruneInfo *pruneinfo,
Bitmapset **initially_valid_subplans);
+extern List *ExecInitJoinpartpruneList(PlanState *planstate, List *joinpartprune_info_list);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ PlanState *planstate);
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 9259352672..3cfd011a0a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2684,6 +2684,9 @@ typedef struct HashState
/* Parallel hash state. */
struct ParallelHashJoinState *parallel_state;
+
+ /* Infos for join partition pruning. */
+ List *joinpartprune_state_list;
} HashState;
/* ----------------
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 534692bee1..da7803448a 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -530,6 +530,9 @@ struct PlannerInfo
/* not-yet-assigned NestLoopParams */
List *curOuterParams;
+ /* a stack of JoinPartitionPruneInfos */
+ List *join_partition_prune_candidates;
+
/*
* These fields are workspace for setrefs.c. Each is an array
* corresponding to glob->subplans. (We could probably teach
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b4ef6bc44c..964bc85123 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -275,6 +275,9 @@ typedef struct Append
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /* Info for join partition pruning; NULL if we're not doing that */
+ Bitmapset *join_prune_paramids;
} Append;
/* ----------------
@@ -310,6 +313,9 @@ typedef struct MergeAppend
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /* Info for join partition pruning; NULL if we're not doing that */
+ Bitmapset *join_prune_paramids;
} MergeAppend;
/* ----------------
@@ -1206,6 +1212,7 @@ typedef struct Hash
bool skewInherit; /* is outer join rel an inheritance tree? */
/* all other info is in the parent HashJoin node */
Cardinality rows_total; /* estimate total rows if parallel_aware */
+ List *joinpartprune_info_list; /* infos for join partition pruning */
} Hash;
/* ----------------
@@ -1552,6 +1559,35 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*
+ * JoinPartitionPruneCandidateInfo - Information required to build
+ * JoinPartitionPruneInfos.
+ */
+typedef struct JoinPartitionPruneCandidateInfo
+{
+ pg_node_attr(no_equal, no_query_jumble)
+
+ NodeTag type;
+ List *joinrestrictinfo;
+ Bitmapset *inner_relids;
+ double inner_rows;
+ List *joinpartprune_info_list;
+} JoinPartitionPruneCandidateInfo;
+
+/*
+ * JoinPartitionPruneInfo - Details required to allow the executor to prune
+ * partitions during join.
+ */
+typedef struct JoinPartitionPruneInfo
+{
+ pg_node_attr(no_equal, no_query_jumble)
+
+ NodeTag type;
+ PartitionPruneInfo *part_prune_info;
+ int paramid;
+ int nplans;
+} JoinPartitionPruneInfo;
+
/*
* Plan invalidation info
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index b1c51a4e70..5ccd3c62d4 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -212,5 +212,9 @@ extern PathTarget *set_pathtarget_cost_width(PlannerInfo *root, PathTarget *targ
extern double compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel,
Path *bitmapqual, double loop_count,
Cost *cost_p, double *tuples_p);
+extern double compute_partprune_cost(PlannerInfo *root, RelOptInfo *appendrel,
+ Cost append_total_cost, int append_nplans,
+ Relids inner_relids, double inner_rows,
+ List *prunequal);
#endif /* COST_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index bd490d154f..5187485c88 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -19,6 +19,8 @@
struct PlannerInfo; /* avoid including pathnodes.h here */
struct RelOptInfo;
+struct Path;
+struct JoinPath;
/*
@@ -73,7 +75,15 @@ typedef struct PartitionPruneContext
extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
struct RelOptInfo *parentrel,
List *subpaths,
- List *prunequal);
+ List *prunequal,
+ Bitmapset *available_rels);
+extern Bitmapset *make_join_partition_pruneinfos(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ struct Path *best_path,
+ List *subpaths);
+extern void prepare_join_partition_prune_candidate(struct PlannerInfo *root,
+ struct JoinPath *jpath);
+extern List *get_join_partition_prune_candidate(struct PlannerInfo *root);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 9c20a24982..7f51a3986e 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -3144,6 +3144,92 @@ order by tbl1.col1, tprt.col1;
------+------
(0 rows)
+-- join partition pruning
+-- The 'Memory Usage' from the Hash node can vary between machines. Let's just
+-- replace the number with an 'N'.
+-- We need to run EXPLAIN ANALYZE because we need to see '(never executed)'
+-- notations because that's the only way to verify runtime pruning.
+create function explain_join_partition_pruning(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, verbose, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+ return next ln;
+ end loop;
+end;
+$$;
+delete from tbl1;
+insert into tbl1 values (501), (505);
+analyze tbl1, tprt;
+set enable_nestloop = off;
+set enable_mergejoin = off;
+set enable_hashjoin = on;
+select explain_join_partition_pruning('
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;');
+ explain_join_partition_pruning
+--------------------------------------------------------------------------------
+ Hash Right Join (actual rows=2 loops=1)
+ Output: p1.col1, p2.col1, t.col1
+ Hash Cond: ((p1.col1 = t.col1) AND (p2.col1 = t.col1))
+ -> Hash Join (actual rows=3 loops=1)
+ Output: p1.col1, p2.col1
+ Hash Cond: (p1.col1 = p2.col1)
+ -> Append (actual rows=3 loops=1)
+ Join Partition Pruning: $0
+ -> Seq Scan on public.tprt_1 p1_1 (never executed)
+ Output: p1_1.col1
+ -> Seq Scan on public.tprt_2 p1_2 (actual rows=3 loops=1)
+ Output: p1_2.col1
+ -> Seq Scan on public.tprt_3 p1_3 (never executed)
+ Output: p1_3.col1
+ -> Seq Scan on public.tprt_4 p1_4 (never executed)
+ Output: p1_4.col1
+ -> Seq Scan on public.tprt_5 p1_5 (never executed)
+ Output: p1_5.col1
+ -> Seq Scan on public.tprt_6 p1_6 (never executed)
+ Output: p1_6.col1
+ -> Hash (actual rows=3 loops=1)
+ Output: p2.col1
+ Buckets: 1024 Batches: 1 Memory Usage: NkB
+ -> Append (actual rows=3 loops=1)
+ Join Partition Pruning: $1
+ -> Seq Scan on public.tprt_1 p2_1 (never executed)
+ Output: p2_1.col1
+ -> Seq Scan on public.tprt_2 p2_2 (actual rows=3 loops=1)
+ Output: p2_2.col1
+ -> Seq Scan on public.tprt_3 p2_3 (never executed)
+ Output: p2_3.col1
+ -> Seq Scan on public.tprt_4 p2_4 (never executed)
+ Output: p2_4.col1
+ -> Seq Scan on public.tprt_5 p2_5 (never executed)
+ Output: p2_5.col1
+ -> Seq Scan on public.tprt_6 p2_6 (never executed)
+ Output: p2_6.col1
+ -> Hash (actual rows=2 loops=1)
+ Output: t.col1
+ Buckets: 1024 Batches: 1 Memory Usage: NkB
+ Partition Prune: $0, $1
+ -> Seq Scan on public.tbl1 t (actual rows=2 loops=1)
+ Output: t.col1
+(43 rows)
+
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;
+ col1 | col1 | col1
+------+------+------
+ 501 | 501 | 501
+ 505 | 505 | 505
+(2 rows)
+
drop table tbl1, tprt;
-- Test with columns defined in varying orders between each level
create table part_abc (a int not null, b int not null, c int not null) partition by list (a);
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index a09b27d820..2d1e96d51e 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -764,6 +764,45 @@ select tbl1.col1, tprt.col1 from tbl1
inner join tprt on tbl1.col1 = tprt.col1
order by tbl1.col1, tprt.col1;
+-- join partition pruning
+
+-- The 'Memory Usage' from the Hash node can vary between machines. Let's just
+-- replace the number with an 'N'.
+-- We need to run EXPLAIN ANALYZE because we need to see '(never executed)'
+-- notations because that's the only way to verify runtime pruning.
+create function explain_join_partition_pruning(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, verbose, costs off, summary off, timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Memory Usage: \d+', 'Memory Usage: N');
+ return next ln;
+ end loop;
+end;
+$$;
+
+delete from tbl1;
+insert into tbl1 values (501), (505);
+analyze tbl1, tprt;
+
+set enable_nestloop = off;
+set enable_mergejoin = off;
+set enable_hashjoin = on;
+
+select explain_join_partition_pruning('
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;');
+
+select * from tprt p1
+ inner join tprt p2 on p1.col1 = p2.col1
+ right join tbl1 t on p1.col1 = t.col1 and p2.col1 = t.col1;
+
drop table tbl1, tprt;
-- Test with columns defined in varying orders between each level
--
2.31.0
On 19/3/2024 07:12, Richard Guo wrote:
On Tue, Jan 30, 2024 at 10:33 AM Richard Guo <guofenglinux@gmail.com
<mailto:guofenglinux@gmail.com>> wrote:Attached is an updated patch. Nothing else has changed.
Here is another rebase over master so it applies again. Nothing else
has changed.
The patch doesn't apply to the master now.
I wonder why this work was suppressed - it looks highly profitable in
the case of foreign partitions. And the idea of cost-based enablement
makes it a must-have, I think.
I have just skimmed through the patch and have a couple of questions:
1. It makes sense to calculate the cost and remember the minimum number
of pruned partitions when the cost of HJ with probing is still
profitable. Why don't we disable this probing in runtime if we see that
the number of potentially pruning partitions is already too low?
2. Maybe I misunderstood the code, but having matched a hashed tuple
with a partition, it makes sense for further tuples to reduce the number
of probing expressions because we already know that the partition will
not be pruned.
--
regards, Andrei Lepikhov
On Tue, 22 Aug 2023 at 21:51, Richard Guo <guofenglinux@gmail.com> wrote:
Sometimes we may just not generate parameterized nestloop as final plan,
such as when there are no indexes and no lateral references in the
Append/MergeAppend node. In this case I think it would be great if we
can still do some partition running.
(I just read through this thread again to remind myself of where it's at.)
Here are my current thoughts: You've done some costing work which will
only prefer the part-prune hash join path in very conservative cases.
This is to reduce the risk of performance regressions caused by
running the pruning code too often in cases where it's less likely to
be able to prune any partitions.
Now, I'm not saying we shouldn't ever do this pruning hash join stuff,
but what I think might be better to do as a first step is to have
partitioned tables create a parameterized path on their partition key,
and a prefix thereof for RANGE partitioned tables. This would allow
parameterized nested loop joins when no index exists on the partition
key.
Right now you can get a plan that does this if you do:
create table p (col int);
create table pt (partkey int) partition by list(partkey);
create table pt1 partition of pt for values in(1);
create table pt2 partition of pt for values in(2);
insert into p values(1);
insert into pt values(1);
explain (analyze, costs off, timing off, summary off)
SELECT * FROM p, LATERAL (SELECT * FROM pt WHERE p.col = pt.partkey OFFSET 0);
QUERY PLAN
----------------------------------------------------------
Nested Loop (actual rows=0 loops=1)
-> Seq Scan on p (actual rows=1 loops=1)
-> Append (actual rows=0 loops=1)
-> Seq Scan on pt1 pt_1 (actual rows=0 loops=1)
Filter: (p.col = partkey)
-> Seq Scan on pt2 pt_2 (never executed)
Filter: (p.col = partkey)
You get the parameterized nested loop. Great! But, as soon as you drop
the OFFSET 0, the lateral join will be converted to an inner join and
Nested Loop won't look so great when it's not parameterized.
explain (analyze, costs off, timing off, summary off)
SELECT * FROM p, LATERAL (SELECT * FROM pt WHERE p.col = pt.partkey);
QUERY PLAN
----------------------------------------------------------
Hash Join (actual rows=1 loops=1)
Hash Cond: (pt.partkey = p.col)
-> Append (actual rows=1 loops=1)
-> Seq Scan on pt1 pt_1 (actual rows=1 loops=1)
-> Seq Scan on pt2 pt_2 (actual rows=0 loops=1)
-> Hash (actual rows=1 loops=1)
Buckets: 4096 Batches: 2 Memory Usage: 32kB
-> Seq Scan on p (actual rows=1 loops=1)
Maybe instead of inventing a very pessimistic part prune Hash Join, it
might be better to make the above work without the LATERAL + OFFSET 0
by creating the parameterized paths Seq Scan paths. That's going to be
an immense help when the non-partitioned relation just has a small
number of rows, which I think your costing favoured anyway.
What do you think?
David
On Fri, Sep 6, 2024 at 9:22 AM David Rowley <dgrowleyml@gmail.com> wrote:
Maybe instead of inventing a very pessimistic part prune Hash Join, it
might be better to make the above work without the LATERAL + OFFSET 0
by creating the parameterized paths Seq Scan paths. That's going to be
an immense help when the non-partitioned relation just has a small
number of rows, which I think your costing favoured anyway.What do you think?
This approach seems promising. It reminds me of the discussion about
pushing join clauses into a seqscan [1]/messages/by-id/3478841.1724878067@sss.pgh.pa.us. But I think there are two
problems that we need to address to make it work.
* Currently, the costing code does not take run-time pruning into
consideration. How should we calculate the costs of the parameterized
paths on partitioned tables?
* This approach generates additional paths at the scan level, which
may not be easily compared with regular scan paths. As a result, we
might need to retain these paths at every level of the join tree. I'm
afraid this could lead to a significant increase in planning time in
some cases. We need to find a way to avoid regressions in planning
time.
[1]: /messages/by-id/3478841.1724878067@sss.pgh.pa.us
Thanks
Richard
On Fri, 6 Sept 2024 at 19:19, Richard Guo <guofenglinux@gmail.com> wrote:
* Currently, the costing code does not take run-time pruning into
consideration. How should we calculate the costs of the parameterized
paths on partitioned tables?
Couldn't we assume total_cost = total_cost / n_apppend_children for
equality conditions and do something with DEFAULT_INEQ_SEL and
DEFAULT_RANGE_INEQ_SEL for more complex cases. I understand we
probably need to do something about this to have the planner have any
chance of actually choose these Paths, so hacking something in there
to test the idea is sound before going to the trouble of refining the
cost model seems like a good idea.
* This approach generates additional paths at the scan level, which
may not be easily compared with regular scan paths. As a result, we
might need to retain these paths at every level of the join tree. I'm
afraid this could lead to a significant increase in planning time in
some cases. We need to find a way to avoid regressions in planning
time.
How about just creating these Paths for partitioned tables (and
partitions) when there's an EquivalenceClass containing multiple
relids on the partition key? I think those are about the only cases
that could benefit, so I think it makes sense to restrict making the
additional Paths for that case.
David